You searched for subject:( en NATURAL LANGUAGE PROCESSING)
.
Showing records 1 – 30 of
132897 total matches.
◁ [1] [2] [3] [4] [5] … [4430] ▶

Anna University
1.
Balaji J.
Pattern based bootstrapping approaches For natural
language processing of Morphologically rich languages;.
Degree: Pattern based bootstrapping approaches For natural
language processing of Morphologically rich languages, 2015, Anna University
URL: http://shodhganga.inflibnet.ac.in/handle/10603/35514
► This thesis attempts to tackle Natural Language Processing NLP tasks by exploiting the special characteristics of morphologically rich newlineLanguages In this thesis we use Tamil…
(more)
▼ This thesis attempts to tackle Natural Language
Processing NLP tasks by exploiting the special characteristics of
morphologically rich newlineLanguages In this thesis we use Tamil
as an example to show how newlinecomputational approaches to such
morphologically rich languages need to be newlinedifferent Our
initial work used the special characteristics to build rule based
newlinesystems However as is the case with most rule based systems
only the newlinenatural language sentences of a specific domain
could be tackled As a result newlineof our experience in building
the rule based systems we were able to identify newlinethe
linguistic features that could be effectively used for the NLP
processing of newlinemorphologically rich languages newlineIn order
to overcome the limitations of rule based approaches we newlinenext
attempted to explore machine learning approaches One of the common
newlinemachine learning approaches used for languages such as
English, is newlinesupervised learning Supervised approaches
require a large labor intensive newlineannotated and labeled corpus
which is not available for resource scarce newlinelanguages such as
Tamil Unsupervised approaches on the other hand take a newlinelong
time to converge to a solution We first attempted an unsupervised
approach newlinefor the semantic relation extraction From our
experience with the unsupervised newlineapproach we found that the
partially free word order characteristic of a
newlinemorphologically rich language did not lend itself to fast
convergence to a newlinesolution In this context we decided that
semi supervised approaches that require a limited number of trained
samples could be attempted newline newline
reference p284-303.
Advisors/Committee Members: Geetha T V.
Subjects/Keywords: Natural Language Processing
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
J, B. (2015). Pattern based bootstrapping approaches For natural
language processing of Morphologically rich languages;. (Thesis). Anna University. Retrieved from http://shodhganga.inflibnet.ac.in/handle/10603/35514
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Chicago Manual of Style (16th Edition):
J, Balaji. “Pattern based bootstrapping approaches For natural
language processing of Morphologically rich languages;.” 2015. Thesis, Anna University. Accessed January 21, 2021.
http://shodhganga.inflibnet.ac.in/handle/10603/35514.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
MLA Handbook (7th Edition):
J, Balaji. “Pattern based bootstrapping approaches For natural
language processing of Morphologically rich languages;.” 2015. Web. 21 Jan 2021.
Vancouver:
J B. Pattern based bootstrapping approaches For natural
language processing of Morphologically rich languages;. [Internet] [Thesis]. Anna University; 2015. [cited 2021 Jan 21].
Available from: http://shodhganga.inflibnet.ac.in/handle/10603/35514.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Council of Science Editors:
J B. Pattern based bootstrapping approaches For natural
language processing of Morphologically rich languages;. [Thesis]. Anna University; 2015. Available from: http://shodhganga.inflibnet.ac.in/handle/10603/35514
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation

Pontifical Catholic University of Rio de Janeiro
2.
PEDRO HENRIQUE THOMPSON FURTADO.
[en] AUTOMATIC INTERPRETATION OF EQUIPMENT OPERATION
REPORTS.
Degree: 2017, Pontifical Catholic University of Rio de Janeiro
URL: http://www.maxwell.vrac.puc-rio.br/Busca_etds.php?strSecao=resultado&nrSeq=30732
► [pt] As unidades operacionais da área de Exploração e Produção (EeP) da PETROBRAS utilizam relatórios diários para o registro de situações e eventos em Unidades…
(more)
▼ [pt] As unidades operacionais da área de Exploração e
Produção (EeP) da PETROBRAS utilizam relatórios diários para o
registro de situações e eventos em Unidades Estacionárias de
Produção (UEPs), as conhecidas plataformas de produção de petróleo.
Um destes relatórios, o SITOP (Situação Operacional das Unidades
Marítimas), é um documento diário em texto livre que apresenta
informações numéricas (índices de produção, algumas vazões, etc.)
e, principalmente, informações textuais. A parte textual, apesar de
não estruturada, encerra uma valiosíssima base de dados de
histórico de eventos no ambiente de produção, tais como: quebras de
válvulas, falhas em equipamentos de processo, início e término de
manutenções, manobras executadas, responsabilidades etc. O valor
destes dados é alto, mas o custo da busca de informações também o
é, pois se demanda a atenção de técnicos da empresa na leitura de
uma enorme quantidade de documentos. O objetivo do presente
trabalho é o desenvolvimento de um modelo de processamento de
linguagem natural para a identificação, nos textos dos SITOPs, de
entidades nomeadas e extração de relações entre estas entidades,
descritas formalmente em uma ontologia de domínio aplicada a
eventos em unidades de processamento de petróleo e gás em ambiente
offshore. Ter-se-á, portanto, um método de estruturação automática
da informação presente nestes relatórios operacionais. Os
resultados obtidos demonstram que a metodologia é útil para este
caso, ainda que passível de melhorias em diferentes frentes. A
extração de relações apresenta melhores resultados que a
identificação de entidades, o que pode ser explicado pela diferença
entre o número de classes das duas tarefas. Verifica-se também que
o aumento na quantidade de dados é um dos fatores mais importantes
para a melhoria do aprendizado e da eficiência da metodologia como
um todo.
[en] The operational units at the Exploration and
Production (E and P) area at PETROBRAS make use of daily reports to
register situations and events from their Stationary Production
Units (SPUs), the well-known petroleum production platforms. One of
these reports, called SITOP (the Portuguese acronym for Offshore
Unities Operational Situation), is a daily document in free text
format that presents numerical information and, mainly, textual
information about operational situation of offshore units. The
textual section, although unstructured, stores a valuable database
with historical events in the production environment, such as:
valve breakages, failures in processing equipment, beginning and
end of maintenance activities, actions executed, responsibilities,
etc. The value of these data is high, as well as the costs of
searching relevant information, consuming many hours of attention
from technicians and engineers to read the large number of
documents. The goal of this dissertation is to develop a model of
natural language processing to recognize named entities and extract
relations among them, described formally as a domain ontology
applied to events in offshore oil and gas…
Advisors/Committee Members: HELIO CORTES VIEIRA LOPES.
Subjects/Keywords: [pt] GAS NATURAL; [en] NATURAL GAS; [pt] PETROLEO; [en] PETROLEUM; [pt] ONTOLOGIAS; [en] ONTOLOGIES; [pt] APRENDIZADO AUTOMATICO; [en] AUTOMATIC LEARNING; [pt] PROCESSAMENTO DE LINGUAGEM NATURAL; [en] NATURAL LANGUAGE PROCESSING
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
FURTADO, P. H. T. (2017). [en] AUTOMATIC INTERPRETATION OF EQUIPMENT OPERATION
REPORTS. (Thesis). Pontifical Catholic University of Rio de Janeiro. Retrieved from http://www.maxwell.vrac.puc-rio.br/Busca_etds.php?strSecao=resultado&nrSeq=30732
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Chicago Manual of Style (16th Edition):
FURTADO, PEDRO HENRIQUE THOMPSON. “[en] AUTOMATIC INTERPRETATION OF EQUIPMENT OPERATION
REPORTS.” 2017. Thesis, Pontifical Catholic University of Rio de Janeiro. Accessed January 21, 2021.
http://www.maxwell.vrac.puc-rio.br/Busca_etds.php?strSecao=resultado&nrSeq=30732.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
MLA Handbook (7th Edition):
FURTADO, PEDRO HENRIQUE THOMPSON. “[en] AUTOMATIC INTERPRETATION OF EQUIPMENT OPERATION
REPORTS.” 2017. Web. 21 Jan 2021.
Vancouver:
FURTADO PHT. [en] AUTOMATIC INTERPRETATION OF EQUIPMENT OPERATION
REPORTS. [Internet] [Thesis]. Pontifical Catholic University of Rio de Janeiro; 2017. [cited 2021 Jan 21].
Available from: http://www.maxwell.vrac.puc-rio.br/Busca_etds.php?strSecao=resultado&nrSeq=30732.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Council of Science Editors:
FURTADO PHT. [en] AUTOMATIC INTERPRETATION OF EQUIPMENT OPERATION
REPORTS. [Thesis]. Pontifical Catholic University of Rio de Janeiro; 2017. Available from: http://www.maxwell.vrac.puc-rio.br/Busca_etds.php?strSecao=resultado&nrSeq=30732
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
3.
Doval, Yerai.
Seeking robustness in a multilingual world: from pipelines to embeddings
.
Degree: 2019, Universidad da Coruña
URL: http://hdl.handle.net/2183/24535
► [Abstract] In this dissertation, we study two approaches to overcome the challenges posed by processing user-generated non-standard multilingual text content as it is found on…
(more)
▼ [Abstract] In this dissertation, we study two approaches to overcome the challenges posed by
processing
user-generated non-standard multilingual text content as it is found on the Web nowadays.
Firstly, we present a traditional discrete pipeline approach where we preprocess the
input text so that it can be more easily handled later by other systems. This implies dealing
first with the multilinguality concern by identifying the
language of the input and, next,
managing the
language-specific non-standard writing phenomena involved by means of text
normalization and word (re-)segmentation techniques.
Secondly, we analyze the inherent limitations of this type of discrete models, taking
us to an approach centred on the use of continuous word embedding models. In this
case, the explicit preprocessing of the input is replaced by the encoding of the linguistic
characteristics and other nuances of non-standard texts in the embedding space. We aim to
obtain continuous models that not only overcome the limitations of discrete models but also
align with the current state of the art in
Natural Language Processing (NLP), dominated by
systems based on neural networks.
The results obtained after extensive experimentation showcase the capabilities of word
embeddings to effectively support the multilingual and non-standard phenomena of usergenerated
texts. Furthermore, all this is accomplished within a conceptually simple and
modular framework which does not sacrifice system integration. Such embedding models
can be readily used as a fundamental building block for state-of-the-art neural networks
which are, in turn, used in virtually any NLP task.
Advisors/Committee Members: Vilares, Jesús (advisor), Vilares, Manuel (advisor).
Subjects/Keywords: Procesamiento en lenguaje natural (Informática);
Ensamblado de palabras;
Natural language processing;
Word segmentation
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Doval, Y. (2019). Seeking robustness in a multilingual world: from pipelines to embeddings
. (Doctoral Dissertation). Universidad da Coruña. Retrieved from http://hdl.handle.net/2183/24535
Chicago Manual of Style (16th Edition):
Doval, Yerai. “Seeking robustness in a multilingual world: from pipelines to embeddings
.” 2019. Doctoral Dissertation, Universidad da Coruña. Accessed January 21, 2021.
http://hdl.handle.net/2183/24535.
MLA Handbook (7th Edition):
Doval, Yerai. “Seeking robustness in a multilingual world: from pipelines to embeddings
.” 2019. Web. 21 Jan 2021.
Vancouver:
Doval Y. Seeking robustness in a multilingual world: from pipelines to embeddings
. [Internet] [Doctoral dissertation]. Universidad da Coruña; 2019. [cited 2021 Jan 21].
Available from: http://hdl.handle.net/2183/24535.
Council of Science Editors:
Doval Y. Seeking robustness in a multilingual world: from pipelines to embeddings
. [Doctoral Dissertation]. Universidad da Coruña; 2019. Available from: http://hdl.handle.net/2183/24535
4.
Alkhazi, Ibrahim.
Compression-based parts-of-speech tagger for the Arabic language.
Degree: PhD, 2019, Bangor University
URL: https://research.bangor.ac.uk/portal/en/theses/compressionbased-partsofspeech-tagger-for-the-arabic-language(076552a6-32ee-41ff-9255-7abc6489c010).html
;
https://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.801316
► The Arabic language is a morphologically complex language that causes various difficulties for various NLP systems, such as POS tagging. The motive of this research…
(more)
▼ The Arabic language is a morphologically complex language that causes various difficulties for various NLP systems, such as POS tagging. The motive of this research is to investigate the development and training of a compression-based Arabic POS tagger using the PPM algorithm. The adoption of the algorithm for Arabic POS tagging may increase the efficiency and reduce the Arabic language ambiguity problem. The best text compression algorithms can be applied to NLP tasks often with state-of-the-art results. This research examines the use of tag-based compression of larger Arabic resources to re-evaluate the performance of tag-based compression which may reveal POS linguistic aspects of the Arabic language. We also found that tag-based text compression for the Arabic text can be utilised as a means of evaluating the performance and quality of the Arabic POS taggers. The results of the experiments show that the tag-based compression of the text can effectively be used for assessing the performance of Arabic POS taggers when used to tag different types of the Arabic text, and also as a means of comparing the performance of two Arabic POS taggers on the same text. With the rapid growth of Arabic text on the Web, studies that address the problems of classification and segmentation of the Arabic language are limited compared to other languages, most of which implement word-based and feature extraction algorithms. This research adopts a PPM character-based compression scheme to classify and segment Classical Arabic (CA) and Modern Standard Arabic (MSA) texts. An initial experiment using the PPM classification method on samples of text resulted in an accuracy of 95.5%, an average precision of 0.958, an average recall of 0.955 and an average F-measure of 0.954, using the concept of minimum cross-entropy. Segmenting the CA and MSA text using the PPM compression algorithm obtained an accuracy of 86%, an average precision of 0.869, an average recall of 0.86 and an average F-measure of 0.859. This research describes the creation of the new Bangor Arabic Annotated Corpus (BAAC) which is a Modern Standard Arabic (MSA) corpus that comprises 50K words manually annotated by parts-of-speech. For evaluating the quality of the corpus, the Kappa coefficient and a direct percent agreement for each tag were calculated for the new corpus and a Kappa value of 0.956 was obtained, with an average observed agreement of 94.25%. The corpus was used to evaluate the widely used Madamira Arabic POS tagger and to further investigate compression models for text compressed using POS tags. Also, a new annotation tool was developed and employed for the annotation process of the BAAC.
Subjects/Keywords: Language modelling; natural language processing
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Alkhazi, I. (2019). Compression-based parts-of-speech tagger for the Arabic language. (Doctoral Dissertation). Bangor University. Retrieved from https://research.bangor.ac.uk/portal/en/theses/compressionbased-partsofspeech-tagger-for-the-arabic-language(076552a6-32ee-41ff-9255-7abc6489c010).html ; https://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.801316
Chicago Manual of Style (16th Edition):
Alkhazi, Ibrahim. “Compression-based parts-of-speech tagger for the Arabic language.” 2019. Doctoral Dissertation, Bangor University. Accessed January 21, 2021.
https://research.bangor.ac.uk/portal/en/theses/compressionbased-partsofspeech-tagger-for-the-arabic-language(076552a6-32ee-41ff-9255-7abc6489c010).html ; https://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.801316.
MLA Handbook (7th Edition):
Alkhazi, Ibrahim. “Compression-based parts-of-speech tagger for the Arabic language.” 2019. Web. 21 Jan 2021.
Vancouver:
Alkhazi I. Compression-based parts-of-speech tagger for the Arabic language. [Internet] [Doctoral dissertation]. Bangor University; 2019. [cited 2021 Jan 21].
Available from: https://research.bangor.ac.uk/portal/en/theses/compressionbased-partsofspeech-tagger-for-the-arabic-language(076552a6-32ee-41ff-9255-7abc6489c010).html ; https://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.801316.
Council of Science Editors:
Alkhazi I. Compression-based parts-of-speech tagger for the Arabic language. [Doctoral Dissertation]. Bangor University; 2019. Available from: https://research.bangor.ac.uk/portal/en/theses/compressionbased-partsofspeech-tagger-for-the-arabic-language(076552a6-32ee-41ff-9255-7abc6489c010).html ; https://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.801316

Pontifical Catholic University of Rio de Janeiro
5.
CARLOS EDUARDO MEGER CRESTANA.
[en] A TOKEN CLASSIFICATION APPROACH TO DEPENDENCY
PARSING.
Degree: 2010, Pontifical Catholic University of Rio de Janeiro
URL: http://www.maxwell.vrac.puc-rio.br/Busca_etds.php?strSecao=resultado&nrSeq=16458
► [pt] Uma das tarefas mais importantes em Processamento de Linguagem Natural é a análise sintática, onde a estrutura de uma sentença é determinada de acordo…
(more)
▼ [pt] Uma das tarefas mais importantes em Processamento
de Linguagem Natural é a análise sintática, onde a estrutura de uma
sentença é determinada de acordo com uma dada gramática, informando
o significado de uma sentença a partir do significado das palavras
nela contidas. A Análise Sintática baseada em Gramáticas de
Dependência consiste em identificar para cada palavra a outra
palavra na sentença que a governa. Assim, a saída de um analisador
sintático de dependência é uma árvore onde os nós são as palavras
da sentença. Esta estrutura simples, mas rica, é utilizada em uma
grande variedade de aplicações, entre elas Sistemas de
Pergunta-Resposta, Tradução Automática, Extração de Informação, e
Identificação de Papéis Semânticos. Os sistemas estado-da-arte em
análise sintática de dependência utilizam modelos baseados em
transições ou modelos baseados em grafos. Essa dissertação
apresenta uma abordagem por classificação tokena- token para a
análise sintática de dependência ao criar um conjunto especial de
classes que permitem a correta identificação de uma palavra na
sentença. Usando esse conjunto de classes, qualquer algoritmo de
classificação pode ser treinado para identificar corretamente a
palavra governante de cada palavra na sentença. Além disso, este
conjunto de classes permite tratar igualmente relações de
dependência projetivas e não-projetivas, evitando abordagens
pseudo-projetivas. Para avaliar a sua eficácia, aplicamos o
algoritmo Entropy Guided Transformation Learning aos corpora
disponibilizados publicamente na tarefa proposta durante a CoNLL
2006. Esses experimentos foram realizados em três corpora de
diferentes idiomas: dinamarquês, holandês e português. Para
avaliação de desempenho foi utilizada a métrica de Unlabeled
Attachment Score. Nossos resultados mostram que os modelos gerados
atingem resultados acima da média dos sistemas do CoNLL. Ainda,
nossos resultados indicam que a abordagem por classificação
token-a-token é uma abordagem promissora para o problema de análise
sintática de dependência.
[en] One of the most important tasks in Natural
Language Processing is syntactic parsing, where the structure of a
sentence is inferred according to a given grammar. Syntactic
parsing, thus, tells us how to determine the meaning of the
sentence fromthemeaning of the words in it. Syntactic parsing based
on dependency grammars is called dependency parsing. The
Dependency-based syntactic parsing task consists in identifying a
head word for each word in an input sentence. Hence, its output is
a rooted tree, where the nodes are the words in the sentence. This
simple, yet powerful, structure is used in a great variety of
applications, like Question Answering,Machine Translation,
Information Extraction and Semantic Role Labeling. State-of-the-art
dependency parsing systems use transition-based or graph-based
models. This dissertation presents a token classification approach
to dependency parsing, by creating a special tagging set that helps
to correctly find the head of a token. Using this tagging style,
any…
Advisors/Committee Members: RUY LUIZ MILIDIU.
Subjects/Keywords: [pt] APRENDIZAGEM; [en] LEARNING; [pt] PROCESSAMENTO DA LINGUAGEM NATURAL; [en] NATURAL LANGUAGE PROCESSING; [pt] CLASSIFICACAO TOKEN-A-TOKEN
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
CRESTANA, C. E. M. (2010). [en] A TOKEN CLASSIFICATION APPROACH TO DEPENDENCY
PARSING. (Thesis). Pontifical Catholic University of Rio de Janeiro. Retrieved from http://www.maxwell.vrac.puc-rio.br/Busca_etds.php?strSecao=resultado&nrSeq=16458
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Chicago Manual of Style (16th Edition):
CRESTANA, CARLOS EDUARDO MEGER. “[en] A TOKEN CLASSIFICATION APPROACH TO DEPENDENCY
PARSING.” 2010. Thesis, Pontifical Catholic University of Rio de Janeiro. Accessed January 21, 2021.
http://www.maxwell.vrac.puc-rio.br/Busca_etds.php?strSecao=resultado&nrSeq=16458.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
MLA Handbook (7th Edition):
CRESTANA, CARLOS EDUARDO MEGER. “[en] A TOKEN CLASSIFICATION APPROACH TO DEPENDENCY
PARSING.” 2010. Web. 21 Jan 2021.
Vancouver:
CRESTANA CEM. [en] A TOKEN CLASSIFICATION APPROACH TO DEPENDENCY
PARSING. [Internet] [Thesis]. Pontifical Catholic University of Rio de Janeiro; 2010. [cited 2021 Jan 21].
Available from: http://www.maxwell.vrac.puc-rio.br/Busca_etds.php?strSecao=resultado&nrSeq=16458.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Council of Science Editors:
CRESTANA CEM. [en] A TOKEN CLASSIFICATION APPROACH TO DEPENDENCY
PARSING. [Thesis]. Pontifical Catholic University of Rio de Janeiro; 2010. Available from: http://www.maxwell.vrac.puc-rio.br/Busca_etds.php?strSecao=resultado&nrSeq=16458
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation

Pontifical Catholic University of Rio de Janeiro
6.
ERALDO LUIS REZENDE FERNANDES.
[en] ENTROPY GUIDED FEATURE GENERATION FOR STRUCTURE
LEARNING.
Degree: 2014, Pontifical Catholic University of Rio de Janeiro
URL: http://www.maxwell.vrac.puc-rio.br/Busca_etds.php?strSecao=resultado&nrSeq=23812
► [pt] Aprendizado de estruturas consiste em aprender um mapeamento de variáveis de entrada para saídas estruturadas a partir de exemplos de pares entrada-saída. Vários problemas…
(more)
▼ [pt] Aprendizado de estruturas consiste em aprender um
mapeamento de variáveis de entrada para saídas estruturadas a
partir de exemplos de pares entrada-saída. Vários problemas
importantes podem ser modelados desta maneira. O processamento de
linguagem natural provê diversas tarefas que podem ser formuladas e
solucionadas através do aprendizado de estruturas. Por exemplo,
parsing de dependência envolve o reconhecimento de uma árvore
implícita em uma frase. Geração de atributos é uma sub-tarefa
importante do aprendizado de estruturas. Geralmente, esta
sub-tarefa é realizada por um especialista que constrói gabaritos
de atributos complexos e discriminativos através da combinação dos
atributos básicos disponíveis na entrada. Esta é uma forma limitada
e cara para geração de atributos e é reconhecida como um gargalo de
modelagem. Neste trabalho, propomos um método automático para
geração de atributos para problemas de aprendizado de estruturas.
Este método é guiado por entropia já que é baseado na entropia
condicional de variáveis locais de saída dados os atributos
básicos. Comparamos experimentalmente o método proposto com dois
métodos alternativos para geração de atributos: geração manual e
métodos de kernel polinomial. Nossos resultados mostram que o
método de geração de atributos guiado por entropia é superior aos
dois métodos alternativos em diferentes aspectos. Nosso método é
muito mais barato do que o método manual e computacionalmente mais
rápido que o método baseado em kernel. Adicionalmente, ele permite
o controle do seu poder de generalização mais facilmente do que
métodos de kernel. Nós avaliamos nosso método em nove datasets
envolvendo cinco tarefas de linguística computacional e quatro
idiomas. Os sistemas desenvolvidos apresentam resultados
comparáveis aos melhores sistemas atualmente e, particularmente
para etiquetagem morfossintática, identificação de sintagmas,
extração de citações e resolução de coreferência, obtêm os melhores
resultados conhecidos para diferentes idiomas como Árabe, Chinês,
Inglês e Português. Adicionalmente, nosso sistema de resolução de
coreferência obteve o primeiro lugar na competição Conference on
Computational Natural Language Learning 2012 Shared Task. O sistema
vencedor foi determinado pela média de desempenho em três idiomas:
Árabe, Chinês e Inglês. Nosso sistema obteve o melhor desempenho
nos três idiomas avaliados. Nosso método de geração de atributos
estende naturalmente o framework de aprendizado de estruturas e não
está restrito a tarefas de processamento de linguagem
natural.
[en] Structure learning consists in learning a mapping
from inputs to structured outputs by means of a sample of correct
input-output pairs. Many important problems fit into this setting.
Natural language processing provides several tasks that can be
formulated and solved as structure learning problems. Dependency
parsing, for instance, involves the prediction of a tree underlying
a sentence. Feature generation is an important subtask of structure
learning which, usually, is partially solved by a…
Advisors/Committee Members: RUY LUIZ MILIDIU.
Subjects/Keywords: [pt] ENTROPIA; [en] ENTROPY; [pt] APRENDIZADO DE ESTRUTURAS; [pt] GERACAO DE ATRIBUTOS; [pt] PROCESSAMENTO DE LINGUAGEM NATURAL; [en] NATURAL LANGUAGE PROCESSING
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
FERNANDES, E. L. R. (2014). [en] ENTROPY GUIDED FEATURE GENERATION FOR STRUCTURE
LEARNING. (Thesis). Pontifical Catholic University of Rio de Janeiro. Retrieved from http://www.maxwell.vrac.puc-rio.br/Busca_etds.php?strSecao=resultado&nrSeq=23812
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Chicago Manual of Style (16th Edition):
FERNANDES, ERALDO LUIS REZENDE. “[en] ENTROPY GUIDED FEATURE GENERATION FOR STRUCTURE
LEARNING.” 2014. Thesis, Pontifical Catholic University of Rio de Janeiro. Accessed January 21, 2021.
http://www.maxwell.vrac.puc-rio.br/Busca_etds.php?strSecao=resultado&nrSeq=23812.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
MLA Handbook (7th Edition):
FERNANDES, ERALDO LUIS REZENDE. “[en] ENTROPY GUIDED FEATURE GENERATION FOR STRUCTURE
LEARNING.” 2014. Web. 21 Jan 2021.
Vancouver:
FERNANDES ELR. [en] ENTROPY GUIDED FEATURE GENERATION FOR STRUCTURE
LEARNING. [Internet] [Thesis]. Pontifical Catholic University of Rio de Janeiro; 2014. [cited 2021 Jan 21].
Available from: http://www.maxwell.vrac.puc-rio.br/Busca_etds.php?strSecao=resultado&nrSeq=23812.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Council of Science Editors:
FERNANDES ELR. [en] ENTROPY GUIDED FEATURE GENERATION FOR STRUCTURE
LEARNING. [Thesis]. Pontifical Catholic University of Rio de Janeiro; 2014. Available from: http://www.maxwell.vrac.puc-rio.br/Busca_etds.php?strSecao=resultado&nrSeq=23812
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
7.
Konstas, Ioannis.
Joint models for concept-to-text generation.
Degree: PhD, 2014, University of Edinburgh
URL: http://hdl.handle.net/1842/8926
► Much of the data found on the world wide web is in numeric, tabular, or other nontextual format (e.g., weather forecast tables, stock market charts,…
(more)
▼ Much of the data found on the world wide web is in numeric, tabular, or other nontextual format (e.g., weather forecast tables, stock market charts, live sensor feeds), and thus inaccessible to non-experts or laypersons. However, most conventional search engines and natural language processing tools (e.g., summarisers) can only handle textual input. As a result, data in non-textual form remains largely inaccessible. Concept-to-text generation refers to the task of automatically producing textual output from non-linguistic input, and holds promise for rendering non-linguistic data widely accessible. Several successful generation systems have been produced in the past twenty years. They mostly rely on human-crafted rules or expert-driven grammars, implement a pipeline architecture, and usually operate in a single domain. In this thesis, we present several novel statistical models that take as input a set of database records and generate a description of them in natural language text. Our unique idea is to combine the processes of structuring a document (document planning), deciding what to say (content selection) and choosing the specific words and syntactic constructs specifying how to say it (lexicalisation and surface realisation), in a uniform joint manner. Rather than breaking up the generation process into a sequence of local decisions, we define a probabilistic context-free grammar that globally describes the inherent structure of the input (a corpus of database records and text describing some of them). This joint representation allows individual processes (i.e., document planning, content selection, and surface realisation) to communicate and influence each other naturally. We recast generation as the task of finding the best derivation tree for a set of input database records and our grammar, and describe several algorithms for decoding in this framework that allows to intersect the grammar with additional information capturing fluency and syntactic well-formedness constraints. We implement our generators using the hypergraph framework. Contrary to traditional systems, we learn all the necessary document, structural and linguistic knowledge from unannotated data. Additionally, we explore a discriminative reranking approach on the hypergraph representation of our model, by including more refined content selection features. Central to our approach is the idea of porting our models to various domains; we experimented on four widely different domains, namely sportscasting, weather forecast generation, booking flights, and troubleshooting guides. The performance of our systems is competitive and often superior compared to state-of-the-art systems that use domain specific constraints, explicit feature engineering or labelled data.
Subjects/Keywords: 006.3; natural language generation; natural language processing
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Konstas, I. (2014). Joint models for concept-to-text generation. (Doctoral Dissertation). University of Edinburgh. Retrieved from http://hdl.handle.net/1842/8926
Chicago Manual of Style (16th Edition):
Konstas, Ioannis. “Joint models for concept-to-text generation.” 2014. Doctoral Dissertation, University of Edinburgh. Accessed January 21, 2021.
http://hdl.handle.net/1842/8926.
MLA Handbook (7th Edition):
Konstas, Ioannis. “Joint models for concept-to-text generation.” 2014. Web. 21 Jan 2021.
Vancouver:
Konstas I. Joint models for concept-to-text generation. [Internet] [Doctoral dissertation]. University of Edinburgh; 2014. [cited 2021 Jan 21].
Available from: http://hdl.handle.net/1842/8926.
Council of Science Editors:
Konstas I. Joint models for concept-to-text generation. [Doctoral Dissertation]. University of Edinburgh; 2014. Available from: http://hdl.handle.net/1842/8926

Delft University of Technology
8.
Wang, Zina (author).
Unsupervised and Supervised Learning of ComplexRelation Instances Extraction in Natural Language.
Degree: 2020, Delft University of Technology
URL: http://resolver.tudelft.nl/uuid:2a448e6d-e1e8-4b01-a97e-dcf2e9074560
► Relation extraction has been considered as one of the most popular topics nowadays, thanks for its common application in knowledge graph, machine reading and other…
(more)
▼ Relation extraction has been considered as one of the most popular topics nowadays, thanks for its common application in knowledge graph, machine reading and other artificial intelligence sub-field. However, this field has long been suffered from data hunger. Annotating large high-quality datasets for relation extraction is troublesome and time-consuming. This thesis project will main focus on efficient way of annotating text datasets for extracting complex relations between entities. Moreover, we put some efforts on compare the influence of different components in the pipeline. The main contributions of this project are the comparisons and analysis regarding the influences of components, which are in place for the majority of relation extraction models, and the clear literature review together with the summary of available datasets in the relation extraction flow.
Advisors/Committee Members: Verwer, S.E. (mentor), Delft University of Technology (degree granting institution).
Subjects/Keywords: Natural Language Understanding; Natural Language Processing
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Wang, Z. (. (2020). Unsupervised and Supervised Learning of ComplexRelation Instances Extraction in Natural Language. (Masters Thesis). Delft University of Technology. Retrieved from http://resolver.tudelft.nl/uuid:2a448e6d-e1e8-4b01-a97e-dcf2e9074560
Chicago Manual of Style (16th Edition):
Wang, Zina (author). “Unsupervised and Supervised Learning of ComplexRelation Instances Extraction in Natural Language.” 2020. Masters Thesis, Delft University of Technology. Accessed January 21, 2021.
http://resolver.tudelft.nl/uuid:2a448e6d-e1e8-4b01-a97e-dcf2e9074560.
MLA Handbook (7th Edition):
Wang, Zina (author). “Unsupervised and Supervised Learning of ComplexRelation Instances Extraction in Natural Language.” 2020. Web. 21 Jan 2021.
Vancouver:
Wang Z(. Unsupervised and Supervised Learning of ComplexRelation Instances Extraction in Natural Language. [Internet] [Masters thesis]. Delft University of Technology; 2020. [cited 2021 Jan 21].
Available from: http://resolver.tudelft.nl/uuid:2a448e6d-e1e8-4b01-a97e-dcf2e9074560.
Council of Science Editors:
Wang Z(. Unsupervised and Supervised Learning of ComplexRelation Instances Extraction in Natural Language. [Masters Thesis]. Delft University of Technology; 2020. Available from: http://resolver.tudelft.nl/uuid:2a448e6d-e1e8-4b01-a97e-dcf2e9074560

Ryerson University
9.
Krezolek, Michal Andrzej.
Natural Language Learning.
Degree: 2010, Ryerson University
URL: https://digital.library.ryerson.ca/islandora/object/RULA%3A635
► This thesis is a small step towards automated learning of natural languages. With the use of a parser that incorporates machine-learning algorithms, our algorithm is…
(more)
▼ This thesis is a small step towards automated learning of
natural languages. With the use of a parser that incorporates machine-learning algorithms, our algorithm is able to learn mean-ings of words representing relations in simple sentences, that describe relative positions of two points on a 2D plane. Our SentenceLearner program can create simple sentences describing rela-tions between two points on another 2D plane using data, collected by a statistical parser from sentences given for training, based on n-grams of five words.In this thesis I show that association of simple relations expressed in training sentences with the positional relations of a corresponding pair of points on a 2D plane is possible without the use of any machine-learning algorithm in some circumstances.
Advisors/Committee Members: Ryerson University (Degree grantor).
Subjects/Keywords: Natural language processing; Machine learning
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Krezolek, M. A. (2010). Natural Language Learning. (Thesis). Ryerson University. Retrieved from https://digital.library.ryerson.ca/islandora/object/RULA%3A635
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Chicago Manual of Style (16th Edition):
Krezolek, Michal Andrzej. “Natural Language Learning.” 2010. Thesis, Ryerson University. Accessed January 21, 2021.
https://digital.library.ryerson.ca/islandora/object/RULA%3A635.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
MLA Handbook (7th Edition):
Krezolek, Michal Andrzej. “Natural Language Learning.” 2010. Web. 21 Jan 2021.
Vancouver:
Krezolek MA. Natural Language Learning. [Internet] [Thesis]. Ryerson University; 2010. [cited 2021 Jan 21].
Available from: https://digital.library.ryerson.ca/islandora/object/RULA%3A635.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Council of Science Editors:
Krezolek MA. Natural Language Learning. [Thesis]. Ryerson University; 2010. Available from: https://digital.library.ryerson.ca/islandora/object/RULA%3A635
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
10.
Irvine, Ann.
Using Comparable Corpora to Augment Statistical Machine Translation Models in Low Resource Settings.
Degree: 2014, Johns Hopkins University
URL: http://jhir.library.jhu.edu/handle/1774.2/38018
► Previously, statistical machine translation (SMT) models have been estimated from parallel corpora, or pairs of translated sentences. In this thesis, we directly incorporate comparable corpora…
(more)
▼ Previously, statistical machine translation (SMT) models have been estimated from parallel corpora, or pairs of translated sentences. In this thesis, we directly incorporate comparable corpora into the estimation of end-to-end SMT models. In contrast to parallel corpora, comparable corpora are pairs of monolingual corpora that have some cross-lingual similarities, for example topic or publication date, but that do not necessarily contain any direct translations. Comparable corpora are more readily available in large quantities than parallel corpora, which require significant human effort to compile. We use comparable corpora to estimate machine translation model parameters and show that doing so improves performance in settings where a limited amount of parallel data is available for training. The major contributions of this thesis are the following: * We release ‘
language packs’ for 151 human languages, which include bilingual dictionaries, comparable corpora of
Wikipedia document pairs, comparable corpora of time-stamped news text that we harvested from the web, and, for non-roman script languages, dictionaries of name pairs, which are likely to be transliterations. * We present a novel technique for using a small number of example word translations to learn a supervised model for bilingual lexicon induction which takes advantage of a wide variety of signals of translation equivalence that can be estimated over comparable corpora. * We show that using comparable corpora to induce new translations and estimate new phrase table feature functions improves end-to-end statistical machine translation performance for low resource
language pairs as well as domains. * We present a novel algorithm for composing multiword phrase translations from multiple unigram translations and then use comparable corpora to prune the large space of hypothesis translations. We show that these induced phrase translations improve machine translation performance beyond
that of component unigrams. This thesis focuses on critical low resource machine translation settings, where insufficient parallel corpora exist for training statistical models. We experiment with both low resource
language pairs and low resource domains of text. We present results from our novel error analysis methodology, which show that most translation errors in low resource settings are due to unseen source
language words and phrases and unseen target
language translations. We also find room for fixing errors due to how different translations are weighted, or scored, in the models. We target both error types; we use comparable corpora to induce new word and phrase translations and estimate novel translation feature scores. Our experiments show that augmenting baseline SMT systems with new translations and features estimated over comparable corpora improves translation performance significantly. Additionally, our techniques expand the applicability of statistical machine
translation to those
language pairs for which zero parallel text is available.
Advisors/Committee Members: Callison-Burch, Chris (advisor).
Subjects/Keywords: machine translation;
natural language processing
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Irvine, A. (2014). Using Comparable Corpora to Augment Statistical Machine Translation Models in Low Resource Settings. (Thesis). Johns Hopkins University. Retrieved from http://jhir.library.jhu.edu/handle/1774.2/38018
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Chicago Manual of Style (16th Edition):
Irvine, Ann. “Using Comparable Corpora to Augment Statistical Machine Translation Models in Low Resource Settings.” 2014. Thesis, Johns Hopkins University. Accessed January 21, 2021.
http://jhir.library.jhu.edu/handle/1774.2/38018.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
MLA Handbook (7th Edition):
Irvine, Ann. “Using Comparable Corpora to Augment Statistical Machine Translation Models in Low Resource Settings.” 2014. Web. 21 Jan 2021.
Vancouver:
Irvine A. Using Comparable Corpora to Augment Statistical Machine Translation Models in Low Resource Settings. [Internet] [Thesis]. Johns Hopkins University; 2014. [cited 2021 Jan 21].
Available from: http://jhir.library.jhu.edu/handle/1774.2/38018.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Council of Science Editors:
Irvine A. Using Comparable Corpora to Augment Statistical Machine Translation Models in Low Resource Settings. [Thesis]. Johns Hopkins University; 2014. Available from: http://jhir.library.jhu.edu/handle/1774.2/38018
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation

University of Georgia
11.
Tarkhadkar, Sagar Santosh.
Mutaimpact miner.
Degree: 2014, University of Georgia
URL: http://hdl.handle.net/10724/30031
► Manual curation of knowledge from biomedical literature is both expensive and time consuming. Scientific publications in biomedicine have an enormous amount of valuable information on…
(more)
▼ Manual curation of knowledge from biomedical literature is both expensive and time consuming. Scientific publications in biomedicine have an enormous amount of valuable information on gene mutations and their impacts, which is significant in
addressing multiple research problems. In this thesis, we have developed a text mining system for extracting and curating mutation impacts from full text scientific documents. The objective of this system is to populate biomedical knowledge-bases with
accurate knowledge regarding mutation impacts, in a semi-automated way. We have used a number of Natural Language Processing tasks in developing this system. Furthermore, a curation module allows the scientists to decide if the mutation impact
information is suitable to be included to the knowledge base, hence eliminating the possibility of adding incorrect data. Our prototype system has been used in the Protein Kinase domain, but can be adapted to work in other domains, in the
future.
Subjects/Keywords: text mining; Natural Language Processing
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Tarkhadkar, S. S. (2014). Mutaimpact miner. (Thesis). University of Georgia. Retrieved from http://hdl.handle.net/10724/30031
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Chicago Manual of Style (16th Edition):
Tarkhadkar, Sagar Santosh. “Mutaimpact miner.” 2014. Thesis, University of Georgia. Accessed January 21, 2021.
http://hdl.handle.net/10724/30031.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
MLA Handbook (7th Edition):
Tarkhadkar, Sagar Santosh. “Mutaimpact miner.” 2014. Web. 21 Jan 2021.
Vancouver:
Tarkhadkar SS. Mutaimpact miner. [Internet] [Thesis]. University of Georgia; 2014. [cited 2021 Jan 21].
Available from: http://hdl.handle.net/10724/30031.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Council of Science Editors:
Tarkhadkar SS. Mutaimpact miner. [Thesis]. University of Georgia; 2014. Available from: http://hdl.handle.net/10724/30031
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation

Vanderbilt University
12.
Davis, Mary Feller.
Determining the Use of Electronic Medical Records in Genetic Studies of Multiple Sclerosis.
Degree: PhD, Human Genetics, 2013, Vanderbilt University
URL: http://hdl.handle.net/1803/14936
► The clinical course of multiple sclerosis (MS) is highly variable, and research data collection is costly and time-consuming. Much is known about the genetic risk…
(more)
▼ The clinical course of multiple sclerosis (MS) is highly variable, and research data collection is costly and time-consuming. Much is known about the genetic risk of acquiring MS, but little is understood about the effect of genetics on the clinical course. This work uses
natural language processing techniques applied to electronic medical records (EMR) to identify MS patients and key clinical traits of disease course. 5,789 individuals with MS were identified by algorithm. Algorithms were also developed with high precision and specificity to extract detailed features of the clinical course of MS, including clinical subtype, presence of oligoclonal bands, year of diagnosis, year and origin of first symptom, Expanded Disability Status Scale scores, timed 25-foot walk scores, and MS medications. DNA was available for 1,221 individuals through BioVU. These samples and 2,587 control samples were genotyped on the ImmunoChip. After extensive sample and SNP quality control, replication of known MS risk loci confirmed that the genetic architecture of this EMR-derived population is similar to that of other published MS datasets. Genetic analyses of seven clinical traits were performed using the data extracted from the medical records: age at diagnosis, age and CNS origin of first neurological symptom, presence of oligoclonal bands, Multiple Sclerosis Severity Score, timed 25-foot walk, and time to secondary progressive MS. No outstanding results were observed, but many interesting results require further investigation. This work shows the potential of using EMR-derived data in research studies of disease course.
Advisors/Committee Members: Jonathan L. Haines (committee member), Subramaniam Sriram (committee member), Joshua C. Denny (committee member), Thomas M. Aune (committee member), William S. Bush (Committee Chair).
Subjects/Keywords: genetic association; natural language processing
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Davis, M. F. (2013). Determining the Use of Electronic Medical Records in Genetic Studies of Multiple Sclerosis. (Doctoral Dissertation). Vanderbilt University. Retrieved from http://hdl.handle.net/1803/14936
Chicago Manual of Style (16th Edition):
Davis, Mary Feller. “Determining the Use of Electronic Medical Records in Genetic Studies of Multiple Sclerosis.” 2013. Doctoral Dissertation, Vanderbilt University. Accessed January 21, 2021.
http://hdl.handle.net/1803/14936.
MLA Handbook (7th Edition):
Davis, Mary Feller. “Determining the Use of Electronic Medical Records in Genetic Studies of Multiple Sclerosis.” 2013. Web. 21 Jan 2021.
Vancouver:
Davis MF. Determining the Use of Electronic Medical Records in Genetic Studies of Multiple Sclerosis. [Internet] [Doctoral dissertation]. Vanderbilt University; 2013. [cited 2021 Jan 21].
Available from: http://hdl.handle.net/1803/14936.
Council of Science Editors:
Davis MF. Determining the Use of Electronic Medical Records in Genetic Studies of Multiple Sclerosis. [Doctoral Dissertation]. Vanderbilt University; 2013. Available from: http://hdl.handle.net/1803/14936
13.
Choe, Do Kook.
Toward Solving Penn Treebank Parsing.
Degree: Department of Computer Science, 2017, Brown University
URL: https://repository.library.brown.edu/studio/item/bdr:733298/
► A natural language parser recovers the latent grammatical structures of sentences. In many natural language processing (NLP) applications, parsing is applied to sentences first and…
(more)
▼ A
natural language parser recovers the latent
grammatical structures of sentences. In many
natural language
processing (NLP) applications, parsing is applied to sentences
first and the parses along with their sentences are fed to
following NLP systems. For example, Google parses the entire web
and applies a series of NLP programs to index the web and the
quality of search results depends on the quality of parses. Parsing
is difficult because sentences are ambiguous: a sentence has
different syntactic structures depending on its meaning. For
example, a sentence "Eugene wears a bow tie with polka dots" can
have very different meanings depending on what "with polka dots"
modifies. It is
natural for us humans to infer that "with polka
dots" modifies "a bow tie" because we have common sense that "with
polka dots" rarely (if not never) describes an action "wears."
Computers, however, lack common sense and learn such a relationship
from large amounts of texts by just looking for statistical
patterns. We explore four ways of improving parsing in this thesis:
creating a training data of high quality parses using paraphrases;
a model combination technique applied to n-best parsing; a
generative reranker based on a
language model; a discriminative
parser inspired by neural machine translation. Our parse-reranker
achieves human-level performance on the standard Penn Treebank
dataset.
Advisors/Committee Members: Charniak, Eugene (Advisor), Littman, Michael (Reader), Sudderth, Erik (Reader), Tellex, Stefanie (Reader).
Subjects/Keywords: Natural language processing (Computer science)
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Choe, D. K. (2017). Toward Solving Penn Treebank Parsing. (Thesis). Brown University. Retrieved from https://repository.library.brown.edu/studio/item/bdr:733298/
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Chicago Manual of Style (16th Edition):
Choe, Do Kook. “Toward Solving Penn Treebank Parsing.” 2017. Thesis, Brown University. Accessed January 21, 2021.
https://repository.library.brown.edu/studio/item/bdr:733298/.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
MLA Handbook (7th Edition):
Choe, Do Kook. “Toward Solving Penn Treebank Parsing.” 2017. Web. 21 Jan 2021.
Vancouver:
Choe DK. Toward Solving Penn Treebank Parsing. [Internet] [Thesis]. Brown University; 2017. [cited 2021 Jan 21].
Available from: https://repository.library.brown.edu/studio/item/bdr:733298/.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Council of Science Editors:
Choe DK. Toward Solving Penn Treebank Parsing. [Thesis]. Brown University; 2017. Available from: https://repository.library.brown.edu/studio/item/bdr:733298/
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation

University of Arizona
14.
Luri Rodriguez, Ignacio.
Listening to the Market: Text Analysis Approaches to Consumer Research
.
Degree: 2020, University of Arizona
URL: http://hdl.handle.net/10150/641701
► Language is central to human interaction, thinking, and sense-making. Witty marketing communicators and loquacious consumer research scholars can bend and shape language to great effect…
(more)
▼ Language is central to human interaction, thinking, and sense-making. Witty marketing communicators and loquacious consumer research scholars can bend and shape
language to great effect and be admired for it. Marketing has traditionally been a diverse discipline, borrowing from many parents and employing a variety of methods or approaches to understanding consumers and the market. This dissertation presents three essays exploring consumer behavior from a perspective influenced by theory and methods from
language- and discourse-centric disciplines.
The first essay employs ethnographic methods informed by discourse analysis of marketing communications and in-person service encounters to better understand services. We integrate the disconnected research streams of role theory in service encounters and cocreation. Our findings challenge common definitions of a service script, defining instead as the product of imagined service encounters that serve as a template for cocreation in consumers’ minds.
The second essay examines U.S. news media on the topic of debt in order to reveal how public discourse frames debt to distribute responsibility and guide action.
The data analysis begins at the qualitative level of discourse analysis and hermeneutics, followed by a corpus research approach, complemented with a neural network-based word embedding technique used in
Natural Language Processing (NLP). Our findings reveal two dominant metaphors in public debt conversations: debt as weight, and debt as captivity. These metaphors frame the discourse, creating narratives with contrasting assignments of responsibility in the market and proposed marketing actions.
The third essay utilizes the same U.S. news articles database of the second paper to answer different research questions. We join a growing and incredibly impactful new stream of consumer research studies harnessing the power of big textual data for marketing insight. We make a methodological contribution by developing and training a topic-detection Bi-directional long short term memory (Bi-LSTM) neural network to classify a large, unstructured corpus. We sequentially ran a dynamic Latent Dietrich Allocation (LDA) to identify the narratives predominant to each type of debt and how they change over the ten-year period (2010-2019).
Advisors/Committee Members: Schau, Hope J (advisor), Ghosh, Bikram (committeemember), Taillard, Marie (committeemember), Sias, Rick (committeemember).
Subjects/Keywords: discourse;
marketing;
natural language processing
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Luri Rodriguez, I. (2020). Listening to the Market: Text Analysis Approaches to Consumer Research
. (Doctoral Dissertation). University of Arizona. Retrieved from http://hdl.handle.net/10150/641701
Chicago Manual of Style (16th Edition):
Luri Rodriguez, Ignacio. “Listening to the Market: Text Analysis Approaches to Consumer Research
.” 2020. Doctoral Dissertation, University of Arizona. Accessed January 21, 2021.
http://hdl.handle.net/10150/641701.
MLA Handbook (7th Edition):
Luri Rodriguez, Ignacio. “Listening to the Market: Text Analysis Approaches to Consumer Research
.” 2020. Web. 21 Jan 2021.
Vancouver:
Luri Rodriguez I. Listening to the Market: Text Analysis Approaches to Consumer Research
. [Internet] [Doctoral dissertation]. University of Arizona; 2020. [cited 2021 Jan 21].
Available from: http://hdl.handle.net/10150/641701.
Council of Science Editors:
Luri Rodriguez I. Listening to the Market: Text Analysis Approaches to Consumer Research
. [Doctoral Dissertation]. University of Arizona; 2020. Available from: http://hdl.handle.net/10150/641701

University of Guelph
15.
Stantic, Daniel.
A Unified Probabilistic Model for Aspect-Level Sentiment Analysis.
Degree: MS, School of Computer Science, 2016, University of Guelph
URL: https://atrium.lib.uoguelph.ca/xmlui/handle/10214/9616
► In this thesis, we develop a new probabilistic model for aspect-level sentiment analysis based on POSLDA, a topic classifier that incorporates syntax modelling for better…
(more)
▼ In this thesis, we develop a new probabilistic model for aspect-level sentiment analysis based on POSLDA, a topic classifier that incorporates syntax modelling for better performance. POSLDA separates semantic words from purely functional words and restricts its topic modelling on the semantic words. We take this a step further by modelling the probability of a semantic word expressing sentiment based on its part-of-speech class and then modelling its sentiment if it is a sentiment word. We restructure the popular approach of topic-sentiment distributions within documents and add a few novel heuristic improvements. Our experiments demonstrate that our model produces results competitive to the state of the art systems. In addition to the model, we develop a multi-threaded version of the popular Gibbs sampling algorithm that can perform inference over 1000 times faster than the traditional implementation while preserving the quality of the results.
Advisors/Committee Members: Song, Fei (advisor).
Subjects/Keywords: natural language processing; sentiment analysis
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Stantic, D. (2016). A Unified Probabilistic Model for Aspect-Level Sentiment Analysis. (Masters Thesis). University of Guelph. Retrieved from https://atrium.lib.uoguelph.ca/xmlui/handle/10214/9616
Chicago Manual of Style (16th Edition):
Stantic, Daniel. “A Unified Probabilistic Model for Aspect-Level Sentiment Analysis.” 2016. Masters Thesis, University of Guelph. Accessed January 21, 2021.
https://atrium.lib.uoguelph.ca/xmlui/handle/10214/9616.
MLA Handbook (7th Edition):
Stantic, Daniel. “A Unified Probabilistic Model for Aspect-Level Sentiment Analysis.” 2016. Web. 21 Jan 2021.
Vancouver:
Stantic D. A Unified Probabilistic Model for Aspect-Level Sentiment Analysis. [Internet] [Masters thesis]. University of Guelph; 2016. [cited 2021 Jan 21].
Available from: https://atrium.lib.uoguelph.ca/xmlui/handle/10214/9616.
Council of Science Editors:
Stantic D. A Unified Probabilistic Model for Aspect-Level Sentiment Analysis. [Masters Thesis]. University of Guelph; 2016. Available from: https://atrium.lib.uoguelph.ca/xmlui/handle/10214/9616

University of Houston
16.
Kulkarni, Akshay Bhavani Kumar 1994-.
Early Detection of Depression.
Degree: MS, Computer Science, 2018, University of Houston
URL: http://hdl.handle.net/10657/3089
► Depression is a mental disorder that affects more than 300 million people worldwide. An individual suffering from depression functions poorly in life, is prone to…
(more)
▼ Depression is a mental disorder that affects more than 300 million people worldwide. An individual suffering from depression functions poorly in life, is prone to other diseases and in the worst-case, depression leads to suicide. There are many impediments that prevent expert care from reaching people suffering from depression in time. Impediments such as social stigma associated with mental disorders, lack of trained health-care professionals and ignorance of the signs of depression owing to a lack of awareness of the disease. Moreover, the World Health Organization (WHO) claims that individuals who are depressed are often not correctly diagnosed and others who are misdiagnosed are prescribed antidepressants. Thus, there is a strong need to automatically assess the risk of depression.
Identification of depression from social media has been framed as a classification problem in the field of
Natural Language Processing (NLP). In this work we study NLP approaches that can successfully extract information from textual data to enhance identification of depression. These NLP approaches perform feature extraction to build document representations. The issues of detecting depression in a social media environment is data scarcity for users with depression and the inherent noise associated with social media data. We attempt to address those issues by using representations that can naturally cope with a social media environment. Specifically, we propose the usage of Distributed Term Representations (DTRs) to capture information that can be used by supervised machine learning methods for learning and classifying users suffering from depression. Experimental evaluation provides evidence that DTRs are more effective for depression detection than traditional representations such as Bag of Words (BOW) and representations based on neural word embeddings. In fact, we have obtained state-of-the-art results with Document Occurrence Representation (DOR) for depression detection (F1-Score 0.66 on the depressed class). For early detection of depression, we have obtained the lowest reported Early Risk Detection Error (ERDE) using Pyramidal a newly adapted method that is used for computing document representations.
Advisors/Committee Members: Solorio, Thamar (advisor), Gonzalez, Fabio A. (committee member), Eick, Christoph F. (committee member).
Subjects/Keywords: Natural Language Processing; Health care
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Kulkarni, A. B. K. 1. (2018). Early Detection of Depression. (Masters Thesis). University of Houston. Retrieved from http://hdl.handle.net/10657/3089
Chicago Manual of Style (16th Edition):
Kulkarni, Akshay Bhavani Kumar 1994-. “Early Detection of Depression.” 2018. Masters Thesis, University of Houston. Accessed January 21, 2021.
http://hdl.handle.net/10657/3089.
MLA Handbook (7th Edition):
Kulkarni, Akshay Bhavani Kumar 1994-. “Early Detection of Depression.” 2018. Web. 21 Jan 2021.
Vancouver:
Kulkarni ABK1. Early Detection of Depression. [Internet] [Masters thesis]. University of Houston; 2018. [cited 2021 Jan 21].
Available from: http://hdl.handle.net/10657/3089.
Council of Science Editors:
Kulkarni ABK1. Early Detection of Depression. [Masters Thesis]. University of Houston; 2018. Available from: http://hdl.handle.net/10657/3089

University of Victoria
17.
Sedghi, Elham.
A novel stroke prediction model based on clinical natural language processing (NLP) and data mining methods.
Degree: Department of Computer Science, 2017, University of Victoria
URL: http://hdl.handle.net/1828/7879
► Early detection and treatment of stroke can save lives. Before any procedure is planned, the patient is traditionally subjected to a brain scan such as…
(more)
▼ Early detection and treatment of stroke can save lives. Before any procedure is
planned, the patient is traditionally subjected to a brain scan such as Magnetic Resonance Imaging (MRI) in order to make sure he/she receives a safe treatment. Before any imaging is performed, the patient is checked into Emergency Room (ER) and clinicians from the Stroke Rapid Assessment Unit (SRAU) perform an evaluation of the patient's signs and symptoms. The question we address in this thesis is: Can Data Mining (DM) algorithms be employed to reliably predict the occurrence of stroke in a patient based on the signs and symptoms gathered by the clinicians and other staff in the ER or the SRAU? A reliable DM algorithm would be very useful in helping the clinicians make a better decision whether to escalate the case or classify it as a non-life threatening mimic and not put the patient through unnecessary imaging and tests. Such an algorithm would not only make the life of patients and clinicians easier but would also enable the hospitals to cut down on their costs. Most of the signs and symptoms gathered by clinicians in the ER or the SRAU are stored in free-text format in hospital information systems. Using techniques from
Natural Language Processing (NLP), the vocabularies of interest can be extracted and classiffied. A big challenge in this process is that medical narratives are full of misspelled words and clinical abbreviations. It is a well known fact that the quality of data mining results crucially depends on the quality of input data. In this thesis, as a rst contribution, we describe a procedure to preprocess the raw data and transform it into clean, well-structured data that can be effectively used by DM learning algorithms. Another contribution of this thesis is producing a set of carefully crafted rules to perform detection of negated meaning in free-text sentences. Using these rules, we were able to get the correct semantics of sentences and provide much more useful datasets to DM learning algorithms. This thesis consists of three main parts. In the first part, we focus on building classi ers to reliably distinguish stroke and Transient Ischemic Attack (TIA) from mimic cases. For this, we used text extracted from the "chief complaint" and "history of patient illness" fields available in the patients' les at the Victoria General Hospital (VGH). In collaboration with stroke specialists, we identified a well-de ned set of stroke-related keywords. Next, we created practical tools to accurately assign keywords from this set to each patient. Then, we performed extensive experiments for nding the right learning algorithm to build the best classifier that provides a good balance between sensitivity, specificity, and a host of other quality indicators. In the second part, we focus on the most important mimic case, migraine, and how to e ectively distinguish it from stroke or TIA. This is a challenging problem because migraine has many signs and symptoms that are similar to those of stroke or TIA. Another challenge we address is…
Advisors/Committee Members: Thomo, Alex (supervisor), Weber, Jens H. (supervisor).
Subjects/Keywords: Data Mining; Natural Language Processing
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Sedghi, E. (2017). A novel stroke prediction model based on clinical natural language processing (NLP) and data mining methods. (Thesis). University of Victoria. Retrieved from http://hdl.handle.net/1828/7879
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Chicago Manual of Style (16th Edition):
Sedghi, Elham. “A novel stroke prediction model based on clinical natural language processing (NLP) and data mining methods.” 2017. Thesis, University of Victoria. Accessed January 21, 2021.
http://hdl.handle.net/1828/7879.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
MLA Handbook (7th Edition):
Sedghi, Elham. “A novel stroke prediction model based on clinical natural language processing (NLP) and data mining methods.” 2017. Web. 21 Jan 2021.
Vancouver:
Sedghi E. A novel stroke prediction model based on clinical natural language processing (NLP) and data mining methods. [Internet] [Thesis]. University of Victoria; 2017. [cited 2021 Jan 21].
Available from: http://hdl.handle.net/1828/7879.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Council of Science Editors:
Sedghi E. A novel stroke prediction model based on clinical natural language processing (NLP) and data mining methods. [Thesis]. University of Victoria; 2017. Available from: http://hdl.handle.net/1828/7879
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation

University of Minnesota
18.
Exley, Andrew.
Improvements to a Speech Repair Parser.
Degree: PhD, Computer Science, 2016, University of Minnesota
URL: http://hdl.handle.net/11299/181724
► Parsing is a common task for speech-recognition systems, but many parsers ignore the possibility of speech errors and repairs, which are very common in conversational…
(more)
▼ Parsing is a common task for speech-recognition systems, but many parsers ignore the possibility of speech errors and repairs, which are very common in conversational language. The goal of this thesis is to examine a parsing system that can handle these occurrences and improve its performance by incorporating systems that use linguistic knowledge about speech errors and repairs. The basis for this thesis is a system for incremental parsing. The thesis shows additions that can be made to that system to allow for detection of speech errors and repairs. That is shown to be an improvement on previous incremental systems. An extension to the system is introduced which incorporates ideas about human short term memory and its relationship to speech errors. The system is then tested with many different configurations. Finally, the thesis concludes with a summary and discussion of the various results and lays out possible avenues for future work.
Subjects/Keywords: Natural Language Processing; Parsing
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Exley, A. (2016). Improvements to a Speech Repair Parser. (Doctoral Dissertation). University of Minnesota. Retrieved from http://hdl.handle.net/11299/181724
Chicago Manual of Style (16th Edition):
Exley, Andrew. “Improvements to a Speech Repair Parser.” 2016. Doctoral Dissertation, University of Minnesota. Accessed January 21, 2021.
http://hdl.handle.net/11299/181724.
MLA Handbook (7th Edition):
Exley, Andrew. “Improvements to a Speech Repair Parser.” 2016. Web. 21 Jan 2021.
Vancouver:
Exley A. Improvements to a Speech Repair Parser. [Internet] [Doctoral dissertation]. University of Minnesota; 2016. [cited 2021 Jan 21].
Available from: http://hdl.handle.net/11299/181724.
Council of Science Editors:
Exley A. Improvements to a Speech Repair Parser. [Doctoral Dissertation]. University of Minnesota; 2016. Available from: http://hdl.handle.net/11299/181724

Virginia Tech
19.
Patil, Supritha Basavaraj.
Analysis of Moving Events Using Tweets.
Degree: MS, Computer Science and Applications, 2019, Virginia Tech
URL: http://hdl.handle.net/10919/90884
► News now travels faster on social media than through news channels. Information from social media can help retrieve minute details that might not be emphasized…
(more)
▼ News now travels faster on social media than through news channels. Information from social media can help retrieve minute details that might not be emphasized in news. People tend to describe their actions or sentiments in tweets. I aim at studying if such collections of tweets are dependable sources for identifying paths of moving events. In events like hurricanes, using Twitter can help in analyzing people’s reaction to such moving events. These may include actions such as dislocation or emotions during different phases of the event. The results obtained in the experiments concur with the actual path of the events with respect to the regions affected and time. The frequency of tweets increases during event peaks. The number of locations affected that are identified are significantly more than in news wires.
Advisors/Committee Members: Fox, Edward A. (committeechair), Lee, Sunshin (committee member), Prakash, Bodicherla Aditya (committee member).
Subjects/Keywords: Natural Language Processing; Twitter
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Patil, S. B. (2019). Analysis of Moving Events Using Tweets. (Masters Thesis). Virginia Tech. Retrieved from http://hdl.handle.net/10919/90884
Chicago Manual of Style (16th Edition):
Patil, Supritha Basavaraj. “Analysis of Moving Events Using Tweets.” 2019. Masters Thesis, Virginia Tech. Accessed January 21, 2021.
http://hdl.handle.net/10919/90884.
MLA Handbook (7th Edition):
Patil, Supritha Basavaraj. “Analysis of Moving Events Using Tweets.” 2019. Web. 21 Jan 2021.
Vancouver:
Patil SB. Analysis of Moving Events Using Tweets. [Internet] [Masters thesis]. Virginia Tech; 2019. [cited 2021 Jan 21].
Available from: http://hdl.handle.net/10919/90884.
Council of Science Editors:
Patil SB. Analysis of Moving Events Using Tweets. [Masters Thesis]. Virginia Tech; 2019. Available from: http://hdl.handle.net/10919/90884

Virginia Tech
20.
Kakusa, Takondwa Lisungu.
Use of Assembly Inspired Instructions in the Allowance of Natural Language Processing in ROS.
Degree: MS, Computer Engineering, 2018, Virginia Tech
URL: http://hdl.handle.net/10919/84521
► Natural Language processing is a growing field and widely used in both industrial and and commercial cases. Though it is difficult to create a natural…
(more)
▼ Natural Language processing is a growing field and widely used in both industrial and and commercial cases. Though it is difficult to create a
natural language system that can robustly react to and handle every situation it is quite possible to design the system to react to specific instruction or scenario. The problem with current
natural language systems used in machines, though, is that they are focused on single instructions, working to complete the instruction given then waiting for the next instruction. In this way they are not set to respond to possible conditions that are explained to them.
In the system designed and explained in this thesis, the goal is to fix this problem by introducing a method of adjusting to these conditions. The contributions made in this thesis are to design a set of instruction types that can be used in order to allow for conditional statements within
natural language instructions. To create a modular system using ROS in order to allow for more robust communication and integration. Finally, the goal is to also allow for an interconnection between the written text and derived instructions that will make the sentence construction more seamless and
natural for the user.
The work in this thesis will be limited in its focus to pertaining to the objective of obstacle traversal. The ideas and methodology, though, can be seen to extend into future work in the area.
Advisors/Committee Members: Hsiao, Michael S. (committeechair), Zeng, Haibo (committee member), Patterson, Cameron D. (committee member).
Subjects/Keywords: Natural Language Processing; Robotics; ROS
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Kakusa, T. L. (2018). Use of Assembly Inspired Instructions in the Allowance of Natural Language Processing in ROS. (Masters Thesis). Virginia Tech. Retrieved from http://hdl.handle.net/10919/84521
Chicago Manual of Style (16th Edition):
Kakusa, Takondwa Lisungu. “Use of Assembly Inspired Instructions in the Allowance of Natural Language Processing in ROS.” 2018. Masters Thesis, Virginia Tech. Accessed January 21, 2021.
http://hdl.handle.net/10919/84521.
MLA Handbook (7th Edition):
Kakusa, Takondwa Lisungu. “Use of Assembly Inspired Instructions in the Allowance of Natural Language Processing in ROS.” 2018. Web. 21 Jan 2021.
Vancouver:
Kakusa TL. Use of Assembly Inspired Instructions in the Allowance of Natural Language Processing in ROS. [Internet] [Masters thesis]. Virginia Tech; 2018. [cited 2021 Jan 21].
Available from: http://hdl.handle.net/10919/84521.
Council of Science Editors:
Kakusa TL. Use of Assembly Inspired Instructions in the Allowance of Natural Language Processing in ROS. [Masters Thesis]. Virginia Tech; 2018. Available from: http://hdl.handle.net/10919/84521

University of Sydney
21.
Pink, Glen Alan.
Slot Filling
.
Degree: 2017, University of Sydney
URL: http://hdl.handle.net/2123/17055
► Slot filling (SF) is the task of automatically extracting facts about particular entities from unstructured text, and populating a knowledge base (KB) with these facts.…
(more)
▼ Slot filling (SF) is the task of automatically extracting facts about particular entities from unstructured text, and populating a knowledge base (KB) with these facts. These structured KBs enable applications such as structured web queries and question answering. SF is typically framed as a query-oriented setting of the related task of relation extraction. Throughout this thesis, we reflect on how SF is a task with many distinct problems. We demonstrate that recall is a major limiter on SF system performance. We contribute an analysis of typical SF recall loss, and find a substantial amount of loss occurs early in the SF pipeline. We confirm that accurate NER and coreference resolution are required for high-recall SF. We measure upper bounds using a naïve graph-based semi-supervised bootstrapping technique, and find that only 39% of results are reachable using a typical feature space. We expect that this graph-based technique will be directly useful for extraction, and this leads us to frame SF as a label propagation task. We focus on a detailed graph representation of the task which reflects the behaviour and assumptions we want to model based on our analysis, including modifying the label propagation process to model multiple types of label interaction. Analysing the graph, we find that a large number of errors occur in very close proximity to training data, and identify that this is of major concern for propagation. While there are some conflicts caused by a lack of sufficient disambiguating context—we explore adding additional contextual features to address this—many of these conflicts are caused by subtle annotation problems. We find that lack of a standard for how explicit expressions of relations must be in text makes consistent annotation difficult. Using a strict definition of explicitness results in 20% of correct annotations being removed from a standard dataset. We contribute several annotation-driven analyses of this problem, exploring the definition of slots and the effect of the lack of a concrete definition of explicitness: annotation schema do not detail how explicit expressions of relations need to be, and there is large scope for disagreement between annotators. Additionally, applications may require relatively strict or relaxed evidence for extractions, but this is not considered in annotation tasks. We demonstrate that annotators frequently disagree on instances, dependent on differences in annotator world knowledge and thresholds on making probabilistic inference. SF is fundamental to enabling many knowledge-based applications, and this work motivates modelling and evaluating SF to better target these tasks.
Subjects/Keywords: natural language processing;
information;
extraction
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Pink, G. A. (2017). Slot Filling
. (Thesis). University of Sydney. Retrieved from http://hdl.handle.net/2123/17055
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Chicago Manual of Style (16th Edition):
Pink, Glen Alan. “Slot Filling
.” 2017. Thesis, University of Sydney. Accessed January 21, 2021.
http://hdl.handle.net/2123/17055.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
MLA Handbook (7th Edition):
Pink, Glen Alan. “Slot Filling
.” 2017. Web. 21 Jan 2021.
Vancouver:
Pink GA. Slot Filling
. [Internet] [Thesis]. University of Sydney; 2017. [cited 2021 Jan 21].
Available from: http://hdl.handle.net/2123/17055.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Council of Science Editors:
Pink GA. Slot Filling
. [Thesis]. University of Sydney; 2017. Available from: http://hdl.handle.net/2123/17055
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation

University of Texas – Austin
22.
-7062-2970.
Advances in statistical script learning.
Degree: PhD, Computer Science, 2017, University of Texas – Austin
URL: http://hdl.handle.net/2152/63480
► When humans encode information into natural language, they do so with the clear assumption that the reader will be able to seamlessly make inferences based…
(more)
▼ When humans encode information into
natural language, they do so with the
clear assumption that the reader will be able to seamlessly make inferences
based on world knowledge. For example, given the sentence ``Mrs. Dalloway said
she would buy the flowers herself,'' one can make a number of probable
inferences based on event co-occurrences: she bought flowers, she went to a
store, she took the flowers home, and so on.
Observing this, it is clear that many different useful
natural language
end-tasks could benefit from models of events as they typically co-occur
(so-called script models).
Robust question-answering systems must be able to infer highly-probable implicit
events from what is explicitly stated in a text, as must robust
information-extraction systems that map from unstructured text to formal
assertions about relations expressed in the text. Coreference resolution
systems, semantic role labeling, and even syntactic parsing systems could, in
principle, benefit from event co-occurrence models.
To this end, we present a number of contributions related to statistical
event co-occurrence models. First, we investigate a method of incorporating
multiple entities into events in a count-based co-occurrence model. We find that
modeling multiple entities interacting across events allows for improved
empirical performance on the task of modeling sequences of events in documents.
Second, we give a method of applying Recurrent Neural Network sequence models
to the task of predicting held-out predicate-argument structures from documents.
This model allows us to easily incorporate entity noun information, and can
allow for more complex, higher-arity events than a count-based co-occurrence
model. We find the neural model improves performance considerably over the
count-based co-occurrence model.
Third, we investigate the performance of a sequence-to-sequence encoder-decoder
neural model on the task of predicting held-out predicate-argument events from
text. This model does not explicitly model any external syntactic information,
and does not require a parser. We find the text-level model to be competitive in
predictive performance with an event level model directly mediated by an
external syntactic analysis.
Finally, motivated by this result, we investigate incorporating features derived
from these models into a baseline noun coreference resolution system. We find
that, while our additional features do not appreciably improve top-level
performance, we can nonetheless provide empirical improvement on a number of
restricted classes of difficult coreference decisions.
Advisors/Committee Members: Mooney, Raymond J. (Raymond Joseph) (advisor), Chambers, Nathanael (committee member), Erk, Katrin (committee member), Stone, Peter (committee member).
Subjects/Keywords: Natural language processing; Machine learning
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
-7062-2970. (2017). Advances in statistical script learning. (Doctoral Dissertation). University of Texas – Austin. Retrieved from http://hdl.handle.net/2152/63480
Note: this citation may be lacking information needed for this citation format:
Author name may be incomplete
Chicago Manual of Style (16th Edition):
-7062-2970. “Advances in statistical script learning.” 2017. Doctoral Dissertation, University of Texas – Austin. Accessed January 21, 2021.
http://hdl.handle.net/2152/63480.
Note: this citation may be lacking information needed for this citation format:
Author name may be incomplete
MLA Handbook (7th Edition):
-7062-2970. “Advances in statistical script learning.” 2017. Web. 21 Jan 2021.
Note: this citation may be lacking information needed for this citation format:
Author name may be incomplete
Vancouver:
-7062-2970. Advances in statistical script learning. [Internet] [Doctoral dissertation]. University of Texas – Austin; 2017. [cited 2021 Jan 21].
Available from: http://hdl.handle.net/2152/63480.
Note: this citation may be lacking information needed for this citation format:
Author name may be incomplete
Council of Science Editors:
-7062-2970. Advances in statistical script learning. [Doctoral Dissertation]. University of Texas – Austin; 2017. Available from: http://hdl.handle.net/2152/63480
Note: this citation may be lacking information needed for this citation format:
Author name may be incomplete

University of Texas – Austin
23.
Tausczik, Yla Rebecca.
Changing group dynamics through computerized language feedback.
Degree: PhD, Psychology, 2012, University of Texas – Austin
URL: http://hdl.handle.net/2152/ETD-UT-2012-08-5971
► Why do some groups of people work well together while others do not? It is commonly accepted that effective groups communicate well. Yet one of…
(more)
▼ Why do some groups of people work well together while others do not? It is commonly accepted that effective groups communicate well. Yet one of the biggest roadblocks facing the study of group communication is that it is extremely difficult to capture real-world group interactions and analyze the words people use in a timely manner. This project overcame this limitation in two ways. First, a broader and more systematic study of group processes was conducted by using a computerized text analysis program (Linguistic Inquiry and Word Count) that automatically codes
natural language using pre-established rules. Groups that work well together typically exchange more knowledge and establish good social relationships, which is reflected in the way that they use words. The group dynamics of over 500 student discussion groups interacting via group chat were assessed by studying their
language use. Second, a
language feedback system was built to experimentally test the importance of certain group processes on group satisfaction and performance. It is now possible to provide
language feedback by
processing natural language dialogue using computerized text analysis in real time. The
language feedback system can change the way the group works by providing individualized recommendations. In this way it is possible to manipulate group processes naturalistically. Together these studies provided evidence that important group processes can be detected even using simplistic
natural language processing, and preliminary evidence that providing real-time feedback based on the words students use in a group discussion can improve learning by changing how the group works together.
Advisors/Committee Members: Pennebaker, James W. (advisor), Cormack, Lawrence K. (committee member), Gosling, Samuel D. (committee member), Graesser, Arthur C. (committee member), Henderson, Marlone D. (committee member).
Subjects/Keywords: Teamwork; Natural language processing; Intervention
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Tausczik, Y. R. (2012). Changing group dynamics through computerized language feedback. (Doctoral Dissertation). University of Texas – Austin. Retrieved from http://hdl.handle.net/2152/ETD-UT-2012-08-5971
Chicago Manual of Style (16th Edition):
Tausczik, Yla Rebecca. “Changing group dynamics through computerized language feedback.” 2012. Doctoral Dissertation, University of Texas – Austin. Accessed January 21, 2021.
http://hdl.handle.net/2152/ETD-UT-2012-08-5971.
MLA Handbook (7th Edition):
Tausczik, Yla Rebecca. “Changing group dynamics through computerized language feedback.” 2012. Web. 21 Jan 2021.
Vancouver:
Tausczik YR. Changing group dynamics through computerized language feedback. [Internet] [Doctoral dissertation]. University of Texas – Austin; 2012. [cited 2021 Jan 21].
Available from: http://hdl.handle.net/2152/ETD-UT-2012-08-5971.
Council of Science Editors:
Tausczik YR. Changing group dynamics through computerized language feedback. [Doctoral Dissertation]. University of Texas – Austin; 2012. Available from: http://hdl.handle.net/2152/ETD-UT-2012-08-5971

Pontifical Catholic University of Rio de Janeiro
24.
SILVANO NOGUEIRA BUBACK.
[en] USING MACHINE LEARNING TO BUILD A TOOL THAT HELPS
COMMENTS MODERATION.
Degree: 2012, Pontifical Catholic University of Rio de Janeiro
URL: http://www.maxwell.vrac.puc-rio.br/Busca_etds.php?strSecao=resultado&nrSeq=19232
► [pt] Uma das mudanças trazidas pela Web 2.0 é a maior participação dos usuários na produção do conteúdo, através de opiniões em redes sociais ou…
(more)
▼ [pt] Uma das mudanças trazidas pela Web 2.0 é a maior
participação dos usuários na produção do conteúdo, através de
opiniões em redes sociais ou comentários nos próprios sites de
produtos e serviços. Estes comentários são muito valiosos para seus
sites pois fornecem feedback e incentivam a participação e
divulgação do conteúdo. Porém excessos podem ocorrer através de
comentários com palavrões indesejados ou spam. Enquanto para alguns
sites a própria moderação da comunidade é suficiente, para outros
as mensagens indesejadas podem comprometer o serviço. Para auxiliar
na moderação dos comentários foi construída uma ferramenta que
utiliza técnicas de aprendizado de máquina para auxiliar o
moderador. Para testar os resultados, dois corpora de comentários
produzidos na Globo.com foram utilizados, o primeiro com 657.405
comentários postados diretamente no site, e outro com 451.209
mensagens capturadas do Twitter. Nossos experimentos mostraram que
o melhor resultado é obtido quando se separa o aprendizado dos
comentários de acordo com o tema sobre o qual está sendo
comentado.
[en] One of the main changes brought by Web 2.0 is the
increase of user participation in content generation mainly in
social networks and comments in news and service sites. These
comments are valuable to the sites because they bring feedback and
motivate other people to participate and to spread the content. On
the other hand these comments also bring some kind of abuse as bad
words and spam. While for some sites their own community moderation
is enough, for others this impropriate content may compromise its
content. In order to help theses sites, a tool that uses machine
learning techniques was built to mediate comments. As a test to
compare results, two datasets captured from Globo.com were used:
the first one with 657.405 comments posted through its site and the
second with 451.209 messages captured from Twitter. Our experiments
show that best result is achieved when comment learning is done
according to the subject that is being commented.
Advisors/Committee Members: MARCO ANTONIO CASANOVA, MARCO ANTONIO CASANOVA, MARCO ANTONIO CASANOVA.
Subjects/Keywords: [pt] CLASSIFICACAO DE TEXTOS; [en] TEXT CLASSIFICATION; [pt] PROCESSAMENTO DA LINGUAGEM NATURAL; [en] NATURAL LANGUAGE PROCESSING; [pt] SVM; [en] SVM; [pt] BOOSTING; [en] BOOSTING
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
BUBACK, S. N. (2012). [en] USING MACHINE LEARNING TO BUILD A TOOL THAT HELPS
COMMENTS MODERATION. (Thesis). Pontifical Catholic University of Rio de Janeiro. Retrieved from http://www.maxwell.vrac.puc-rio.br/Busca_etds.php?strSecao=resultado&nrSeq=19232
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Chicago Manual of Style (16th Edition):
BUBACK, SILVANO NOGUEIRA. “[en] USING MACHINE LEARNING TO BUILD A TOOL THAT HELPS
COMMENTS MODERATION.” 2012. Thesis, Pontifical Catholic University of Rio de Janeiro. Accessed January 21, 2021.
http://www.maxwell.vrac.puc-rio.br/Busca_etds.php?strSecao=resultado&nrSeq=19232.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
MLA Handbook (7th Edition):
BUBACK, SILVANO NOGUEIRA. “[en] USING MACHINE LEARNING TO BUILD A TOOL THAT HELPS
COMMENTS MODERATION.” 2012. Web. 21 Jan 2021.
Vancouver:
BUBACK SN. [en] USING MACHINE LEARNING TO BUILD A TOOL THAT HELPS
COMMENTS MODERATION. [Internet] [Thesis]. Pontifical Catholic University of Rio de Janeiro; 2012. [cited 2021 Jan 21].
Available from: http://www.maxwell.vrac.puc-rio.br/Busca_etds.php?strSecao=resultado&nrSeq=19232.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Council of Science Editors:
BUBACK SN. [en] USING MACHINE LEARNING TO BUILD A TOOL THAT HELPS
COMMENTS MODERATION. [Thesis]. Pontifical Catholic University of Rio de Janeiro; 2012. Available from: http://www.maxwell.vrac.puc-rio.br/Busca_etds.php?strSecao=resultado&nrSeq=19232
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation

Pontifical Catholic University of Rio de Janeiro
25.
LUCIANA ROSA REDLICH.
[en] TRAFFIC EVENTS MODELING BASED ON CLIPPING OF HUGE
QUANTITY OF DATA FROM THE WEB.
Degree: 2015, Pontifical Catholic University of Rio de Janeiro
URL: http://www.maxwell.vrac.puc-rio.br/Busca_etds.php?strSecao=resultado&nrSeq=23940
► [pt] Este trabalho consiste no desenvolvimento de um modelo que auxilie na análise de eventos ocorridos no trânsito das grandes cidades. Utilizando uma grande massa…
(more)
▼ [pt] Este trabalho consiste no desenvolvimento de um
modelo que auxilie na análise de eventos ocorridos no trânsito das
grandes cidades. Utilizando uma grande massa de dados publicados na
Internet, em especial no twitter, por usuários comuns, este
trabalho fornece uma ontologia para eventos do trânsito publicados
em notícias da internet e uma aplicação que use o modelo proposto
para realizar consultas aos eventos modelados. Para isso, as
notícias publicadas em linguagem natural são processadas, isto é,
as entidades relevantes no texto são identificadas e depois
estruturadas de tal forma que seja feita uma analise semântica da
notícia publicada. As notícias publicadas são estruturadas no
modelo proposto de eventos e com isso é possível que sejam feitas
consultas sobre suas propriedades e relacionamentos, facilitando
assim a análise do processo do trânsito e dos eventos ocorridos
nele.
[en] This work proposes a traffic event model to
assist the analysis of traffic events on big cities. This paper
aims to provide not only an ontology for traffic events considering
published news over the Internet, but also a prototype of a
software architecture that uses the proposed model to perform
queries on the events, using a huge quantity of published data on
the Internet by regular users, especially on twitter. To do so, the
news published in natural language is processed, and the relevant
entities in the text are identified and structured in order to make
a semantic analysis of them. The news reported is structured in the
proposed model of events and thus the queries about their
properties and relationships could be answered. As a consequence,
the result of this work facilitates the analysis of the events
occurred on the traffic process.
Advisors/Committee Members: HELIO CORTES VIEIRA LOPES.
Subjects/Keywords: [pt] APRENDIZADO DE MAQUINA; [en] MACHINE LEARNING; [pt] ONTOLOGIAS; [en] ONTOLOGIES; [pt] EVENTO; [en] EVENT; [pt] PROCESSAMENTO DE LINGUAGEM NATURAL; [en] NATURAL LANGUAGE PROCESSING
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
REDLICH, L. R. (2015). [en] TRAFFIC EVENTS MODELING BASED ON CLIPPING OF HUGE
QUANTITY OF DATA FROM THE WEB. (Thesis). Pontifical Catholic University of Rio de Janeiro. Retrieved from http://www.maxwell.vrac.puc-rio.br/Busca_etds.php?strSecao=resultado&nrSeq=23940
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Chicago Manual of Style (16th Edition):
REDLICH, LUCIANA ROSA. “[en] TRAFFIC EVENTS MODELING BASED ON CLIPPING OF HUGE
QUANTITY OF DATA FROM THE WEB.” 2015. Thesis, Pontifical Catholic University of Rio de Janeiro. Accessed January 21, 2021.
http://www.maxwell.vrac.puc-rio.br/Busca_etds.php?strSecao=resultado&nrSeq=23940.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
MLA Handbook (7th Edition):
REDLICH, LUCIANA ROSA. “[en] TRAFFIC EVENTS MODELING BASED ON CLIPPING OF HUGE
QUANTITY OF DATA FROM THE WEB.” 2015. Web. 21 Jan 2021.
Vancouver:
REDLICH LR. [en] TRAFFIC EVENTS MODELING BASED ON CLIPPING OF HUGE
QUANTITY OF DATA FROM THE WEB. [Internet] [Thesis]. Pontifical Catholic University of Rio de Janeiro; 2015. [cited 2021 Jan 21].
Available from: http://www.maxwell.vrac.puc-rio.br/Busca_etds.php?strSecao=resultado&nrSeq=23940.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Council of Science Editors:
REDLICH LR. [en] TRAFFIC EVENTS MODELING BASED ON CLIPPING OF HUGE
QUANTITY OF DATA FROM THE WEB. [Thesis]. Pontifical Catholic University of Rio de Janeiro; 2015. Available from: http://www.maxwell.vrac.puc-rio.br/Busca_etds.php?strSecao=resultado&nrSeq=23940
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation

Pontifical Catholic University of Rio de Janeiro
26.
MIGUEL MENDES DE BRITO.
[en] DEEP LEARNING APPLIED TO TEXT CHUNKING.
Degree: 2019, Pontifical Catholic University of Rio de Janeiro
URL: http://www.maxwell.vrac.puc-rio.br/Busca_etds.php?strSecao=resultado&nrSeq=38016
► [pt] O Processamento de Linguagem natural é uma área de pesquisa que explora como computadores podem entender e manipular textos em linguagem natural. Dentre as…
(more)
▼ [pt] O Processamento de Linguagem natural é uma área
de pesquisa que explora como computadores podem entender e
manipular textos em linguagem natural. Dentre as tarefas mais
conhecidas em PLN está a de rotular sequências de texto. O problema
de segmentação de texto em sintagmas é um dos problemas que pode
ser abordado como rotulagem de sequências. Para isto, classificamos
quais palavras pertencem a um sintagma, onde cada sintagma
representa um grupo disjunto de palavras sintaticamente
correlacionadas. Este tipo de segmentação possui importantes
aplicações em tarefas mais complexas de processamento de linguagem
natural, como análise de dependências, tradução automática,
anotação de papéis semânticos, identificação de orações e outras. O
objetivo deste trabalho é apresentar uma arquitetura de rede neural
profunda para o problema de segmentação textual em sintagmas para a
língua portuguesa. O corpus usado nos experimentos é o Bosque, do
projeto Floresta Sintá(c)tica. Baseado em trabalhos recentes na
área, nossa abordagem supera o estado-da-arte para o português ao
alcançar um F(beta)=1 de 90,51, que corresponde a um aumento de
2,56 em comparação com o trabalho anterior. Além disso, como forma
de comprovar a qualidade do segmentador, usamos os rótulos obtidos
pelo nosso sistema como um dos atributos de entrada para a tarefa
de análise de dependências. Esses atributos melhoraram a acurácia
do analisador em 0,87.
[en] Natural Language Processing is a research field
that explores how computers can understand and manipulate natural
language texts. Sequence tagging is amongst the most well-known
tasks in NLP. Text Chunking is one of the problems that can be
approached as a sequence tagging problem. Thus, we classify which
words belong to a chunk, where each chunk represents a disjoint
group of syntactically correlated words. This type of chunking has
important applications in more complex tasks of natural language
processing, such as dependency parsing, machine translation,
semantic role labeling, clause identification and much more. The
goal of this work is to present a deep neural network archtecture
for the Portuguese text chunking problem. The corpus used in the
experiments is the Bosque, from the Floresta Sintá(c)tica project.
Based on recent work in the field, our approach surpass the
state-of-the-art for Portuguese by achieving a F(beta)=1 of 90.51,
which corresponds to an increase of 2.56 in comparison with the
previous work. In addition, in order to attest the chunker
effectiveness we use the tags obtained by our system as feature for
the depedency parsing task. These features improved the accuracy of
the parser by 0.87.
Advisors/Committee Members: SERGIO COLCHER.
Subjects/Keywords: [pt] APRENDIZADO DE MAQUINA; [en] MACHINE LEARNING; [pt] PROCESSAMENTO DE LINGUAGEM NATURAL; [en] NATURAL LANGUAGE PROCESSING; [pt] SEGMENTACAO TEXTUAL; [en] TEXT CHUNKING; [pt] APRENDIZADO PROFUNDO; [en] DEEP LEARNING
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
BRITO, M. M. D. (2019). [en] DEEP LEARNING APPLIED TO TEXT CHUNKING. (Thesis). Pontifical Catholic University of Rio de Janeiro. Retrieved from http://www.maxwell.vrac.puc-rio.br/Busca_etds.php?strSecao=resultado&nrSeq=38016
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Chicago Manual of Style (16th Edition):
BRITO, MIGUEL MENDES DE. “[en] DEEP LEARNING APPLIED TO TEXT CHUNKING.” 2019. Thesis, Pontifical Catholic University of Rio de Janeiro. Accessed January 21, 2021.
http://www.maxwell.vrac.puc-rio.br/Busca_etds.php?strSecao=resultado&nrSeq=38016.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
MLA Handbook (7th Edition):
BRITO, MIGUEL MENDES DE. “[en] DEEP LEARNING APPLIED TO TEXT CHUNKING.” 2019. Web. 21 Jan 2021.
Vancouver:
BRITO MMD. [en] DEEP LEARNING APPLIED TO TEXT CHUNKING. [Internet] [Thesis]. Pontifical Catholic University of Rio de Janeiro; 2019. [cited 2021 Jan 21].
Available from: http://www.maxwell.vrac.puc-rio.br/Busca_etds.php?strSecao=resultado&nrSeq=38016.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Council of Science Editors:
BRITO MMD. [en] DEEP LEARNING APPLIED TO TEXT CHUNKING. [Thesis]. Pontifical Catholic University of Rio de Janeiro; 2019. Available from: http://www.maxwell.vrac.puc-rio.br/Busca_etds.php?strSecao=resultado&nrSeq=38016
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation

Pontifical Catholic University of Rio de Janeiro
27.
[No author].
[en] AN APPROACH TO ANSWERING NATURAL LANGUAGE QUESTIONS IN
PORTUGUESE FROM ONTOLOGIES AND KNOWLEDGE BASES.
Degree: 2020, Pontifical Catholic University of Rio de Janeiro
URL: http://www.maxwell.vrac.puc-rio.br/Busca_etds.php?strSecao=resultado&nrSeq=47744
► [pt] Nos últimos anos temos visto o crescimento do volume de dados não estruturados gerados naWeb tradicional, e por isso aWeb Semântica nasceu como um…
(more)
▼ [pt] Nos últimos anos temos visto o crescimento do
volume de dados não estruturados gerados naWeb tradicional, e por
isso aWeb Semântica nasceu como um paradigma que se propõe a
estruturar o conteúdo da Web de uma forma flexível, por meio de
ontologias de domínio e o modelo RDF, tornando os computadores
capazes de processar automaticamente esses dados e possibilitando a
geração de mais informação e conhecimento. Mas para tornar estas
informações acessíveis para usuários de outros domínios, é
necessário que haja uma maneira mais conveniente de consultar estas
bases de conhecimento. A área de Processamento de Linguagem Natural
(PLN) forneceu ferramentas para permitir que a linguagem natural
(falada ou escrita) seja um meio conveniente para realizar
consultas em bases de conhecimento. Contudo, para que o uso da
linguagem natural seja realmente efetivo, é necessário um método
que converta uma pergunta ou pedido em linguagem natural em uma
consulta estruturada. Tendo em vista este objetivo, o presente
trabalho propõe uma abordagem que converte uma pergunta/pedido em
Português em uma consulta estruturada na linguagem SPARQL, por meio
do uso de árvores de dependências e ontologias estruturada em
grafos, e que também permite o enriquecimento dos resultados das
perguntas/pedidos por meio da geração de perguntas
relacionadas.
[en] In recent years we have seen the growth of the
volume of unstructured data generated in the traditional Web.
Therefore the Semantic Web was born as a paradigm that proposes to
structure the content of the Web flexibly through domain ontologies
and the RDF model, making computers capable of automatically
processing this data, enabling the generation of more information
and knowledge. However, to make this information accessible to
users in other domains, there needs to be a more convenient way of
looking at these knowledge bases. The Natural Language Processing
(NLP) area has provided tools to allow natural (spoken or writing)
is a convenient way to perform queries in knowledge bases. However,
for the use of natural language to be useful, a method is required
that converts a natural language question or request into a
structured query. With this objective, the present work proposes an
approach that converts a question/request in Portuguese into a
structured query in the SPARQL language, through the use of
dependency trees and structured ontologies in graphs, and that also
enables the enrichment of question/request results by generating
related questions.
Advisors/Committee Members: SIMONE DINIZ JUNQUEIRA BARBOSA.
Subjects/Keywords: [pt] ONTOLOGIA; [en] ONTOLOGY; [pt] WEB SEMANTICA; [en] SEMANTIC WEB; [pt] PROCESSAMENTO DE LINGUAGEM NATURAL; [en] NATURAL LANGUAGE PROCESSING; [pt] BASE DE CONHECIMENTO; [en] KNOWLEDGE BASES
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
author], [. (2020). [en] AN APPROACH TO ANSWERING NATURAL LANGUAGE QUESTIONS IN
PORTUGUESE FROM ONTOLOGIES AND KNOWLEDGE BASES. (Thesis). Pontifical Catholic University of Rio de Janeiro. Retrieved from http://www.maxwell.vrac.puc-rio.br/Busca_etds.php?strSecao=resultado&nrSeq=47744
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Chicago Manual of Style (16th Edition):
author], [No. “[en] AN APPROACH TO ANSWERING NATURAL LANGUAGE QUESTIONS IN
PORTUGUESE FROM ONTOLOGIES AND KNOWLEDGE BASES.” 2020. Thesis, Pontifical Catholic University of Rio de Janeiro. Accessed January 21, 2021.
http://www.maxwell.vrac.puc-rio.br/Busca_etds.php?strSecao=resultado&nrSeq=47744.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
MLA Handbook (7th Edition):
author], [No. “[en] AN APPROACH TO ANSWERING NATURAL LANGUAGE QUESTIONS IN
PORTUGUESE FROM ONTOLOGIES AND KNOWLEDGE BASES.” 2020. Web. 21 Jan 2021.
Vancouver:
author] [. [en] AN APPROACH TO ANSWERING NATURAL LANGUAGE QUESTIONS IN
PORTUGUESE FROM ONTOLOGIES AND KNOWLEDGE BASES. [Internet] [Thesis]. Pontifical Catholic University of Rio de Janeiro; 2020. [cited 2021 Jan 21].
Available from: http://www.maxwell.vrac.puc-rio.br/Busca_etds.php?strSecao=resultado&nrSeq=47744.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Council of Science Editors:
author] [. [en] AN APPROACH TO ANSWERING NATURAL LANGUAGE QUESTIONS IN
PORTUGUESE FROM ONTOLOGIES AND KNOWLEDGE BASES. [Thesis]. Pontifical Catholic University of Rio de Janeiro; 2020. Available from: http://www.maxwell.vrac.puc-rio.br/Busca_etds.php?strSecao=resultado&nrSeq=47744
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation

Pontifical Catholic University of Rio de Janeiro
28.
[No author].
[pt] SEGMENTAÇÃO SEMÂNTICA DE VAGAS DE EMPREGO: ESTUDO
COMPARATIVO DE ALGORITMOS CLÁSSICOS DE APRENDIZADO DE
MÁQUINA.
Degree: 2020, Pontifical Catholic University of Rio de Janeiro
URL: http://www.maxwell.vrac.puc-rio.br/Busca_etds.php?strSecao=resultado&nrSeq=49087
► [pt] Este trabalho demonstra como web mining, processamento de linguagem natural e aprendizado de máquina podem ser combinados para melhorar a compreensão de vagas de…
(more)
▼ [pt] Este trabalho demonstra como web mining,
processamento de linguagem natural e aprendizado de máquina podem
ser combinados para melhorar a compreensão de vagas de emprego
segmentando semanticamente os textos de suas descrições. Para
atingir essa finalidade, foram coletados dados textuais de três
grandes sites de vagas de emprego: Catho, LinkedIn e VAGAS.com.br.
Baseado na literatura, este trabalho propôe uma estrutura semântica
simplificada em que cada sentença da descrição da vaga de emprego
pode pertencer a uma dessas classes: Responsabilidades, Requisitos,
Benefícios e Outros. De posse dessa ideia, a tarefa de segmentação
semântica pode ser repensada como uma segmentação de sentenças
seguida de uma classificação. Usando o Python como ferramenta, são
experimentadas algumas formas de construção de atributos a partir
de textos, tanto léxicas quanto semânticas, e quatro algoritmos
clássicos de aprendizado de máquina: Naive Bayes, Regressão
Logística, Máquina de Vetores de Suporte e Floresta Aleatória. Como
resultados, este trabalho traz um classificador (Regressão
Logística com representação binária) com 95.58 porcento de
acurácia, sem sobreajuste de modelo e sem degenerar as
classificações por desbalanceio de classes, que é comparável ao
estado da arte para Classificação de Texto. Esse classificador foi
treinado e validado usando dados do Catho, mas foi testado também
nos dados do VAGAS.com.br (88.60 porcento) e do LinkedIn (91.14
porcento), apresentando uma evidência de que seu aprendizado é
generalizável para dados de outros sites. Além disso, o
classificador foi usado para segmentação semântica das vagas de
emprego e obteve uma métrica Pk de 3.67 porcento e uma métrica
WindowDiff de 4.78 porcento, que é comparável ao estado da arte de
Segmentação de Texto. Por fim, vale salientar duas contribuições
indiretas deste trabalho: 1) uma estrutura para pensar e analisar
vagas de emprego e 2) uma indicação de que algoritmos clássicos
também podem alcançar o estado da arte e, portanto, sempre devem
experimentados.
[en] This dissertation demonstrates how web mining,
natural language processing, and machine learning can be combined
to improve understanding of job openings by semantically segmenting
the texts of their descriptions. To achieve this purpose, textual
data were collected from three major job sites: Catho, LinkedIn and
VAGAS.com.br. Based on the literature, this work proposes a
simplified semantic structure in which each sentence of the job
description can belong to one of these classes: Responsibilities,
Requirements, Benefits and Others. With this idea, the semantic
segmentation task can be rethought as a sentence segmentation
followed by a classification. Using Python as a tool, some ways of
constructing features from texts are tried out, both lexical and
semantic, and four classic machine learning algorithms: Naïve
Bayes, Logistic Regression, Support Vector Machine, and Random
Forest. As a result, this work presents a classifier (Logistic
Regression with binary representation) with 95.58 percent…
Advisors/Committee Members: EDUARDO SANY LABER.
Subjects/Keywords: [pt] APRENDIZADO DE MAQUINA; [pt] VAGA DE EMPREGO; [pt] PROCESSAMENTO DE LINGUAGEM NATURAL; [en] MACHINE LEARNING; [en] JOB VACANCIES; [en] NATURAL LANGUAGE PROCESSING
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
author], [. (2020). [pt] SEGMENTAÇÃO SEMÂNTICA DE VAGAS DE EMPREGO: ESTUDO
COMPARATIVO DE ALGORITMOS CLÁSSICOS DE APRENDIZADO DE
MÁQUINA. (Thesis). Pontifical Catholic University of Rio de Janeiro. Retrieved from http://www.maxwell.vrac.puc-rio.br/Busca_etds.php?strSecao=resultado&nrSeq=49087
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Chicago Manual of Style (16th Edition):
author], [No. “[pt] SEGMENTAÇÃO SEMÂNTICA DE VAGAS DE EMPREGO: ESTUDO
COMPARATIVO DE ALGORITMOS CLÁSSICOS DE APRENDIZADO DE
MÁQUINA.” 2020. Thesis, Pontifical Catholic University of Rio de Janeiro. Accessed January 21, 2021.
http://www.maxwell.vrac.puc-rio.br/Busca_etds.php?strSecao=resultado&nrSeq=49087.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
MLA Handbook (7th Edition):
author], [No. “[pt] SEGMENTAÇÃO SEMÂNTICA DE VAGAS DE EMPREGO: ESTUDO
COMPARATIVO DE ALGORITMOS CLÁSSICOS DE APRENDIZADO DE
MÁQUINA.” 2020. Web. 21 Jan 2021.
Vancouver:
author] [. [pt] SEGMENTAÇÃO SEMÂNTICA DE VAGAS DE EMPREGO: ESTUDO
COMPARATIVO DE ALGORITMOS CLÁSSICOS DE APRENDIZADO DE
MÁQUINA. [Internet] [Thesis]. Pontifical Catholic University of Rio de Janeiro; 2020. [cited 2021 Jan 21].
Available from: http://www.maxwell.vrac.puc-rio.br/Busca_etds.php?strSecao=resultado&nrSeq=49087.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Council of Science Editors:
author] [. [pt] SEGMENTAÇÃO SEMÂNTICA DE VAGAS DE EMPREGO: ESTUDO
COMPARATIVO DE ALGORITMOS CLÁSSICOS DE APRENDIZADO DE
MÁQUINA. [Thesis]. Pontifical Catholic University of Rio de Janeiro; 2020. Available from: http://www.maxwell.vrac.puc-rio.br/Busca_etds.php?strSecao=resultado&nrSeq=49087
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation

McMaster University
29.
Cui, Yexin.
Language Identification on Short Textual Data.
Degree: MASc, 2020, McMaster University
URL: http://hdl.handle.net/11375/25126
► Language identification is the task of automatically detecting the languages(s) written in a text or a document given, and is also the very first step…
(more)
▼ Language identification is the task of automatically detecting the languages(s) written in a text or a document given, and is also the very first step of further natural language processing tasks. This task has been well-studied over decades in the past, however, most of the works have focused on long texts rather than the short that is proved to be more challenging due to the insufficiency of syntactic and semantic information. In this work, we present approaches to this problem based on deep learning techniques, traditional methods and their combination. The proposed ensemble model, composed of a learning based method and a dictionary based method, achieves 89.6% accuracy on our new generated gold test set, surpassing Google Translate API by 3.7% and an industry leading tool Langid.py by 26.1%.
Thesis
Master of Applied Science (MASc)
Advisors/Committee Members: Chen, Jun, Electrical and Computer Engineering.
Subjects/Keywords: Natural Language Processing; Language identification; Textual data
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Cui, Y. (2020). Language Identification on Short Textual Data. (Masters Thesis). McMaster University. Retrieved from http://hdl.handle.net/11375/25126
Chicago Manual of Style (16th Edition):
Cui, Yexin. “Language Identification on Short Textual Data.” 2020. Masters Thesis, McMaster University. Accessed January 21, 2021.
http://hdl.handle.net/11375/25126.
MLA Handbook (7th Edition):
Cui, Yexin. “Language Identification on Short Textual Data.” 2020. Web. 21 Jan 2021.
Vancouver:
Cui Y. Language Identification on Short Textual Data. [Internet] [Masters thesis]. McMaster University; 2020. [cited 2021 Jan 21].
Available from: http://hdl.handle.net/11375/25126.
Council of Science Editors:
Cui Y. Language Identification on Short Textual Data. [Masters Thesis]. McMaster University; 2020. Available from: http://hdl.handle.net/11375/25126

Pontifical Catholic University of Rio de Janeiro
30.
EDUARDO DE JESUS COELHO REIS.
[en] MORPHOSYNTACTIC ANNOTATION BASED ON MORPHOLOGICAL
CONTEXT.
Degree: 2016, Pontifical Catholic University of Rio de Janeiro
URL: http://www.maxwell.vrac.puc-rio.br/Busca_etds.php?strSecao=resultado&nrSeq=28461
► [pt] Rotular as classes gramaticais ao longo de uma sentença - part-ofspeech tagging - é uma das primeiras tarefas de processamento de linguagem natural, fornecendo…
(more)
▼ [pt] Rotular as classes gramaticais ao longo de uma
sentença - part-ofspeech tagging - é uma das primeiras tarefas de
processamento de linguagem natural, fornecendo atributos
importantes para realizar tarefas de alta complexidade. A
representação de texto a nível de palavra tem sido amplamente
adotada, tanto através de uma codificação esparsa convencional,
e.g. bagofwords; quanto por uma representação distribuída, como os
sofisticados modelos de word-embedding usados para descrever
informações sintáticas e semânticas. Um problema importante desse
tipo de codificação é a carência de aspectos morfológicos. Além
disso, os sistemas atuais apresentam uma precisão por token em
torno de 97 por cento. Contudo, quando avaliados por sentença,
apresentam um resultado mais modesto com uma taxa de acerto em
torno de 55−57 por cento. Neste trabalho, nós demonstramos como
utilizar n-grams para derivar automaticamente atributos esparsos e
morfológicos para processamento de texto. Essa representação
permite que redes neurais realizem a tarefa de POS-Tagging a partir
de uma representação a nível de caractere. Além disso, introduzimos
uma estratégia de regularização capaz de selecionar atributos
específicos para cada neurônio. A utilização de regularização
embutida em nossos modelos produz duas variantes. A primeira
compartilha os n-grams selecionados globalmente entre todos os
neurônios de uma camada; enquanto que a segunda opera uma seleção
individual para cada neurônio, de forma que cada neurônio é
sensível apenas aos n-grams que mais o estimulam. Utilizando a
abordagem apresentada, nós geramos uma alta quantidade de
características que representam afeições morfossintáticas
relevantes baseadas a nível de caractere. Nosso POS tagger atinge a
acurácia de 96, 67 por cento no corpus Mac-Morpho para o
Português.
[en] Part-of-speech tagging is one of the primary
stages in natural language processing, providing useful features
for performing higher complexity tasks. Word level representations
have been largely adopted, either through a conventional sparse
codification, such as bag-of-words, or through a distributed
representation, like the sophisticated word embedded models used to
describe syntactic and semantic information. A central issue on
these codifications is the lack of morphological aspects. In
addition, recent taggers present per-token accuracies around 97
percent. However, when using a persentence metric, the good taggers
show modest accuracies, scoring around 55-57 percent. In this work,
we demonstrate how to use n-grams to automatically derive
morphological sparse features for text processing. This
representation allows neural networks to perform POS tagging from a
character-level input. Additionally, we introduce a regularization
strategy capable of selecting specific features for each layer
unit. As a result, regarding n-grams selection, using the embedded
regularization in our models produces two variants. The first one
shares globally selected features among all layer units, whereas
the second operates individual…
Advisors/Committee Members: RUY LUIZ MILIDIU.
Subjects/Keywords: [pt] REDES NEURAIS; [en] NEURAL NETWORKS; [pt] PROCESSAMENTO DE LINGUAGEM NATURAL; [en] NATURAL LANGUAGE PROCESSING; [pt] PART OF SPEECH TAGGING; [pt] REPRESENTACAO MORFOLOGICA; [pt] N GRAMS; [pt] REGULARIZACAO ESPARSA
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
REIS, E. D. J. C. (2016). [en] MORPHOSYNTACTIC ANNOTATION BASED ON MORPHOLOGICAL
CONTEXT. (Thesis). Pontifical Catholic University of Rio de Janeiro. Retrieved from http://www.maxwell.vrac.puc-rio.br/Busca_etds.php?strSecao=resultado&nrSeq=28461
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Chicago Manual of Style (16th Edition):
REIS, EDUARDO DE JESUS COELHO. “[en] MORPHOSYNTACTIC ANNOTATION BASED ON MORPHOLOGICAL
CONTEXT.” 2016. Thesis, Pontifical Catholic University of Rio de Janeiro. Accessed January 21, 2021.
http://www.maxwell.vrac.puc-rio.br/Busca_etds.php?strSecao=resultado&nrSeq=28461.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
MLA Handbook (7th Edition):
REIS, EDUARDO DE JESUS COELHO. “[en] MORPHOSYNTACTIC ANNOTATION BASED ON MORPHOLOGICAL
CONTEXT.” 2016. Web. 21 Jan 2021.
Vancouver:
REIS EDJC. [en] MORPHOSYNTACTIC ANNOTATION BASED ON MORPHOLOGICAL
CONTEXT. [Internet] [Thesis]. Pontifical Catholic University of Rio de Janeiro; 2016. [cited 2021 Jan 21].
Available from: http://www.maxwell.vrac.puc-rio.br/Busca_etds.php?strSecao=resultado&nrSeq=28461.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Council of Science Editors:
REIS EDJC. [en] MORPHOSYNTACTIC ANNOTATION BASED ON MORPHOLOGICAL
CONTEXT. [Thesis]. Pontifical Catholic University of Rio de Janeiro; 2016. Available from: http://www.maxwell.vrac.puc-rio.br/Busca_etds.php?strSecao=resultado&nrSeq=28461
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
◁ [1] [2] [3] [4] [5] … [4430] ▶
.