You searched for +publisher:"University of Illinois – Urbana-Champaign" +contributor:("Roth, Dan")
.
Showing records 1 – 30 of
67 total matches.
◁ [1] [2] [3] ▶

University of Illinois – Urbana-Champaign
1.
Muddireddy, Pavankumar Reddy.
Fine-grained entity typing system - design and analysis.
Degree: MS, Electrical and Computer Engineering, 2018, University of Illinois – Urbana-Champaign
URL: http://hdl.handle.net/2142/101214
► Named entity recognition (NER) is a natural language processing (NLP) task that involves identifying mentions (spans of text) denoting entities in a given text document…
(more)
▼ Named entity recognition (NER) is a natural language processing (NLP) task that involves identifying mentions (spans of text) denoting entities in a given text document and assigning them a semantic category/type from a given taxonomy. It is considered to be one of the fundamental tasks in NLP and forms the basis for higher level understanding. In this thesis, we deal with fine-grained entity type recognition, which is a variant of the classic NER task where the usual types are sub-divided into fine-grained types. We show that the current approaches, which address this problem using only local context, are insufficient to completely address the problem. We systematically identify the fundamental challenges and misconceptions that underlie the assumptions, approaches and evaluation methodologies of this task and propose improvements and alternatives. We do this by first analyzing the role of context and background knowledge in the task of fine-grained entity typing. Second, we introduce a modular architecture for fine-grained typing of entities and show that a rather simple instantiation of these modules reaches the state-of-the-art performance.
Advisors/Committee Members: Champaign%22%20%2Bcontributor%3A%28%22Roth%2C%20Dan%22%29&pagesize-30">
Roth,
Dan (advisor).
Subjects/Keywords: named entity recognition; NER; fine-grained named entity recognition; finet; figer; information retrieval; entity typing; fine-grained typing
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Muddireddy, P. R. (2018). Fine-grained entity typing system - design and analysis. (Thesis). University of Illinois – Urbana-Champaign. Retrieved from http://hdl.handle.net/2142/101214
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Chicago Manual of Style (16th Edition):
Muddireddy, Pavankumar Reddy. “Fine-grained entity typing system - design and analysis.” 2018. Thesis, University of Illinois – Urbana-Champaign. Accessed March 01, 2021.
http://hdl.handle.net/2142/101214.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
MLA Handbook (7th Edition):
Muddireddy, Pavankumar Reddy. “Fine-grained entity typing system - design and analysis.” 2018. Web. 01 Mar 2021.
Vancouver:
Muddireddy PR. Fine-grained entity typing system - design and analysis. [Internet] [Thesis]. University of Illinois – Urbana-Champaign; 2018. [cited 2021 Mar 01].
Available from: http://hdl.handle.net/2142/101214.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Council of Science Editors:
Muddireddy PR. Fine-grained entity typing system - design and analysis. [Thesis]. University of Illinois – Urbana-Champaign; 2018. Available from: http://hdl.handle.net/2142/101214
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation

University of Illinois – Urbana-Champaign
2.
Mangipudi, Bhargav.
Evaluating exact and approximate algorithms for integer linear programming formulations of MAP inference.
Degree: MS, Computer Science, 2017, University of Illinois – Urbana-Champaign
URL: http://hdl.handle.net/2142/99111
► Structured prediction tasks involve an inference step which allows for producing coherent label assignments to the output structure. This can be achieved by constraining the…
(more)
▼ Structured prediction tasks involve an inference step which allows for producing coherent label assignments to the output structure. This can be achieved by constraining the output using prior knowledge about the domain. This paradigm is called Constrained Conditional Models; and it involves augmenting the learning of conditional models with declarative constraints. The MAP inference problem in CCM framework can be solved by formulating an Integer Linear Programming problem. This ILP formulation is generally relaxed to an Linear Programming problem by dropping the integrality constraints and making it tractable. In this work, we evaluate other approximate inference algorithms for the MAP estimate for structured prediction task in the CCM framework. We model the constrained structured prediction problem as a factor graph and use different graphical models based algorithms. We evaluate these methods for the quality of their solution and the computation time over some NLP tasks with varying complexity. For large-scale problems, the tradeoff between inference time and the approximateness of the solution is a crucial aspect. Furthermore, these inference solvers are provided as black-box implementations in Saul, which is a declarative programming language for structured prediction tasks.
Advisors/Committee Members: Champaign%22%20%2Bcontributor%3A%28%22Roth%2C%20Dan%22%29&pagesize-30">
Roth,
Dan (advisor).
Subjects/Keywords: Structured inference; Constrained conditional models; Graphical models
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Mangipudi, B. (2017). Evaluating exact and approximate algorithms for integer linear programming formulations of MAP inference. (Thesis). University of Illinois – Urbana-Champaign. Retrieved from http://hdl.handle.net/2142/99111
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Chicago Manual of Style (16th Edition):
Mangipudi, Bhargav. “Evaluating exact and approximate algorithms for integer linear programming formulations of MAP inference.” 2017. Thesis, University of Illinois – Urbana-Champaign. Accessed March 01, 2021.
http://hdl.handle.net/2142/99111.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
MLA Handbook (7th Edition):
Mangipudi, Bhargav. “Evaluating exact and approximate algorithms for integer linear programming formulations of MAP inference.” 2017. Web. 01 Mar 2021.
Vancouver:
Mangipudi B. Evaluating exact and approximate algorithms for integer linear programming formulations of MAP inference. [Internet] [Thesis]. University of Illinois – Urbana-Champaign; 2017. [cited 2021 Mar 01].
Available from: http://hdl.handle.net/2142/99111.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Council of Science Editors:
Mangipudi B. Evaluating exact and approximate algorithms for integer linear programming formulations of MAP inference. [Thesis]. University of Illinois – Urbana-Champaign; 2017. Available from: http://hdl.handle.net/2142/99111
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation

University of Illinois – Urbana-Champaign
3.
Jiang, Yiming.
Improvements and augmentations to Learning Based Java: a Java based learning based programming language.
Degree: MS, Computer Science, 2016, University of Illinois – Urbana-Champaign
URL: http://hdl.handle.net/2142/90827
► Machine Learning (ML) is the science that enables computers with the ability to learn without being explicitly programmed. ML is so pervasive today, with applications…
(more)
▼ Machine Learning (ML) is the science that enables computers with the ability to learn without being explicitly programmed. ML is so pervasive today, with applications in speech recognition, recommendation systems, fraud detection and many more that we may not be aware of. To facilitate a rapid pace of development, it is important to create a framework with modularity and reusability. Learning Based Java (LBJava) was introduced by Cognitive Computation Group (CCG) to achieve such goal.
This thesis extends and introduces multiple components in LBJava. We begin by giving a comprehensive literature review relates to Learning Based Programming (LBP) and LBJava.
Then we introduce regression evaluation metrics to LBJava. In addition, we introduce Adaptive Sub- Gradient (AdaGrad) for regression. Then we add a comprehensive tutorial with example on regression. Furthermore, we extend both SGD and AdaGrad algorithms for classification. Then we evaluate across var- ious learning algorithms, with sparse and dense features, using large programmatically generated datasets.
Moreover, we introduce Neural Network (NN), in particular, Multilayer Perceptron (MLP), to LBJava. We also did some miscellaneous work.
Lastly, we conclude on all the extended and added components and provide recommendations for future work.
Advisors/Committee Members: Champaign%22%20%2Bcontributor%3A%28%22Roth%2C%20Dan%22%29&pagesize-30">
Roth,
Dan (advisor).
Subjects/Keywords: Machine Learning
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Jiang, Y. (2016). Improvements and augmentations to Learning Based Java: a Java based learning based programming language. (Thesis). University of Illinois – Urbana-Champaign. Retrieved from http://hdl.handle.net/2142/90827
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Chicago Manual of Style (16th Edition):
Jiang, Yiming. “Improvements and augmentations to Learning Based Java: a Java based learning based programming language.” 2016. Thesis, University of Illinois – Urbana-Champaign. Accessed March 01, 2021.
http://hdl.handle.net/2142/90827.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
MLA Handbook (7th Edition):
Jiang, Yiming. “Improvements and augmentations to Learning Based Java: a Java based learning based programming language.” 2016. Web. 01 Mar 2021.
Vancouver:
Jiang Y. Improvements and augmentations to Learning Based Java: a Java based learning based programming language. [Internet] [Thesis]. University of Illinois – Urbana-Champaign; 2016. [cited 2021 Mar 01].
Available from: http://hdl.handle.net/2142/90827.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Council of Science Editors:
Jiang Y. Improvements and augmentations to Learning Based Java: a Java based learning based programming language. [Thesis]. University of Illinois – Urbana-Champaign; 2016. Available from: http://hdl.handle.net/2142/90827
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation

University of Illinois – Urbana-Champaign
4.
Duncan, Chase.
A study of coherence in entity linking.
Degree: MS, Computer Science, 2018, University of Illinois – Urbana-Champaign
URL: http://hdl.handle.net/2142/102393
► Entity linking (EL) is the task of mapping entities, such as persons, locations, organizations, etc., in text to a corresponding record in a knowledge base…
(more)
▼ Entity linking (EL) is the task of mapping entities, such as persons, locations, organizations, etc., in text to a corresponding record in a knowledge base (KB) like Wikipedia or Freebase. In this paper we present, for the first time, a controlled study of one aspect of this problem called coherence. Further we show that many state-of-the-art models for EL reduce to the same basic architecture. Based on this general model we suggest that any system can theoretically bene t from using coherence although most do not. Our experimentation suggests that this is because the common approaches to measuring coherence among entities produce only weak signals. Therefore we argue that the way forward for research into coherence in EL is not by seeking new methods for performing inference but rather better methods for representing and comparing entities based off of existing structured data resources such as DBPedia and Wikidata.
Advisors/Committee Members: Champaign%22%20%2Bcontributor%3A%28%22Roth%2C%20Dan%22%29&pagesize-30">
Roth,
Dan (advisor).
Subjects/Keywords: Entity Linking; Machine Learning; NLP
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Duncan, C. (2018). A study of coherence in entity linking. (Thesis). University of Illinois – Urbana-Champaign. Retrieved from http://hdl.handle.net/2142/102393
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Chicago Manual of Style (16th Edition):
Duncan, Chase. “A study of coherence in entity linking.” 2018. Thesis, University of Illinois – Urbana-Champaign. Accessed March 01, 2021.
http://hdl.handle.net/2142/102393.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
MLA Handbook (7th Edition):
Duncan, Chase. “A study of coherence in entity linking.” 2018. Web. 01 Mar 2021.
Vancouver:
Duncan C. A study of coherence in entity linking. [Internet] [Thesis]. University of Illinois – Urbana-Champaign; 2018. [cited 2021 Mar 01].
Available from: http://hdl.handle.net/2142/102393.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Council of Science Editors:
Duncan C. A study of coherence in entity linking. [Thesis]. University of Illinois – Urbana-Champaign; 2018. Available from: http://hdl.handle.net/2142/102393
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation

University of Illinois – Urbana-Champaign
5.
Chen, Liang-Wei.
Extending Wikification: Nominal discovery, nominal linking, and the grounding of nouns.
Degree: MS, Computer Science, 2017, University of Illinois – Urbana-Champaign
URL: http://hdl.handle.net/2142/99195
► Mention discovery, entity linking, and grounding are crucial steps in natural language understanding. Compared with named entities, the detection and linking of nominals are relatively…
(more)
▼ Mention discovery, entity linking, and grounding are crucial steps in natural language understanding. Compared with named entities, the detection and linking of nominals are relatively little studied but essential since the grounding of nouns enriches information for humans that read documents. In this thesis, we address those problems by extending the
Illinois Cross-lingual Wikifier with nominal linking and sense disambiguation. We train a nominal detector with the dictionary post-process to discover nominal mentions and classify them into predefined type categories. For the nominal linking, we propose a co-reference model that captures the pairwise features between the named entity and the nominal, and we integrate it with several linking heuristics. Finally, we ground nouns to their Wikipedia titles by adjusting the ranker of the Wikifier with extra features and the training on common nouns. Our proposed approaches show competitive performances on the benchmark datasets.
Advisors/Committee Members: Champaign%22%20%2Bcontributor%3A%28%22Roth%2C%20Dan%22%29&pagesize-30">
Roth,
Dan (advisor).
Subjects/Keywords: Wikification; Nominal entity recognition; Nominal entity disambiguation; Concept disambiguation; Natural language processing
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Chen, L. (2017). Extending Wikification: Nominal discovery, nominal linking, and the grounding of nouns. (Thesis). University of Illinois – Urbana-Champaign. Retrieved from http://hdl.handle.net/2142/99195
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Chicago Manual of Style (16th Edition):
Chen, Liang-Wei. “Extending Wikification: Nominal discovery, nominal linking, and the grounding of nouns.” 2017. Thesis, University of Illinois – Urbana-Champaign. Accessed March 01, 2021.
http://hdl.handle.net/2142/99195.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
MLA Handbook (7th Edition):
Chen, Liang-Wei. “Extending Wikification: Nominal discovery, nominal linking, and the grounding of nouns.” 2017. Web. 01 Mar 2021.
Vancouver:
Chen L. Extending Wikification: Nominal discovery, nominal linking, and the grounding of nouns. [Internet] [Thesis]. University of Illinois – Urbana-Champaign; 2017. [cited 2021 Mar 01].
Available from: http://hdl.handle.net/2142/99195.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Council of Science Editors:
Chen L. Extending Wikification: Nominal discovery, nominal linking, and the grounding of nouns. [Thesis]. University of Illinois – Urbana-Champaign; 2017. Available from: http://hdl.handle.net/2142/99195
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation

University of Illinois – Urbana-Champaign
6.
Rozovskaya, Alla.
Automated methods for text correction.
Degree: PhD, 0301 0301, 2014, University of Illinois – Urbana-Champaign
URL: http://hdl.handle.net/2142/46875
► Development of automatic text correction systems has a long history in natural language processing research. This thesis considers the problem of correcting writing mistakes made…
(more)
▼ Development of automatic text correction systems has a long history in natural language processing research. This thesis considers the problem of correcting writing mistakes made by non-native English speakers. We address several types of errors commonly exhibited by non-native English writers – misuse of articles, prepositions, noun number, and verb properties – and build a robust, state-of-the-art system that combines machine learning methods and linguistic knowledge.
The proposed approach is distinguished from other related work in several respects. First,
several machine learning methods are compared to determine which methods are most effective for this problem. Earlier evaluations, because they are based on incomparable data sets, have questionable conclusions. Our results reverse these conclusions and pave the way for the next contribution.
Using the important observation that mistakes made by non-native writers are systematic, we develop models that utilize knowledge about error regularities with minimal annotation costs. Our approach differs from earlier ones that either built models that had no knowledge about error regularities or required a lot of annotated data.
Next, we develop special strategies for correcting errors on open-class words. These errors, while being very prevalent among non-native English speakers, are the least studied and are not well-understood linguistically. The challenges that these mistakes present are addressed in a linguistically-informed approach.
Finally, a novel global approach to error correction is proposed that considers grammatical dependencies among error types and addresses these via joint learning and joint inference. The systems and techniques described in this thesis are evaluated empirically and competitively in the context of several shared tasks, where they have demonstrated superior performance. In particular, our system ranked first in the most prestigious competition in the natural language processing field, the CoNLL-2013 shared task on text correction. Based on the analysis of this system, four design principles that are crucial for building a state-of-the-art error correction system are identified.
Advisors/Committee Members: Champaign%22%20%2Bcontributor%3A%28%22Roth%2C%20Dan%22%29&pagesize-30">
Roth,
Dan (advisor),
Champaign%22%20%2Bcontributor%3A%28%22Cole%2C%20Jennifer%20S.%22%29&pagesize-30">Cole, Jennifer S. (Committee Chair),
Champaign%22%20%2Bcontributor%3A%28%22Roth%2C%20Dan%22%29&pagesize-30">Roth, Dan (committee member),
Champaign%22%20%2Bcontributor%3A%28%22Hockenmaier%2C%20Julia%20C.%22%29&pagesize-30">Hockenmaier, Julia C. (committee member),
Champaign%22%20%2Bcontributor%3A%28%22Hirst%2C%20Graeme%22%29&pagesize-30">Hirst, Graeme (committee member).
Subjects/Keywords: text correction; grammatical error correction; English as a second language (ESL) error correction; automated methods for text correction
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Rozovskaya, A. (2014). Automated methods for text correction. (Doctoral Dissertation). University of Illinois – Urbana-Champaign. Retrieved from http://hdl.handle.net/2142/46875
Chicago Manual of Style (16th Edition):
Rozovskaya, Alla. “Automated methods for text correction.” 2014. Doctoral Dissertation, University of Illinois – Urbana-Champaign. Accessed March 01, 2021.
http://hdl.handle.net/2142/46875.
MLA Handbook (7th Edition):
Rozovskaya, Alla. “Automated methods for text correction.” 2014. Web. 01 Mar 2021.
Vancouver:
Rozovskaya A. Automated methods for text correction. [Internet] [Doctoral dissertation]. University of Illinois – Urbana-Champaign; 2014. [cited 2021 Mar 01].
Available from: http://hdl.handle.net/2142/46875.
Council of Science Editors:
Rozovskaya A. Automated methods for text correction. [Doctoral Dissertation]. University of Illinois – Urbana-Champaign; 2014. Available from: http://hdl.handle.net/2142/46875

University of Illinois – Urbana-Champaign
7.
Ratinov, Lev.
Exploiting knowledge in NLP.
Degree: PhD, 0112, 2012, University of Illinois – Urbana-Champaign
URL: http://hdl.handle.net/2142/31198
► In recent decades, the society depends more and more on computers for a large number of tasks. The first steps in NLP applications involve identification…
(more)
▼ In recent decades, the society depends more and more on computers for a large number of tasks. The first steps in NLP applications involve identification of topics, entities, concepts, and relations in text. Traditionally, statistical models have been successfully deployed for the aforementioned problems. However, the major trend
so far has been: “scaling up by dumbing down”- that is, applying sophisticated statistical algorithms operating on very simple or low-level features of the text. This trend is also exemplified, by expressions such as "we present a
knowledge-lean approach", which have been traditionally viewed as a positive statement, one that will help papers get into top conferences. This thesis suggests that it is essential to use knowledge in NLP, proposes several
ways of doing it, and provides case studies on several fundamental NLP problems.
It is clear that humans use a lot of knowledge when understanding text.
Let us consider the following text "Carnahan campaigned with Al Gore
whenever the vice president was in Missouri." and ask two questions: (1) who is the vice president? (2) is this sentence about politics or sports? A knowledge-lean NLP approach will have a great difficulty answering the first question, and will require a lot of training data to answer the second one. On the other hand, people can answer both questions effortlessly.
We are not the first to suggest that NLP requires knowledge. One of the first such large-scale efforts, CYC, has started in 1984, and by 1995 has consumed a person-century of effort collecting 100000 concepts and 1000000 commonsense axioms, including "You can usually see
peoples noses, but not their hearts". Unfortunately, such an effort has several problems. (a) The set of facts we can deduce is
significantly larger than 1M . For example, in the above example
"heart" can be replaced by any internal organ or tissue, as well as by a bank account, thoughts etc., leading to thousands of axioms. (b) The axioms often do not hold. For example, if the person is standing with their back to you, can cannot see their nose. And during an open heart surgery, you can see someone's
heart. (c) Matching the concepts to natural-language expressions is challenging. For example, "Al Gore" can be referred to as "Democrat", "environmentalist", "vice president", "Nobel prize laureate" among other things. The idea of "buying a used car" can be also expressed as "purchasing a pre-owned automobile". Lexical variability in text makes using knowledge challenging. Instead of focusing on obtaining a large set of logic axioms, we are focusing on using knowledge-rich features in NLP solutions.
We have used three sources of knowledge: a large corpus of unlabeled text, encyclopedic knowledge derived from Wikipedia and first-order-logic-like constraints within a machine learning framework.
Namely, we have developed a Named Entity Recognition system which uses word representations induced from unlabeled text and gazetteers extracted from Wikipedia to achieve new state of the art…
Advisors/Committee Members: Champaign%22%20%2Bcontributor%3A%28%22Roth%2C%20Dan%22%29&pagesize-30">
Roth,
Dan (advisor),
Champaign%22%20%2Bcontributor%3A%28%22Roth%2C%20Dan%22%29&pagesize-30">Roth, Dan (Committee Chair),
Champaign%22%20%2Bcontributor%3A%28%22Han%2C%20Jiawei%22%29&pagesize-30">Han, Jiawei (committee member),
Champaign%22%20%2Bcontributor%3A%28%22Zhai%2C%20ChengXiang%22%29&pagesize-30">Zhai, ChengXiang (committee member),
Champaign%22%20%2Bcontributor%3A%28%22Mihalcea%2C%20Rada%22%29&pagesize-30">Mihalcea, Rada (committee member).
Subjects/Keywords: Machine Learning; Natural language processing (NLP); Text Classification; Co-reference Resolution; Concept Disambiguation; Information Extraction; Named Entity Recognition; Semi-Supervised Learning.
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Ratinov, L. (2012). Exploiting knowledge in NLP. (Doctoral Dissertation). University of Illinois – Urbana-Champaign. Retrieved from http://hdl.handle.net/2142/31198
Chicago Manual of Style (16th Edition):
Ratinov, Lev. “Exploiting knowledge in NLP.” 2012. Doctoral Dissertation, University of Illinois – Urbana-Champaign. Accessed March 01, 2021.
http://hdl.handle.net/2142/31198.
MLA Handbook (7th Edition):
Ratinov, Lev. “Exploiting knowledge in NLP.” 2012. Web. 01 Mar 2021.
Vancouver:
Ratinov L. Exploiting knowledge in NLP. [Internet] [Doctoral dissertation]. University of Illinois – Urbana-Champaign; 2012. [cited 2021 Mar 01].
Available from: http://hdl.handle.net/2142/31198.
Council of Science Editors:
Ratinov L. Exploiting knowledge in NLP. [Doctoral Dissertation]. University of Illinois – Urbana-Champaign; 2012. Available from: http://hdl.handle.net/2142/31198

University of Illinois – Urbana-Champaign
8.
Chang, Ming-Wei.
Structured prediction with indirect supervision.
Degree: PhD, 0112, 2012, University of Illinois – Urbana-Champaign
URL: http://hdl.handle.net/2142/29768
► Structured tasks, which often involve many interdependent decisions for each example, are the backbone for many important applications such as natural language processing tasks. The…
(more)
▼ Structured tasks, which often involve many interdependent decisions for each example, are the backbone for many important applications such as natural language processing tasks. The models built for structured tasks need to be capable of assigning values to a set of interdependent variables. In this thesis, we point out that the strong dependencies between the decisions in structured tasks can be exploited to simplify both the learning task and the annotation effort – it is sometimes possible to supply partial and indirect supervision to only some of the target variables or to other variables that are derivatives of the target variables and thus reduce the supervision effort significantly.
Based on this intuition, this thesis addresses the problem of reducing the cost of labeling for structural tasks. We tackle this problem by developing advanced machine learning algorithms that can learn and generalize from indirect supervision in addition to labeled data. Indirect supervision can come in the form of constraints or weaker supervision signals. Our proposed learning frameworks can handle both structured output problems and problems with latent structures. We demonstrate the effectiveness of the learning with indirect supervision framework for many natural language processing tasks.
Advisors/Committee Members: Champaign%22%20%2Bcontributor%3A%28%22Roth%2C%20Dan%22%29&pagesize-30">
Roth,
Dan (advisor),
Champaign%22%20%2Bcontributor%3A%28%22Roth%2C%20Dan%22%29&pagesize-30">Roth, Dan (Committee Chair),
Champaign%22%20%2Bcontributor%3A%28%22DeJong%2C%20Gerald%20F.%22%29&pagesize-30">DeJong, Gerald F. (committee member),
Champaign%22%20%2Bcontributor%3A%28%22Hoiem%2C%20Derek%20W.%22%29&pagesize-30">Hoiem, Derek W. (committee member),
Champaign%22%20%2Bcontributor%3A%28%22Smith%2C%20Noah%22%29&pagesize-30">Smith, Noah (committee member).
Subjects/Keywords: Machine learning; natural language processing; structural learning; indirect supervision; constraint driven learning
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Chang, M. (2012). Structured prediction with indirect supervision. (Doctoral Dissertation). University of Illinois – Urbana-Champaign. Retrieved from http://hdl.handle.net/2142/29768
Chicago Manual of Style (16th Edition):
Chang, Ming-Wei. “Structured prediction with indirect supervision.” 2012. Doctoral Dissertation, University of Illinois – Urbana-Champaign. Accessed March 01, 2021.
http://hdl.handle.net/2142/29768.
MLA Handbook (7th Edition):
Chang, Ming-Wei. “Structured prediction with indirect supervision.” 2012. Web. 01 Mar 2021.
Vancouver:
Chang M. Structured prediction with indirect supervision. [Internet] [Doctoral dissertation]. University of Illinois – Urbana-Champaign; 2012. [cited 2021 Mar 01].
Available from: http://hdl.handle.net/2142/29768.
Council of Science Editors:
Chang M. Structured prediction with indirect supervision. [Doctoral Dissertation]. University of Illinois – Urbana-Champaign; 2012. Available from: http://hdl.handle.net/2142/29768

University of Illinois – Urbana-Champaign
9.
Tsai, Chen-Tse.
Concept and entity grounding using indirect supervision.
Degree: PhD, Computer Science, 2017, University of Illinois – Urbana-Champaign
URL: http://hdl.handle.net/2142/98336
► Extracting and disambiguating entities and concepts is a crucial step toward understanding natural language text. In this thesis, we consider the problem of grounding concepts…
(more)
▼ Extracting and disambiguating entities and concepts is a crucial step toward understanding natural language text. In this thesis, we consider the problem of grounding concepts and entities mentioned in text to one or more knowledge bases (KBs). A well-studied scenario of this problem is the one in which documents are given in English and the goal is to identify concept and entity mentions, and find the corresponding entries the mentions refer to in Wikipedia. We extend this problem in two directions: First, we study identifying and grounding entities written in any language to the English Wikipedia. Second, we investigate using multiple KBs which do not contain rich textual and structural information Wikipedia does.
These more involved settings pose a few additional challenges beyond those addressed in the standard English Wikification problem. Key among them is that no supervision is available to facilitate training machine learning models. The first extension, cross-lingual Wikification, introduces problems such as recognizing multilingual named entities mentioned in text, translating non-English names into English, and computing word similarity across languages. Since it is impossible to acquire manually annotated examples for all languages, building models for all languages in Wikipedia requires exploring indirect or incidental supervision signals which already exist in Wikipedia. For the second setting, we need to deal with the fact that most KBs do not contain the rich information Wikipedia has; consequently, the main supervision signal used to train Wikification rankers does not exist anymore. In this thesis, we show that supervision signals can be obtained by carefully examining the redundancy and relations between multiple KBs. By developing algorithms and models which harvest these incidental signals, we can achieve better performance on these tasks.
Advisors/Committee Members: Champaign%22%20%2Bcontributor%3A%28%22Roth%2C%20Dan%22%29&pagesize-30">
Roth,
Dan (advisor),
Champaign%22%20%2Bcontributor%3A%28%22Roth%2C%20Dan%22%29&pagesize-30">Roth, Dan (Committee Chair),
Champaign%22%20%2Bcontributor%3A%28%22Chang%2C%20Kevin%22%29&pagesize-30">Chang, Kevin (committee member),
Champaign%22%20%2Bcontributor%3A%28%22Zhai%2C%20ChengXiang%22%29&pagesize-30">Zhai, ChengXiang (committee member),
Champaign%22%20%2Bcontributor%3A%28%22Mihalcea%2C%20Rada%22%29&pagesize-30">Mihalcea, Rada (committee member).
Subjects/Keywords: Wikification; Entity linking; Cross-lingual wikification; Named entity recognition; Indirect supervision; Incidental supervision; Entity disambiguation; Concept disambiguation
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Tsai, C. (2017). Concept and entity grounding using indirect supervision. (Doctoral Dissertation). University of Illinois – Urbana-Champaign. Retrieved from http://hdl.handle.net/2142/98336
Chicago Manual of Style (16th Edition):
Tsai, Chen-Tse. “Concept and entity grounding using indirect supervision.” 2017. Doctoral Dissertation, University of Illinois – Urbana-Champaign. Accessed March 01, 2021.
http://hdl.handle.net/2142/98336.
MLA Handbook (7th Edition):
Tsai, Chen-Tse. “Concept and entity grounding using indirect supervision.” 2017. Web. 01 Mar 2021.
Vancouver:
Tsai C. Concept and entity grounding using indirect supervision. [Internet] [Doctoral dissertation]. University of Illinois – Urbana-Champaign; 2017. [cited 2021 Mar 01].
Available from: http://hdl.handle.net/2142/98336.
Council of Science Editors:
Tsai C. Concept and entity grounding using indirect supervision. [Doctoral Dissertation]. University of Illinois – Urbana-Champaign; 2017. Available from: http://hdl.handle.net/2142/98336

University of Illinois – Urbana-Champaign
10.
Pasternack, Jeffrey.
Knowing Who to Trust and What to Believe in the Presence of Conflicting Information.
Degree: PhD, 0112, 2012, University of Illinois – Urbana-Champaign
URL: http://hdl.handle.net/2142/29516
► The Information Age has created an increasing abundance of data and has, thanks to the rise of the Internet, made that knowledge instantly available to…
(more)
▼ The Information Age has created an increasing abundance of data and has, thanks to the rise of the Internet, made that knowledge instantly available to humans and computers alike. This is not without caveats, however, as though we may read a document, ask an expert, or locate a fact nearly effortlessly, we lack a ready means to determine whether we should actually believe them.
We seek to address this problem with a computational trust system capable of substituting for the user's informed, subjective judgment, with the understanding that truth is not objective and instead depends upon one's prior knowledge and beliefs, a philosophical point with deep practical implications.
First, however, we must consider the even more basic question of how the trustworthiness of an information source can be expressed: measuring the trustworthiness of a person, document, or
publisher as the mere percentage of true claims it makes can be extraordinarily misleading at worst, and uninformative at best. Instead of providing simple accuracy, we instead provide a comprehensive set of trust metrics, calculating the source's truthfulness, completeness, and bias, providing the user with our trust judgment in a way that is both understandable and actionable.
We then consider the trust algorithm itself, starting with the baseline of determining the truth by taking a simple vote that assumes all information sources are equally trustworthy, and quickly move on to fact-finders, iterative algorithms capable of estimating the trustworthiness of the source in addition to the believability of the claims, and proceed to incorporate increasing amounts of information and declarative prior knowledge into the fact-finder's trust decision via the Generalized and Constrained Fact-Finding frameworks while still maintaining the relative simplicity and tractability of standard fact-finders.
Ultimately, we introduce Latent Trust Analysis, a new type of probabilistic trust model that provides the first strongly principled view of information trust and a wide array of advantages over preceding methods, with a semantically crisp generative story that explains how sources "generate" their assertions in claims. Such explanations can be used to justify trust decisions to the user, and, moreover, the transparent mechanics make the models highly flexible, e.g. by applying regularization via Bayesian prior probabilities. Furthermore, as probabilistic models they naturally support semi-supervised and supervised learning when the truth of some claims or the trustworthiness of sources is already known, unlike fact-finders which are perform only unsupervised learning. Finally, with Generalized Constrained Models, a new structured learning technique, we can apply declarative prior knowledge to Latent Trust Analysis models just as we can with Constrained Fact-Finding.
Together, these trust algorithms create a spectrum of approaches that trade increasing complexity for greater information utilization, performance, and flexibility, although even the…
Advisors/Committee Members: Champaign%22%20%2Bcontributor%3A%28%22Roth%2C%20Dan%22%29&pagesize-30">
Roth,
Dan (advisor),
Champaign%22%20%2Bcontributor%3A%28%22Gil%2C%20Yolanda%22%29&pagesize-30">Gil, Yolanda (committee member),
Champaign%22%20%2Bcontributor%3A%28%22Han%2C%20Jiawei%22%29&pagesize-30">Han, Jiawei (committee member),
Champaign%22%20%2Bcontributor%3A%28%22Zhai%2C%20ChengXiang%22%29&pagesize-30">Zhai, ChengXiang (committee member).
Subjects/Keywords: trust; trustworthiness; information trustworthiness; information trust; comprehensive trust metrics; subjective truth; fact-finders; fact-finding; factfinders; factfinding; generalized fact-finding; generalized fact-finders; constrained fact-finding; constrained fact-finders; Generalized Constrained Models (GCMs); latent trust analysis; latent trustworthiness analysis; Latent Trust Analysis (LTA); belief; information filtering; structured learning; constrained structured learning; constrained learning
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Pasternack, J. (2012). Knowing Who to Trust and What to Believe in the Presence of Conflicting Information. (Doctoral Dissertation). University of Illinois – Urbana-Champaign. Retrieved from http://hdl.handle.net/2142/29516
Chicago Manual of Style (16th Edition):
Pasternack, Jeffrey. “Knowing Who to Trust and What to Believe in the Presence of Conflicting Information.” 2012. Doctoral Dissertation, University of Illinois – Urbana-Champaign. Accessed March 01, 2021.
http://hdl.handle.net/2142/29516.
MLA Handbook (7th Edition):
Pasternack, Jeffrey. “Knowing Who to Trust and What to Believe in the Presence of Conflicting Information.” 2012. Web. 01 Mar 2021.
Vancouver:
Pasternack J. Knowing Who to Trust and What to Believe in the Presence of Conflicting Information. [Internet] [Doctoral dissertation]. University of Illinois – Urbana-Champaign; 2012. [cited 2021 Mar 01].
Available from: http://hdl.handle.net/2142/29516.
Council of Science Editors:
Pasternack J. Knowing Who to Trust and What to Believe in the Presence of Conflicting Information. [Doctoral Dissertation]. University of Illinois – Urbana-Champaign; 2012. Available from: http://hdl.handle.net/2142/29516

University of Illinois – Urbana-Champaign
11.
Zhao, Bo.
Truth finding in databases.
Degree: PhD, 0112, 2013, University of Illinois – Urbana-Champaign
URL: http://hdl.handle.net/2142/42470
► In practical data integration systems, it is common for the data sources being integrated to provide conflicting information about the same entity. Consequently, a major…
(more)
▼ In practical data integration systems, it is common for the data sources being integrated to provide conflicting information about the same entity. Consequently, a major challenge for data integration is to derive the most complete and accurate integrated records from diverse and sometimes conflicting sources. We term this challenge the truth finding problem. We observe that some sources are generally more reliable than others, and therefore a good model of source quality is the key to solving the truth finding problem. In this thesis, we propose probabilistic models that can automatically infer true records and source quality without any supervision on both categorical data and numerical data. We further develop a new entity matching framework that considers source quality based on truth-finding models.
On categorical data, in contrast to previous methods, our principled approach leverages a generative process of two types of errors (false positive and false negative) by modeling two different aspects of source quality. In so doing, ours is also the first approach designed to merge multi-valued attribute types. Our method is scalable, due to an efficient sampling-based inference algorithm that needs very few iterations in practice and enjoys linear time complexity, with an even faster incremental variant. Experiments on two real world datasets show that our new method outperforms existing state-of-the-art approaches to the truth finding problem on categorical data.
While in practice, numerical data is not only ubiquitous but also of high value, e.g. price, weather, census, polls and economic statistics. Quality issues on numerical data can also be even more common and severe than categorical data due to its characteristics. Therefore, in this thesis we propose a new truth-finding method specially designed for handling numerical data. Based on Bayesian probabilistic models, our method can leverage the characteristics of numerical data in a principled way, when modeling the dependencies among source quality, truth, and claimed values. Experiments on two real world datasets show that our new method outperforms existing state-of-the-art approaches in both effectiveness and efficiency.
We further observe that modeling source quality not only can help decide the truth but also can help match entities across different sources. Therefore, as a natural next step, we integrate truth finding with entity matching so that we could infer matching of entities, true attributes of entities and source quality in a joint fashion. This is the first entity matching approach that involves modeling source quality and truth finding. Experiments show that our approach can outperform state-of-the-art baselines.
Advisors/Committee Members: Champaign%22%20%2Bcontributor%3A%28%22Han%2C%20Jiawei%22%29&pagesize-30">Han, Jiawei (advisor),
Champaign%22%20%2Bcontributor%3A%28%22Han%2C%20Jiawei%22%29&pagesize-30">Han, Jiawei (Committee Chair),
Champaign%22%20%2Bcontributor%3A%28%22Zhai%2C%20ChengXiang%22%29&pagesize-30">Zhai, ChengXiang (committee member),
Champaign%22%20%2Bcontributor%3A%28%22Roth%2C%20Dan%22%29&pagesize-30">Roth, Dan (committee member),
Champaign%22%20%2Bcontributor%3A%28%22Yu%2C%20Philip%20S.%22%29&pagesize-30">Yu, Philip S. (committee member).
Subjects/Keywords: data integration; truth finding; data fusion; data quality; entity matching; data mining; probabilistic graphical models
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Zhao, B. (2013). Truth finding in databases. (Doctoral Dissertation). University of Illinois – Urbana-Champaign. Retrieved from http://hdl.handle.net/2142/42470
Chicago Manual of Style (16th Edition):
Zhao, Bo. “Truth finding in databases.” 2013. Doctoral Dissertation, University of Illinois – Urbana-Champaign. Accessed March 01, 2021.
http://hdl.handle.net/2142/42470.
MLA Handbook (7th Edition):
Zhao, Bo. “Truth finding in databases.” 2013. Web. 01 Mar 2021.
Vancouver:
Zhao B. Truth finding in databases. [Internet] [Doctoral dissertation]. University of Illinois – Urbana-Champaign; 2013. [cited 2021 Mar 01].
Available from: http://hdl.handle.net/2142/42470.
Council of Science Editors:
Zhao B. Truth finding in databases. [Doctoral Dissertation]. University of Illinois – Urbana-Champaign; 2013. Available from: http://hdl.handle.net/2142/42470

University of Illinois – Urbana-Champaign
12.
Le, Hieu.
Efficient data to decision pipelines for embedded and social sensing.
Degree: PhD, 0112, 2013, University of Illinois – Urbana-Champaign
URL: http://hdl.handle.net/2142/42487
► This dissertation presents results of our studies in making data to decision pipelines for embedded and social sensing efficient. Due to the pervasive presence of…
(more)
▼ This dissertation presents results of our studies in making data to decision pipelines for embedded and social sensing efficient. Due to the pervasive presence of wired sensors, wireless sensors, and mobile devices, the amount of data about the physical world (environmental measurements, traffic, etc.) and human societies (news, trends, conversations, intelligence reports, etc.) reaches an unprecedented rate and volume. This motivates us to optimize the way information is collected from sensors and social entities. Two challenges are addressed: (i) How can we gather data such that throughput is maximized given the physical constraints of the communication medium? and (ii) How can we process inherently unreliable data, generated by large networks of information and social sources? We present some essential solutions addressing these challenges in this dissertation. The dissertation is organized in two parts. Part I presents our solution to maximizing bit-level data throughput by utilizing multiple radio channels in applications equiped with wireless sensors. Part II presents our solution to dealing with the large amount of information contributed by unvetted sources.
Advisors/Committee Members: Champaign%22%20%2Bcontributor%3A%28%22Abdelzaher%2C%20Tarek%20F.%22%29&pagesize-30">Abdelzaher, Tarek F. (advisor),
Champaign%22%20%2Bcontributor%3A%28%22Abdelzaher%2C%20Tarek%20F.%22%29&pagesize-30">Abdelzaher, Tarek F. (Committee Chair),
Champaign%22%20%2Bcontributor%3A%28%22Nahrstedt%2C%20Klara%22%29&pagesize-30">Nahrstedt, Klara (committee member),
Champaign%22%20%2Bcontributor%3A%28%22Roth%2C%20Dan%22%29&pagesize-30">Roth, Dan (committee member),
Champaign%22%20%2Bcontributor%3A%28%22Szymanski%2C%20Boleslaw%22%29&pagesize-30">Szymanski, Boleslaw (committee member).
Subjects/Keywords: Social Sensing; Data Distillation; Data to Decision; Big Data; Data Science
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Le, H. (2013). Efficient data to decision pipelines for embedded and social sensing. (Doctoral Dissertation). University of Illinois – Urbana-Champaign. Retrieved from http://hdl.handle.net/2142/42487
Chicago Manual of Style (16th Edition):
Le, Hieu. “Efficient data to decision pipelines for embedded and social sensing.” 2013. Doctoral Dissertation, University of Illinois – Urbana-Champaign. Accessed March 01, 2021.
http://hdl.handle.net/2142/42487.
MLA Handbook (7th Edition):
Le, Hieu. “Efficient data to decision pipelines for embedded and social sensing.” 2013. Web. 01 Mar 2021.
Vancouver:
Le H. Efficient data to decision pipelines for embedded and social sensing. [Internet] [Doctoral dissertation]. University of Illinois – Urbana-Champaign; 2013. [cited 2021 Mar 01].
Available from: http://hdl.handle.net/2142/42487.
Council of Science Editors:
Le H. Efficient data to decision pipelines for embedded and social sensing. [Doctoral Dissertation]. University of Illinois – Urbana-Champaign; 2013. Available from: http://hdl.handle.net/2142/42487

University of Illinois – Urbana-Champaign
13.
Duan, Huizhong.
Intent modeling and automatic query reformulation for search engine systems.
Degree: PhD, 0112, 2014, University of Illinois – Urbana-Champaign
URL: http://hdl.handle.net/2142/46834
► Understanding and modeling users' intent in search queries is an important topic in studying search engine systems. Good understanding of search intent is required in…
(more)
▼ Understanding and modeling users' intent in search queries is an important topic in studying search engine systems. Good understanding of search intent is required in order to achieve better search accuracy and better user experience. In this thesis work, I identify and study three major problems in the subject: ambiguous search intent, ineffective query formulation and vague relevance criteria. To systematically study these problems, the thesis consists of three parts. In the first part, I study search intent ambiguity in search engine queries and propose a click pattern-based method that captures ambiguous search intent based on behavioral difference rather than semantic difference. Analysis shows that the proposed method is more accurate and robust in measuring query ambiguity. In the second part, I study how to provide query formulation support to facilitate users in expressing search intent. Query completion and correction, and syntactic query reformulation are proposed and studied in this part. Experiments show that the proposed query formulation support methods can help users formulate more effective queries and alleviate search difficulty. In the third part, I study how to model search intent so that we can gain insights about users' behaviors and leverage the knowledge to improve search engines. Two topics are studied in this part: modeling search intent with data level representation and discovering coordinated shopping intent in product search. It is shown that the proposed methods can not only discover meaningful user intent but also improve search and other related applications. The proposed models and algorithms in the thesis are general and can be applied to improve search accuracy in potentially many different search engines. As a systematic study on intent modeling and automatic query reformulation in search engine systems, this thesis work also provides a road map to future exploration on intent understanding and analysis.
Advisors/Committee Members: Champaign%22%20%2Bcontributor%3A%28%22Zhai%2C%20ChengXiang%22%29&pagesize-30">Zhai, ChengXiang (advisor),
Champaign%22%20%2Bcontributor%3A%28%22Zhai%2C%20ChengXiang%22%29&pagesize-30">Zhai, ChengXiang (Committee Chair),
Champaign%22%20%2Bcontributor%3A%28%22Han%2C%20Jiawei%22%29&pagesize-30">Han, Jiawei (committee member),
Champaign%22%20%2Bcontributor%3A%28%22Roth%2C%20Dan%22%29&pagesize-30">Roth, Dan (committee member),
Champaign%22%20%2Bcontributor%3A%28%22Kiciman%2C%20Emre%22%29&pagesize-30">Kiciman, Emre (committee member).
Subjects/Keywords: search intent; query reformulation
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Duan, H. (2014). Intent modeling and automatic query reformulation for search engine systems. (Doctoral Dissertation). University of Illinois – Urbana-Champaign. Retrieved from http://hdl.handle.net/2142/46834
Chicago Manual of Style (16th Edition):
Duan, Huizhong. “Intent modeling and automatic query reformulation for search engine systems.” 2014. Doctoral Dissertation, University of Illinois – Urbana-Champaign. Accessed March 01, 2021.
http://hdl.handle.net/2142/46834.
MLA Handbook (7th Edition):
Duan, Huizhong. “Intent modeling and automatic query reformulation for search engine systems.” 2014. Web. 01 Mar 2021.
Vancouver:
Duan H. Intent modeling and automatic query reformulation for search engine systems. [Internet] [Doctoral dissertation]. University of Illinois – Urbana-Champaign; 2014. [cited 2021 Mar 01].
Available from: http://hdl.handle.net/2142/46834.
Council of Science Editors:
Duan H. Intent modeling and automatic query reformulation for search engine systems. [Doctoral Dissertation]. University of Illinois – Urbana-Champaign; 2014. Available from: http://hdl.handle.net/2142/46834

University of Illinois – Urbana-Champaign
14.
Sondhi, Parikshit.
Autonomous agents for serving complex information needs.
Degree: PhD, 0112, 2014, University of Illinois – Urbana-Champaign
URL: http://hdl.handle.net/2142/46907
► Over the past few decades two prominent paradigms for information seeking in the form of search engines and recommendation systems have been developed. However neither…
(more)
▼ Over the past few decades two prominent paradigms for information seeking in the form of search engines and recommendation systems have been developed. However neither of these is well suited to serve queries representing complex information needs (eg. medical case-based queries). As a result users increasingly turn to web communities such as HealthBoards and Yahoo! Answers making them extremely popular. However, not all queries posted there receive informative answers or are answered in a timely manner.
In this work we present a novel paradigm for information service in which autonomous agents help dissatisfied users in web communities by proactively posting responses to their unresolved queries. The main contribution of this work is to concretely define three application tasks based on this paradigm in the healthcare domain, and show that it is indeed feasible to develop agents capable of generating meaningful responses with a high accuracy.
The first task involved designing an agent for resolving physician case-based queries using literature data. We addressed the problem via methods that utilized available biomedical semantic resources and showed that a precision at 10 of upto 0.48 could be achieved. The second study involved resolving layperson queries on web forums by finding similar discussion threads. This task was more challenging due to noisy nature of forum data and unsuitability of existing semantic resources. We developed novel shallow semantic information extraction techniques for the problem, and our methods utilized them to achieve a best precision at 5 of 0.54. Finally the third task was to design an autonomous agent for resolving general healthcare questions on community question answering (cQA) websites. This task required more detailed semantic information in the form of a database containing precise medical entities, verbose text descriptions, and the relations between them. These were obtained by using health information websites as an information source. We proposed a principled probabilistic model for the problem, and it was found to resolve over 30% of the questions correctly.
Overall our results clearly suggest that autonomous agents are not only feasible, but can also deliver considerable value to both expert and layperson users of web forums and cQA websites. We believe such autonomous agents have great potential and our work opens up an exciting new area of research.
Advisors/Committee Members: Champaign%22%20%2Bcontributor%3A%28%22Zhai%2C%20ChengXiang%22%29&pagesize-30">Zhai, ChengXiang (advisor),
Champaign%22%20%2Bcontributor%3A%28%22Zhai%2C%20ChengXiang%22%29&pagesize-30">Zhai, ChengXiang (Committee Chair),
Champaign%22%20%2Bcontributor%3A%28%22Roth%2C%20Dan%22%29&pagesize-30">Roth, Dan (committee member),
Champaign%22%20%2Bcontributor%3A%28%22Sun%2C%20Jimeng%22%29&pagesize-30">Sun, Jimeng (committee member),
Champaign%22%20%2Bcontributor%3A%28%22Schatz%2C%20Bruce%20R.%22%29&pagesize-30">Schatz, Bruce R. (committee member).
Subjects/Keywords: Autonomous agents; complex information needs; forum search; case retrieval; question answering; reliability; knowledge-based Question
Resolution (kbqr)
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Sondhi, P. (2014). Autonomous agents for serving complex information needs. (Doctoral Dissertation). University of Illinois – Urbana-Champaign. Retrieved from http://hdl.handle.net/2142/46907
Chicago Manual of Style (16th Edition):
Sondhi, Parikshit. “Autonomous agents for serving complex information needs.” 2014. Doctoral Dissertation, University of Illinois – Urbana-Champaign. Accessed March 01, 2021.
http://hdl.handle.net/2142/46907.
MLA Handbook (7th Edition):
Sondhi, Parikshit. “Autonomous agents for serving complex information needs.” 2014. Web. 01 Mar 2021.
Vancouver:
Sondhi P. Autonomous agents for serving complex information needs. [Internet] [Doctoral dissertation]. University of Illinois – Urbana-Champaign; 2014. [cited 2021 Mar 01].
Available from: http://hdl.handle.net/2142/46907.
Council of Science Editors:
Sondhi P. Autonomous agents for serving complex information needs. [Doctoral Dissertation]. University of Illinois – Urbana-Champaign; 2014. Available from: http://hdl.handle.net/2142/46907

University of Illinois – Urbana-Champaign
15.
Wang, Chi.
Mining latent entity structures from massive unstructured and interconnected data.
Degree: PhD, 0112, 2015, University of Illinois – Urbana-Champaign
URL: http://hdl.handle.net/2142/72967
► The “big data” era is characterized by an explosion of information in the form of digital data collections, ranging from scientific knowledge, to social media,…
(more)
▼ The “big data” era is characterized by an explosion of information in the form of digital data collections, ranging from scientific knowledge, to social media, news, and everyone’s daily life. Valuable knowledge about multi-typed entities is often hidden in the unstructured or loosely structured but interconnected data. Mining latent structured information around entities uncovers semantic
structures from massive unstructured data and hence enables many high-impact applications, including taxonomy or knowledge base construction, multi-dimensional data analysis and information
or social network analysis.
A mining framework is proposed, to solve and integrate a chain of tasks: hierarchical topic
discovery, topical phrase mining, entity role analysis and entity relation mining. It reveals two
main forms of structures: topical and relational structures. The topical structure summarizes the
topics associated with entities with various granularity, such as the research areas in computer
science. The framework enables recursive construction of phrase-represented and entity-enriched
topic hierarchy from text-attached information networks. It makes breakthrough in terms of quality
and computational efficiency. The relational structure recovers the hidden relationship among
entities, such as advisor-advisee. A probabilistic graphical modeling approach is proposed. The
method can utilize heterogeneous attributes and links to capture all kinds of semantic signals,
including constraints and dependencies, to recover the hierarchical relationship with the best known
accuracy.
Advisors/Committee Members: Champaign%22%20%2Bcontributor%3A%28%22Han%2C%20Jiawei%22%29&pagesize-30">Han, Jiawei (advisor),
Champaign%22%20%2Bcontributor%3A%28%22Han%2C%20Jiawei%22%29&pagesize-30">Han, Jiawei (Committee Chair),
Champaign%22%20%2Bcontributor%3A%28%22Zhai%2C%20ChengXiang%22%29&pagesize-30">Zhai, ChengXiang (committee member),
Champaign%22%20%2Bcontributor%3A%28%22Roth%2C%20Dan%22%29&pagesize-30">Roth, Dan (committee member),
Champaign%22%20%2Bcontributor%3A%28%22Chakrabarti%2C%20Kaushik%22%29&pagesize-30">Chakrabarti, Kaushik (committee member).
Subjects/Keywords: data mining; text mining; information network; social network; network analysis; probabilistic graphical model; topic model; phrase mining; relation mining; Information Extraction
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Wang, C. (2015). Mining latent entity structures from massive unstructured and interconnected data. (Doctoral Dissertation). University of Illinois – Urbana-Champaign. Retrieved from http://hdl.handle.net/2142/72967
Chicago Manual of Style (16th Edition):
Wang, Chi. “Mining latent entity structures from massive unstructured and interconnected data.” 2015. Doctoral Dissertation, University of Illinois – Urbana-Champaign. Accessed March 01, 2021.
http://hdl.handle.net/2142/72967.
MLA Handbook (7th Edition):
Wang, Chi. “Mining latent entity structures from massive unstructured and interconnected data.” 2015. Web. 01 Mar 2021.
Vancouver:
Wang C. Mining latent entity structures from massive unstructured and interconnected data. [Internet] [Doctoral dissertation]. University of Illinois – Urbana-Champaign; 2015. [cited 2021 Mar 01].
Available from: http://hdl.handle.net/2142/72967.
Council of Science Editors:
Wang C. Mining latent entity structures from massive unstructured and interconnected data. [Doctoral Dissertation]. University of Illinois – Urbana-Champaign; 2015. Available from: http://hdl.handle.net/2142/72967

University of Illinois – Urbana-Champaign
16.
Guo, Ruiqi.
Scene understanding with complete scenes and structured representations.
Degree: PhD, 0112, 2014, University of Illinois – Urbana-Champaign
URL: http://hdl.handle.net/2142/50564
► Humans can understand scenes with abundant detail: they see layouts, surfaces, the shape of objects among other details. By contrast, many machine-based scene analysis algorithms…
(more)
▼ Humans can understand scenes with abundant detail: they see layouts, surfaces, the shape of objects among other details. By contrast, many machine-based scene analysis algorithms use simple representation to parse scenes, mainly bounding boxes and pixel labels, and apply only to visible regions. We believe we should move to deeper levels of scene analysis, embracing more a comprehensive, structured representation.
In this dissertation, we focus on analyzing scenes to their complete extent and structured details. First off, our work uses a structured representation that is closer to human interpretation, with a mixture of layout, functional objects and clutter. We developed annotation tools and collected a dataset of 1449 rooms annotated in detailed 3D models.
Another feature of our work is that we understand scenes to their complete extent, even parts of them beyond the line of the sight. We present a simple framework to detect visible portion with appearance-based models and then infer the occluded portion with a contextual approach. We integrate contexts from surrounding regions, the spatial prior and shape regularity of background surfaces. Our method is applicable to 2D images, and can also be used to infer support surfaces in 3D scenarios. Our complete surface prediction quantitatively out-performs relevant baselines, especially when they are occluded.
Finally, we present a system that interprets from single-view RGB-D images of indoor scenes into our proposed representation. Such a scene interpretation is useful for robotics and visual reasoning but difficult to produce due to the well-known challenge of segmenting objects, the high degree of occlusion, and the diversity of objects in indoor scenes. We take a data-driven approach, generating sets of potential object regions, matching them with regions in training images, and transferring and aligning associated 3D models while encouraging them to be consistent with observed depths. To the best of our knowledge, this is the first automatic system capable of interpreting scenes into 3D models with similar levels of detail.
Advisors/Committee Members: Champaign%22%20%2Bcontributor%3A%28%22Hoiem%2C%20Derek%20W.%22%29&pagesize-30">Hoiem, Derek W. (advisor),
Champaign%22%20%2Bcontributor%3A%28%22Hoiem%2C%20Derek%20W.%22%29&pagesize-30">Hoiem, Derek W. (Committee Chair),
Champaign%22%20%2Bcontributor%3A%28%22Forsyth%2C%20David%20A.%22%29&pagesize-30">Forsyth, David A. (committee member),
Champaign%22%20%2Bcontributor%3A%28%22Roth%2C%20Dan%22%29&pagesize-30">Roth, Dan (committee member),
Champaign%22%20%2Bcontributor%3A%28%22Urtasun%2C%20Raquel%22%29&pagesize-30">Urtasun, Raquel (committee member).
Subjects/Keywords: Scene Understanding; Computer Vision; Machine Learning; Computer Graphics; Image Parsing; Image Segmentation; RGB-D images
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Guo, R. (2014). Scene understanding with complete scenes and structured representations. (Doctoral Dissertation). University of Illinois – Urbana-Champaign. Retrieved from http://hdl.handle.net/2142/50564
Chicago Manual of Style (16th Edition):
Guo, Ruiqi. “Scene understanding with complete scenes and structured representations.” 2014. Doctoral Dissertation, University of Illinois – Urbana-Champaign. Accessed March 01, 2021.
http://hdl.handle.net/2142/50564.
MLA Handbook (7th Edition):
Guo, Ruiqi. “Scene understanding with complete scenes and structured representations.” 2014. Web. 01 Mar 2021.
Vancouver:
Guo R. Scene understanding with complete scenes and structured representations. [Internet] [Doctoral dissertation]. University of Illinois – Urbana-Champaign; 2014. [cited 2021 Mar 01].
Available from: http://hdl.handle.net/2142/50564.
Council of Science Editors:
Guo R. Scene understanding with complete scenes and structured representations. [Doctoral Dissertation]. University of Illinois – Urbana-Champaign; 2014. Available from: http://hdl.handle.net/2142/50564

University of Illinois – Urbana-Champaign
17.
Wang, Hongning.
Computational user intent modeling.
Degree: PhD, 0112, 2014, University of Illinois – Urbana-Champaign
URL: http://hdl.handle.net/2142/50572
► User modeling is essential for any information service system (e.g., search engines, recommender systems, and computational advertising) to optimize its service to the end users.…
(more)
▼ User modeling is essential for any information service system (e.g., search engines, recommender systems, and computational advertising) to optimize its service to the end users. The level of user understanding directly determines the upper bound of optimality that such a system can achieve when assisting its users. Unfortunately, due to the limited support in current human-computer interaction interfaces, users are restricted to express their complex information needs via simple keyword queries or some predefined categories, which are too shallow to capture users' higher-level latent intents that influence their decisions and preferences. As a result, there is great demand to build effective computational models to analyze users' generated data and their behavior patterns when they interact with such systems, and understand users' underlying intents so as to enable the systems to provide optimal and personalized services for each individual user.
This dissertation aims at developing general and effective computational methods for user modeling based on two specific types of user-generated data. First, a novel opinionated text mining problem called Latent Aspect Rating Analysis (LARA) is proposed and studied. Clearly distinct from all previous works in opinion analysis that mostly focus on integrated entity-level opinions, LARA for the first time reveals individual users' latent sentiment preference at the level of topical aspects in an unsupervised manner. A prototype system called ReviewMiner has been developed based on the techniques proposed in the LARA work. Second, users' interaction patterns recorded in search engine logs (e.g., their issued queries and clicked documents) are explored for understanding their longitudinal information seeking behaviors. Various important problems related to users' search behaviors have been addressed, including long-term search task identification, personalized ranking model adaptation prediction and task-level search satisfaction.
Advisors/Committee Members: Champaign%22%20%2Bcontributor%3A%28%22Zhai%2C%20ChengXiang%22%29&pagesize-30">Zhai, ChengXiang (advisor),
Champaign%22%20%2Bcontributor%3A%28%22Zhai%2C%20ChengXiang%22%29&pagesize-30">Zhai, ChengXiang (Committee Chair),
Champaign%22%20%2Bcontributor%3A%28%22Han%2C%20Jiawei%22%29&pagesize-30">Han, Jiawei (committee member),
Champaign%22%20%2Bcontributor%3A%28%22Roth%2C%20Dan%22%29&pagesize-30">Roth, Dan (committee member),
Champaign%22%20%2Bcontributor%3A%28%22Gabrilovich%2C%20Evgeniy%22%29&pagesize-30">Gabrilovich, Evgeniy (committee member).
Subjects/Keywords: User modeling; Opinion mining; Search log analysis; Search personalization; Latent structural model
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Wang, H. (2014). Computational user intent modeling. (Doctoral Dissertation). University of Illinois – Urbana-Champaign. Retrieved from http://hdl.handle.net/2142/50572
Chicago Manual of Style (16th Edition):
Wang, Hongning. “Computational user intent modeling.” 2014. Doctoral Dissertation, University of Illinois – Urbana-Champaign. Accessed March 01, 2021.
http://hdl.handle.net/2142/50572.
MLA Handbook (7th Edition):
Wang, Hongning. “Computational user intent modeling.” 2014. Web. 01 Mar 2021.
Vancouver:
Wang H. Computational user intent modeling. [Internet] [Doctoral dissertation]. University of Illinois – Urbana-Champaign; 2014. [cited 2021 Mar 01].
Available from: http://hdl.handle.net/2142/50572.
Council of Science Editors:
Wang H. Computational user intent modeling. [Doctoral Dissertation]. University of Illinois – Urbana-Champaign; 2014. Available from: http://hdl.handle.net/2142/50572

University of Illinois – Urbana-Champaign
18.
Choi, Jaesik.
Lifted Inference for Relational Hybrid Models.
Degree: PhD, 0112, 2012, University of Illinois – Urbana-Champaign
URL: http://hdl.handle.net/2142/32004
► Probabilistic Graphical Models (PGMs) promise to play a prominent role in many complex real-world systems. Probabilistic Relational Graphical Models (PRGMs) scale the representation and learning…
(more)
▼ Probabilistic Graphical Models (PGMs) promise to play a prominent role in many complex real-world systems. Probabilistic Relational Graphical Models (PRGMs) scale the representation and learning of PGMs. Answering questions using PRGMs enables many current and future applications, such as medical informatics, environmental engineering, financial forecasting and robot localizations. Scaling inference algorithms for large models is a key challenge for scaling up current applications and enabling future ones.
This thesis presents new insights into large-scale probabilistic graphical models. It provides fresh ideas for maintaining a compact structure when answering questions or inferences about large, continuous models. The insights result in a key contribution, the Lifted Relational Kalman filter (LRKF), an efficient estimation algorithm for large-scale linear dynamic systems. It shows that the new relational Kalman filter enables scaling the exact vanilla Kalman filter from 1,000 to 1,000,000,000 variables. Another key contribution of this thesis is that it proves that typically used probabilistic first-order languages, including Markov Logic Networks (MLNs) and First-Order Probabilistic Models (FOPMs), can be reduced to compact probabilistic graphical representations under reasonable conditions. Specifically, this thesis shows that aggregate operators and the existential quantification in the languages are accurately approximated by linear constraints in the Gaussian distribution. In general, probabilistic first-order languages are transformed into nonparametric variational models where lifted inference algorithms can efficiently solve inference problems.
Advisors/Committee Members: Champaign%22%20%2Bcontributor%3A%28%22Amir%2C%20Eyal%22%29&pagesize-30">Amir, Eyal (advisor),
Champaign%22%20%2Bcontributor%3A%28%22Amir%2C%20Eyal%22%29&pagesize-30">Amir, Eyal (Committee Chair),
Champaign%22%20%2Bcontributor%3A%28%22Roth%2C%20Dan%22%29&pagesize-30">Roth, Dan (committee member),
Champaign%22%20%2Bcontributor%3A%28%22LaValle%2C%20Steven%20M.%22%29&pagesize-30">LaValle, Steven M. (committee member),
Champaign%22%20%2Bcontributor%3A%28%22Poole%2C%20David%22%29&pagesize-30">Poole, David (committee member).
Subjects/Keywords: Probabilistic Graphical Models; Relational Hybrid Models; Lifted Inference; First-Order Probabilistic Models; Probabilistic Logic; Kalman filter; Relational Kalman filter; Variational Learning, Markov Logic Networks
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Choi, J. (2012). Lifted Inference for Relational Hybrid Models. (Doctoral Dissertation). University of Illinois – Urbana-Champaign. Retrieved from http://hdl.handle.net/2142/32004
Chicago Manual of Style (16th Edition):
Choi, Jaesik. “Lifted Inference for Relational Hybrid Models.” 2012. Doctoral Dissertation, University of Illinois – Urbana-Champaign. Accessed March 01, 2021.
http://hdl.handle.net/2142/32004.
MLA Handbook (7th Edition):
Choi, Jaesik. “Lifted Inference for Relational Hybrid Models.” 2012. Web. 01 Mar 2021.
Vancouver:
Choi J. Lifted Inference for Relational Hybrid Models. [Internet] [Doctoral dissertation]. University of Illinois – Urbana-Champaign; 2012. [cited 2021 Mar 01].
Available from: http://hdl.handle.net/2142/32004.
Council of Science Editors:
Choi J. Lifted Inference for Relational Hybrid Models. [Doctoral Dissertation]. University of Illinois – Urbana-Champaign; 2012. Available from: http://hdl.handle.net/2142/32004

University of Illinois – Urbana-Champaign
19.
Hajishirzi, Hannaneh.
Action-centered reasoning for probabilistic dynamic domains.
Degree: PhD, 0112, 2012, University of Illinois – Urbana-Champaign
URL: http://hdl.handle.net/2142/29669
► This dissertation focuses on modeling stochastic dynamic domains, using representations and algorithms that combine logical AI ideas and probabilistic methods. We introduce new tractable and…
(more)
▼ This dissertation focuses on modeling stochastic dynamic domains, using representations and algorithms that combine logical AI ideas and probabilistic methods. We introduce new tractable and highly accurate algorithms for reasoning in those complex domains. Furthermore, we apply these algorithms to tasks of narrative understanding and web page monitoring.
We model stochastic dynamic domains with a factored logical representation that uses a graphical model to represent a prior distribution over initial states. Our representation uses sequences of actions (represented in logical form) to represent transitions.
We introduce an algorithm for reasoning in stochastic dynamic domains (in propositional and relational fashions) based on subroutines for reasoning in deterministic substructure of the domain. Our algorithm takes advantage of the factored logical representation and efficient subroutines for logical progression and regression. The tractability of the algorithm results from the tractability of these underlying subroutines. Our theoretical and empirical results show significant improvement of our algorithm over previous approaches for reasoning. Our novel algorithm for reasoning in probabilistic dynamic domains samples sequences of deterministic actions corresponding to an observed sequence of probabilistic actions. This algorithm is built upon a novel exact and tractable algorithm to reason about deterministic dynamic domains with a probabilistic prior.
We apply the dynamic domain representation and our algorithms to the understanding of narratives. For that, we introduce a label-free iterative learning approach which outperforms the state- of-the-art that uses labeled data. We also apply dynamic domains and reasoning about them in the problem of monitoring change in webpages to direct crawlers. For that, we introduce a greedy algorithm that outperforms the state-of-the-art algorithms.
Advisors/Committee Members: Champaign%22%20%2Bcontributor%3A%28%22Amir%2C%20Eyal%22%29&pagesize-30">Amir, Eyal (advisor),
Champaign%22%20%2Bcontributor%3A%28%22Roth%2C%20Dan%22%29&pagesize-30">Roth, Dan (committee member),
Champaign%22%20%2Bcontributor%3A%28%22Hockenmaier%2C%20Julia%20C.%22%29&pagesize-30">Hockenmaier, Julia C. (committee member),
Champaign%22%20%2Bcontributor%3A%28%22Mueller%2C%20Erik%20T.%22%29&pagesize-30">Mueller, Erik T. (committee member).
Subjects/Keywords: Probabilistic reasoning; Action theories; Statistical Relational Models; Narrative Understanding; Web Monitoring
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Hajishirzi, H. (2012). Action-centered reasoning for probabilistic dynamic domains. (Doctoral Dissertation). University of Illinois – Urbana-Champaign. Retrieved from http://hdl.handle.net/2142/29669
Chicago Manual of Style (16th Edition):
Hajishirzi, Hannaneh. “Action-centered reasoning for probabilistic dynamic domains.” 2012. Doctoral Dissertation, University of Illinois – Urbana-Champaign. Accessed March 01, 2021.
http://hdl.handle.net/2142/29669.
MLA Handbook (7th Edition):
Hajishirzi, Hannaneh. “Action-centered reasoning for probabilistic dynamic domains.” 2012. Web. 01 Mar 2021.
Vancouver:
Hajishirzi H. Action-centered reasoning for probabilistic dynamic domains. [Internet] [Doctoral dissertation]. University of Illinois – Urbana-Champaign; 2012. [cited 2021 Mar 01].
Available from: http://hdl.handle.net/2142/29669.
Council of Science Editors:
Hajishirzi H. Action-centered reasoning for probabilistic dynamic domains. [Doctoral Dissertation]. University of Illinois – Urbana-Champaign; 2012. Available from: http://hdl.handle.net/2142/29669

University of Illinois – Urbana-Champaign
20.
Sorokin, Alexander.
Expanding the limits of predictive methods: from supervised learning to novel sensors and massive human supervision.
Degree: PhD, 0112, 2012, University of Illinois – Urbana-Champaign
URL: http://hdl.handle.net/2142/29722
► The mission of machine learning is to empower computers to make generalizations from available data: labeled and unlabeled. The more labeled data we have the…
(more)
▼ The mission of machine learning is to empower computers to make generalizations from available data: labeled and unlabeled. The more labeled data we have the better predictions we???ll make, but labeled data usually comes at a cost and should be used sparingly. In some cases, the nature of prediction problem can be changed by using a different sensor modality or by obtaining a different kind of annotation.
In this dissertation we first present methods to enhance predictive ability by improving the use of existing data: by constructing feature spaces for human activity recognition and by developing semi-supervised methods for object recognition.
We then develop methods for collecting, storing and visualizing information about activity in an indoor office en- vironment. By using a dense array of simple motion sensors, we can track people in the office space, while preserving reasonable expectations of privacy.
We develop methods for efficient access to data annotation services via crowdsourcing. We develop tools for formalizing interactions in the domain of computer vision. By designing a general-purpose toolkit we present a com- putational abstraction of otherwise undefined human abilities. To ensure high quality of crowdsourced annotations, we developed programmatic gold framework. By automatically generating gold standard data for crowdsourced tasks, we can present clear expectations to the workers, provide in-task training and explicitly measure worker accuracy.
Crowdsourced annotations present an opportunity to re-formulate what an AI agent should be able to do. An indoor robot can safely operate in the environment with unknown objects. To interact with the objects, however, it must have a detailed model of the object: semantic label, visual model for recognition and geometry model for grasp and manipulation planning. We develop a robot supervision framework where crowdsourced on-demand annotations allow a robot to collect necessary information about unseen objects, build object models and proceed to manipulate these previously unseen objects.
Advisors/Committee Members: Champaign%22%20%2Bcontributor%3A%28%22Forsyth%2C%20David%20A.%22%29&pagesize-30">Forsyth, David A. (advisor),
Champaign%22%20%2Bcontributor%3A%28%22Roth%2C%20Dan%22%29&pagesize-30">Roth, Dan (committee member),
Champaign%22%20%2Bcontributor%3A%28%22Hoiem%2C%20Derek%20W.%22%29&pagesize-30">Hoiem, Derek W. (committee member),
Champaign%22%20%2Bcontributor%3A%28%22Hockenmaier%2C%20Julia%20C.%22%29&pagesize-30">Hockenmaier, Julia C. (committee member),
Champaign%22%20%2Bcontributor%3A%28%22Bradski%2C%20Gary%20R.%22%29&pagesize-30">Bradski, Gary R. (committee member).
Subjects/Keywords: crowdsourcing; computer vision; robotics; semi-supervised learning; object recognition; human activity recognition; sensor networks
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Sorokin, A. (2012). Expanding the limits of predictive methods: from supervised learning to novel sensors and massive human supervision. (Doctoral Dissertation). University of Illinois – Urbana-Champaign. Retrieved from http://hdl.handle.net/2142/29722
Chicago Manual of Style (16th Edition):
Sorokin, Alexander. “Expanding the limits of predictive methods: from supervised learning to novel sensors and massive human supervision.” 2012. Doctoral Dissertation, University of Illinois – Urbana-Champaign. Accessed March 01, 2021.
http://hdl.handle.net/2142/29722.
MLA Handbook (7th Edition):
Sorokin, Alexander. “Expanding the limits of predictive methods: from supervised learning to novel sensors and massive human supervision.” 2012. Web. 01 Mar 2021.
Vancouver:
Sorokin A. Expanding the limits of predictive methods: from supervised learning to novel sensors and massive human supervision. [Internet] [Doctoral dissertation]. University of Illinois – Urbana-Champaign; 2012. [cited 2021 Mar 01].
Available from: http://hdl.handle.net/2142/29722.
Council of Science Editors:
Sorokin A. Expanding the limits of predictive methods: from supervised learning to novel sensors and massive human supervision. [Doctoral Dissertation]. University of Illinois – Urbana-Champaign; 2012. Available from: http://hdl.handle.net/2142/29722

University of Illinois – Urbana-Champaign
21.
Connor, Michael.
Minimal supervision for language learning: bootstrapping global patterns from local knowledge.
Degree: PhD, 0112, 2012, University of Illinois – Urbana-Champaign
URL: http://hdl.handle.net/2142/29824
► A fundamental step in sentence comprehension involves assigning semantic roles to sentence constituents. To accomplish this, the listener must parse the sentence, find constituents that…
(more)
▼ A fundamental step in sentence comprehension involves assigning semantic roles
to sentence constituents. To accomplish this, the listener
must parse the sentence, find constituents that are candidate arguments, and
assign semantic roles to those constituents. Each step depends on prior lexical
and syntactic knowledge. Where do children begin in solving this problem when
learning their first languages? To experiment with different representations
that children may use to begin understanding language, we have built a
computational model for this early point in language acquisition. This system,
BabySRL, learns from transcriptions of natural child-directed speech and makes
use of psycholinguistically plausible background knowledge and realistically
noisy semantic feedback to begin to classify sentences at the level of ``who
does what to whom.''
Starting with simple, psycholinguistically-motivated representations of
sentence structure, the BabySRL is able to learn from full semantic feedback,
as well as a supervision signal derived from partial semantic background
knowledge. In addition we combine the BabySRL with an unsupervised Hidden
Markov Model part-of-speech tagger, linking clusters with syntactic categories
using background noun knowledge so that they can be used to parse input for the
SRL system. The results show that proposed shallow representations of sentence
structure are robust to reductions in parsing accuracy, and that the
contribution of alternative representations of sentence structure to successful
semantic role labeling varies with the integrity of the parsing and
argument-identification stages. Finally, we enable the BabySRL to improve both
an intermediate syntactic representation and its final semantic role
classification. Using this system we show that it is possible for a simple
learner in a plausible (noisy) setup to begin comprehending simple semantics
when initialized with a small amount of concrete noun knowledge and some simple
syntax-semantics mapping biases, before acquiring any specific verb knowledge.
Advisors/Committee Members: Champaign%22%20%2Bcontributor%3A%28%22Roth%2C%20Dan%22%29&pagesize-30">
Roth,
Dan (advisor),
Champaign%22%20%2Bcontributor%3A%28%22Fisher%2C%20Cynthia%20L.%22%29&pagesize-30">Fisher, Cynthia L. (committee member),
Champaign%22%20%2Bcontributor%3A%28%22Stevenson%2C%20Suzanne%22%29&pagesize-30">Stevenson, Suzanne (committee member),
Champaign%22%20%2Bcontributor%3A%28%22DeJong%2C%20Gerald%20F.%22%29&pagesize-30">DeJong, Gerald F. (committee member),
Champaign%22%20%2Bcontributor%3A%28%22Hockenmaier%2C%20Julia%20C.%22%29&pagesize-30">Hockenmaier, Julia C. (committee member).
Subjects/Keywords: Machine Learning; Natural Language Processing; Language acquisition; Psycholinguistics
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Connor, M. (2012). Minimal supervision for language learning: bootstrapping global patterns from local knowledge. (Doctoral Dissertation). University of Illinois – Urbana-Champaign. Retrieved from http://hdl.handle.net/2142/29824
Chicago Manual of Style (16th Edition):
Connor, Michael. “Minimal supervision for language learning: bootstrapping global patterns from local knowledge.” 2012. Doctoral Dissertation, University of Illinois – Urbana-Champaign. Accessed March 01, 2021.
http://hdl.handle.net/2142/29824.
MLA Handbook (7th Edition):
Connor, Michael. “Minimal supervision for language learning: bootstrapping global patterns from local knowledge.” 2012. Web. 01 Mar 2021.
Vancouver:
Connor M. Minimal supervision for language learning: bootstrapping global patterns from local knowledge. [Internet] [Doctoral dissertation]. University of Illinois – Urbana-Champaign; 2012. [cited 2021 Mar 01].
Available from: http://hdl.handle.net/2142/29824.
Council of Science Editors:
Connor M. Minimal supervision for language learning: bootstrapping global patterns from local knowledge. [Doctoral Dissertation]. University of Illinois – Urbana-Champaign; 2012. Available from: http://hdl.handle.net/2142/29824

University of Illinois – Urbana-Champaign
22.
Girlea, Codruta Liliana.
Deception detection in dialogues.
Degree: PhD, Computer Science, 2017, University of Illinois – Urbana-Champaign
URL: http://hdl.handle.net/2142/99091
► In the social media era, it is commonplace to engage in written conversations. People sometimes even form connections across large distances, in writing. However, human…
(more)
▼ In the social media era, it is commonplace to engage in written conversations. People sometimes even form connections across large distances, in writing. However, human communication is in large part non-verbal. This means it is now easier for people to hide their harmful intentions. At the same time, people can now get in touch with more people than ever before. This puts vulnerable groups at higher risk for malevolent interactions, such as bullying, trolling, or predatory behavior. Furthermore, such growing behaviors have most recently led to waves of fake news and a growing industry of deceit creators and deceit detectors. There is now an urgent need for both theory that explains deception and applications that automatically detect deception.
In this thesis I address this need with a novel application that learns from examples and detects deception reliably in natural-language dialogues. I formally define the problem of deception detection and identify several domains where it is useful. I introduce and evaluate new psycholinguistic features of deception in written dialogues for two datasets. My results shed light on the connection between language, deception, and perception. They also underline the challenges and difficulty of assessing perceptions from written text.
To automatically learn to detect deception I first introduce an expressive logical model and then present a probabilistic model that simplifies the first and is learnable from labeled examples. I introduce a belief-over-belief formalization, based on Kripke semantics and situation calculus. I use an observation model to describe how utterances are produced from the nested beliefs and intentions. This allows me to easily make inferences about these beliefs and intentions given utterances, without needing to explicitly represent perlocutions. The agents’ belief states are filtered with the observed utterances, resulting in an updated Kripke structure.
I then translate my formalization to a practical system that can learn from a small dataset and is able to perform well using very little structural background knowledge in the form of a relational dynamic Bayesian network structure.
Advisors/Committee Members: Champaign%22%20%2Bcontributor%3A%28%22Amir%2C%20Eyal%22%29&pagesize-30">Amir, Eyal (advisor),
Champaign%22%20%2Bcontributor%3A%28%22Girju%2C%20Roxana%22%29&pagesize-30">Girju, Roxana (advisor),
Champaign%22%20%2Bcontributor%3A%28%22Amir%2C%20Eyal%22%29&pagesize-30">Amir, Eyal (Committee Chair),
Champaign%22%20%2Bcontributor%3A%28%22Roth%2C%20Dan%22%29&pagesize-30">Roth, Dan (committee member),
Champaign%22%20%2Bcontributor%3A%28%22Hockenmaier%2C%20Julia%22%29&pagesize-30">Hockenmaier, Julia (committee member),
Champaign%22%20%2Bcontributor%3A%28%22Shahaf%2C%20Dafna%22%29&pagesize-30">Shahaf, Dafna (committee member).
Subjects/Keywords: Natural language dialogues; Beliefs over beliefs; Psycholinguistics; Deception detection; Dynamic Bayesian networks
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Girlea, C. L. (2017). Deception detection in dialogues. (Doctoral Dissertation). University of Illinois – Urbana-Champaign. Retrieved from http://hdl.handle.net/2142/99091
Chicago Manual of Style (16th Edition):
Girlea, Codruta Liliana. “Deception detection in dialogues.” 2017. Doctoral Dissertation, University of Illinois – Urbana-Champaign. Accessed March 01, 2021.
http://hdl.handle.net/2142/99091.
MLA Handbook (7th Edition):
Girlea, Codruta Liliana. “Deception detection in dialogues.” 2017. Web. 01 Mar 2021.
Vancouver:
Girlea CL. Deception detection in dialogues. [Internet] [Doctoral dissertation]. University of Illinois – Urbana-Champaign; 2017. [cited 2021 Mar 01].
Available from: http://hdl.handle.net/2142/99091.
Council of Science Editors:
Girlea CL. Deception detection in dialogues. [Doctoral Dissertation]. University of Illinois – Urbana-Champaign; 2017. Available from: http://hdl.handle.net/2142/99091

University of Illinois – Urbana-Champaign
23.
Bisk, Yonatan Yitzhak.
Unsupervised grammar induction with Combinatory Categorial Grammars.
Degree: PhD, Computer Science, 2015, University of Illinois – Urbana-Champaign
URL: http://hdl.handle.net/2142/89027
► Language is a highly structured medium for communication. An idea starts in the speaker's mind (semantics) and is transformed into a well formed, intelligible, sentence…
(more)
▼ Language is a highly structured medium for communication. An idea starts in the speaker's mind (semantics) and is transformed into a well formed, intelligible, sentence via the specific syntactic rules of a language. We aim to discover the fingerprints of this process in the choice and location of words used in the final utterance. What is unclear is how much of this latent process can be discovered from the linguistic signal alone and how much requires shared non-linguistic context, knowledge, or cues.
Unsupervised grammar induction is the task of analyzing strings in a language to discover the latent syntactic structure of the language without access to labeled training data. Successes in unsupervised grammar induction shed light on the amount of syntactic structure that is discoverable from raw or part-of-speech tagged text. In this thesis, we present a state-of-the-art grammar induction system based on Combinatory Categorial Grammars. Our choice of syntactic formalism enables the first labeled evaluation of an unsupervised system. This allows us to perform an in-depth analysis of the system’s linguistic strengths and weaknesses. In order to completely eliminate reliance on any supervised systems, we also examine how performance is affected when we use induced word clusters instead of gold-standard POS tags. Finally, we perform a semantic evaluation of induced grammars, providing unique insights into future directions for unsupervised grammar induction systems.
Advisors/Committee Members: Champaign%22%20%2Bcontributor%3A%28%22Hockenmaier%2C%20Julia%22%29&pagesize-30">Hockenmaier, Julia (advisor),
Champaign%22%20%2Bcontributor%3A%28%22Hockenmaier%2C%20Julia%22%29&pagesize-30">Hockenmaier, Julia (Committee Chair),
Champaign%22%20%2Bcontributor%3A%28%22Eisner%2C%20Jason%22%29&pagesize-30">Eisner, Jason (committee member),
Champaign%22%20%2Bcontributor%3A%28%22Roth%2C%20Dan%22%29&pagesize-30">Roth, Dan (committee member),
Champaign%22%20%2Bcontributor%3A%28%22Zhai%2C%20ChengXiang%22%29&pagesize-30">Zhai, ChengXiang (committee member).
Subjects/Keywords: Combinatory Categorial Grammar (CCG); Grammar Induction; Unsupervised Methods
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Bisk, Y. Y. (2015). Unsupervised grammar induction with Combinatory Categorial Grammars. (Doctoral Dissertation). University of Illinois – Urbana-Champaign. Retrieved from http://hdl.handle.net/2142/89027
Chicago Manual of Style (16th Edition):
Bisk, Yonatan Yitzhak. “Unsupervised grammar induction with Combinatory Categorial Grammars.” 2015. Doctoral Dissertation, University of Illinois – Urbana-Champaign. Accessed March 01, 2021.
http://hdl.handle.net/2142/89027.
MLA Handbook (7th Edition):
Bisk, Yonatan Yitzhak. “Unsupervised grammar induction with Combinatory Categorial Grammars.” 2015. Web. 01 Mar 2021.
Vancouver:
Bisk YY. Unsupervised grammar induction with Combinatory Categorial Grammars. [Internet] [Doctoral dissertation]. University of Illinois – Urbana-Champaign; 2015. [cited 2021 Mar 01].
Available from: http://hdl.handle.net/2142/89027.
Council of Science Editors:
Bisk YY. Unsupervised grammar induction with Combinatory Categorial Grammars. [Doctoral Dissertation]. University of Illinois – Urbana-Champaign; 2015. Available from: http://hdl.handle.net/2142/89027

University of Illinois – Urbana-Champaign
24.
Massung, Sean Alexander.
Beyond topic-based representations for text mining.
Degree: PhD, Computer Science, 2017, University of Illinois – Urbana-Champaign
URL: http://hdl.handle.net/2142/97736
► A massive amount of online information is natural language text: newspapers, blog articles, forum posts and comments, tweets, scientific literature, government documents, and more. While…
(more)
▼ A massive amount of online information is natural language text: newspapers, blog articles, forum posts and comments, tweets, scientific literature, government documents, and more. While in general, all kinds of online information is useful, textual information is especially important—it is the most natural, most common, and most expressive form of information. Text representation plays a critical role in application tasks like classification or information retrieval since the quality of the underlying feature space directly impacts each task's performance. Because of this importance, many different approaches have been developed for generating text representations. By far, the most common way to generate features is to segment text into words and record their n-grams. While simple term features perform relatively well in topic-based tasks, not all downstream applications are of a topical nature and can be captured by words alone. For example, determining the native language of an English essay writer will depend on more than just word choice. Competing methods to topic-based representations (such as neural networks) are often not interpretable or rely on massive amounts of training data. This thesis proposes three novel contributions to generate and analyze a large space of non-topical features.
First, structural parse tree features are solely based on structural properties of a parse tree by ignoring all of the syntactic categories in the tree. An important advantage of these "skeletons" over regular syntactic features is that they can capture global tree structures without causing problems of data sparseness or overfitting.
Second, SyntacticDiff explicitly captures differences in a text document with respect to a reference corpus, creating features that are easily explained as weighted word edit differences. These edit features are especially useful since they are derived from information not present in the current document, capturing a type of comparative feature.
Third, Cross-Context Lexical Analysis is a general framework for analyzing similarities and differences in both term meaning and representation with respect to different, potentially overlapping partitions of a text collection. The representations analyzed by CCLA are not limited to topic-based features.
Advisors/Committee Members: Champaign%22%20%2Bcontributor%3A%28%22Zhai%2C%20ChengXiang%22%29&pagesize-30">Zhai, ChengXiang (advisor),
Champaign%22%20%2Bcontributor%3A%28%22Zhai%2C%20ChengXiang%22%29&pagesize-30">Zhai, ChengXiang (Committee Chair),
Champaign%22%20%2Bcontributor%3A%28%22Hockenmaier%2C%20Julia%22%29&pagesize-30">Hockenmaier, Julia (committee member),
Champaign%22%20%2Bcontributor%3A%28%22Roth%2C%20Dan%22%29&pagesize-30">Roth, Dan (committee member),
Champaign%22%20%2Bcontributor%3A%28%22LaValle%2C%20Steve%22%29&pagesize-30">LaValle, Steve (committee member),
Champaign%22%20%2Bcontributor%3A%28%22Mei%2C%20Qiaozhu%22%29&pagesize-30">Mei, Qiaozhu (committee member).
Subjects/Keywords: Text mining; Text representation; Feature representation; Natural language processing
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Massung, S. A. (2017). Beyond topic-based representations for text mining. (Doctoral Dissertation). University of Illinois – Urbana-Champaign. Retrieved from http://hdl.handle.net/2142/97736
Chicago Manual of Style (16th Edition):
Massung, Sean Alexander. “Beyond topic-based representations for text mining.” 2017. Doctoral Dissertation, University of Illinois – Urbana-Champaign. Accessed March 01, 2021.
http://hdl.handle.net/2142/97736.
MLA Handbook (7th Edition):
Massung, Sean Alexander. “Beyond topic-based representations for text mining.” 2017. Web. 01 Mar 2021.
Vancouver:
Massung SA. Beyond topic-based representations for text mining. [Internet] [Doctoral dissertation]. University of Illinois – Urbana-Champaign; 2017. [cited 2021 Mar 01].
Available from: http://hdl.handle.net/2142/97736.
Council of Science Editors:
Massung SA. Beyond topic-based representations for text mining. [Doctoral Dissertation]. University of Illinois – Urbana-Champaign; 2017. Available from: http://hdl.handle.net/2142/97736

University of Illinois – Urbana-Champaign
25.
Shirazi, Afsaneh H.
Reasoning with models of probabilistic knowledge over probabilistic knowledge.
Degree: PhD, 0112, 2011, University of Illinois – Urbana-Champaign
URL: http://hdl.handle.net/2142/24239
► In multi-agent systems, the knowledge of agents about other agents??? knowledge often plays a pivotal role in their decisions. In many applications, this knowledge involves…
(more)
▼ In multi-agent systems, the knowledge of agents about other agents??? knowledge often plays a pivotal role in their decisions. In many applications, this knowledge
involves uncertainty. This uncertainty may be about the state of the world or about the other agents??? knowledge. In this thesis, we answer the question of how
to model this probabilistic knowledge and reason about it efficiently.
Modal logics enable representation of knowledge and belief by explicit reference to classical logical formulas in addition to references to those formulas??? truth values. Traditional modal logics (see e.g. [Fitting, 1993; Blackburn et al., 2007]) cannot easily represent scenarios involving degrees of belief. Works that combine modal logics and probabilities apply the representation power of modal operators for representing beliefs over beliefs, and the representation power of probability
for modeling graded beliefs. Most tractable approaches apply a single model that is either engineered or learned, and reasoning is done within that model.
Present model-based approaches of this kind are limited in that either their semantics is restricted to have all agents with a common prior on world states, or
are resolving to reasoning algorithms that do not scale to large models.
In this thesis we provide the first sampling-based algorithms for model-based reasoning in such combinations of modal logics and probability. We examine a
different point than examined before in the expressivity-tractability tradeoff for that combination, and examine both general models and also models which use Bayesian Networks to represent subjective probabilistic beliefs of agents. We provide exact inference algorithms for the two representations, together with correctness results, and show that they are faster than comparable previous ones when some structural conditions hold. We also present sampling-based algorithms, show that those converge under relaxed conditions and that they may not converge otherwise, demonstrate the methods on some examples, and examine the performance of our algorithms experimentally.
Advisors/Committee Members: Champaign%22%20%2Bcontributor%3A%28%22Amir%2C%20Eyal%22%29&pagesize-30">Amir, Eyal (advisor),
Champaign%22%20%2Bcontributor%3A%28%22Amir%2C%20Eyal%22%29&pagesize-30">Amir, Eyal (Committee Chair),
Champaign%22%20%2Bcontributor%3A%28%22Roth%2C%20Dan%22%29&pagesize-30">Roth, Dan (committee member),
Champaign%22%20%2Bcontributor%3A%28%22Forsyth%2C%20David%20A.%22%29&pagesize-30">Forsyth, David A. (committee member),
Champaign%22%20%2Bcontributor%3A%28%22Chekuri%2C%20Chandra%20S.%22%29&pagesize-30">Chekuri, Chandra S. (committee member).
Subjects/Keywords: Probabilistic Knowledge; Bayesian Networks; Modal Logic
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Shirazi, A. H. (2011). Reasoning with models of probabilistic knowledge over probabilistic knowledge. (Doctoral Dissertation). University of Illinois – Urbana-Champaign. Retrieved from http://hdl.handle.net/2142/24239
Chicago Manual of Style (16th Edition):
Shirazi, Afsaneh H. “Reasoning with models of probabilistic knowledge over probabilistic knowledge.” 2011. Doctoral Dissertation, University of Illinois – Urbana-Champaign. Accessed March 01, 2021.
http://hdl.handle.net/2142/24239.
MLA Handbook (7th Edition):
Shirazi, Afsaneh H. “Reasoning with models of probabilistic knowledge over probabilistic knowledge.” 2011. Web. 01 Mar 2021.
Vancouver:
Shirazi AH. Reasoning with models of probabilistic knowledge over probabilistic knowledge. [Internet] [Doctoral dissertation]. University of Illinois – Urbana-Champaign; 2011. [cited 2021 Mar 01].
Available from: http://hdl.handle.net/2142/24239.
Council of Science Editors:
Shirazi AH. Reasoning with models of probabilistic knowledge over probabilistic knowledge. [Doctoral Dissertation]. University of Illinois – Urbana-Champaign; 2011. Available from: http://hdl.handle.net/2142/24239

University of Illinois – Urbana-Champaign
26.
Ji, Ming.
Semi-supervised learning and relevance search on networked data.
Degree: PhD, 0112, 2014, University of Illinois – Urbana-Champaign
URL: http://hdl.handle.net/2142/46856
► Real-world data entities are often connected by meaningful relationships, forming large-scale networks. With the rapid growth of social networks and online relational data, it is…
(more)
▼ Real-world data entities are often connected by meaningful relationships, forming large-scale networks. With the rapid growth of social networks and online relational data, it is widely recognized that networked data are playing increasingly important roles in people's daily life. Based on whether the nodes and edges have different semantic meanings or not, networks can be roughly categorized into heterogeneous and homogeneous networks. Although homogeneous networks have been studied for decades, some problems still remain unsolved. Heterogeneous networks are much more complicated than homogeneous networks, and have not been explored until recently. Therefore, effective and principled algorithms for mining both homogeneous and heterogeneous networks are in great demand.
In this thesis, two important and closely related problems, semi-supervised learning and relevance search, are studied on both homogeneous and heterogeneous networks. Different from many existing models, algorithms developed in this thesis are theoretically reasonable, widely applicable with minimum constraints, and provide more informative mining results. First, a label selection criterion is proposed to improve the effectiveness of existing semi-supervised learning models on networks. Second, ranking and semi-supervised learning are integrated together to improve the informativeness of the results. Third, a relevance search algorithm that fully considers the geometric structure of the homogeneous networked data is designed. Finally, the relevance search problem between different types of nodes on heterogeneous networks is studied, and the proposed solution is applied on a network constructed from unstructured text data. Research results introduced in this thesis provide advanced principles and the first few steps towards a complete and systematic solution of mining networked data.
Advisors/Committee Members: Champaign%22%20%2Bcontributor%3A%28%22Han%2C%20Jiawei%22%29&pagesize-30">Han, Jiawei (advisor),
Champaign%22%20%2Bcontributor%3A%28%22Han%2C%20Jiawei%22%29&pagesize-30">Han, Jiawei (Committee Chair),
Champaign%22%20%2Bcontributor%3A%28%22Roth%2C%20Dan%22%29&pagesize-30">Roth, Dan (committee member),
Champaign%22%20%2Bcontributor%3A%28%22Huang%2C%20Thomas%20S.%22%29&pagesize-30">Huang, Thomas S. (committee member),
Champaign%22%20%2Bcontributor%3A%28%22Chen%2C%20Yuguo%22%29&pagesize-30">Chen, Yuguo (committee member),
Champaign%22%20%2Bcontributor%3A%28%22Ye%2C%20Jieping%22%29&pagesize-30">Ye, Jieping (committee member).
Subjects/Keywords: Data Mining; Machine Learning; Semi-supervised Learning; Search; Heterogeneous Networks; Graphs
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Ji, M. (2014). Semi-supervised learning and relevance search on networked data. (Doctoral Dissertation). University of Illinois – Urbana-Champaign. Retrieved from http://hdl.handle.net/2142/46856
Chicago Manual of Style (16th Edition):
Ji, Ming. “Semi-supervised learning and relevance search on networked data.” 2014. Doctoral Dissertation, University of Illinois – Urbana-Champaign. Accessed March 01, 2021.
http://hdl.handle.net/2142/46856.
MLA Handbook (7th Edition):
Ji, Ming. “Semi-supervised learning and relevance search on networked data.” 2014. Web. 01 Mar 2021.
Vancouver:
Ji M. Semi-supervised learning and relevance search on networked data. [Internet] [Doctoral dissertation]. University of Illinois – Urbana-Champaign; 2014. [cited 2021 Mar 01].
Available from: http://hdl.handle.net/2142/46856.
Council of Science Editors:
Ji M. Semi-supervised learning and relevance search on networked data. [Doctoral Dissertation]. University of Illinois – Urbana-Champaign; 2014. Available from: http://hdl.handle.net/2142/46856

University of Illinois – Urbana-Champaign
27.
Kantor, Arthur.
Pronunciation modeling for large vocabulary speech recognition.
Degree: PhD, 0112, 2011, University of Illinois – Urbana-Champaign
URL: http://hdl.handle.net/2142/18276
► The large pronunciation variability of words in conversational speech is one of the major causes of low accuracy in automatic speech recognition (ASR). Many pronunciation…
(more)
▼ The large pronunciation variability of words in conversational speech is one of the major causes of low accuracy in automatic speech recognition (ASR). Many pronunciation modeling approaches have been developed to address this problem. Some explicitly manipulate the pronunciation dictionary as well as the set of the units used to define the pronunciations of words. Other approaches model the pronunciation implicitly by using long duration acoustical context to more accurately classify the spoken pronunciation unit.
This thesis is a study of the relative ability of the acoustic and the pronunciation models to capture pronunciation variability in a nearly state of the art conversational telephone speech recognizer. Several methods are tested, each designed to improve the modeling accuracy of the recognizer. Some of the experiments result in a lower word error rate, but many do not, apparently because, in different ways, the accuracy gained by one part of the recognizer comes at the expense of accuracy lost or transferred from another part of the recognizer.
Pronunciation variability is modeled with two approaches: from above with explicit pronunciation modeling and from below with implicit pronunciation modeling within the acoustic model. Both approaches make use of long duration context, explicitly by considering long-duration pronunciation units and implicitly by having the acoustic model consider long-duration speech segments.
Some pronunciation models address the pronunciation variability problem by introducing multiple pronunciations per word to cover more variants observed in conversational speech. However, this can potentially increase the confusability between words. This thesis studies the relationship between pronunciation perplexity and the lexical ambiguity, which has informed the design of the explicit pronunciation models presented here.
Advisors/Committee Members: Champaign%22%20%2Bcontributor%3A%28%22Hasegawa-Johnson%2C%20Mark%20A.%22%29&pagesize-30">Hasegawa-Johnson, Mark A. (advisor),
Champaign%22%20%2Bcontributor%3A%28%22Hasegawa-Johnson%2C%20Mark%20A.%22%29&pagesize-30">Hasegawa-Johnson, Mark A. (Committee Chair),
Champaign%22%20%2Bcontributor%3A%28%22Roth%2C%20Dan%22%29&pagesize-30">Roth, Dan (committee member),
Champaign%22%20%2Bcontributor%3A%28%22Fleck%2C%20Margaret%20M.%22%29&pagesize-30">Fleck, Margaret M. (committee member),
Champaign%22%20%2Bcontributor%3A%28%22Livescu%2C%20Karen%22%29&pagesize-30">Livescu, Karen (committee member).
Subjects/Keywords: automatic speech recognition (ASR); Large-Vocabulary Continuous Speech Recognition (LVCSR); Pronunciation modeling; Conversational speech recognition
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Kantor, A. (2011). Pronunciation modeling for large vocabulary speech recognition. (Doctoral Dissertation). University of Illinois – Urbana-Champaign. Retrieved from http://hdl.handle.net/2142/18276
Chicago Manual of Style (16th Edition):
Kantor, Arthur. “Pronunciation modeling for large vocabulary speech recognition.” 2011. Doctoral Dissertation, University of Illinois – Urbana-Champaign. Accessed March 01, 2021.
http://hdl.handle.net/2142/18276.
MLA Handbook (7th Edition):
Kantor, Arthur. “Pronunciation modeling for large vocabulary speech recognition.” 2011. Web. 01 Mar 2021.
Vancouver:
Kantor A. Pronunciation modeling for large vocabulary speech recognition. [Internet] [Doctoral dissertation]. University of Illinois – Urbana-Champaign; 2011. [cited 2021 Mar 01].
Available from: http://hdl.handle.net/2142/18276.
Council of Science Editors:
Kantor A. Pronunciation modeling for large vocabulary speech recognition. [Doctoral Dissertation]. University of Illinois – Urbana-Champaign; 2011. Available from: http://hdl.handle.net/2142/18276

University of Illinois – Urbana-Champaign
28.
Guzman Rivera, Abner.
Multi-output structured learning.
Degree: PhD, 0112, 2014, University of Illinois – Urbana-Champaign
URL: http://hdl.handle.net/2142/50456
► Real-world applications of Machine Learning (ML) require modeling and reasoning about complex, heterogeneous and high-dimensional data. Probabilistic Inference and Structured-Output Prediction (SOP) are frameworks within…
(more)
▼ Real-world applications of Machine Learning (ML) require modeling and reasoning about complex, heterogeneous and high-dimensional data. Probabilistic Inference and Structured-Output Prediction (SOP) are frameworks within ML, which enable systems to learn and reason about complex output spaces by exploiting conditional independence assumptions. SOP systems are capable of coping with exponentially large numbers of possibilities, e.g., all segmentations of an image (i.e., labelings of every pixel with a semantic category); all English translations of a Chinese sentence; or all 3D configurations of a fixed-length sequence of (a priori unknown) amino acids. Indeed, SOP has led to state-of-the-art results in applications from various fields [Bakir et al., 2007].
Despite their success and generality, the application of SOP systems to real-world tasks is most severely limited by intractability issues. In brief, intractability is a consequence of high-order interactions in real-world phenomena. For this reason, researchers adopt performance-limiting simplifying assumptions (e.g., of conditional independence) within their models and forgo optimality guarantees in their inference algorithms. Learning SOP models from data is also intractable in general and thus, further approximations are introduced in the learning task. Additionally, labeled training data, is expensive and most often limited and biased. As a consequence of all of these difficulties, the SOP systems used in practice are plagued with limitations and inaccuracies.
Further complicating the above is the fact that uncertainty is inherent to real-world applications for SOP, e.g., the data input to SOP systems is noisy, incomplete or otherwise ambiguous – in some cases, the input-output mapping is in effect one-to-many. As a result, the distributions over outputs we are interested to model are in general multi-modal.
In this work, we propose to increase the expressivity and performance of SOP models by specifying and training models to produce fixed-size tuples of structured-outputs. We achieve this by constructing “portfolios” of structured prediction models that make independent predictions at test-time but that are trained jointly to produce sets of relevant and diverse hypotheses.
In some sense, the motivation for decomposition in this thesis is akin to the spirit of mixture models or ensemble approaches. However, in this work we dispense with component weights and delay commitment to single predictions. In doing so, we advocate for pipelined approaches where multiple hypotheses are fed forward for refinement, aggregation, simulation, etc. or as inputs to increasingly complex predictive tasks. In these settings, it is often practical and advantageous for certain stages to be informed by higher order features (e.g., inter-hypothesis features), additional information available at test-time (e.g., generative procedure, temporal or textual context) or a user/expert in the loop.
We show that our methods lead to predictions of higher accuracy compared to…
Advisors/Committee Members: Champaign%22%20%2Bcontributor%3A%28%22Rutenbar%2C%20Robin%20A.%22%29&pagesize-30">Rutenbar, Robin A. (advisor),
Champaign%22%20%2Bcontributor%3A%28%22Rutenbar%2C%20Robin%20A.%22%29&pagesize-30">Rutenbar, Robin A. (Committee Chair),
Champaign%22%20%2Bcontributor%3A%28%22Forsyth%2C%20David%20A.%22%29&pagesize-30">Forsyth, David A. (committee member),
Champaign%22%20%2Bcontributor%3A%28%22Roth%2C%20Dan%22%29&pagesize-30">Roth, Dan (committee member),
Champaign%22%20%2Bcontributor%3A%28%22Batra%2C%20Dhruv%22%29&pagesize-30">Batra, Dhruv (committee member),
Champaign%22%20%2Bcontributor%3A%28%22Kohli%2C%20Pushmeet%22%29&pagesize-30">Kohli, Pushmeet (committee member).
Subjects/Keywords: Structured Output Prediction; Structured Learning; Multi-Output Structured Learning; Multiple Outputs
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Guzman Rivera, A. (2014). Multi-output structured learning. (Doctoral Dissertation). University of Illinois – Urbana-Champaign. Retrieved from http://hdl.handle.net/2142/50456
Chicago Manual of Style (16th Edition):
Guzman Rivera, Abner. “Multi-output structured learning.” 2014. Doctoral Dissertation, University of Illinois – Urbana-Champaign. Accessed March 01, 2021.
http://hdl.handle.net/2142/50456.
MLA Handbook (7th Edition):
Guzman Rivera, Abner. “Multi-output structured learning.” 2014. Web. 01 Mar 2021.
Vancouver:
Guzman Rivera A. Multi-output structured learning. [Internet] [Doctoral dissertation]. University of Illinois – Urbana-Champaign; 2014. [cited 2021 Mar 01].
Available from: http://hdl.handle.net/2142/50456.
Council of Science Editors:
Guzman Rivera A. Multi-output structured learning. [Doctoral Dissertation]. University of Illinois – Urbana-Champaign; 2014. Available from: http://hdl.handle.net/2142/50456

University of Illinois – Urbana-Champaign
29.
Li, Yanen.
A systematic study of multi-level query understanding.
Degree: PhD, 0112, 2014, University of Illinois – Urbana-Champaign
URL: http://hdl.handle.net/2142/50592
► Search and information retrieval technologies have significantly transformed the way people seek information and acquire knowledge from the internet. To further improve the search accuracy…
(more)
▼ Search and information retrieval technologies have significantly transformed the way people seek information and acquire knowledge from the internet. To further improve the search accuracy and usability of the current-generation search engines, one of the most important research challenges is for a search engine to accurately understand a user’s intent or information need underlying the query.
This thesis presents a systematic study of query understanding. In this thesis I have proposed a conceptual framework where there are different levels of query understanding. And these levels of query understanding have natural logical dependency. After that, I will present my studies on addressing important research questions in this framework.
First, as a major type of query alteration, I addressed the query spelling correction problem by modeling all major types of spelling errors with a generalized Hidden Markov Model. Second, query segmentation is the most important type of query linguistic signals. I proposed a probabilistic model to identify the query segmentations using clickthrough data. Third, synonym finding is an important challenge for semantic annotation of queries. I proposed a compact clustering framework to mine entity attribute synonyms for a set of inputs jointly with multiple information sources. And finally, in the dynamic query understanding, I introduced the horizontal skipping bias which is unique to the query auto- completion process (QAC). I then proposed a novel two-dimensional click model for modeling the QAC process with emphasis on such behavior.
Advisors/Committee Members: Champaign%22%20%2Bcontributor%3A%28%22Zhai%2C%20ChengXiang%22%29&pagesize-30">Zhai, ChengXiang (advisor),
Champaign%22%20%2Bcontributor%3A%28%22Zhai%2C%20ChengXiang%22%29&pagesize-30">Zhai, ChengXiang (Committee Chair),
Champaign%22%20%2Bcontributor%3A%28%22Han%2C%20Jiawei%22%29&pagesize-30">Han, Jiawei (committee member),
Champaign%22%20%2Bcontributor%3A%28%22Schatz%2C%20Bruce%20R.%22%29&pagesize-30">Schatz, Bruce R. (committee member),
Champaign%22%20%2Bcontributor%3A%28%22Roth%2C%20Dan%22%29&pagesize-30">Roth, Dan (committee member),
Champaign%22%20%2Bcontributor%3A%28%22Hsu%2C%20Bo-June%22%29&pagesize-30">Hsu, Bo-June (committee member).
Subjects/Keywords: Web Search; Query Understanding; Multi-Level Query Understanding; Query Spelling Correction; Query Segmentation; Query Semantics; Query Auto-Completion
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Li, Y. (2014). A systematic study of multi-level query understanding. (Doctoral Dissertation). University of Illinois – Urbana-Champaign. Retrieved from http://hdl.handle.net/2142/50592
Chicago Manual of Style (16th Edition):
Li, Yanen. “A systematic study of multi-level query understanding.” 2014. Doctoral Dissertation, University of Illinois – Urbana-Champaign. Accessed March 01, 2021.
http://hdl.handle.net/2142/50592.
MLA Handbook (7th Edition):
Li, Yanen. “A systematic study of multi-level query understanding.” 2014. Web. 01 Mar 2021.
Vancouver:
Li Y. A systematic study of multi-level query understanding. [Internet] [Doctoral dissertation]. University of Illinois – Urbana-Champaign; 2014. [cited 2021 Mar 01].
Available from: http://hdl.handle.net/2142/50592.
Council of Science Editors:
Li Y. A systematic study of multi-level query understanding. [Doctoral Dissertation]. University of Illinois – Urbana-Champaign; 2014. Available from: http://hdl.handle.net/2142/50592
30.
Yu, Xiaodong.
Character language models for generalization of multilingual named entity recognition.
Degree: MS, Computer Science, 2019, University of Illinois – Urbana-Champaign
URL: http://hdl.handle.net/2142/104934
► State-of-the-art Named Entity Recognition (NER) models usually achieve high performance on entities that they have seen in training data, but a significantly lower performance on…
(more)
▼ State-of-the-art Named Entity Recognition (NER) models usually achieve high performance on entities that they have seen in training data, but a significantly lower performance on unseen entities. This is one of the key reasons in performance degradation observed when NER models are evaluated on new domains. Motivated by this observation, quantified for the first time in this thesis, we study an improved, multi-domain and multi-lingual, capability for identifying \what is a name".
Character-level patterns have been widely used as features in English Named Entity Recognition (NER) systems. However, to date there has been no direct investigation of the inherent differences between name and non-name tokens in text, nor whether this property holds across multiple languages. The key contribution of this thesis is to develop a Character-level Language Model (CLM) that, as we show, allow us to better learn \what is a name". We analyze the capabilities of corpus-agnostic Character-level Language Models (CLMs) in the binary task of distinguishing name tokens from non-name tokens and demonstrate that CLMs provide a simple yet powerful model for capturing these differences. Specifically, we show that it can identify named entity tokens in a diverse set of languages at close to the performance of full NER systems. Moreover, by adding very simple CLM-based features we can significantly improve the performance of an o -the-shelf NER system for multiple languages.
Advisors/Committee Members: Champaign%22%20%2Bcontributor%3A%28%22Roth%2C%20Dan%22%29&pagesize-30">
Roth,
Dan (advisor).
Subjects/Keywords: Character Language Models; Named Entity Recognition; Generalization; Multilingual; Multilingual Named Entity Recognition; NER
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Yu, X. (2019). Character language models for generalization of multilingual named entity recognition. (Thesis). University of Illinois – Urbana-Champaign. Retrieved from http://hdl.handle.net/2142/104934
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Chicago Manual of Style (16th Edition):
Yu, Xiaodong. “Character language models for generalization of multilingual named entity recognition.” 2019. Thesis, University of Illinois – Urbana-Champaign. Accessed March 01, 2021.
http://hdl.handle.net/2142/104934.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
MLA Handbook (7th Edition):
Yu, Xiaodong. “Character language models for generalization of multilingual named entity recognition.” 2019. Web. 01 Mar 2021.
Vancouver:
Yu X. Character language models for generalization of multilingual named entity recognition. [Internet] [Thesis]. University of Illinois – Urbana-Champaign; 2019. [cited 2021 Mar 01].
Available from: http://hdl.handle.net/2142/104934.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Council of Science Editors:
Yu X. Character language models for generalization of multilingual named entity recognition. [Thesis]. University of Illinois – Urbana-Champaign; 2019. Available from: http://hdl.handle.net/2142/104934
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
◁ [1] [2] [3] ▶
.