You searched for +publisher:"University of Texas – Austin" +contributor:("Baldridge, Jason")
.
Showing records 1 – 23 of
23 total matches.

University of Texas – Austin
1.
Rajani, Nazneen Fatema.
New topic detection in microblogs and topic model evaluation using topical alignment.
Degree: MSin Computer Sciences, Computer Science, 2014, University of Texas – Austin
URL: http://hdl.handle.net/2152/25914
► This thesis deals with topic model evaluation and new topic detection in microblogs. Microblogs are short and thus may not carry any contextual clues. Hence…
(more)
▼ This thesis deals with topic model evaluation and new topic detection in microblogs. Microblogs are short and thus may not carry any contextual clues. Hence it becomes challenging to apply traditional natural language processing algorithms on such data. Graphical models have been traditionally used for topic discovery and text clustering on sets of text-based documents. Their unsupervised nature allows topic models to be trained easily on datasets meant for specific domains. However the advantage of not requiring annotated data comes with a drawback with respect to evaluation difficulties. The problem aggravates when the data comprises microblogs which are unstructured and noisy.
We demonstrate the application of three types of such models to microblogs - the Latent Dirichlet Allocation, the Author-Topic and the Author-Recipient-Topic model. We extensively evaluate these models under different settings, and our results show that the Author-Recipient-Topic model extracts the most coherent topics. We also addressed the problem of topic modeling on short text by using clustering techniques. This technique helps in boosting the performance of our models.
Topical alignment is used for large scale assessment of topical relevance by comparing topics to manually generated domain specific concepts. In this thesis we use this idea to evaluate topic models by measuring misalignments between topics. Our study on comparing topic models reveals interesting traits about Twitter messages, users and their interactions and establishes that joint modeling on author-recipient pairs and on the content of tweet leads to qualitatively better topic discovery.
This thesis gives a new direction to the well known problem of topic discovery in microblogs. Trend prediction or topic discovery for microblogs is an extensive research area. We propose the idea of using topical alignment to detect new topics by comparing topics from the current week to those of the previous week. We measure correspondence between a set of topics from the current week and a set of topics from the previous week to quantify five types of misalignments: it{junk, fused, missing} and it{repeated}. Our analysis compares three types of topic models under different settings and demonstrates how our framework can detect new topics from topical misalignments. In particular so-called it{junk} topics are more likely to be new topics and the it{missing} topics are likely to have died or die out.
To get more insights into the nature of microblogs we apply topical alignment to hashtags. Comparing topics to hashtags enables us to make interesting inferences about Twitter messages and their content. Our study revealed that although a very small proportion of Twitter messages explicitly contain hashtags, the proportion of tweets that discuss topics related to hashtags is much higher.
Advisors/Committee Members: Baldridge, Jason (advisor).
Subjects/Keywords: Topic models; Topical alignment
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Rajani, N. F. (2014). New topic detection in microblogs and topic model evaluation using topical alignment. (Masters Thesis). University of Texas – Austin. Retrieved from http://hdl.handle.net/2152/25914
Chicago Manual of Style (16th Edition):
Rajani, Nazneen Fatema. “New topic detection in microblogs and topic model evaluation using topical alignment.” 2014. Masters Thesis, University of Texas – Austin. Accessed February 28, 2021.
http://hdl.handle.net/2152/25914.
MLA Handbook (7th Edition):
Rajani, Nazneen Fatema. “New topic detection in microblogs and topic model evaluation using topical alignment.” 2014. Web. 28 Feb 2021.
Vancouver:
Rajani NF. New topic detection in microblogs and topic model evaluation using topical alignment. [Internet] [Masters thesis]. University of Texas – Austin; 2014. [cited 2021 Feb 28].
Available from: http://hdl.handle.net/2152/25914.
Council of Science Editors:
Rajani NF. New topic detection in microblogs and topic model evaluation using topical alignment. [Masters Thesis]. University of Texas – Austin; 2014. Available from: http://hdl.handle.net/2152/25914

University of Texas – Austin
2.
Ding, Weiwei, 1985-.
Weakly supervised part-of-speech tagging for Chinese using label propagation.
Degree: MA, Linguistics, 2011, University of Texas – Austin
URL: http://hdl.handle.net/2152/ETD-UT-2011-05-3193
► Part-of-speech (POS) tagging is one of the most fundamental and crucial tasks in Natural Language Processing. Chinese POS tagging is challenging because it also involves…
(more)
▼ Part-of-speech (POS) tagging is one of the most fundamental and crucial tasks in Natural Language Processing. Chinese POS tagging is challenging because it also involves word segmentation. In this report, research will be focused on how to improve unsupervised Part-of-Speech (POS) tagging using Hidden Markov Models and the Expectation Maximization parameter estimation approach (EM-HMM). The traditional EM-HMM system uses a dictionary, which is used to constrain possible tag sequences and initialize the model parameters. This is a very crude initialization: the emission
parameters are set uniformly in accordance with the tag dictionary. To improve this, word alignments can be used. Word alignments are the word-level translation correspondent pairs generated from parallel text between two languages. In this report, Chinese-English word alignment is used. The performance is expected to be better, as these two tasks are complementary to each other. The dictionary provides information on word types, while word alignment provides information on word tokens. However, it is found to be of limited benefit.
In this report, another method is proposed. To improve the dictionary coverage and get better POS distribution, Modified Adsorption, a label propagation algorithm is used. We construct a graph connecting word tokens to feature types (such as word unigrams and bigrams) and connecting those tokens to information from knowledge sources, such as a small tag dictionary, Wiktionary, and word alignments. The core idea is to use a small amount of supervision, in the form of a tag dictionary and acquire POS distributions for each word (both known and unknown) and provide this as an improved initialization for EM learning for HMM. We find this strategy to work very well, especially when we have a small tag dictionary. Label propagation provides a better initialization for the EM-HMM method, because it greatly increases the coverage of the dictionary. In addition, label propagation is quite flexible to incorporate many kinds of knowledge. However, results also show that some resources, such as the word alignments, are not easily exploited with label propagation.
Advisors/Committee Members: Baldridge, Jason (advisor), Erk, Katrin (committee member).
Subjects/Keywords: Chinese part-of-speech tagging; Hidden Markov model; Expectation maximization; Label propagation
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Ding, Weiwei, 1. (2011). Weakly supervised part-of-speech tagging for Chinese using label propagation. (Masters Thesis). University of Texas – Austin. Retrieved from http://hdl.handle.net/2152/ETD-UT-2011-05-3193
Chicago Manual of Style (16th Edition):
Ding, Weiwei, 1985-. “Weakly supervised part-of-speech tagging for Chinese using label propagation.” 2011. Masters Thesis, University of Texas – Austin. Accessed February 28, 2021.
http://hdl.handle.net/2152/ETD-UT-2011-05-3193.
MLA Handbook (7th Edition):
Ding, Weiwei, 1985-. “Weakly supervised part-of-speech tagging for Chinese using label propagation.” 2011. Web. 28 Feb 2021.
Vancouver:
Ding, Weiwei 1. Weakly supervised part-of-speech tagging for Chinese using label propagation. [Internet] [Masters thesis]. University of Texas – Austin; 2011. [cited 2021 Feb 28].
Available from: http://hdl.handle.net/2152/ETD-UT-2011-05-3193.
Council of Science Editors:
Ding, Weiwei 1. Weakly supervised part-of-speech tagging for Chinese using label propagation. [Masters Thesis]. University of Texas – Austin; 2011. Available from: http://hdl.handle.net/2152/ETD-UT-2011-05-3193

University of Texas – Austin
3.
DeLozier, Grant Hollis.
Data and methods for Gazetteer Independent Toponym Resolution.
Degree: MA, Linguistics, 2016, University of Texas – Austin
URL: http://hdl.handle.net/2152/38766
► This thesis looks at the computational task of Toponym Resolution from multiple perspectives. In its common form the task requires transforming a place name – e.g.…
(more)
▼ This thesis looks at the computational task of Toponym Resolution from multiple perspectives. In its common form the task requires transforming a place name – e.g. Washington – into some grounded representation of that place, typically a point (latitude, longitude) geometry. In recent years Toponym Resolution (TR) systems have advanced beyond heuristic techniques into more complex machine learned classifiers and impressive gains have been made. Despite these advances, a number of issues remain with the task. This thesis looks at aspects of typical TR approaches in a critical light and proposes solutions and new methods. In particular, I'm critical of the dependence of existing approaches on gazetteer matching and under-utilization of complex geometric data types. I also outline some of the shortcomings in existing toponym corpora and detail a new corpus and annotation tool which I helped to develop.In earlier work I explored whether TR systems could be built without dependencies on gazetteer lookups. That work, which I expand and review in this thesis, showed that competitive accuracies can be achieved without using these human curated resources. Additionally, I demonstrate through error analysis that the largest advantage of a gazetteer matching component is with ontology correction and matching, and not with disambiguation or grounding.These new approaches are tested on pre-existing TR corpora, as well as a new corpus in a novel domain. In the process of detailing the new corpus, I remark on many challenges and design decisions that must be made in Toponym Resolution and propose a new evaluation metric.
Advisors/Committee Members: Baldridge, Jason (advisor), Erk, Katrin (committee member).
Subjects/Keywords: Toponym Resolution; Geolocation; Annotation
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
DeLozier, G. H. (2016). Data and methods for Gazetteer Independent Toponym Resolution. (Masters Thesis). University of Texas – Austin. Retrieved from http://hdl.handle.net/2152/38766
Chicago Manual of Style (16th Edition):
DeLozier, Grant Hollis. “Data and methods for Gazetteer Independent Toponym Resolution.” 2016. Masters Thesis, University of Texas – Austin. Accessed February 28, 2021.
http://hdl.handle.net/2152/38766.
MLA Handbook (7th Edition):
DeLozier, Grant Hollis. “Data and methods for Gazetteer Independent Toponym Resolution.” 2016. Web. 28 Feb 2021.
Vancouver:
DeLozier GH. Data and methods for Gazetteer Independent Toponym Resolution. [Internet] [Masters thesis]. University of Texas – Austin; 2016. [cited 2021 Feb 28].
Available from: http://hdl.handle.net/2152/38766.
Council of Science Editors:
DeLozier GH. Data and methods for Gazetteer Independent Toponym Resolution. [Masters Thesis]. University of Texas – Austin; 2016. Available from: http://hdl.handle.net/2152/38766

University of Texas – Austin
4.
Sudan, Nikita Maple.
Using social network information in recommender systems.
Degree: MSin Engineering, Electrical and Computer Engineering, 2011, University of Texas – Austin
URL: http://hdl.handle.net/2152/ETD-UT-2011-08-3855
► Recommender Systems are used to select online information relevant to a given user. Traditional (memory based) recommenders explore the user-item rating matrix and make recommendations…
(more)
▼ Recommender Systems are used to select online information relevant
to a given user. Traditional (memory based) recommenders explore the user-item rating matrix and make recommendations based on users who have rated similarly or items that have been rated similarly. With the growing popularity of social networks, recommender systems can benefit from combining history of user preferences with information from the social/trust network of users. This thesis explores two techniques of combining user-item rating history with trust network information to make better user-item rating predictions. The first
approach (SCOAL [5]) simultaneously co-clusters and learns separate models
for each co-cluster. The co-clustering is based on the user features as well as
the rating history. This captures the intuition that certain groups of users have similar preferences for certain groups of items. The grouping of certain users is affected by the similarity in the rating behavior and the trust network.
The second graph-based label propagation approach (MAD [27]) works in a transductive setting and propagates ratings of user-item pairs directly on the
user social graph. We evaluate both approaches on two large public data-sets from Epinions.com and Flixster.com.
The thesis is amongst the first to explore the role of distrust in rating prediction. Since distrust is not as transitive as trust i.e. an enemy's enemy need not be an enemy or a friend, distrust can't directly replace trust in trust
propagation approaches. By using a low dimensional representation of the original trust network in SCOAL, we use distrust as it is and don't propagate it. Using SCOAL, we can pin-point the groups of users and the groups of
items that have the same preference model. Both SCOAL and MAD are able to seamlessly integrate side information such as item-subject and item-author
information into the trust based rating prediction model.
Advisors/Committee Members: Ghosh, Joydeep (advisor), Baldridge, Jason (committee member).
Subjects/Keywords: Social networks; Trust; Recommender systems; Rating prediction; Co-clustering; Label propagation
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Sudan, N. M. (2011). Using social network information in recommender systems. (Masters Thesis). University of Texas – Austin. Retrieved from http://hdl.handle.net/2152/ETD-UT-2011-08-3855
Chicago Manual of Style (16th Edition):
Sudan, Nikita Maple. “Using social network information in recommender systems.” 2011. Masters Thesis, University of Texas – Austin. Accessed February 28, 2021.
http://hdl.handle.net/2152/ETD-UT-2011-08-3855.
MLA Handbook (7th Edition):
Sudan, Nikita Maple. “Using social network information in recommender systems.” 2011. Web. 28 Feb 2021.
Vancouver:
Sudan NM. Using social network information in recommender systems. [Internet] [Masters thesis]. University of Texas – Austin; 2011. [cited 2021 Feb 28].
Available from: http://hdl.handle.net/2152/ETD-UT-2011-08-3855.
Council of Science Editors:
Sudan NM. Using social network information in recommender systems. [Masters Thesis]. University of Texas – Austin; 2011. Available from: http://hdl.handle.net/2152/ETD-UT-2011-08-3855

University of Texas – Austin
5.
Garrette, Daniel Hunter.
Inducing grammars from linguistic universals and realistic amounts of supervision.
Degree: PhD, Artificial intelligence, 2015, University of Texas – Austin
URL: http://hdl.handle.net/2152/44478
► The best performing NLP models to date are learned from large volumes of manually-annotated data. For tasks like part-of-speech tagging and grammatical parsing, high performance…
(more)
▼ The best performing NLP models to date are learned from large volumes of manually-annotated data. For tasks like part-of-speech tagging and grammatical parsing, high performance can be achieved with plentiful supervised data. However, such resources are extremely costly to produce, making them an unlikely option for building NLP tools in under-resourced languages or domains. This dissertation is concerned with reducing the annotation required to learn NLP models, with the goal of opening up the range of domains and languages to which NLP technologies may be applied. In this work, we explore the possibility of learning from a degree of supervision that is at or close to the amount that could reasonably be collected from annotators for a particular domain or language that currently has none. We show that just a small amount of annotation input — even that which can be collected in just a few hours — can provide enormous advantages if we have learning algorithms that can appropriately exploit it. This work presents new algorithms, models, and approaches designed to learn grammatical information from weak supervision. In particular, we look at ways of intersecting a variety of different forms of supervision in complementary ways, thus lowering the overall annotation burden. Sources of information include tag dictionaries, morphological analyzers, constituent bracketings, and partial tree annotations, as well as unannotated corpora. For example, we present algorithms that are able to combine faster-to-obtain type-level annotation with unannotated text to remove the need for slower-to-obtain token-level annotation. Much of this dissertation describes work on Combinatory Categorial Grammar (CCG), a grammatical formalism notable for its use of structured, logic-backed categories that describe how each word and constituent fits into the overall syntax of the sentence. This work shows how linguistic universals intrinsic to the CCG formalism itself can be encoded as Bayesian priors to improve learning.
Advisors/Committee Members: Baldridge, Jason (advisor), Mooney, Raymond J. (Raymond Joseph) (advisor), Ravikumar, Pradeep (committee member), Scott, James G (committee member), Smith, Noah A (committee member).
Subjects/Keywords: Computer science; Artificial intelligence; Natural language processing; Machine learning; Bayesian statistics; Grammar induction; Parsing; Computational linguistics
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Garrette, D. H. (2015). Inducing grammars from linguistic universals and realistic amounts of supervision. (Doctoral Dissertation). University of Texas – Austin. Retrieved from http://hdl.handle.net/2152/44478
Chicago Manual of Style (16th Edition):
Garrette, Daniel Hunter. “Inducing grammars from linguistic universals and realistic amounts of supervision.” 2015. Doctoral Dissertation, University of Texas – Austin. Accessed February 28, 2021.
http://hdl.handle.net/2152/44478.
MLA Handbook (7th Edition):
Garrette, Daniel Hunter. “Inducing grammars from linguistic universals and realistic amounts of supervision.” 2015. Web. 28 Feb 2021.
Vancouver:
Garrette DH. Inducing grammars from linguistic universals and realistic amounts of supervision. [Internet] [Doctoral dissertation]. University of Texas – Austin; 2015. [cited 2021 Feb 28].
Available from: http://hdl.handle.net/2152/44478.
Council of Science Editors:
Garrette DH. Inducing grammars from linguistic universals and realistic amounts of supervision. [Doctoral Dissertation]. University of Texas – Austin; 2015. Available from: http://hdl.handle.net/2152/44478
6.
-9911-0186.
Text-based document geolocation and its application to the digital humanities.
Degree: PhD, Linguistics, 2015, University of Texas – Austin
URL: http://hdl.handle.net/2152/40313
► This dissertation investigates automatic geolocation of documents (i.e. identification of their location, expressed as latitude/longitude coordinates), based on the text of those documents rather than…
(more)
▼ This dissertation investigates automatic geolocation of documents (i.e. identification of their location, expressed as latitude/longitude coordinates), based on the text of those documents rather than metadata. I assert that such geolocation can be performed using text alone, at a sufficient accuracy for use in real-world applications. Although in some corpora metadata is found in abundance (e.g. home location, time zone, friends, followers, etc. in Twitter), it is lacking in others, such as many corpora of primary-source documents in the digital humanities, an area to which document geolocation has hardly been applied. To this end, I first develop methods for accurate text-based geolocation and then apply them to newly-annotated corpora in the digital humanities. The geolocation methods I develop use both uniform and adaptive (k-d tree) grids over the Earth’s surface, culminating in a hierarchical logistic-regression-based technique that achieves state of the art results on well-known corpora (Twitter user feeds, Wikipedia articles and Flickr image tags). In the second part of the dissertation I develop a new NLP task, text-based geolocation of historical corpora. Because there are no existing corpora to test on, I create and annotate two new corpora of significantly different natures (a 19th-century travel log and a large set of Civil War archives). I show how my methods produce good geolocation accuracy even given the relatively small amount of annotated data available, which can be further improved using domain adaptation. I then use the predictions on the much larger unannotated portion of the Civil War archives to generate and analyze geographic topic models, showing how they can be mined to produce interesting revelations concerning various Civil War-related subjects. Finally, I develop a new geolocation technique for text-only corpora involving co-training between document-geolocation and toponym- resolution models, using a gazetteer to inject additional information into the training process. To evaluate this technique I develop a new metric, the closest toponym error distance, on which I show improvements compared with a baseline geolocator.
Advisors/Committee Members: Baldridge, Jason (advisor), Erk, Katrin (committee member), Beaver, David (committee member), Mooney, Ray (committee member), Lease, Matt (committee member).
Subjects/Keywords: Geolocation; Computational linguistics; Natural language processing; Digital humanities
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
-9911-0186. (2015). Text-based document geolocation and its application to the digital humanities. (Doctoral Dissertation). University of Texas – Austin. Retrieved from http://hdl.handle.net/2152/40313
Note: this citation may be lacking information needed for this citation format:
Author name may be incomplete
Chicago Manual of Style (16th Edition):
-9911-0186. “Text-based document geolocation and its application to the digital humanities.” 2015. Doctoral Dissertation, University of Texas – Austin. Accessed February 28, 2021.
http://hdl.handle.net/2152/40313.
Note: this citation may be lacking information needed for this citation format:
Author name may be incomplete
MLA Handbook (7th Edition):
-9911-0186. “Text-based document geolocation and its application to the digital humanities.” 2015. Web. 28 Feb 2021.
Note: this citation may be lacking information needed for this citation format:
Author name may be incomplete
Vancouver:
-9911-0186. Text-based document geolocation and its application to the digital humanities. [Internet] [Doctoral dissertation]. University of Texas – Austin; 2015. [cited 2021 Feb 28].
Available from: http://hdl.handle.net/2152/40313.
Note: this citation may be lacking information needed for this citation format:
Author name may be incomplete
Council of Science Editors:
-9911-0186. Text-based document geolocation and its application to the digital humanities. [Doctoral Dissertation]. University of Texas – Austin; 2015. Available from: http://hdl.handle.net/2152/40313
Note: this citation may be lacking information needed for this citation format:
Author name may be incomplete

University of Texas – Austin
7.
-9322-9685.
Discovering latent structures in syntax trees and mixed-type data.
Degree: PhD, Operations Research and Industrial Engineering, 2016, University of Texas – Austin
URL: http://hdl.handle.net/2152/68368
► Gibbs sampling is a widely applied algorithm to estimate parameters in statistical models. This thesis uses Gibbs sampling to resolve practical problems, especially on natural…
(more)
▼ Gibbs sampling is a widely applied algorithm to estimate parameters in statistical models. This thesis uses Gibbs sampling to resolve practical problems, especially on natural language processing and mixed type data. It includes three independent studies. The first study includes a Bayesian model for learning latent annotations. The technique is capable of parsing sentences in a wide variety of languages, producing results that are on-par with or surpass previous approaches in accuracy, and shows promising potential for parsing low-resource languages. The second study presents a method to automatically complete annotations from partially-annotated sentence data, with the help of Gibbs sampling. The algorithm significantly reduces the time required to annotate sentences for natural language processing, without a significant drop in annotation accuracy. The last study proposes a novel factor model for uncovering latent factors and exploring covariation among multiple outcomes of mixed types, including binary, count, and continuous data. Gibbs sampling is used to estimate model parameters. The algorithm successfully discovers correlation structures of mixed-type
data in both simulated and real-word data.
Advisors/Committee Members: Dimitrov, Nedialko B. (advisor), Baldridge, Jason (committee member), Hasenbein, John (committee member), Khajavirad, Aida (committee member), Scott, James (committee member).
Subjects/Keywords: Gibbs sampling; Natural language processing; Bayesian statistics; Factor analysis; Syntax trees parsing
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
-9322-9685. (2016). Discovering latent structures in syntax trees and mixed-type data. (Doctoral Dissertation). University of Texas – Austin. Retrieved from http://hdl.handle.net/2152/68368
Note: this citation may be lacking information needed for this citation format:
Author name may be incomplete
Chicago Manual of Style (16th Edition):
-9322-9685. “Discovering latent structures in syntax trees and mixed-type data.” 2016. Doctoral Dissertation, University of Texas – Austin. Accessed February 28, 2021.
http://hdl.handle.net/2152/68368.
Note: this citation may be lacking information needed for this citation format:
Author name may be incomplete
MLA Handbook (7th Edition):
-9322-9685. “Discovering latent structures in syntax trees and mixed-type data.” 2016. Web. 28 Feb 2021.
Note: this citation may be lacking information needed for this citation format:
Author name may be incomplete
Vancouver:
-9322-9685. Discovering latent structures in syntax trees and mixed-type data. [Internet] [Doctoral dissertation]. University of Texas – Austin; 2016. [cited 2021 Feb 28].
Available from: http://hdl.handle.net/2152/68368.
Note: this citation may be lacking information needed for this citation format:
Author name may be incomplete
Council of Science Editors:
-9322-9685. Discovering latent structures in syntax trees and mixed-type data. [Doctoral Dissertation]. University of Texas – Austin; 2016. Available from: http://hdl.handle.net/2152/68368
Note: this citation may be lacking information needed for this citation format:
Author name may be incomplete

University of Texas – Austin
8.
-2270-3295.
Supervision for syntactic parsing of low-resource languages.
Degree: PhD, Linguistics, 2016, University of Texas – Austin
URL: http://hdl.handle.net/2152/45746
► Developing tools for doing computational linguistics work in low-resource scenarios often requires creating resources from scratch, especially when considering highly specialized domains or languages with…
(more)
▼ Developing tools for doing computational linguistics work in low-resource scenarios often requires creating resources from scratch, especially when considering highly specialized domains or languages with few existing tools or research. Due to practical considerations in project costs and sizes, the resources created in these circumstances are often different from large-scale resources in both quantity and quality, and working with these resources poses a distinctly different set of challenges than working with larger, more established resources. There are different approaches to handling these challenges, including many variations aimed at reducing or eliminating the annotations needed to train models for various tasks. This work considers the task of low-resource syntactic parsing, and looks at the relative benefits of different methods of supervision. I will argue here that the benefits of doing some amount of supervision almost always outweigh the costs associated with doing that annotation; unsupervised or minimally supervised methods are often surpassed with surprisingly small amounts of supervision. This work is primarily concerned with identifying and classifying sources of supervision that are both useful and practical in low-resource scenarios, along with analyzing the performance of systems that make use of these different supervision sources and the behaviors of the minimally trained annotators that provide them. Additionally, I demonstrate several cases where linguistic theory and computational performance are directly connected. Maintaining a focus on the linguistic side of computational linguistics can provide many benefits, especially when working with languages where the correct analysis for various phenomena may still be very much unsettled.
Advisors/Committee Members: Baldridge, Jason (advisor), Erk, Katrin (committee member), Mooney, Ray (committee member), Dyer, Chris (committee member), Beavers, John (committee member).
Subjects/Keywords: Linguistics; Computer science; Computational linguistics; Natural language processing; NLP; Parsing; Low-resource; Supervision
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
-2270-3295. (2016). Supervision for syntactic parsing of low-resource languages. (Doctoral Dissertation). University of Texas – Austin. Retrieved from http://hdl.handle.net/2152/45746
Note: this citation may be lacking information needed for this citation format:
Author name may be incomplete
Chicago Manual of Style (16th Edition):
-2270-3295. “Supervision for syntactic parsing of low-resource languages.” 2016. Doctoral Dissertation, University of Texas – Austin. Accessed February 28, 2021.
http://hdl.handle.net/2152/45746.
Note: this citation may be lacking information needed for this citation format:
Author name may be incomplete
MLA Handbook (7th Edition):
-2270-3295. “Supervision for syntactic parsing of low-resource languages.” 2016. Web. 28 Feb 2021.
Note: this citation may be lacking information needed for this citation format:
Author name may be incomplete
Vancouver:
-2270-3295. Supervision for syntactic parsing of low-resource languages. [Internet] [Doctoral dissertation]. University of Texas – Austin; 2016. [cited 2021 Feb 28].
Available from: http://hdl.handle.net/2152/45746.
Note: this citation may be lacking information needed for this citation format:
Author name may be incomplete
Council of Science Editors:
-2270-3295. Supervision for syntactic parsing of low-resource languages. [Doctoral Dissertation]. University of Texas – Austin; 2016. Available from: http://hdl.handle.net/2152/45746
Note: this citation may be lacking information needed for this citation format:
Author name may be incomplete
9.
Evans, James Spencer.
Characterizing the relationship in social media between language and perspective on science-based reasoning as justification for belief.
Degree: MA, Linguistics, 2014, University of Texas – Austin
URL: http://hdl.handle.net/2152/26188
► Beliefs that are not the result of science-based interpretation of evidence (e.g., belief in ghosts or belief that prayer is effective) are extremely common. Science…
(more)
▼ Beliefs that are not the result of science-based interpretation of evidence (e.g., belief in ghosts or belief that prayer is effective) are extremely common. Science enthusiasts have expressed interest in automatic detection of non-science-based claims. This thesis intends to provide some first steps toward a solution, specifically aimed at detecting Twitter users who are likely or unlikely to take a science-based perspective on all topics. As part of this thesis, a set a Twitter users was labeled as being either "pro-science" (i.e. as having the view that beliefs are rational if and only if they are in accord with science-based reasoning) or "non-pro-science" (i.e. as having the view that beliefs may be reasonable even if they are not in accord with science-based reasoning). Word frequency ratios relative to a neutral dataset, and a simple topic alignment technique, suggest considerable linguistic divergence between the pro-science and non-pro-science users. High accuracy logistic regression classification using linguistic features of users' recent tweets support that idea. Supervised classification experiments suggest that the pro-science and non-pro-science perspectives are not only detectable from linguistic features, but that they can be abstracted away from particular topics (i.e. that the pro-science and non-pro-science perspectives are not inherently topic-specific). Results from distantly supervised classification suggest that using easily acquired, weakly labeled data may be preferable to the much slower process of individually labeling data for some applications, despite the pronounced inferiority to the fully supervised approach in terms of accuracy. The best classifier obtained in this thesis has an accuracy of 93.9%.
Advisors/Committee Members: Baldridge, Jason (advisor).
Subjects/Keywords: Twitter; Perspective; Social media; Perspective classification
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Evans, J. S. (2014). Characterizing the relationship in social media between language and perspective on science-based reasoning as justification for belief. (Masters Thesis). University of Texas – Austin. Retrieved from http://hdl.handle.net/2152/26188
Chicago Manual of Style (16th Edition):
Evans, James Spencer. “Characterizing the relationship in social media between language and perspective on science-based reasoning as justification for belief.” 2014. Masters Thesis, University of Texas – Austin. Accessed February 28, 2021.
http://hdl.handle.net/2152/26188.
MLA Handbook (7th Edition):
Evans, James Spencer. “Characterizing the relationship in social media between language and perspective on science-based reasoning as justification for belief.” 2014. Web. 28 Feb 2021.
Vancouver:
Evans JS. Characterizing the relationship in social media between language and perspective on science-based reasoning as justification for belief. [Internet] [Masters thesis]. University of Texas – Austin; 2014. [cited 2021 Feb 28].
Available from: http://hdl.handle.net/2152/26188.
Council of Science Editors:
Evans JS. Characterizing the relationship in social media between language and perspective on science-based reasoning as justification for belief. [Masters Thesis]. University of Texas – Austin; 2014. Available from: http://hdl.handle.net/2152/26188
10.
Ponvert, Elias Franchot.
Global models for temporal relation classification.
Degree: MA, Linguistics, 2008, University of Texas – Austin
URL: http://hdl.handle.net/2152/19160
► Temporal relation classification is one of the most challenging areas of natural language processing. Advances in this area have direct relevance to improving practical applications,…
(more)
▼ Temporal relation classification is one of the most challenging areas of natural language processing. Advances in this area have direct relevance to improving practical applications, such as question-answering and summarization systems, as well as informing theoretical understanding of temporal meaning realization in language. With the development of annotated textual materials, this domain is now accessible to empirical machine-learning oriented approaches, where systems treat temporal relation processing as a classification problem: i.e. a decision as per which label (before, after, identity, etc) to assign to a pair (i, j) of event indices in a text. Most reported systems in this new research domain utilize classifiers that make decisions effectively in isolation, without explicitly utilizing the decisions made about other indices in a document. In this work, we present a new strategy for temporal relation classification that utilizes global models of temporal relations in a document, choosing the optimal classification for all pairs of indices in a document subject to global constraints which may be linguistically motivated. We propose and evaluate two applications of global models to temporal semantic processing: joint prediction of situation entities with temporal relations, and temporal relations prediction guided by global coherence constraints.
Advisors/Committee Members: Baldridge, Jason (advisor).
Subjects/Keywords: Temporal relation classification; Temporal semantic processing; Natural language processing
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Ponvert, E. F. (2008). Global models for temporal relation classification. (Masters Thesis). University of Texas – Austin. Retrieved from http://hdl.handle.net/2152/19160
Chicago Manual of Style (16th Edition):
Ponvert, Elias Franchot. “Global models for temporal relation classification.” 2008. Masters Thesis, University of Texas – Austin. Accessed February 28, 2021.
http://hdl.handle.net/2152/19160.
MLA Handbook (7th Edition):
Ponvert, Elias Franchot. “Global models for temporal relation classification.” 2008. Web. 28 Feb 2021.
Vancouver:
Ponvert EF. Global models for temporal relation classification. [Internet] [Masters thesis]. University of Texas – Austin; 2008. [cited 2021 Feb 28].
Available from: http://hdl.handle.net/2152/19160.
Council of Science Editors:
Ponvert EF. Global models for temporal relation classification. [Masters Thesis]. University of Texas – Austin; 2008. Available from: http://hdl.handle.net/2152/19160
11.
Speriosu, Michael Adrian.
Methods and applications of text-driven toponym resolution with indirect supervision.
Degree: PhD, Linguistics, 2013, University of Texas – Austin
URL: http://hdl.handle.net/2152/21303
► This thesis addresses the problem of toponym resolution. Given an ambiguous placename like Springfield in some natural language context, the task is to automatically predict…
(more)
▼ This thesis addresses the problem of toponym resolution. Given an ambiguous placename like Springfield in some natural language context, the task is to automatically predict the location on the earth's surface the author is referring to. Many previous efforts use hand-built heuristics to attempt to solve this problem, looking for specific words in close proximity such as Springfield, Illinois, and disambiguating any remaining toponyms to possible locations close to those already resolved. Such approaches require the data to take a fairly specific form in order to perform well, thus they often have low coverage. Some have applied machine learning to this task in an attempt to build more general resolvers, but acquiring large amounts of high quality hand-labeled training material is difficult. I discuss these and other approaches found in previous work before presenting several new toponym resolvers that rely neither on hand-labeled training material prepared explicitly for this task nor on particular co-occurrences of toponyms in close proximity in the data to be disambiguated. Some of the resolvers I develop reflect the intuition of many heuristic resolvers that toponyms nearby in text tend to (but do not always) refer to locations nearby on Earth, but do not require toponyms to occur in direct sequence with one another. I also introduce several resolvers that use the predictions of a document geolocation system (i.e. one that predicts a location for a piece of text of arbitrary length) to inform toponym disambiguation. Another resolver takes into account these document-level location predictions, knowledge of different administrative levels (country, state, city, etc.), and predictions from a logistic regression classifier trained on automatically extracted training instances from Wikipedia in a probabilistic way. It takes advantage of all content words in each toponym's context (both local window and whole document) rather than only toponyms. One resolver I build that extracts training material for a machine learned classifier from Wikipedia, taking advantage of link structure and geographic coordinates on articles, resolves 83% of toponyms in a previously introduced corpus of news articles correctly, beating the strong but simplistic population baseline. I introduce a corpus of Civil War related writings not previously used for this task on which the population baseline does poorly; combining a Wikipedia informed resolver with an algorithm that seeks to minimize the geographic scope of all predicted locations in a document achieves 86% blind test set accuracy on this dataset. After providing these high performing resolvers, I form the groundwork for more flexible and complex approaches by transforming the problem of toponym resolution into the traveling purchaser problem, modeling the probability of a location given its toponym's textual context and the geographic distribution of all locations mentioned in a document as two components of an objective function to be minimized. As one solution to this incarnation of…
Advisors/Committee Members: Baldridge, Jason (advisor).
Subjects/Keywords: Toponym resolution; Semi-supervised learning; Computational linguistics
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Speriosu, M. A. (2013). Methods and applications of text-driven toponym resolution with indirect supervision. (Doctoral Dissertation). University of Texas – Austin. Retrieved from http://hdl.handle.net/2152/21303
Chicago Manual of Style (16th Edition):
Speriosu, Michael Adrian. “Methods and applications of text-driven toponym resolution with indirect supervision.” 2013. Doctoral Dissertation, University of Texas – Austin. Accessed February 28, 2021.
http://hdl.handle.net/2152/21303.
MLA Handbook (7th Edition):
Speriosu, Michael Adrian. “Methods and applications of text-driven toponym resolution with indirect supervision.” 2013. Web. 28 Feb 2021.
Vancouver:
Speriosu MA. Methods and applications of text-driven toponym resolution with indirect supervision. [Internet] [Doctoral dissertation]. University of Texas – Austin; 2013. [cited 2021 Feb 28].
Available from: http://hdl.handle.net/2152/21303.
Council of Science Editors:
Speriosu MA. Methods and applications of text-driven toponym resolution with indirect supervision. [Doctoral Dissertation]. University of Texas – Austin; 2013. Available from: http://hdl.handle.net/2152/21303
12.
Mielens, Jason David.
Unknown word sequences in HPSG.
Degree: MA, Linguistics, 2014, University of Texas – Austin
URL: http://hdl.handle.net/2152/26312
► This work consists of an investigation into the properties of unknown words in HPSG, and in particular into the phenomenon of multi-word unknown expressions consisting…
(more)
▼ This work consists of an investigation into the properties of unknown words in HPSG, and in particular into the phenomenon of multi-word unknown expressions consisting of multiple unknown words in a sequence. The work presented consists first of a study determining the relative frequency of multi-word unknown expressions, and then a survey of the efficacy of a variety of techniques for handling these expressions. The techniques presented consist of modified versions of techniques from the existing unknown-word prediction literature as well as novel techniques, and they are evaluated with a specific concern for how they fare in the context of sentences with many unknown words and long unknown sequences.
Advisors/Committee Members: Baldridge, Jason (advisor).
Subjects/Keywords: Parsing; HPSG; Unknowns; CRF; Multi-word expressions
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Mielens, J. D. (2014). Unknown word sequences in HPSG. (Masters Thesis). University of Texas – Austin. Retrieved from http://hdl.handle.net/2152/26312
Chicago Manual of Style (16th Edition):
Mielens, Jason David. “Unknown word sequences in HPSG.” 2014. Masters Thesis, University of Texas – Austin. Accessed February 28, 2021.
http://hdl.handle.net/2152/26312.
MLA Handbook (7th Edition):
Mielens, Jason David. “Unknown word sequences in HPSG.” 2014. Web. 28 Feb 2021.
Vancouver:
Mielens JD. Unknown word sequences in HPSG. [Internet] [Masters thesis]. University of Texas – Austin; 2014. [cited 2021 Feb 28].
Available from: http://hdl.handle.net/2152/26312.
Council of Science Editors:
Mielens JD. Unknown word sequences in HPSG. [Masters Thesis]. University of Texas – Austin; 2014. Available from: http://hdl.handle.net/2152/26312
13.
Skiles, Erik David.
Document geolocation using language models built from lexical and geographic similarity.
Degree: MA, Linguistics, 2012, University of Texas – Austin
URL: http://hdl.handle.net/2152/ETD-UT-2012-05-5717
► This thesis investigates the automatic identification of the location of doc- uments. This process of geolocation aids in toponym resolution, document summarization, and geographic-based marketing.…
(more)
▼ This thesis investigates the automatic identification of the location of doc- uments. This process of geolocation aids in toponym resolution, document summarization, and geographic-based marketing. I focus on minimally su- pervised methods to examine both the lexical similarities and the geographic similarities between documents. This method predicts the location of a doc- ument as a single point on the earth’s surface. Three data sets are used to evaluate this method: a set of geotagged Wikipedia articles and two sets of Twitter feeds. For Wikipedia, the combined method obtains a median error of 12.1 kilometers and an improvement in mean error to 164 kilometers. The large Twitter data shows the greatest improvement from this method with a median error of 333 kilometers, down from the previous best of 463 kilometers.
Advisors/Committee Members: Baldridge, Jason (advisor), Erk, Katrin (committee member).
Subjects/Keywords: Geolocation
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Skiles, E. D. (2012). Document geolocation using language models built from lexical and geographic similarity. (Masters Thesis). University of Texas – Austin. Retrieved from http://hdl.handle.net/2152/ETD-UT-2012-05-5717
Chicago Manual of Style (16th Edition):
Skiles, Erik David. “Document geolocation using language models built from lexical and geographic similarity.” 2012. Masters Thesis, University of Texas – Austin. Accessed February 28, 2021.
http://hdl.handle.net/2152/ETD-UT-2012-05-5717.
MLA Handbook (7th Edition):
Skiles, Erik David. “Document geolocation using language models built from lexical and geographic similarity.” 2012. Web. 28 Feb 2021.
Vancouver:
Skiles ED. Document geolocation using language models built from lexical and geographic similarity. [Internet] [Masters thesis]. University of Texas – Austin; 2012. [cited 2021 Feb 28].
Available from: http://hdl.handle.net/2152/ETD-UT-2012-05-5717.
Council of Science Editors:
Skiles ED. Document geolocation using language models built from lexical and geographic similarity. [Masters Thesis]. University of Texas – Austin; 2012. Available from: http://hdl.handle.net/2152/ETD-UT-2012-05-5717
14.
Hafner, Simon.
Typesafe NLP pipelines on Spark.
Degree: MA, Lingustics, 2014, University of Texas – Austin
URL: http://hdl.handle.net/2152/28654
► Natural language pipelines consist of various natural language algorithms that use the annotations of a previous algorithm to compute more annotations. These algorithms tend to…
(more)
▼ Natural language pipelines consist of various natural language algorithms that use the annotations of a previous algorithm to compute more annotations. These algorithms tend to be expensive in terms of computational power. Therefore it is advantageous to parallelize them in order to reduce the time necessary to analyze a large document collection. The goal of this project was to develop a new framework to encapsulate algorithms such that they may be used as part of a pipeline without any additional work. The framework consists of a custom-built data structure called Slab which implements type safety and functional transparency to integrate itself into the Scala programming language. Because of this integration, it is possible to use Spark, a MapReduce framework, to parallelize the pipeline on a cluster. To assess the performance of the new framework, a pipeline based on the OpenNLP library was created. An existing pipeline implemented in UIMA, an industry standard for natural language pipeline frameworks, served as a baseline in terms of performance. The pipeline created from the new framework processed the corpus in about half the time.
Advisors/Committee Members: Baldridge, Jason (advisor), Erk, Katrin (advisor).
Subjects/Keywords: Natural language processing; NLP; Pipelines; Spark; Slab
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Hafner, S. (2014). Typesafe NLP pipelines on Spark. (Masters Thesis). University of Texas – Austin. Retrieved from http://hdl.handle.net/2152/28654
Chicago Manual of Style (16th Edition):
Hafner, Simon. “Typesafe NLP pipelines on Spark.” 2014. Masters Thesis, University of Texas – Austin. Accessed February 28, 2021.
http://hdl.handle.net/2152/28654.
MLA Handbook (7th Edition):
Hafner, Simon. “Typesafe NLP pipelines on Spark.” 2014. Web. 28 Feb 2021.
Vancouver:
Hafner S. Typesafe NLP pipelines on Spark. [Internet] [Masters thesis]. University of Texas – Austin; 2014. [cited 2021 Feb 28].
Available from: http://hdl.handle.net/2152/28654.
Council of Science Editors:
Hafner S. Typesafe NLP pipelines on Spark. [Masters Thesis]. University of Texas – Austin; 2014. Available from: http://hdl.handle.net/2152/28654
15.
Brewster, Joshua Blake.
Dependency based CCG derivation and application.
Degree: MA, Linguistics, 2010, University of Texas – Austin
URL: http://hdl.handle.net/2152/ETD-UT-2010-12-2563
► This paper presents and evaluates an algorithm to translate a dependency treebank into a Combinatory Categorial Grammar (CCG) lexicon. The dependency relations between a head…
(more)
▼ This paper presents and evaluates an algorithm to translate a dependency treebank into a Combinatory Categorial Grammar (CCG) lexicon. The dependency relations between a head and a child in a dependency tree are exploited to determine how CCG categories should be derived by making a functional distinction between adjunct and argument relations. Derivations for an English (CoNLL08 shared task treebank) and for an Italian (Turin
University Treebank) dependency treebank are performed, each requiring a number of preprocessing steps.
In order to determine the adequacy of the lexicons, dubbed DepEngCCG and DepItCCG, they are compared via two methods to preexisting CCG lexicons derived from similar or equivalent sources (CCGbank and TutCCG). First, a number of metrics are used to compare the state of the lexicon, including category complexity and category growth. Second, to measures the potential applicability of the lexicons in NLP tasks, the derived English CCG lexicon and CCGbank are compared in a sentiment analysis task. While the numeric measurements show promising results for the quality of the lexicons, the sentiment analysis task fails to generate a usable comparison.
Advisors/Committee Members: Baldridge, Jason (advisor), Erk, Katrin (committee member).
Subjects/Keywords: CCG derivation; Dependency treebank; Computational linguistics; Combinatory Categorical Grammar; Lexicon; Dependency grammars
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Brewster, J. B. (2010). Dependency based CCG derivation and application. (Masters Thesis). University of Texas – Austin. Retrieved from http://hdl.handle.net/2152/ETD-UT-2010-12-2563
Chicago Manual of Style (16th Edition):
Brewster, Joshua Blake. “Dependency based CCG derivation and application.” 2010. Masters Thesis, University of Texas – Austin. Accessed February 28, 2021.
http://hdl.handle.net/2152/ETD-UT-2010-12-2563.
MLA Handbook (7th Edition):
Brewster, Joshua Blake. “Dependency based CCG derivation and application.” 2010. Web. 28 Feb 2021.
Vancouver:
Brewster JB. Dependency based CCG derivation and application. [Internet] [Masters thesis]. University of Texas – Austin; 2010. [cited 2021 Feb 28].
Available from: http://hdl.handle.net/2152/ETD-UT-2010-12-2563.
Council of Science Editors:
Brewster JB. Dependency based CCG derivation and application. [Masters Thesis]. University of Texas – Austin; 2010. Available from: http://hdl.handle.net/2152/ETD-UT-2010-12-2563
16.
Speriosu, Michael Adrian.
Semisupervised sentiment analysis of tweets based on noisy emoticon labels.
Degree: MA, Linguistics, 2011, University of Texas – Austin
URL: http://hdl.handle.net/2152/ETD-UT-2011-08-3823
► There is high demand for computational tools that can automatically label tweets (Twitter messages) as having positive or negative sentiment, but great effort and expense…
(more)
▼ There is high demand for computational tools that can automatically label tweets (Twitter messages) as having positive or negative sentiment, but great effort and expense would be required to build a large enough hand-labeled training corpus on which to apply standard machine learning techniques. Going beyond current keyword-based heuristic techniques, this paper uses emoticons (e.g. ':)' and ':(') to collect a large training set with noisy labels using little human intervention and trains a Maximum Entropy classifier on that training set. Results on two hand-labeled test corpora are compared to various baselines and a keyword-based heuristic approach, with the machine learned classifier significantly outperforming both.
Advisors/Committee Members: Baldridge, Jason (advisor), Erk, Katrin (committee member).
Subjects/Keywords: Sentiment analysis; Tweets; Emoticons; Noisy labels; Maximum Entropy classifier; Machine learned classifier
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Speriosu, M. A. (2011). Semisupervised sentiment analysis of tweets based on noisy emoticon labels. (Masters Thesis). University of Texas – Austin. Retrieved from http://hdl.handle.net/2152/ETD-UT-2011-08-3823
Chicago Manual of Style (16th Edition):
Speriosu, Michael Adrian. “Semisupervised sentiment analysis of tweets based on noisy emoticon labels.” 2011. Masters Thesis, University of Texas – Austin. Accessed February 28, 2021.
http://hdl.handle.net/2152/ETD-UT-2011-08-3823.
MLA Handbook (7th Edition):
Speriosu, Michael Adrian. “Semisupervised sentiment analysis of tweets based on noisy emoticon labels.” 2011. Web. 28 Feb 2021.
Vancouver:
Speriosu MA. Semisupervised sentiment analysis of tweets based on noisy emoticon labels. [Internet] [Masters thesis]. University of Texas – Austin; 2011. [cited 2021 Feb 28].
Available from: http://hdl.handle.net/2152/ETD-UT-2011-08-3823.
Council of Science Editors:
Speriosu MA. Semisupervised sentiment analysis of tweets based on noisy emoticon labels. [Masters Thesis]. University of Texas – Austin; 2011. Available from: http://hdl.handle.net/2152/ETD-UT-2011-08-3823
17.
Wing, Benjamin Patai.
Data-rich document geotagging using geodesic grids.
Degree: MA, Linguistics, 2011, University of Texas – Austin
URL: http://hdl.handle.net/2152/ETD-UT-2011-05-3632
► This thesis investigates automatic geolocation (i.e. identification of the location, expressed as latitude/longitude coordinates) of documents. Geolocation can be an effective means of summarizing large…
(more)
▼ This thesis investigates automatic geolocation (i.e. identification of the location, expressed as latitude/longitude coordinates) of documents. Geolocation can be an effective means of summarizing large document collections and is an important component of geographic information retrieval. We describe several simple supervised methods for document geolocation using only the document’s raw text as evidence. All of our methods predict locations in the context of geodesic grids of varying degrees of resolution. We evaluate the methods on geotagged Wikipedia articles and Twitter feeds. For Wikipedia, our best method obtains a median prediction error of just 11.8 kilometers. Twitter geolocation is more challenging: we obtain a median error of 479 km, an improvement on previous results for the dataset.
Advisors/Committee Members: Baldridge, Jason (advisor), Erk, Katrin (committee member).
Subjects/Keywords: Geospatial data; Geographical positions; Geodatabases; Computational linguistics; Geolocation; Geographic information retrieval; Wikipedia; Twitter; KL divergence; Geotagging
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Wing, B. P. (2011). Data-rich document geotagging using geodesic grids. (Masters Thesis). University of Texas – Austin. Retrieved from http://hdl.handle.net/2152/ETD-UT-2011-05-3632
Chicago Manual of Style (16th Edition):
Wing, Benjamin Patai. “Data-rich document geotagging using geodesic grids.” 2011. Masters Thesis, University of Texas – Austin. Accessed February 28, 2021.
http://hdl.handle.net/2152/ETD-UT-2011-05-3632.
MLA Handbook (7th Edition):
Wing, Benjamin Patai. “Data-rich document geotagging using geodesic grids.” 2011. Web. 28 Feb 2021.
Vancouver:
Wing BP. Data-rich document geotagging using geodesic grids. [Internet] [Masters thesis]. University of Texas – Austin; 2011. [cited 2021 Feb 28].
Available from: http://hdl.handle.net/2152/ETD-UT-2011-05-3632.
Council of Science Editors:
Wing BP. Data-rich document geotagging using geodesic grids. [Masters Thesis]. University of Texas – Austin; 2011. Available from: http://hdl.handle.net/2152/ETD-UT-2011-05-3632
18.
Palmer, Alexis Mary.
Semi-automated annotation and active learning for language documentation.
Degree: PhD, Linguistics, 2009, University of Texas – Austin
URL: http://hdl.handle.net/2152/19805
► By the end of this century, half of the approximately 6000 extant languages will cease to be transmitted from one generation to the next. The…
(more)
▼ By the end of this century, half of the approximately 6000 extant languages will cease to be transmitted from one generation to the next. The field of language documentation seeks to make a record of endangered languages before they reach the point of extinction, while they are still in use. The work of documenting and describing a language is difficult and extremely time-consuming, and resources are extremely limited. Developing efficient methods for making lasting records of languages may increase the amount of documentation achieved within budget restrictions. This thesis approaches the problem from the perspective of computational linguistics, asking whether and how automated language processing can reduce human annotation effort when very little labeled data is available for model training. The task addressed is morpheme labeling for the Mayan language Uspanteko, and we test the effectiveness of two complementary types of machine support: (a) learner-guided selection of examples for annotation (active learning); and (b) annotator access to the predictions of the learned model (semi-automated annotation). Active learning (AL) has been shown to increase efficacy of annotation effort for many different tasks. Most of the reported results, however, are from studies which simulate annotation, often assuming a single, infallible oracle. In our studies, crucially, annotation is not simulated but rather performed by human annotators. We measure and record the time spent on each annotation, which in turn allows us to evaluate the effectiveness of machine support in terms of actual annotation effort. We report three main findings with respect to active learning. First, in order for efficiency gains reported from active learning to be meaningful for realistic annotation scenarios, the type of cost measurement used to gauge those gains must faithfully reflect the actual annotation cost. Second, the relative effectiveness of different selection strategies in AL seems to depend in part on the characteristics of the annotator, so it is important to model the individual oracle or annotator when choosing a selection strategy. And third, the cost of labeling a given instance from a sample is not a static value but rather depends on the context in which it is labeled. We report two main findings with respect to semi-automated annotation. First, machine label suggestions have the potential to increase annotator efficacy, but the degree of their impact varies by annotator, with annotator expertise a likely contributing factor. At the same time, we find that implementation and interface must be handled very carefully if we are to accurately measure gains from semi-automated annotation. Together these findings suggest that simulated annotation studies fail to model crucial human factors inherent to applying machine learning strategies in real annotation settings.
Advisors/Committee Members: Baldridge, Jason (advisor), Erk, Katrin (advisor), England, Nora (committee member), Mooney, Raymond (committee member), Woodbury, Anthony (committee member).
Subjects/Keywords: Active learning; Computational linguistics; Language documentation; Language endangerment; Uspanteko; Semi-automated annotation; Interlinear text; Annotator expertise
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Palmer, A. M. (2009). Semi-automated annotation and active learning for language documentation. (Doctoral Dissertation). University of Texas – Austin. Retrieved from http://hdl.handle.net/2152/19805
Chicago Manual of Style (16th Edition):
Palmer, Alexis Mary. “Semi-automated annotation and active learning for language documentation.” 2009. Doctoral Dissertation, University of Texas – Austin. Accessed February 28, 2021.
http://hdl.handle.net/2152/19805.
MLA Handbook (7th Edition):
Palmer, Alexis Mary. “Semi-automated annotation and active learning for language documentation.” 2009. Web. 28 Feb 2021.
Vancouver:
Palmer AM. Semi-automated annotation and active learning for language documentation. [Internet] [Doctoral dissertation]. University of Texas – Austin; 2009. [cited 2021 Feb 28].
Available from: http://hdl.handle.net/2152/19805.
Council of Science Editors:
Palmer AM. Semi-automated annotation and active learning for language documentation. [Doctoral Dissertation]. University of Texas – Austin; 2009. Available from: http://hdl.handle.net/2152/19805
19.
Moon, Taesun, Ph. D.
Word meaning in context as a paraphrase distribution : evidence, learning, and inference.
Degree: PhD, Linguistics, 2011, University of Texas – Austin
URL: http://hdl.handle.net/2152/ETD-UT-2011-08-4143
► In this dissertation, we introduce a graph-based model of instance-based, usage meaning that is cast as a problem of probabilistic inference. The main aim of…
(more)
▼ In this dissertation, we introduce a graph-based model of instance-based, usage meaning that is cast as a problem of probabilistic inference. The main aim of this model is to provide a flexible platform that can be used to explore multiple hypotheses about usage meaning computation. Our model takes up and extends the proposals of Erk and Pado [2007] and McCarthy and Navigli [2009] by representing usage meaning as a probability distribution over potential paraphrases. We use undirected graphical models to infer this probability distribution for every content word in a given sentence. Graphical models represent complex probability distributions through a graph. In the graph, nodes stand for random variables, and edges stand for direct probabilistic interactions between them. The lack of edges between any two variables reflect independence assumptions. In our model, we represent each content word of the sentence through two adjacent nodes: the observed node represents the surface form of the word itself, and the hidden node represents its usage meaning. The distribution over values that we infer for the hidden node is a paraphrase distribution for the observed word. To encode the fact that lexical semantic information is exchanged between syntactic neighbors, the graph contains edges that mirror the dependency graph for the sentence. Further knowledge sources that influence the hidden nodes are represented through additional edges that, for example, connect to document topic. The integration of adjacent knowledge sources is accomplished in a standard way by multiplying factors and marginalizing over variables.
Evaluating on a paraphrasing task, we find that our model outperforms the current state-of-the-art usage vector model [Thater et al., 2010] on all parts of speech except verbs, where the previous model wins by a small margin. But our main focus is not on the numbers but on the fact that our model is flexible enough to encode different hypotheses about usage meaning computation. In particular, we concentrate on five questions (with minor variants):
- Nonlocal syntactic context: Existing usage vector models only use a word's direct syntactic neighbors for disambiguation or inferring some other meaning representation. Would it help to have contextual information instead "flow" along the entire dependency graph, each word's inferred meaning relying on the paraphrase distribution of its neighbors?
- Influence of collocational information: In some cases, it is intuitively plausible to use the selectional preference of a neighboring word towards the target to determine its meaning in context. How does modeling selectional preferences into the model affect performance?
- Non-syntactic bag-of-words context: To what extent can non-syntactic information in the form of bag-of-words context help in inferring meaning?
- Effects of parametrization: We experiment with two transformations of MLE. One interpolates various MLEs and another transforms it by exponentiating pointwise mutual information. Which performs…
Advisors/Committee Members: Erk, Katrin (advisor), Baldridge, Jason (committee member), Bannard, Colin (committee member), Dhillon, Inderjit (committee member), Mooney, Raymond (committee member).
Subjects/Keywords: Computational linguistics; Lexical semantics; Probabilistic graphical models; Natural language processing; Word sense disambiguation; Paraphrasing
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Moon, Taesun, P. D. (2011). Word meaning in context as a paraphrase distribution : evidence, learning, and inference. (Doctoral Dissertation). University of Texas – Austin. Retrieved from http://hdl.handle.net/2152/ETD-UT-2011-08-4143
Chicago Manual of Style (16th Edition):
Moon, Taesun, Ph D. “Word meaning in context as a paraphrase distribution : evidence, learning, and inference.” 2011. Doctoral Dissertation, University of Texas – Austin. Accessed February 28, 2021.
http://hdl.handle.net/2152/ETD-UT-2011-08-4143.
MLA Handbook (7th Edition):
Moon, Taesun, Ph D. “Word meaning in context as a paraphrase distribution : evidence, learning, and inference.” 2011. Web. 28 Feb 2021.
Vancouver:
Moon, Taesun PD. Word meaning in context as a paraphrase distribution : evidence, learning, and inference. [Internet] [Doctoral dissertation]. University of Texas – Austin; 2011. [cited 2021 Feb 28].
Available from: http://hdl.handle.net/2152/ETD-UT-2011-08-4143.
Council of Science Editors:
Moon, Taesun PD. Word meaning in context as a paraphrase distribution : evidence, learning, and inference. [Doctoral Dissertation]. University of Texas – Austin; 2011. Available from: http://hdl.handle.net/2152/ETD-UT-2011-08-4143
20.
Ponvert, Elias Franchot.
Unsupervised partial parsing.
Degree: PhD, Linguistics, 2011, University of Texas – Austin
URL: http://hdl.handle.net/2152/ETD-UT-2011-08-3991
► The subject matter of this thesis is the problem of learning to discover grammatical structure from raw text alone, without access to explicit instruction or…
(more)
▼ The subject matter of this thesis is the problem of learning to discover grammatical structure from raw text alone, without access to explicit instruction or annotation – in particular, by a computer or computational process – in other words, unsupervised parser induction, or simply, unsupervised parsing.
This work presents a method for raw text unsupervised parsing that is simple, but nevertheless achieves state-of-the-art results on treebank-based direct evaluation. The approach to unsupervised parsing presented in this dissertation adopts a different way to constrain learned models than has been deployed in previous work. Specifically, I focus on a sub-task of full unsupervised partial parsing called unsupervised partial parsing. In essence, the strategy is to learn to segment a string of tokens into a set of non-overlapping constituents or chunks which may be one or more tokens in length. This strategy has a number of advantages: it is fast and scalable, based on well-understood and extensible natural language processing techniques, and it produces predictions about human language structure which are useful for human language technologies. The models developed for unsupervised partial parsing recover base noun phrases and local constituent structure with high accuracy compared to strong baselines.
Finally, these models may be applied in a cascaded fashion for the prediction of full constituent trees: first segmenting a string of tokens into local phrases, then re-segmenting to predict higher-level constituent structure. This simple strategy leads to an unsupervised parsing model which produces state-of-the-art results for constituent parsing of English, German and Chinese. This thesis presents, evaluates and explores these models and strategies.
Advisors/Committee Members: Baldridge, Jason (advisor), Bannard, Colin (committee member), Beaver, David I. (committee member), Erk, Katrin E. (committee member), Mooney, Raymond J. (committee member).
Subjects/Keywords: Computational linguistics; Natural language processing; Unsupervised; Parsing; Chunking; Text processing
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Ponvert, E. F. (2011). Unsupervised partial parsing. (Doctoral Dissertation). University of Texas – Austin. Retrieved from http://hdl.handle.net/2152/ETD-UT-2011-08-3991
Chicago Manual of Style (16th Edition):
Ponvert, Elias Franchot. “Unsupervised partial parsing.” 2011. Doctoral Dissertation, University of Texas – Austin. Accessed February 28, 2021.
http://hdl.handle.net/2152/ETD-UT-2011-08-3991.
MLA Handbook (7th Edition):
Ponvert, Elias Franchot. “Unsupervised partial parsing.” 2011. Web. 28 Feb 2021.
Vancouver:
Ponvert EF. Unsupervised partial parsing. [Internet] [Doctoral dissertation]. University of Texas – Austin; 2011. [cited 2021 Feb 28].
Available from: http://hdl.handle.net/2152/ETD-UT-2011-08-3991.
Council of Science Editors:
Ponvert EF. Unsupervised partial parsing. [Doctoral Dissertation]. University of Texas – Austin; 2011. Available from: http://hdl.handle.net/2152/ETD-UT-2011-08-3991

University of Texas – Austin
21.
Denis, P. (Pascal).
New learning models for robust reference resolution.
Degree: PhD, Linguistics, 2007, University of Texas – Austin
URL: http://hdl.handle.net/2152/3566
► An important challenge for the automatic understanding of natural language texts is the correct computation of the discourse entities that are mentioned therein —persons, locations,…
(more)
▼ An important challenge for the automatic understanding of natural language texts is the
correct computation of the discourse entities that are mentioned therein —persons, locations,
abstract objects, and so on. The problem of mapping linguistic expressions into these
underlying entities is known as reference resolution. Recent years of research in computational
reference resolution have seen the emergence of machine learning approaches, which
are much more robust and better performing than their rule-based predecessors. Unfortunately,
perfect performance are still out of reach for these systems. Broadly defined, the
aim of this dissertation is to improve on these existing systems by exploring more advanced
machine learning models, which are: (i) able to more adequately encode the structure of the
problem, and (ii) allow a better use of the information sources that are given to the system.
Starting with the sub-task of anaphora resolution, we propose to model this task
viii
as a ranking problem and no longer as a classification problem (as is done in existing systems).
A ranker offers a potentially better way to model this task by directly including the
comparison between antecedent candidates as part of its training criterion. We find that the
ranker delivers significant performance improvements over classification-based systems,
and is also computationally more attractive in terms of training time and learning rate than
its rivals.
The ranking approach is then extended to the larger problem of coreference resolution.
To main goal is to see whether the better antecedent selection capabilities offered
by the ranking approach can also benefit in the larger coreference resolution task. The extension
is two-fold. First, we design various specialized ranker models for different types
referential expressions (e.g., pronouns, definite descriptions, proper names). Besides its
linguistic appeal, this division of labor has also the potential of learning better model parameters.
Second, we augment these rankers with a model that determines the discourse
status of mentions and that is used to filter the “non-anaphoric” mentions. As shown by
various experiments, this combined strategy results in significant performance improvements
over the single-model, classification-based approach on the three main coreference
metrics: the standard MUC metric, but also the more representative B
3
and CEAF metrics.
Finally, we show how the task of coreference resolution can be recast as a linear
optimization problem. In particular, we use the framework of Integer Linear Programming
(ILP) to: (i) combine the predictions of three local models (namely, a standard pairwise
coreference classifier, a discourse status classifier, and a named entity classifier) in a joint,
global inference, and (ii) integrate various other global constraints (such as transitivity constraints)
to better capture the dependencies between coreference decisions. Tested on the
ACE datasets, our ILP formulations deliver significant f-score…
Advisors/Committee Members: Baldridge, Jason (advisor), Asher, Nicholas (advisor).
Subjects/Keywords: Reference (Linguistics); English language – Discourse analysis
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Denis, P. (. (2007). New learning models for robust reference resolution. (Doctoral Dissertation). University of Texas – Austin. Retrieved from http://hdl.handle.net/2152/3566
Chicago Manual of Style (16th Edition):
Denis, P (Pascal). “New learning models for robust reference resolution.” 2007. Doctoral Dissertation, University of Texas – Austin. Accessed February 28, 2021.
http://hdl.handle.net/2152/3566.
MLA Handbook (7th Edition):
Denis, P (Pascal). “New learning models for robust reference resolution.” 2007. Web. 28 Feb 2021.
Vancouver:
Denis P(. New learning models for robust reference resolution. [Internet] [Doctoral dissertation]. University of Texas – Austin; 2007. [cited 2021 Feb 28].
Available from: http://hdl.handle.net/2152/3566.
Council of Science Editors:
Denis P(. New learning models for robust reference resolution. [Doctoral Dissertation]. University of Texas – Austin; 2007. Available from: http://hdl.handle.net/2152/3566

University of Texas – Austin
22.
Ramanujam, Srivatsan.
Factorial Hidden Markov Models for full and weakly supervised supertagging.
Degree: MA, Computer Sciences, 2009, University of Texas – Austin
URL: http://hdl.handle.net/2152/ETD-UT-2009-08-350
► For many sequence prediction tasks in Natural Language Processing, modeling dependencies between individual predictions can be used to improve prediction accuracy of the sequence as…
(more)
▼ For many sequence prediction tasks in Natural Language Processing, modeling dependencies between individual predictions can be used to improve
prediction accuracy of the sequence as a whole. Supertagging, involves assigning lexical entries to words based on lexicalized grammatical theory such as Combinatory Categorial Grammar (CCG).
Previous work has used Bayesian HMMs to learn taggers for both POS tagging and supertagging separately. Modeling them jointly has the potential to produce more robust and accurate supertaggers trained with less supervision and thereby potentially help in the creation of useful models for new languages and domains.
Factorial Hidden Markov Models (FHMM) support joint inference for multiple sequence prediction tasks. Here, I use them to jointly
predict part-of-speech tag and supertag sequences with varying levels of supervision. I show that supervised training of FHMM models
improves performance compared to standard HMMs, especially when labeled training material is scarce. Secondly, FHMMs trained from tag
dictionaries rather than labeled examples also perform better than a standard HMM. Finally, I show that an FHMM and a maximum entropy
Markov model can complement each other in a single step co-training setup that improves the performance of both models when there is
limited labeled training material available.
Advisors/Committee Members: Mooney, Raymond J. (Raymond Joseph) (advisor), Baldridge, Jason (committee member).
Subjects/Keywords: Hidden Markov Models; Bayesian Models; Categorial Grammar; Supertagging; Joint Inference
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Ramanujam, S. (2009). Factorial Hidden Markov Models for full and weakly supervised supertagging. (Masters Thesis). University of Texas – Austin. Retrieved from http://hdl.handle.net/2152/ETD-UT-2009-08-350
Chicago Manual of Style (16th Edition):
Ramanujam, Srivatsan. “Factorial Hidden Markov Models for full and weakly supervised supertagging.” 2009. Masters Thesis, University of Texas – Austin. Accessed February 28, 2021.
http://hdl.handle.net/2152/ETD-UT-2009-08-350.
MLA Handbook (7th Edition):
Ramanujam, Srivatsan. “Factorial Hidden Markov Models for full and weakly supervised supertagging.” 2009. Web. 28 Feb 2021.
Vancouver:
Ramanujam S. Factorial Hidden Markov Models for full and weakly supervised supertagging. [Internet] [Masters thesis]. University of Texas – Austin; 2009. [cited 2021 Feb 28].
Available from: http://hdl.handle.net/2152/ETD-UT-2009-08-350.
Council of Science Editors:
Ramanujam S. Factorial Hidden Markov Models for full and weakly supervised supertagging. [Masters Thesis]. University of Texas – Austin; 2009. Available from: http://hdl.handle.net/2152/ETD-UT-2009-08-350

University of Texas – Austin
23.
Hoyt, Frederick MacNeill.
Negative concord in Levantine Arabic.
Degree: PhD, Linguistics, 2010, University of Texas – Austin
URL: http://hdl.handle.net/2152/ETD-UT-2010-08-1763
► This dissertation is a study of negative concord in Levantine Arabic (Israel/Palestine, Jordan, Lebanon, Syria), where negative concord is the failure of an n-word to…
(more)
▼ This dissertation is a study of negative concord in Levantine Arabic (Israel/Palestine, Jordan, Lebanon, Syria), where negative concord is the failure of an n-word to express negative meaning distinctly when in syntagm with another negative expression . A set of n-words is identified, including the never-words <ʔɛbadan> and <bɪlmarra> "never, not once, not at all," the negative minimizers <hawa> and <qɛšal> "nothing," and the negative scalar focus particle <wala> "not (even) (one), not a (single)." Each can be used to express negation in sentence fragments and other constructions with elliptical interpretations, such as gapping and coordination. Beyond that, the three categories differ syntactically and semantically. I present analyses of these expressions that treat them as having different morphological and semantic properties. The data support an ambiguity analysis for wala-phrases, and a syntactic analysis of it with never-words, indicating that a single, uniform theory of negative concord should be rejected for Levantine Arabic.
The dissertation is the first such work to explicitly identify negative concord in Levantine Arabic, and to provide a detailed survey and analysis of it. The description includes subtle points of variation between regional varieties of Levantine, as well as in depth analysis of the usage of n-words. It also adds a large new data set to the body of data that has been reported on negative concord, and have several implications for theories on the subject. The dissertation also makes a contribution to computational linguistics as applied to Arabic, because the analyses are couched in Combinatory Categorial Grammar, a formalism that is used both for linguisic theorizing as well as for a variety of practical applications, including text parsing and text generaration. The semantic generalizations reported here are also important for practical computational tasks, because they provide a way to correctly calculate the negative or positive polarity of utterances in a negative concord language, which is essential for computational tasks such as machine translation or sentiment analysis.
Advisors/Committee Members: Baldridge, Jason (advisor), Beaver, David I. (committee member), Beavers, John (committee member), Abboud, Peter F. (committee member), Benmamoun, Abbas (committee member), Steedman, Mark J. (committee member).
Subjects/Keywords: Arabic language; Levantine Arabic; Colloquial Arabic; Syntax; Semantics; Negation; Negative concord
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Hoyt, F. M. (2010). Negative concord in Levantine Arabic. (Doctoral Dissertation). University of Texas – Austin. Retrieved from http://hdl.handle.net/2152/ETD-UT-2010-08-1763
Chicago Manual of Style (16th Edition):
Hoyt, Frederick MacNeill. “Negative concord in Levantine Arabic.” 2010. Doctoral Dissertation, University of Texas – Austin. Accessed February 28, 2021.
http://hdl.handle.net/2152/ETD-UT-2010-08-1763.
MLA Handbook (7th Edition):
Hoyt, Frederick MacNeill. “Negative concord in Levantine Arabic.” 2010. Web. 28 Feb 2021.
Vancouver:
Hoyt FM. Negative concord in Levantine Arabic. [Internet] [Doctoral dissertation]. University of Texas – Austin; 2010. [cited 2021 Feb 28].
Available from: http://hdl.handle.net/2152/ETD-UT-2010-08-1763.
Council of Science Editors:
Hoyt FM. Negative concord in Levantine Arabic. [Doctoral Dissertation]. University of Texas – Austin; 2010. Available from: http://hdl.handle.net/2152/ETD-UT-2010-08-1763
.