University of Southern California
Deciphering natural language.
Degree: PhD, Computer Science, 2011, University of Southern California
Most state-of-the-art techniques used in natural
language processing (NLP) are supervised and require labeled
training data. For example, statistical language translation
requires huge amounts of bilingual data for training translation
systems. But such data does not exist for all language pairs and
domains. Using human annotation to create new bilingual resources
is not a scalable solution. This raises a key research challenge:
How can we circumvent the problem of limited labeled resources for
NLP applications? Interestingly, cryptanalysts and archaeologists
have tackled similar challenges in solving "decipherment
problems".; This thesis work aims to bring together techniques from
classical cryptography, NLP and machine learning. We introduce a
novel approach called "natural language decipherment" that can
solve natural language problems without labeled (parallel) data. A
wide variety of NLP problems can be formulated as decipherment
tasks – for example, in statistical language translation one can
view the foreign-language text as a cipher for English. Instead of
relying on parallel training data, decipherment uses knowledge of
the target language (e.g., English) and large quantities of readily
available monolingual source (cipher) data to induce bilingual
connections between the source and target languages. Using
decipherment techniques, we make headway in attacking a hierarchy
of problems ranging from letter substitution decipherment to
sequence labeling problems (such as part-of-speech tagging) to
language translation. Along the way, we make several key
contributions – novel unsupervised algorithms that search for
minimized models during decipherment and achieve state-of-the-art
results on a number of important natural language tasks. Unlike
conventional approaches, these decipherment methods can be easily
extended to multiple domains and languages (especially
resource-poor languages), thereby helping to spread the impact and
benefits of NLP research.
Advisors/Committee Members: Knight, Kevin (Committee Chair), Marcu, Daniel (Committee Member), Chiang, David (Committee Member), Teng, Shang-Hua (Committee Member), Narayanan, Shrikanth S. (Committee Member).
Subjects/Keywords: natural language processing; machine learning; computational decipherment; artificial intelligence; statistics
to Zotero / EndNote / Reference
APA (6th Edition):
Ravi, S. (2011). Deciphering natural language. (Doctoral Dissertation). University of Southern California. Retrieved from http://digitallibrary.usc.edu/cdm/compoundobject/collection/p15799coll127/id/448537/rec/1788
Chicago Manual of Style (16th Edition):
Ravi, Sujith. “Deciphering natural language.” 2011. Doctoral Dissertation, University of Southern California. Accessed December 14, 2019.
MLA Handbook (7th Edition):
Ravi, Sujith. “Deciphering natural language.” 2011. Web. 14 Dec 2019.
Ravi S. Deciphering natural language. [Internet] [Doctoral dissertation]. University of Southern California; 2011. [cited 2019 Dec 14].
Available from: http://digitallibrary.usc.edu/cdm/compoundobject/collection/p15799coll127/id/448537/rec/1788.
Council of Science Editors:
Ravi S. Deciphering natural language. [Doctoral Dissertation]. University of Southern California; 2011. Available from: http://digitallibrary.usc.edu/cdm/compoundobject/collection/p15799coll127/id/448537/rec/1788