Sybrandt, Justin George.
Exploiting Latent Features of Text and Graphs.
Degree: PhD, School of Computing, 2020, Clemson University
As the size and scope of online data continues to grow, new machine learning techniques become necessary to best capitalize on the wealth of available information. However, the models that help convert data into knowledge require nontrivial processes to make sense of large collections of text and massive online graphs. In both scenarios, modern machine learning pipelines produce embeddings – semantically rich vectors of latent features – to convert human constructs for machine understanding. In this dissertation we focus on information available within biomedical science, including human-written abstracts of scientific papers, as well as machine-generated graphs of biomedical entity relationships. We present the Moliere system, and our method for identifying new discoveries through the use of natural language processing and graph mining algorithms. We propose heuristically-based ranking criteria to augment Moliere, and leverage this ranking to identify a new gene-treatment target for HIV-associated Neurodegenerative Disorders. We additionally focus on the latent features of graphs, and propose a new bipartite graph embedding technique. Using our graph embedding, we advance the state-of-the-art in hypergraph partitioning quality. Having newfound intuition of graph embeddings, we present Agatha, a deep-learning approach to hypothesis generation. This system learns a data-driven ranking criteria derived from the embeddings of our large proposed biomedical semantic graph. To produce human-readable results, we additionally propose CBAG, a technique for conditional biomedical abstract generation.
Advisors/Committee Members: Ilya Safro, Amy Apon, Sez Atamturktur, Brian Dean, Alexander Herzog.
Subjects/Keywords: Conditional Text Generation; Graph Embedding; Hypergraph Partitioning; Hypothesis Generation; Literature-based Discovery; Text Embedding
to Zotero / EndNote / Reference
APA (6th Edition):
Sybrandt, J. G. (2020). Exploiting Latent Features of Text and Graphs. (Doctoral Dissertation). Clemson University. Retrieved from https://tigerprints.clemson.edu/all_dissertations/2592
Chicago Manual of Style (16th Edition):
Sybrandt, Justin George. “Exploiting Latent Features of Text and Graphs.” 2020. Doctoral Dissertation, Clemson University. Accessed July 08, 2020.
MLA Handbook (7th Edition):
Sybrandt, Justin George. “Exploiting Latent Features of Text and Graphs.” 2020. Web. 08 Jul 2020.
Sybrandt JG. Exploiting Latent Features of Text and Graphs. [Internet] [Doctoral dissertation]. Clemson University; 2020. [cited 2020 Jul 08].
Available from: https://tigerprints.clemson.edu/all_dissertations/2592.
Council of Science Editors:
Sybrandt JG. Exploiting Latent Features of Text and Graphs. [Doctoral Dissertation]. Clemson University; 2020. Available from: https://tigerprints.clemson.edu/all_dissertations/2592