University of Washington
Enabling Community-Driven Proteomics.
Degree: PhD, 2019, University of Washington
As the field of proteomics matures, it faces several computational challenges. This dis- sertation describes three new computational methods to address some of these challenges: Metapeptides, Param-Medic and GLEAMS. Metapeptides constructs a database from site-specific metagenomic sequencing of micro- bial community samples to facilitate identification of mass spectra. Even massive public databases offer incomplete coverage of a given microbial community sample. Metaproteomes assembled from site-specific metagenomic sequencing offer better coverage but fail to include all the variability present in sequencing data. Metapeptides constructs a small, sample- targeted peptide database optimized for database search, offering superior sequence coverage and providing a dramatic boost to metaproteomic database search sensitivity at a controlled false discovery rate (FDR). Param-Medic infers optimal database search parameters directly from mass spectrometry data. Tight precursor and fragment mass tolerances can increase database search sensitivity at a given FDR. However, too-tight tolerances reduce sensitivity by improperly excluding match candidates and lowering match scores. Param-Medic infers optimal precursor and fragment tolerances by analyzing pairs of acquired spectra that are likely to have been generated by the same peptide ion, yielding more high confidence identifications at a given FDR than tolerances based on per-instrument best practice or even determined by experts. GLEAMS embeds mass spectra into a low-dimensional space in which spectra generated by the same peptide are close together, enabling rapid propagation of sequence identifications among communities of nearby spectra. Public proteomics repositories contain billions of spectra from researchers around the world, but traditional data analysis workflows fail to take advantage of those data. GLEAMS detects communities of spectra that represent the same peptide. Identifications can be propagated from identified to unidentified spectra, and unidentified communities can then be characterized by targeted downstream analysis. GLEAMS enables identification of 8% more spectra in a sample repository of five million spectra at low computational expense. Scaled up to an entire public repository, GLEAMS offers an efficient, community-driven approach to proteomics data analysis.
Advisors/Committee Members: Noble, William S (advisor).
Subjects/Keywords: computational biology; deep learning; machine learning; mass spectrometry; metaproteomics; proteomics; Biostatistics; Computer science; Genetics; Genetics
to Zotero / EndNote / Reference
APA (6th Edition):
May, D. (2019). Enabling Community-Driven Proteomics. (Doctoral Dissertation). University of Washington. Retrieved from http://hdl.handle.net/1773/43391
Chicago Manual of Style (16th Edition):
May, Damon. “Enabling Community-Driven Proteomics.” 2019. Doctoral Dissertation, University of Washington. Accessed March 20, 2019.
MLA Handbook (7th Edition):
May, Damon. “Enabling Community-Driven Proteomics.” 2019. Web. 20 Mar 2019.
May D. Enabling Community-Driven Proteomics. [Internet] [Doctoral dissertation]. University of Washington; 2019. [cited 2019 Mar 20].
Available from: http://hdl.handle.net/1773/43391.
Council of Science Editors:
May D. Enabling Community-Driven Proteomics. [Doctoral Dissertation]. University of Washington; 2019. Available from: http://hdl.handle.net/1773/43391