You searched for +publisher:"University of Saskatchewan" +contributor:("Kusalik, Tony")
.
Showing records 1 – 20 of
20 total matches.
No search limiters apply to these results.
1.
Haque, H M Zabir 1991-.
Sample Size Evaluation and Comparison of K-Means Clusterings of RNA-Seq Gene Expression Data.
Degree: 2018, University of Saskatchewan
URL: http://hdl.handle.net/10388/11741
► The process by which DNA is transformed into gene products, such as RNA and proteins, is called gene expression. Gene expression profiling quantifies the expression…
(more)
▼ The process by which DNA is transformed into gene products, such as RNA and proteins, is called gene expression. Gene expression profiling quantifies the expression of genes (amount of RNA) in a particular tissue at a particular time. Two commonly used high-throughput techniques for gene expression analysis are DNA microarrays and RNA-Seq, with RNA-Seq being the newer technique based on high-throughput sequencing.
Statistical analysis is needed to deal with complex datasets — one commonly used statistical tool is clustering. Clustering comparison is an existing area dedicated to comparing multiple clusterings from one or more clustering algorithms. However, there has been limited application of cluster comparisons to clusterings of RNA-Seq gene expression data. In particular, cluster comparisons are useful in order to test the differences between clusterings obtained using a single algorithm when using different samples for clustering.
Here we use a metric for cluster comparisons that is a variation of existing metrics. The metric is simply the minimal number of genes that need to be moved from one cluster to another in one given clustering to produce another given clustering. As the metric only has genes (or elements) as units, it is easy to interpret for RNA-Seq analysis. Moreover, three different algorithmic techniques — brute force, branch-and-bound, and maximal bipartite matching — for computing the proposed metric exactly are compared in terms of time to compute, with bipartite matching being significantly more time efficient.
This metric is then applied to the important issue of understanding the effect of increasing the number of RNA-Seq samples to clusterings. Three datasets were used where a large number of samples were available: mouse embryonic stem cell tissue data, Drosophila melanogaster data from multiple tissues and micro-climates, and a mouse multi-tissue dataset. For each, a reference clustering was computed from all of the samples, and then it was compared to clusterings created from smaller subsets of the samples. All clusterings were created using a standard heuristic K-means clustering algorithm, while also systematically varying the numbers of clusters, and also using both Euclidean distance and Manhattan distance. The clustering comparisons suggest that for the three large datasets tested, there seems to be a limited impact of adding more RNA-Seq samples on K-means clusterings using both Euclidean distance and Manhattan distance (Manhattan distance gives a higher variation) beyond some small number of samples. That is, the clusterings compiled based on a limited number of samples were all either quite similar to the reference clustering or did not improve as additional samples were added. These findings were the same for different numbers of clusters. The methods developed could also be applied to other clustering comparison problems.
Advisors/Committee Members: McQuillan, Ian, Kusalik, Tony, Horsch, Michael, Schneider, David.
Subjects/Keywords: Clusterings; Cluster Comparison Distance; Clustering Distance; Gene Expression; RNA Sequencing
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Haque, H. M. Z. 1. (2018). Sample Size Evaluation and Comparison of K-Means Clusterings of RNA-Seq Gene Expression Data. (Thesis). University of Saskatchewan. Retrieved from http://hdl.handle.net/10388/11741
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Chicago Manual of Style (16th Edition):
Haque, H M Zabir 1991-. “Sample Size Evaluation and Comparison of K-Means Clusterings of RNA-Seq Gene Expression Data.” 2018. Thesis, University of Saskatchewan. Accessed April 19, 2021.
http://hdl.handle.net/10388/11741.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
MLA Handbook (7th Edition):
Haque, H M Zabir 1991-. “Sample Size Evaluation and Comparison of K-Means Clusterings of RNA-Seq Gene Expression Data.” 2018. Web. 19 Apr 2021.
Vancouver:
Haque HMZ1. Sample Size Evaluation and Comparison of K-Means Clusterings of RNA-Seq Gene Expression Data. [Internet] [Thesis]. University of Saskatchewan; 2018. [cited 2021 Apr 19].
Available from: http://hdl.handle.net/10388/11741.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Council of Science Editors:
Haque HMZ1. Sample Size Evaluation and Comparison of K-Means Clusterings of RNA-Seq Gene Expression Data. [Thesis]. University of Saskatchewan; 2018. Available from: http://hdl.handle.net/10388/11741
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation

University of Saskatchewan
2.
Khan, Nazifa Azam 1991-.
Chromosome Descrambling Order Analysis in ciliates.
Degree: 2016, University of Saskatchewan
URL: http://hdl.handle.net/10388/7557
► Ciliates are a type of unicellular eukaryotic organism that has two types of nuclei within each cell; one is called the macronucleus (MAC) and the…
(more)
▼ Ciliates are a type of unicellular eukaryotic organism that has two types of nuclei within each cell; one is called the macronucleus (MAC) and the other is known as the micronucleus (MIC). During mating, ciliates exchange their MIC, destroy their own MAC, and create a new MAC from the genetic material of their new MIC. The process of developing a new MAC from the exchanged new MIC is known as gene assembly in ciliates, and it consists of a massive amount of DNA excision from the micronucleus, and the rearrangement of the rest of the DNA sequences. During the gene assembly process, the DNA segments that get eliminated are known as internal eliminated segments (IESs), and the remaining DNA segments that are rearranged in an order that is correct for creating proteins, are called macronuclear destined segments (MDSs).
A topic of interest is to predict the correct order to descramble a gene or chromosomal segment. A prediction can be made based on the principle of parsimony, whereby the smallest sequence of operations is likely close to the actual number of operations that occurred. Interestingly, the order of MDSs in the newly assembled 22,354 Oxytricha trifallax MIC chromosome fragments provides evidence that multiple parallel recombinations occur, where the structure of the chromosomes allows for interleaving between two sections of the developing macronuclear chromosome in a manner that can be captured with a common string operation called the shuffle operation (the shuffle operation on two strings results in a new string by weaving together the first two, while preserving the order within each string). Thus, we studied four similar systems involving applications of shuffle to see how the minimum number of operations needed to assemble differs between the types. Two algorithms for each of the first two systems have been implemented that are both shown to be optimal. And, for the third and fourth systems, four and two heuristic algorithms, respectively, have been implemented. The results from these algorithms revealed that, in most cases, the third system gives the minimum number of applications of shuffle to descramble, but whether the best implemented algorithm for the third system is optimal or not remains an open question. The best implemented algorithm for the third system showed that 96.63% of the scrambled micronuclear chromosome fragments of Oxytricha trifallax can be descrambled by only 1 or 2 applications of shuffle. This small number of steps lends theoretical evidence that some structural component is enforcing an alignment of segments in a shuffle-like fashion, and then parallel recombination is taking place to enable MDS rearrangement and IES elimination.
Another problem of interest is to classify segments of the MIC into MDSs and IESs; this is the second topic of the thesis, and is a matter of determining the right "class label", i.e. MDS or IES, on each nucleotide. Thus, training data of labelled input sequences was used with hidden Markov models (HMMs), which is a well-known supervised machine…
Advisors/Committee Members: McQuillan, Ian, Kusalik, Tony, Eramian, Mark, Wu, FangXiang.
Subjects/Keywords: Micronuclear; Macronuclear; Gene Assembly; Macronuclear Destined Segments; Internal Eliminated Segments; Scrambled; Descrambling.
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Khan, N. A. 1. (2016). Chromosome Descrambling Order Analysis in ciliates. (Thesis). University of Saskatchewan. Retrieved from http://hdl.handle.net/10388/7557
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Chicago Manual of Style (16th Edition):
Khan, Nazifa Azam 1991-. “Chromosome Descrambling Order Analysis in ciliates.” 2016. Thesis, University of Saskatchewan. Accessed April 19, 2021.
http://hdl.handle.net/10388/7557.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
MLA Handbook (7th Edition):
Khan, Nazifa Azam 1991-. “Chromosome Descrambling Order Analysis in ciliates.” 2016. Web. 19 Apr 2021.
Vancouver:
Khan NA1. Chromosome Descrambling Order Analysis in ciliates. [Internet] [Thesis]. University of Saskatchewan; 2016. [cited 2021 Apr 19].
Available from: http://hdl.handle.net/10388/7557.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Council of Science Editors:
Khan NA1. Chromosome Descrambling Order Analysis in ciliates. [Thesis]. University of Saskatchewan; 2016. Available from: http://hdl.handle.net/10388/7557
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
3.
Jin, Lingling.
Interruptional Activity and Simulation of Transposable Elements.
Degree: 2017, University of Saskatchewan
URL: http://hdl.handle.net/10388/8084
► Transposable elements (TEs) are interspersed DNA sequences that can move or copy to new positions within a genome. The active TEs along with the remnants…
(more)
▼ Transposable elements (TEs) are interspersed DNA sequences that can move or copy to new positions within a genome. The active TEs along with the remnants of many transposition events over millions of years constitute 46.69% of the human genome. TEs are believed to promote speciation and their activities play a significant role in human disease. The 22 AluY and 6 AluS TE subfamilies have been the most active TEs in recent human history, whose transposition has been implicated in several inherited human diseases and in various forms of cancer by integrating into genes. Therefore, understanding the transposition activities is very important.
Recently, there has been some work done to quantify the activity levels of active Alu transposable elements based on variation in the sequence. Here, given this activity data, an analysis of TE activity based on the position of mutations is conducted. Two different methods/simulations are created to computationally predict so-called harmful mutation regions in the consensus sequence of a TE; that is, mutations that occur in these regions decrease the transposition activities dramatically. The methods are applied to AluY, the youngest and most active Alu subfamily, to identify the harmful regions laying in its consensus, and verifications are presented using the activity of AluY elements and the secondary structure of the AluYa5 RNA, providing evidence that the method is successfully identifying harmful mutation regions. A supplementary simulation also shows that the identified harmful regions covering the AluYa5 RNA functional regions are not occurring by chance. Therefore, mutations within the harmful regions alter the mobile activity levels of active AluY elements. One of the methods is then applied to two additional TE families: the Alu family and L1 family, in detecting the harmful regions in these elements computationally.
Understanding and predicting the evolution of these TEs is of interest in understanding their powerful evolutionary force in shaping their host genomes. In this thesis, a formal model of TE fragments and their interruptions is devised that provides definitions that are compatible with biological nomenclature, while still providing a suitable formal foundation for computational analysis. Essentially, this model is used for fixing terminology that was misleading in the literature, and it helps to describe further TE problems in a precise way. Indeed, later chapters include two other models built on top of this model: the sequential interruption model and the recursive interruption model, both used to analyze their activity throughout evolution.
The sequential interruption model is defined between TEs that occur in a genomic sequence to estimate how often TEs interrupt other TEs, which has been shown to be useful in predicting their ages and their activity throughout evolution. Here, this prediction from the sequential interruptions is shown to be closely related to a classic matrix optimization problem: the Linear Ordering Problem (LOP). By…
Advisors/Committee Members: Vassileva, Julita, Kusalik, Tony, McQuillan, Ian, Keil, Mark.
Subjects/Keywords: Transposable elements; transpositional activity; theoretical modelling
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Jin, L. (2017). Interruptional Activity and Simulation of Transposable Elements. (Thesis). University of Saskatchewan. Retrieved from http://hdl.handle.net/10388/8084
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Chicago Manual of Style (16th Edition):
Jin, Lingling. “Interruptional Activity and Simulation of Transposable Elements.” 2017. Thesis, University of Saskatchewan. Accessed April 19, 2021.
http://hdl.handle.net/10388/8084.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
MLA Handbook (7th Edition):
Jin, Lingling. “Interruptional Activity and Simulation of Transposable Elements.” 2017. Web. 19 Apr 2021.
Vancouver:
Jin L. Interruptional Activity and Simulation of Transposable Elements. [Internet] [Thesis]. University of Saskatchewan; 2017. [cited 2021 Apr 19].
Available from: http://hdl.handle.net/10388/8084.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Council of Science Editors:
Jin L. Interruptional Activity and Simulation of Transposable Elements. [Thesis]. University of Saskatchewan; 2017. Available from: http://hdl.handle.net/10388/8084
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
4.
Ovens, Katie 1990-.
Integrating biclustering techniques with de novo gene regulatory network discovery using RNA-seq from skeletal tissues.
Degree: 2016, University of Saskatchewan
URL: http://hdl.handle.net/10388/7552
► In order to improve upon stem cell therapy for osteoarthritis, it is necessary to understand the molecular and cellular processes behind bone development and the…
(more)
▼ In order to improve upon stem cell therapy for osteoarthritis, it is necessary to understand the molecular and cellular processes behind bone development and the differences from cartilage formation. To further elucidate these processes would provide a means to analyze the relatedness of bone and cartilage tissue by determining genes that are expressed and regulated for stem cells to differentiate into skeletal tissues. It would also contribute to the classification of differences in normal skeletogenesis and degenerative conditions involving these tissues. The three predominant skeletal tissues of interest are bone, immature cartilage and mature cartilage. Analysis of the transcriptome of these skeletal tissues using RNA-seq technology was performed using differential expression, clustering and biclustering algorithms, to detect similarly expressed genes, which provides evidence for genes potentially interacting together to produce a particular phenotype. Identifying key regulators in the gene regulatory networks (GRNs) driving cartilage and bone development and the differences in the GRNs they drive will facilitate a means to make comparisons between the tissues at the transcriptomic level.
Due to a small number of available samples for gene expression data in bone, immature and mature cartilage, it is necessary to determine how the number of samples influences the ability to make accurate GRN predictions. Machine learning techniques for GRN prediction that can incorporate multiple data types have not been well evaluated for complex organisms, nor has RNA-seq data been used often for evaluating these methods. Therefore, techniques identified to work well with microarray data were applied to RNA-seq data from mouse embryonic stem cells, where more samples are available for evaluation compared to the skeletal tissue RNA-seq samples. The RNA-seq data was combined with ChIP-seq data to determine if the machine learning methods outperform simple, correlation-based methods that have been evaluated using RNA-seq data alone. Two of the best performing GRN prediction algorithms from previous large-scale evaluations, which are incapable of incorporating data beyond expression data, were used as a baseline to determine if the addition of multiple data types could help reduce the number of gene expression samples. It was also necessary to identify a biclustering algorithm that could identify potentially biologically relevant modules. Publicly available ChIP-seq and RNA-seq samples from embryonic stem cells were used to measure the performance and consistency of each method, as there was a well-established network in mouse embryonic stem cells to compare results. The methods were then compared to cMonkey2, a biclustering method used in conjunction with ChIP-seq for two important transcription factors in the embryonic stem cell network. This was done to determine if any of these GRN prediction methods could potentially use the small number of skeletal tissue samples available to determine transcription factors orchestrating the…
Advisors/Committee Members: McQuillan, Ian, Eames, Brian, Stanley, Kevin, Kusalik, Tony, Eskiw, Chris.
Subjects/Keywords: Gene Regulatory Networks; Biclustering; Skeletal Tissues
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Ovens, K. 1. (2016). Integrating biclustering techniques with de novo gene regulatory network discovery using RNA-seq from skeletal tissues. (Thesis). University of Saskatchewan. Retrieved from http://hdl.handle.net/10388/7552
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Chicago Manual of Style (16th Edition):
Ovens, Katie 1990-. “Integrating biclustering techniques with de novo gene regulatory network discovery using RNA-seq from skeletal tissues.” 2016. Thesis, University of Saskatchewan. Accessed April 19, 2021.
http://hdl.handle.net/10388/7552.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
MLA Handbook (7th Edition):
Ovens, Katie 1990-. “Integrating biclustering techniques with de novo gene regulatory network discovery using RNA-seq from skeletal tissues.” 2016. Web. 19 Apr 2021.
Vancouver:
Ovens K1. Integrating biclustering techniques with de novo gene regulatory network discovery using RNA-seq from skeletal tissues. [Internet] [Thesis]. University of Saskatchewan; 2016. [cited 2021 Apr 19].
Available from: http://hdl.handle.net/10388/7552.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Council of Science Editors:
Ovens K1. Integrating biclustering techniques with de novo gene regulatory network discovery using RNA-seq from skeletal tissues. [Thesis]. University of Saskatchewan; 2016. Available from: http://hdl.handle.net/10388/7552
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation

University of Saskatchewan
5.
Aziz, Syed Umair.
Improved Inference of Ecological Interaction Types.
Degree: 2020, University of Saskatchewan
URL: http://hdl.handle.net/10388/13096
► Inference of microbial interaction types allows us to understand the growth and development of microbial life forms found on earth. Numerous methods have been proposed…
(more)
▼ Inference of microbial interaction types allows us to understand the growth and development of microbial life forms found on earth. Numerous methods have been proposed to infer the interaction type(s) of microbes in a microbial communities using a population dynamics model. However, due to dynamic behaviour of microbial communities, these methods can result in erroneous inferences. A method proposed by Xiao et al. in 2017 models the dynamic behaviour of microbial community using sample abundance data overcomes many of these issues, but suffers from a high failure rate of inference, lower confidence on inferred interactions and slower execution speed than the existing algorithms. In this thesis, we propose an improved and more efficient and effective approach to infer the microbial interaction types of larger microbial communities (N>10). Our findings demonstrate that our approach is faster, more fault tolerant, more scalable than the state of the art from 2017, and it has the ability to infer microbial interactions with increased confidence.
Advisors/Committee Members: Stanley, Kevin, Kusalik, Tony, Siciliano, Steven, Mondal, Debajyoti, Peak, Derek.
Subjects/Keywords: microbial interactions; unsupervised
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Aziz, S. U. (2020). Improved Inference of Ecological Interaction Types. (Thesis). University of Saskatchewan. Retrieved from http://hdl.handle.net/10388/13096
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Chicago Manual of Style (16th Edition):
Aziz, Syed Umair. “Improved Inference of Ecological Interaction Types.” 2020. Thesis, University of Saskatchewan. Accessed April 19, 2021.
http://hdl.handle.net/10388/13096.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
MLA Handbook (7th Edition):
Aziz, Syed Umair. “Improved Inference of Ecological Interaction Types.” 2020. Web. 19 Apr 2021.
Vancouver:
Aziz SU. Improved Inference of Ecological Interaction Types. [Internet] [Thesis]. University of Saskatchewan; 2020. [cited 2021 Apr 19].
Available from: http://hdl.handle.net/10388/13096.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Council of Science Editors:
Aziz SU. Improved Inference of Ecological Interaction Types. [Thesis]. University of Saskatchewan; 2020. Available from: http://hdl.handle.net/10388/13096
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation

University of Saskatchewan
6.
Kopas, Logan.
Techniques to Improve Deep Learning for Phenotype Prediction from Genotype Data.
Degree: 2020, University of Saskatchewan
URL: http://hdl.handle.net/10388/13148
► We show that by representing Single Nucleotide Polymorphism (SNP) data to a neural network in a way that incorporates quality scores and avoids filtering out…
(more)
▼ We show that by representing Single Nucleotide Polymorphism (SNP) data to a neural network in a way that incorporates quality scores and avoids filtering out low quality SNPs we are able to increase the effectiveness of a deep neural network for phenotype prediction from genotype in some cases. We also show that we are able to significantly increase the predictive power of a neural network by making use of transfer learning. We demonstrate these results on a Whole Genome Sequencing (WGS) Neisseria gonorrhoeae dataset where we predict Antimicrobial Resistance (AMR) as well as on an exome sequencing Lens culinaris dataset where we predict 3 growing rate phenotypes.
Advisors/Committee Members: Kusalik, Tony, Schneider, Dave, Stavness, Ian, Bett, Kirsten, Zhang, Xuekui.
Subjects/Keywords: Deep learning; bioinformatics; genotype; phenotype prediction
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Kopas, L. (2020). Techniques to Improve Deep Learning for Phenotype Prediction from Genotype Data. (Thesis). University of Saskatchewan. Retrieved from http://hdl.handle.net/10388/13148
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Chicago Manual of Style (16th Edition):
Kopas, Logan. “Techniques to Improve Deep Learning for Phenotype Prediction from Genotype Data.” 2020. Thesis, University of Saskatchewan. Accessed April 19, 2021.
http://hdl.handle.net/10388/13148.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
MLA Handbook (7th Edition):
Kopas, Logan. “Techniques to Improve Deep Learning for Phenotype Prediction from Genotype Data.” 2020. Web. 19 Apr 2021.
Vancouver:
Kopas L. Techniques to Improve Deep Learning for Phenotype Prediction from Genotype Data. [Internet] [Thesis]. University of Saskatchewan; 2020. [cited 2021 Apr 19].
Available from: http://hdl.handle.net/10388/13148.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Council of Science Editors:
Kopas L. Techniques to Improve Deep Learning for Phenotype Prediction from Genotype Data. [Thesis]. University of Saskatchewan; 2020. Available from: http://hdl.handle.net/10388/13148
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation

University of Saskatchewan
7.
Yi, Xin 1989-.
Plant Seed Identification.
Degree: 2017, University of Saskatchewan
URL: http://hdl.handle.net/10388/7813
► Plant seed identification is routinely performed for seed certification in seed trade, phytosanitary certification for the import and export of agricultural commodities, and regulatory monitoring,…
(more)
▼ Plant seed identification is routinely performed for seed certification in seed trade, phytosanitary certification for the import and export of agricultural commodities, and regulatory monitoring, surveillance, and enforcement. Current identification is performed manually by seed analysts with limited aiding tools. Extensive expertise and time is required, especially for small, morphologically similar seeds. Computers are, however, especially good at recognizing subtle differences that humans find difficult to perceive. In this thesis, a 2D, image-based computer-assisted approach is proposed.
The size of plant seeds is extremely small compared with daily objects. The microscopic images of plant seeds are usually degraded by defocus blur due to the high magnification of the imaging equipment. It is necessary and beneficial to differentiate the in-focus and blurred regions given that only sharp regions carry distinctive information usually for identification. If the object of interest, the plant seed in this case, is in- focus under a single image frame, the amount of defocus blur can be employed as a cue to separate the object and the cluttered background. If the defocus blur is too strong to obscure the object itself, sharp regions of multiple image frames acquired at different focal distance can be merged together to make an all-in-focus image. This thesis describes a novel non-reference sharpness metric which exploits the distribution difference of uniform LBP patterns in blurred and non-blurred image regions. It runs in realtime on a single core cpu and responses much better on low contrast sharp regions than the competitor metrics. Its benefits are shown both in defocus segmentation and focal stacking.
With the obtained all-in-focus seed image, a scale-wise pooling method is proposed to construct its feature representation. Since the imaging settings in lab testing are well constrained, the seed objects in the acquired image can be assumed to have measureable scale and controllable scale variance. The proposed method utilizes real pixel scale information and allows for accurate comparison of seeds across scales. By cross-validation on our high quality seed image dataset, better identification rate (95%) was achieved compared with pre- trained convolutional-neural-network-based models (93.6%). It offers an alternative method for image based identification with all-in-focus object images of limited scale variance.
The very first digital seed identification tool of its kind was built and deployed for test in the seed laboratory of Canadian food inspection agency (CFIA). The proposed focal stacking algorithm was employed to create all-in-focus images, whereas scale-wise pooling feature representation was used as the image signature. Throughput, workload, and identification rate were evaluated and seed analysts reported significantly lower mental demand (p = 0.00245) when using the provided tool compared with manual identification. Although the identification rate in practical test is only around 50%, I have…
Advisors/Committee Members: Vassileva, Julita, Eramian, Mark, Horsch , Michael C., Kusalik, Tony, Neufeld, Eric, Bui, Francis M., Wang, Ruojing.
Subjects/Keywords: seed identification; defocus segmentation; focal stacking; fine-grained; seed test; defocus blur; object recognition; out-of-focus; sharpness metric
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Yi, X. 1. (2017). Plant Seed Identification. (Thesis). University of Saskatchewan. Retrieved from http://hdl.handle.net/10388/7813
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Chicago Manual of Style (16th Edition):
Yi, Xin 1989-. “Plant Seed Identification.” 2017. Thesis, University of Saskatchewan. Accessed April 19, 2021.
http://hdl.handle.net/10388/7813.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
MLA Handbook (7th Edition):
Yi, Xin 1989-. “Plant Seed Identification.” 2017. Web. 19 Apr 2021.
Vancouver:
Yi X1. Plant Seed Identification. [Internet] [Thesis]. University of Saskatchewan; 2017. [cited 2021 Apr 19].
Available from: http://hdl.handle.net/10388/7813.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Council of Science Editors:
Yi X1. Plant Seed Identification. [Thesis]. University of Saskatchewan; 2017. Available from: http://hdl.handle.net/10388/7813
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation

University of Saskatchewan
8.
Asavajaru, Akarin.
Structural characterization of viral glycoprotein GP2a, GP3, and GP4 of Porcine Reproductive and Respiratory Syndrome Virus.
Degree: 2019, University of Saskatchewan
URL: http://hdl.handle.net/10388/12537
► Porcine Reproductive and Respiratory Syndrome Virus (PRRSV) is one of the most economically significant pathogens in the pork-producing industry worldwide. PRRSV is a major player…
(more)
▼ Porcine Reproductive and Respiratory Syndrome Virus (PRRSV) is one of the most economically significant pathogens in the pork-producing industry worldwide. PRRSV is a major player in the porcine respiratory disease complex, which affects swine of all ages and which can result in reproductive failure in sows. PRRSV is single-stranded, positive sense RNA virus. The virus encodes seven glycoproteins, of which four are envelope glycoproteins namely, GP2a (ORF2a), GP3 (ORF3), GP4 (ORF4), and GP5 (ORF5). Currently, there is little information about the three-dimensional structure of the glycoproteins, their interaction with each other, and the virion formation. The results from a few studies point toward an essential role for GP2a/GP4 and GP3 in PRRSV assembly; however, very little is known about the function of these minor structural proteins. We hypothesize that the formation of a complex of GP2a, GP3, and GP4 alters the three-dimensional structure of these proteins, resulting in limited access to neutralizing epitopes on GP2a, GP3, and GP4. The purpose of this study is to shed light on the three-dimensional structure of the glycoproteins as part of a complex and by themselves. Two different expression systems were used to express and purify the ectodomains of the glycoproteins thought to be important in the complex formation. Nuclear magnetic resonance spectroscopy (NMR) was used to test folding of these ectodomains to enable three-dimensional structure prediction using computational bioinformatics.
Advisors/Committee Members: Gerdts, Volker, Dmitriev, Oleg, Kusalik, Tony, Tikoo, Suresh, Leung, Adelaine, Rubin, Joe.
Subjects/Keywords: Viral glycoprotein; PRRSV
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Asavajaru, A. (2019). Structural characterization of viral glycoprotein GP2a, GP3, and GP4 of Porcine Reproductive and Respiratory Syndrome Virus. (Thesis). University of Saskatchewan. Retrieved from http://hdl.handle.net/10388/12537
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Chicago Manual of Style (16th Edition):
Asavajaru, Akarin. “Structural characterization of viral glycoprotein GP2a, GP3, and GP4 of Porcine Reproductive and Respiratory Syndrome Virus.” 2019. Thesis, University of Saskatchewan. Accessed April 19, 2021.
http://hdl.handle.net/10388/12537.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
MLA Handbook (7th Edition):
Asavajaru, Akarin. “Structural characterization of viral glycoprotein GP2a, GP3, and GP4 of Porcine Reproductive and Respiratory Syndrome Virus.” 2019. Web. 19 Apr 2021.
Vancouver:
Asavajaru A. Structural characterization of viral glycoprotein GP2a, GP3, and GP4 of Porcine Reproductive and Respiratory Syndrome Virus. [Internet] [Thesis]. University of Saskatchewan; 2019. [cited 2021 Apr 19].
Available from: http://hdl.handle.net/10388/12537.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Council of Science Editors:
Asavajaru A. Structural characterization of viral glycoprotein GP2a, GP3, and GP4 of Porcine Reproductive and Respiratory Syndrome Virus. [Thesis]. University of Saskatchewan; 2019. Available from: http://hdl.handle.net/10388/12537
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
9.
yuan, zheng.
A feature-based deisotoping method for tandem mass spectra.
Degree: 2012, University of Saskatchewan
URL: http://hdl.handle.net/10388/ETD-2012-04-452
► For high-resolution tandem mass spectra, the determination of monoisotopic masses of fragment ions plays a key role in the subsequent peptide and protein identification. It…
(more)
▼ For high-resolution tandem mass spectra, the determination of monoisotopic masses of fragment ions plays a key role in the subsequent peptide and protein identification. It can directly influence the subsequent analysis of mass spectra including peptide determination and quantification. However, there are two difficulties during the process of detecting fragment ions: First, in some cases many real fragment ions have very low intensity and they can be removed as noise peaks by accident. Numerous noisy peaks in tandem mass spectra can cause either false negative or false positive fragment ions. Second, due to the existence of heavy isotopes in nature, more than one isotopic peak for each fragment ion is resolved in high-resolution tandem mass spectra. Though isotopic peaks can provide us with useful information, such as compound composition and charge states, they can increase the computational cost if peptide identification is done without removing them. In addition, isotopic peaks can overlap, which could result in wrong interpretation of masses of fragment ions.
In bottom-up proteomics, proteins are firstly cleaved into smaller peptides which are then used to be analyzed. Since tandem mass spectra of smaller peptides are easier than that of the intact proteins, bottom-up spectra are most often used in the identification of peptides and proteins. In this paper, to increase the accuracy of the peptide identification and reduce the complexity of tandem mass spectral analysis, we present a new algorithm for deisotoping the bottom-up spectra. Isotopic-cluster graphs are constructed to describe the relationship between all possible isotopic clusters. Based on the relationships in isotopic-cluster graphs each possible isotopic cluster is evaluated with a score function that is built by combining non-intensity and intensity features of fragment ions. The non-intensity features are used to prevent fragment ions with low intensity from being removed. Dynamic programming is adopted to find the paths with the highest score, which are presumably the most reliable isotopic clusters. Experimental results show that the average Mascot scores and F-scores of identified peptides from spectra processed by our deisotoping method are greater than those by widely used YADA and MS-Deconv software.
Advisors/Committee Members: Wu, FangXiang, Kusalik, Tony, Gopalan, Selvaraj.
Subjects/Keywords: tandem mass spectra; deisotoping; features; overlapping; isotopic-cluster graphs; dynamic programming.
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
yuan, z. (2012). A feature-based deisotoping method for tandem mass spectra. (Thesis). University of Saskatchewan. Retrieved from http://hdl.handle.net/10388/ETD-2012-04-452
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Chicago Manual of Style (16th Edition):
yuan, zheng. “A feature-based deisotoping method for tandem mass spectra.” 2012. Thesis, University of Saskatchewan. Accessed April 19, 2021.
http://hdl.handle.net/10388/ETD-2012-04-452.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
MLA Handbook (7th Edition):
yuan, zheng. “A feature-based deisotoping method for tandem mass spectra.” 2012. Web. 19 Apr 2021.
Vancouver:
yuan z. A feature-based deisotoping method for tandem mass spectra. [Internet] [Thesis]. University of Saskatchewan; 2012. [cited 2021 Apr 19].
Available from: http://hdl.handle.net/10388/ETD-2012-04-452.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Council of Science Editors:
yuan z. A feature-based deisotoping method for tandem mass spectra. [Thesis]. University of Saskatchewan; 2012. Available from: http://hdl.handle.net/10388/ETD-2012-04-452
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
10.
Zhang, Jian.
Parallel algorithms for real-time peptide-spectrum matching.
Degree: 2010, University of Saskatchewan
URL: http://hdl.handle.net/10388/etd-12132010-114248
► Tandem mass spectrometry is a powerful experimental tool used in molecular biology to determine the composition of protein mixtures. It has become a standard technique…
(more)
▼ Tandem mass spectrometry is a powerful experimental tool used in molecular biology to determine the composition of protein mixtures. It has become a standard technique for protein identification. Due to the rapid development of mass spectrometry technology, the instrument can now produce a large number of mass spectra which are used for peptide identification. The increasing data size demands efficient software tools to perform peptide identification.
In a tandem mass experiment, peptide ion selection algorithms generally select only the most abundant peptide ions for further fragmentation. Because of this, the low-abundance proteins in a sample rarely get identified. To address this problem, researchers develop the notion of a `dynamic exclusion list', which maintains a list of newly selected peptide ions, and it ensures these peptide ions do not get selected again for a certain time. In this way, other peptide ions will get more opportunity to be selected and identified, allowing for identification of peptides of lower abundance.
However, a better method is to also include the identification results into the `dynamic exclusion list' approach. In order to do this, a real-time peptide identification algorithm is required.
In this thesis, we introduce methods to improve the speed of peptide identification so that the `dynamic exclusion list' approach can use the peptide identification results without affecting the throughput of the instrument. Our work is based on RT-PSM, a real-time program for peptide-spectrum matching with statistical significance. We profile the speed of RT-PSM and find out that the peptide-spectrum scoring module is the most time consuming portion.
Given by the profiling results, we introduce methods to parallelize the peptide-spectrum scoring algorithm. In this thesis, we propose two parallel algorithms using different technologies. We introduce parallel peptide-spectrum matching using SIMD instructions. We implemented and tested the parallel algorithm on Intel SSE architecture. The test results show that a 18-fold speedup on the entire process is obtained. The second parallel algorithm is developed using NVIDIA CUDA technology. We describe two CUDA kernels based on different algorithms and compare the performance of the two kernels. The more efficient algorithm is integrated into RT-PSM. The time measurement results show that a 190-fold speedup on the scoring module is achieved and 26-fold speedup on the entire process is obtained. We perform profiling on the CUDA version again to show that the scoring module has been optimized sufficiently to the point where it is no longer the most time-consuming module in the CUDA version of RT-PSM.
In addition, we evaluate the feasibility of creating a metric index to reduce the number of candidate peptides. We describe evaluation methods, and show that general indexing methods are not likely feasible for RT-PSM.
Advisors/Committee Members: McQuillan, Ian, Wu, FangXiang, Kim, Theodore, Kusalik, Tony, Teng, Daniel.
Subjects/Keywords: Bioinfomatics; SIMD; Parallel; GPU; Computer Science
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Zhang, J. (2010). Parallel algorithms for real-time peptide-spectrum matching. (Thesis). University of Saskatchewan. Retrieved from http://hdl.handle.net/10388/etd-12132010-114248
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Chicago Manual of Style (16th Edition):
Zhang, Jian. “Parallel algorithms for real-time peptide-spectrum matching.” 2010. Thesis, University of Saskatchewan. Accessed April 19, 2021.
http://hdl.handle.net/10388/etd-12132010-114248.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
MLA Handbook (7th Edition):
Zhang, Jian. “Parallel algorithms for real-time peptide-spectrum matching.” 2010. Web. 19 Apr 2021.
Vancouver:
Zhang J. Parallel algorithms for real-time peptide-spectrum matching. [Internet] [Thesis]. University of Saskatchewan; 2010. [cited 2021 Apr 19].
Available from: http://hdl.handle.net/10388/etd-12132010-114248.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Council of Science Editors:
Zhang J. Parallel algorithms for real-time peptide-spectrum matching. [Thesis]. University of Saskatchewan; 2010. Available from: http://hdl.handle.net/10388/etd-12132010-114248
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
11.
Lee, Chel Hee.
Imprecise Prior for Imprecise Inference on Poisson Sampling Model.
Degree: 2014, University of Saskatchewan
URL: http://hdl.handle.net/10388/ETD-2014-04-1495
► Prevalence is a valuable epidemiological measure about the burden of disease in a community for planning health services; however, true prevalence is typically underestimated and…
(more)
▼ Prevalence is a valuable epidemiological measure about the burden of disease in a community for planning health services; however, true prevalence is typically underestimated and there exists no
reliable method of confirming the estimate of this prevalence in question. This thesis studies imprecise priors for the development of a statistical reasoning framework regarding this epidemiological decision making problem. The concept of imprecise probabilities introduced by Walley (1991) is
adopted for the construction of this inferential framework in order to model prior ignorance and quantify the degree of imprecision associated with the inferential process.
The study is restricted to the standard and zero-truncated Poisson sampling models that give an exponential family with a canonical log-link function because of the mechanism involved with the estimation of population size. A three-parameter exponential family of posteriors which includes the normal and log-gamma as limiting cases is introduced by applying normal priors on the canonical parameter of the Poisson sampling models. The canonical parameters simplify dealing with families of priors as Bayesian updating corresponds to a translation of the family in the canonical hyperparameter space. The canonical link function creates a linear relationship between regression coefficients of explanatory variables and the canonical parameters of the sampling distribution. Thus, normal priors on the regression coefficients induce normal priors on the canonical parameters leading to a higher-dimensional exponential family of posteriors whose limiting cases are again normal or log-gamma.
All of these implementations are synthesized to build the ipeglim package (Lee, 2013) that
provides a convenient method for characterizing imprecise probabilities and visualizing their translation, soft-linearity, and focusing behaviours. A characterization strategy for imprecise priors is introduced for instances when there exists a state of complete ignorance. The learning process of an individual intentional unit, the agreement process between several intentional units, and situations concerning prior-data conflict are graphically illustrated. Finally, the methodology is applied for re-analyzing the data collected from the epidemiological disease surveillance of three specific cases – Cholera epidemic (Dahiya, 1973), Down’s syndrome (Zelterman, 1988), and the female users of methamphetamine and heroin (B ̈
ohning, 2009).
Advisors/Committee Members: Bickis, Mikelis, Lim, June Hyun-Ja, Janzen, Bonnie, Kusalik, Tony.
Subjects/Keywords: imprecise probabilities; zero-truncated Poisson regression
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Lee, C. H. (2014). Imprecise Prior for Imprecise Inference on Poisson Sampling Model. (Thesis). University of Saskatchewan. Retrieved from http://hdl.handle.net/10388/ETD-2014-04-1495
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Chicago Manual of Style (16th Edition):
Lee, Chel Hee. “Imprecise Prior for Imprecise Inference on Poisson Sampling Model.” 2014. Thesis, University of Saskatchewan. Accessed April 19, 2021.
http://hdl.handle.net/10388/ETD-2014-04-1495.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
MLA Handbook (7th Edition):
Lee, Chel Hee. “Imprecise Prior for Imprecise Inference on Poisson Sampling Model.” 2014. Web. 19 Apr 2021.
Vancouver:
Lee CH. Imprecise Prior for Imprecise Inference on Poisson Sampling Model. [Internet] [Thesis]. University of Saskatchewan; 2014. [cited 2021 Apr 19].
Available from: http://hdl.handle.net/10388/ETD-2014-04-1495.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Council of Science Editors:
Lee CH. Imprecise Prior for Imprecise Inference on Poisson Sampling Model. [Thesis]. University of Saskatchewan; 2014. Available from: http://hdl.handle.net/10388/ETD-2014-04-1495
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
12.
Sun, Jian.
PARALLEL COMPUTING ALGORITHMS FOR TANDEM.
Degree: 2013, University of Saskatchewan
URL: http://hdl.handle.net/10388/ETD-2013-04-1115
► Tandem mass spectrometry, also known as MS/MS, is an analytical technique to measure the mass-to-charge ratio of charged ions and widely used in genomics, proteomics…
(more)
▼ Tandem mass spectrometry, also known as MS/MS, is an analytical technique to measure the mass-to-charge ratio of charged ions and widely used in genomics, proteomics and metabolomics areas. There are two types of automatic ways to interpret tandem mass spectra: de novo methods and database searching methods. Both of them need to use massive computational resources and complicated comparison algorithms. The real-time peptide-spectrum matching (RT-PSM) algorithm is a database searching method to interpret tandem mass spectra with strict time constraints. Restricted by the hardware and architecture of an individual workstation the RT-PSM algorithm has to sacrifice the level of accuracy in order to provide prerequisite processing speed. The peptide-spectrum similarity scoring module is the most time-consuming part out of four modules in the RT-PSM algorithm, which is also the core of the algorithm.
In this study, a multi-core computing algorithm is developed for individual workstations. Moreover, a distributed computing algorithm is designed for a cluster. The improved algorithms can achieve the speed requirement of RT-PSM without sacrificing the accuracy. With some expansion, this distributed computing algorithm can also support different PSM algorithms. Simulation results show that compared with the original RT-PSM, the parallelization version achieves 25 to 34 times speed-up based on different individual workstations. A cluster with 240 CPU cores could accelerate the similarity score module 210 times compare with the single-thread similarity score module and the whole peptide identification process 85 times compare with the single-thread peptide identification process.
Advisors/Committee Members: Wu, Fangxiang, Zhang, Chris, Kusalik, Tony, Teng, Daniel.
Subjects/Keywords: real-time peptide-spectrum matching (RT-PSM) algorithm; tandem mass spectrum; parallel computing algorithm; multi-core computing algorithm; distributed computing algorithm; peptide identification
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Sun, J. (2013). PARALLEL COMPUTING ALGORITHMS FOR TANDEM. (Thesis). University of Saskatchewan. Retrieved from http://hdl.handle.net/10388/ETD-2013-04-1115
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Chicago Manual of Style (16th Edition):
Sun, Jian. “PARALLEL COMPUTING ALGORITHMS FOR TANDEM.” 2013. Thesis, University of Saskatchewan. Accessed April 19, 2021.
http://hdl.handle.net/10388/ETD-2013-04-1115.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
MLA Handbook (7th Edition):
Sun, Jian. “PARALLEL COMPUTING ALGORITHMS FOR TANDEM.” 2013. Web. 19 Apr 2021.
Vancouver:
Sun J. PARALLEL COMPUTING ALGORITHMS FOR TANDEM. [Internet] [Thesis]. University of Saskatchewan; 2013. [cited 2021 Apr 19].
Available from: http://hdl.handle.net/10388/ETD-2013-04-1115.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Council of Science Editors:
Sun J. PARALLEL COMPUTING ALGORITHMS FOR TANDEM. [Thesis]. University of Saskatchewan; 2013. Available from: http://hdl.handle.net/10388/ETD-2013-04-1115
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
13.
Mahmud, MD Sowgat Ibne.
Formal Model and Simulation of the Gene Assembly Process in Ciliates.
Degree: 2013, University of Saskatchewan
URL: http://hdl.handle.net/10388/ETD-2013-11-1292
► The construction process of the functional macronucleus in certain types of ciliates is known as the ciliate gene assembly process. It consists of a massive…
(more)
▼ The construction process of the functional macronucleus in certain types of ciliates is known as the
ciliate gene assembly process. It consists of a massive amount of DNA excision from the micronucleus and the rearrangement of the rest of the DNA sequences (in the case of stichotrichous ciliates). While several computational models have tried to represent certain parts of the gene assembly process, the real process remains not completely understood. In this research, a new formal model called the Computational 2JLP model is introduced based on the recent biological 2JLP model.
For justifying the formal model, a simulation is created and tested with real data. Several parameters are
introduced in the model that are used to test ambiguities or edge cases of the biological model. Parameters are systematically tested from the simulation to try to find their optimal values. Interestingly, a negative correlation is found between a parameter (which is used to filter out scnRNAs that are similar to IES specific sequences from the macronucleus) and the outcome of the simulation. It indicates that if a scnRNA consists of both an MDS and IES, then from the perspective of maximizing the outcome of the simulation, it is desirable to filter out this scnRNA.
The simulator successfully performs the gene assembly process whether the inputs are scrambled or
unscrambled DNA sequences. It is desirable for this model to serve as a foundation for future computational and mathematical study, and to help inform and refine the biological model.
Advisors/Committee Members: McQuillan, Ian, Keil, Mark, Kusalik, Tony, Wu, Fangxiang.
Subjects/Keywords: Computational 2JLP model; gene assembly; ciliates
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Mahmud, M. S. I. (2013). Formal Model and Simulation of the Gene Assembly Process in Ciliates. (Thesis). University of Saskatchewan. Retrieved from http://hdl.handle.net/10388/ETD-2013-11-1292
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Chicago Manual of Style (16th Edition):
Mahmud, MD Sowgat Ibne. “Formal Model and Simulation of the Gene Assembly Process in Ciliates.” 2013. Thesis, University of Saskatchewan. Accessed April 19, 2021.
http://hdl.handle.net/10388/ETD-2013-11-1292.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
MLA Handbook (7th Edition):
Mahmud, MD Sowgat Ibne. “Formal Model and Simulation of the Gene Assembly Process in Ciliates.” 2013. Web. 19 Apr 2021.
Vancouver:
Mahmud MSI. Formal Model and Simulation of the Gene Assembly Process in Ciliates. [Internet] [Thesis]. University of Saskatchewan; 2013. [cited 2021 Apr 19].
Available from: http://hdl.handle.net/10388/ETD-2013-11-1292.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Council of Science Editors:
Mahmud MSI. Formal Model and Simulation of the Gene Assembly Process in Ciliates. [Thesis]. University of Saskatchewan; 2013. Available from: http://hdl.handle.net/10388/ETD-2013-11-1292
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
14.
Berg, Arnie.
Exploring the Behaviour of the Hidden Markov Model on CpG Island Prediction.
Degree: 2013, University of Saskatchewan
URL: http://hdl.handle.net/10388/ETD-2013-04-1030
► DNA can be represented abstrzctly as a language with only four nucleotides represented by the letters A, C, G, and T, yet the arrangement of…
(more)
▼ DNA can be represented abstrzctly as a language with only four nucleotides represented by the letters A,
C, G, and T, yet the arrangement of those four letters plays a major role in determining the development of
an organism. Understanding the signi cance of certain arrangements of nucleotides can unlock the secrets of
how the genome achieves its essential functionality. Regions of DNA particularly enriched with cytosine (C
nucleotides) and guanine (G nucleotides), especially the CpG di-nucleotide, are frequently associated with
biological function related to gene expression, and concentrations of CpGs referred to as \CpG islands" are
known to collocate with regions upstream from gene coding sequences within the promoter region. The
pattern of occurrence of these nucleotides, relative to adenine (A nucleotides) and thymine (T nucleotides),
lends itself to analysis by machine-learning techniques such as Hidden Markov Models (HMMs) to predict
the areas of greater enrichment. HMMs have been applied to CpG island prediction before, but often without
an awareness of how the outcomes are a ected by the manner in which the HMM is applied.
Two main ndings of this study are:
1. The outcome of a HMM is highly sensitive to the setting of the initial probability estimates.
2. Without the appropriate software techniques, HMMs cannot be applied e ectively to large data such
as whole eukaryotic chromosomes.
Both of these factors are rarely considered by users of HMMs, but are critical to a successful application of
HMMs to large DNA sequences. In fact, these shortcomings were discovered through a close examination
of published results of CpG island prediction using HMMs, and without being addressed, can lead to an
incorrect implementation and application of HMM theory.
A rst-order HMM is developed and its performance compared to two other historical methods, the
Takai and Jones method and the UCSC method from the
University of California Santa Cruz. The HMM
is then extended to a second-order to acknowledge that pairs of nucleotides de ne CpG islands rather than
single nucleotides alone, and the second-order HMM is evaluated in comparison to the other methods. The
UCSC method is found to be based on properties that are not related to CpG islands, and thus is not a
fair comparison to the other methods. Of the other methods, the rst-order HMM method and the Takai
and Jones method are comparable in the tests conducted, but the second-order HMM method demonstrates
superior predictive capabilities. However, these results are valid only when taking into consideration the
highly sensitive outcomes based on initial estimates, and nding a suitable set of estimates that provide the
most appropriate results.
The rst-order HMM is applied to the problem of producing synthetic data that simulates the characteristics
of a DNA sequence, including the speci ed presence of CpG islands, based on the model parameters of
a trained HMM. HMM analysis is applied to the synthetic data to explore its delity in generating data with
similar…
Advisors/Committee Members: Kusalik, Tony, Harkness, Troy, McQuillan, Ian, Wu, FangXiang.
Subjects/Keywords: CpG islands; Hidden Markov Model; synthetic data; Baum-Welch; Viterbi; methylation
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Berg, A. (2013). Exploring the Behaviour of the Hidden Markov Model on CpG Island Prediction. (Thesis). University of Saskatchewan. Retrieved from http://hdl.handle.net/10388/ETD-2013-04-1030
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Chicago Manual of Style (16th Edition):
Berg, Arnie. “Exploring the Behaviour of the Hidden Markov Model on CpG Island Prediction.” 2013. Thesis, University of Saskatchewan. Accessed April 19, 2021.
http://hdl.handle.net/10388/ETD-2013-04-1030.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
MLA Handbook (7th Edition):
Berg, Arnie. “Exploring the Behaviour of the Hidden Markov Model on CpG Island Prediction.” 2013. Web. 19 Apr 2021.
Vancouver:
Berg A. Exploring the Behaviour of the Hidden Markov Model on CpG Island Prediction. [Internet] [Thesis]. University of Saskatchewan; 2013. [cited 2021 Apr 19].
Available from: http://hdl.handle.net/10388/ETD-2013-04-1030.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Council of Science Editors:
Berg A. Exploring the Behaviour of the Hidden Markov Model on CpG Island Prediction. [Thesis]. University of Saskatchewan; 2013. Available from: http://hdl.handle.net/10388/ETD-2013-04-1030
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
15.
Lin, Wenjun.
Filtering Methods for Mass Spectrometry-based Peptide Identification Processes.
Degree: 2013, University of Saskatchewan
URL: http://hdl.handle.net/10388/ETD-2013-10-1271
► Tandem mass spectrometry (MS/MS) is a powerful tool for identifying peptide sequences. In a typical experiment, incorrect peptide identifications may result due to noise contained…
(more)
▼ Tandem mass spectrometry (MS/MS) is a powerful tool for identifying peptide sequences. In a typical experiment, incorrect peptide identifications may result due to noise contained in the MS/MS spectra and to the low quality of the spectra. Filtering methods are widely used to remove the noise and improve the quality of the spectra before the subsequent spectra identification process. However, existing filtering methods often use features and empirically assigned weights. These weights may not reflect the reality that the contribution (reflected by weight) of each feature may vary from dataset to dataset. Therefore, filtering methods that can adapt to different datasets have the potential to improve peptide identification results.
This thesis proposes two adaptive filtering methods; denoising and quality assessment, both of which improve efficiency and effectiveness of peptide identification. First, the denoising approach employs an adaptive method for picking signal peaks that is more suitable for the datasets of interest. By applying the approach to two tandem mass spectra datasets, about 66% of peaks (likely noise peaks) can be removed. The number of peptides identified later by peptide identification on those datasets increased by 14% and 23%, respectively, compared to previous work (Ding et al., 2009a). Second, the quality assessment method estimates the probabilities of spectra being high quality based on quality assessments of the individual features. The probabilities are estimated by solving a constraint optimization problem. Experimental results on two datasets illustrate that searching only the high-quality tandem spectra determined using this method saves about 56% and 62% of database searching time and loses 9% of high-quality spectra.
Finally, the thesis suggests future research directions including feature selection and clustering of peptides.
Advisors/Committee Members: Wu, Fang-Xiang, Zhang, Wenjun (Chris), Aryan, Saadat M., Kusalik, Tony, Purves, Randall.
Subjects/Keywords: Tandem mass spectrometry; peptide identification; denoise; quality assessment.
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Lin, W. (2013). Filtering Methods for Mass Spectrometry-based Peptide Identification Processes. (Thesis). University of Saskatchewan. Retrieved from http://hdl.handle.net/10388/ETD-2013-10-1271
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Chicago Manual of Style (16th Edition):
Lin, Wenjun. “Filtering Methods for Mass Spectrometry-based Peptide Identification Processes.” 2013. Thesis, University of Saskatchewan. Accessed April 19, 2021.
http://hdl.handle.net/10388/ETD-2013-10-1271.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
MLA Handbook (7th Edition):
Lin, Wenjun. “Filtering Methods for Mass Spectrometry-based Peptide Identification Processes.” 2013. Web. 19 Apr 2021.
Vancouver:
Lin W. Filtering Methods for Mass Spectrometry-based Peptide Identification Processes. [Internet] [Thesis]. University of Saskatchewan; 2013. [cited 2021 Apr 19].
Available from: http://hdl.handle.net/10388/ETD-2013-10-1271.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Council of Science Editors:
Lin W. Filtering Methods for Mass Spectrometry-based Peptide Identification Processes. [Thesis]. University of Saskatchewan; 2013. Available from: http://hdl.handle.net/10388/ETD-2013-10-1271
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation

University of Saskatchewan
16.
Liu, Jing.
Scrambling analysis of ciliates.
Degree: 2009, University of Saskatchewan
URL: http://hdl.handle.net/10388/etd-09092009-154551
► Ciliates are a class of organisms which undergo a genetic process called gene descrambling after mating. In order to better understand the problem, a literature…
(more)
▼ Ciliates are a class of organisms which undergo a genetic process called gene descrambling after mating. In order to better understand the problem, a literature review of past works has been presented in this thesis. This includes a brief summary of both the relevant biology and bioinformatics literature. Then, a formal definition of scrambling systems is developed which attempts to model the problem of sequence alignment between scrambled and descrambled genes. With this system, sequences can be classified into relevant functional segments. It also provides a framework whereby we can compare various ciliate sequence alignment algorithms. After that, a new method of predicting the various functional segments is studied. This method shows better coverage, and usually a better labelling score with certain parameters. Then we discuss several recent hypotheses as to how ciliates naturally descramble genes. An algorithm suite is developed to test these hypotheses. With the tests, we are able to computationally check which factors are potentially the most important. According to the current results with 247 pointer sequences of 13 micronuclear genes, examining repeats which are the same distance together with either the sequence or the size, as the real pointers, is almost always enough information to guide descrambling. Indeed, the real pointer sequence is the unique repeat 92.7% and 94.3% of the time within the 247 pointers, from the left and right respectively, using only the pointer distance and the pointer sequence information.
Advisors/Committee Members: McQuillan, Ian, Kusalik, Tony, Keil, Mark, Wu, Fangxiang.
Subjects/Keywords: theoretical computer science; ciliate scrambling system; scrambling analysis; sequence alignment; bioinformatics
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Liu, J. (2009). Scrambling analysis of ciliates. (Thesis). University of Saskatchewan. Retrieved from http://hdl.handle.net/10388/etd-09092009-154551
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Chicago Manual of Style (16th Edition):
Liu, Jing. “Scrambling analysis of ciliates.” 2009. Thesis, University of Saskatchewan. Accessed April 19, 2021.
http://hdl.handle.net/10388/etd-09092009-154551.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
MLA Handbook (7th Edition):
Liu, Jing. “Scrambling analysis of ciliates.” 2009. Web. 19 Apr 2021.
Vancouver:
Liu J. Scrambling analysis of ciliates. [Internet] [Thesis]. University of Saskatchewan; 2009. [cited 2021 Apr 19].
Available from: http://hdl.handle.net/10388/etd-09092009-154551.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Council of Science Editors:
Liu J. Scrambling analysis of ciliates. [Thesis]. University of Saskatchewan; 2009. Available from: http://hdl.handle.net/10388/etd-09092009-154551
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation

University of Saskatchewan
17.
Jin, Lingling.
Multiple sequence alignment augmented by expert user constraints.
Degree: 2010, University of Saskatchewan
URL: http://hdl.handle.net/10388/etd-04092010-093748
► Sequence alignment has become one of the most common tasks in bioinformatics. Most of the existing sequence alignment methods use general scoring schemes. But these…
(more)
▼ Sequence alignment has become one of the most common tasks in bioinformatics. Most of the existing sequence alignment methods use general scoring schemes. But these alignments are sometimes not completely relevant because they do not necessarily provide the desired information. It would be extremely difficult, if not impossible, to include any possible objective into an algorithm. Our goal is to allow a working biologist to augment a given alignment with additional information based on their knowledge and objectives.In this thesis, we will formally define constraints and compatible constraint sets for an alignment which require some positions of the sequences to be aligned together. Using this approach, one can align some specific segments such as domains within protein sequences by inputting constraints (the positions of the segments on the sequences), and the algorithm will automatically find an optimal alignment in which the segments are aligned together.A necessary prerequisite of calculating an alignment is that the constraints inputted be compatible with each other, and we will develop algorithms to check this condition for both pairwise and multiple sequence alignments. The algorithms are based on a depth-first search on a graph that is converted from the constraints and the alignment. We then develop algorithms to perform pairwise and multiple sequence alignments satisfying these compatible constraints.Using straightforward dynamic programming for pairwise sequence alignment satisfying a compatible constraint set, an optimal alignment corresponds to a path going through the dynamic programming matrix, and as we are only using single-position constraints, a constraint can be represented as a point on the matrix, so a compatible constraint set is a set of points. We try to determine a new path, rather than the original path, that achieves the highest score which goes through all the compatible constraint set points. The path is a concatenation of sub-paths, so that only the scores in the sub-matrices need to be calculated. This means the time required to get the new path decreases as the number of constraints increases, and it also varies as the positions of the points change. It can be further reduced by using the information from the original alignment, which can offer a significant speed gain.We then use exact and progressive algorithms to find multiple sequence alignments satisfying a compatible constraint set, which are extensions of pairwise sequence alignments. With exact algorithms for three sequences, where constraints are represented as lines, we discuss a method to force the optimal path to cross the constraint lines. And with progressive algorithms, we use a set of pairwise alignments satisfying compatible constraints to construct multiple sequence alignments progressively. Because they are more complex, we leave some extensions as future work.
Advisors/Committee Members: McQuillan, Ian, Angel, Joseph F., Osgood, Nathaniel, Kusalik, Tony.
Subjects/Keywords: multiple sequence alignment; constraint; compatible constraint set
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Jin, L. (2010). Multiple sequence alignment augmented by expert user constraints. (Thesis). University of Saskatchewan. Retrieved from http://hdl.handle.net/10388/etd-04092010-093748
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Chicago Manual of Style (16th Edition):
Jin, Lingling. “Multiple sequence alignment augmented by expert user constraints.” 2010. Thesis, University of Saskatchewan. Accessed April 19, 2021.
http://hdl.handle.net/10388/etd-04092010-093748.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
MLA Handbook (7th Edition):
Jin, Lingling. “Multiple sequence alignment augmented by expert user constraints.” 2010. Web. 19 Apr 2021.
Vancouver:
Jin L. Multiple sequence alignment augmented by expert user constraints. [Internet] [Thesis]. University of Saskatchewan; 2010. [cited 2021 Apr 19].
Available from: http://hdl.handle.net/10388/etd-04092010-093748.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Council of Science Editors:
Jin L. Multiple sequence alignment augmented by expert user constraints. [Thesis]. University of Saskatchewan; 2010. Available from: http://hdl.handle.net/10388/etd-04092010-093748
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation

University of Saskatchewan
18.
Haakensen, Monique Chantelle.
Genetic markers for beer-spoilage by lactobacilli and pediococci.
Degree: 2009, University of Saskatchewan
URL: http://hdl.handle.net/10388/etd-09032009-175224
► The brewing industry has considerable economic impact worldwide; therefore, demand exists for a better understanding of the organisms that cause beer-spoilage. Low nutrient levels, depleted…
(more)
▼ The brewing industry has considerable economic impact worldwide; therefore, demand exists for a better understanding of the organisms that cause beer-spoilage. Low nutrient levels, depleted oxygen levels, high alcohol levels, and the presence of antimicrobial hop compounds all play a role in making beer an inhospitable environment for most microorganisms. Nonetheless, there are bacteria that are resistant to all of these selective pressures. The most common beer-spoilage bacteria are the Gram-positive lactic acid bacteria Lactobacillus and Pediococcus. It is currently believed that hop-resistance is the key factor(s) permitting Lactobacillus and Pediococcus bacteria to grow in beer. However, it is likely that in addition, ethanol-tolerance and the ability to acquire nutrients also play roles in the beer-spoilage ability of Lactobacillus and Pediococcus isolates. The ability of Lactobacillus and Pediococcus to grow in beer was assessed and correlated to the presence of previously described beer-spoilage related genes, as well as with the presence of novel genes identified in this study. Molecular and culture-based techniques for detection and differentiation between Lactobacillus and Pediococcus isolates that can and cannot grow in beer were established and described in detail. Interestingly, beer-spoilage related proteins were often found to share homology with multi-drug transporters. As such, the presence of these beer-spoilage associated genes was also compared to the ability of isolates to grow in the presence of a variety of antibiotics and, unexpectedly, beer-spoiling bacteria were found to be more susceptible to antibiotics than were non beer-spoiling isolates of the same genus. Additionally, it was found that isolates of Lactobacillus and Pediococcus that can grow in beer do not group phylogenetically. In order to fully appreciate the relationship of speciation with beer-spoilage, phylogenetic and whole genome/proteome studies were conducted to clarify the taxonomy of the Lactobacillus and Pediococcus genera. Through the research in this thesis, a greater understanding of the mechanism(s) enabling bacteria to grow in beer has been gained and taxonomy of the genera Lactobacillus and Pediococcus has been clarified.
Advisors/Committee Members: Ziola, Barry, Korber, Darren, Qureshi, Mabood, Qualtiere, Lou, Phister, Trevor, Deneer, Harry, Kusalik, Tony.
Subjects/Keywords: genetics; Lactobacillus; beer-spoilage; Pediococcus; phylogeny
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Haakensen, M. C. (2009). Genetic markers for beer-spoilage by lactobacilli and pediococci. (Thesis). University of Saskatchewan. Retrieved from http://hdl.handle.net/10388/etd-09032009-175224
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Chicago Manual of Style (16th Edition):
Haakensen, Monique Chantelle. “Genetic markers for beer-spoilage by lactobacilli and pediococci.” 2009. Thesis, University of Saskatchewan. Accessed April 19, 2021.
http://hdl.handle.net/10388/etd-09032009-175224.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
MLA Handbook (7th Edition):
Haakensen, Monique Chantelle. “Genetic markers for beer-spoilage by lactobacilli and pediococci.” 2009. Web. 19 Apr 2021.
Vancouver:
Haakensen MC. Genetic markers for beer-spoilage by lactobacilli and pediococci. [Internet] [Thesis]. University of Saskatchewan; 2009. [cited 2021 Apr 19].
Available from: http://hdl.handle.net/10388/etd-09032009-175224.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Council of Science Editors:
Haakensen MC. Genetic markers for beer-spoilage by lactobacilli and pediococci. [Thesis]. University of Saskatchewan; 2009. Available from: http://hdl.handle.net/10388/etd-09032009-175224
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation

University of Saskatchewan
19.
Yin, Yaling.
Multiple hypothesis testing and multiple outlier identification methods.
Degree: 2010, University of Saskatchewan
URL: http://hdl.handle.net/10388/etd-04132010-132400
► Traditional multiple hypothesis testing procedures, such as that of Benjamini and Hochberg, fix an error rate and determine the corresponding rejection region. In 2002 Storey…
(more)
▼ Traditional multiple hypothesis testing procedures, such as that of Benjamini and Hochberg, fix an error rate and determine the corresponding rejection region. In 2002 Storey proposed a fixed rejection region procedure and showed numerically that it can gain more power than the fixed error rate procedure of Benjamini and Hochberg while controlling the same false discovery rate (FDR). In this thesis it is proved that when the number of alternatives is small compared to the total number of hypotheses, Storey’s method can be less powerful than that of Benjamini and Hochberg. Moreover, the two procedures are compared by setting them to produce the same FDR. The difference in power between Storey’s procedure and that of Benjamini and Hochberg is near zero when the distance between the null and alternative distributions is large, but Benjamini and Hochberg’s procedure becomes more powerful as the distance decreases. It is shown that modifying the Benjamini and Hochberg procedure to incorporate an estimate of the proportion of true null hypotheses as proposed by Black gives a procedure with superior power.
Multiple hypothesis testing can also be applied to regression diagnostics. In this thesis, a Bayesian method is proposed to test multiple hypotheses, of which the i-th null and alternative hypotheses are that the i-th observation is not an outlier versus it is, for i=1,...,m. In the proposed Bayesian model, it is assumed that outliers have a mean shift, where the proportion of outliers and the mean shift respectively follow a Beta prior distribution and a normal prior distribution. It is proved in the thesis that for the proposed model, when there exists more than one outlier, the marginal distributions of the deletion residual of the i-th observation under both null and alternative hypotheses are doubly noncentral t distributions. The “outlyingness” of the i-th observation is measured by the marginal posterior probability that the i-th observation is an outlier given its deletion residual. An importance sampling method is proposed to calculate this probability. This method requires the computation of the density of the doubly noncentral F distribution and this is approximated using Patnaik’s approximation. An algorithm is proposed in this thesis to examine the accuracy of Patnaik’s approximation. The comparison of this algorithm’s output with Patnaik’s approximation shows that the latter can save massive computation time without losing much accuracy.
The proposed Bayesian multiple outlier identification procedure is applied to some simulated data sets. Various simulation and prior parameters are used to study the sensitivity of the posteriors to the priors. The area under the ROC curves (AUC) is calculated for each combination of parameters. A factorial design analysis on AUC is carried out by choosing various simulation and prior parameters as factors. The resulting AUC values are high for various selected parameters, indicating that the proposed method can identify the majority of outliers within tolerable errors.…
Advisors/Committee Members: Mik, Bickis, Chris, Soteros, Murdoch, Duncan, Martin, John, Kusalik, Tony, Laverty, Bill, Srinivasan, Raj.
Subjects/Keywords: mean shift; noncentrality parameter; area under ROC curve; receiver operating characteristic; false discovery rate; microarray; doubly noncentral t distribution; pentapeptide; amino acid sequence similarity
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Yin, Y. (2010). Multiple hypothesis testing and multiple outlier identification methods. (Thesis). University of Saskatchewan. Retrieved from http://hdl.handle.net/10388/etd-04132010-132400
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Chicago Manual of Style (16th Edition):
Yin, Yaling. “Multiple hypothesis testing and multiple outlier identification methods.” 2010. Thesis, University of Saskatchewan. Accessed April 19, 2021.
http://hdl.handle.net/10388/etd-04132010-132400.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
MLA Handbook (7th Edition):
Yin, Yaling. “Multiple hypothesis testing and multiple outlier identification methods.” 2010. Web. 19 Apr 2021.
Vancouver:
Yin Y. Multiple hypothesis testing and multiple outlier identification methods. [Internet] [Thesis]. University of Saskatchewan; 2010. [cited 2021 Apr 19].
Available from: http://hdl.handle.net/10388/etd-04132010-132400.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Council of Science Editors:
Yin Y. Multiple hypothesis testing and multiple outlier identification methods. [Thesis]. University of Saskatchewan; 2010. Available from: http://hdl.handle.net/10388/etd-04132010-132400
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation

University of Saskatchewan
20.
Nacenta Sanchez, Miguel Angel.
Cross-display object movement in multi-display environments.
Degree: 2009, University of Saskatchewan
URL: http://hdl.handle.net/10388/etd-01062010-123426
► Many types of multi-display environments (MDEs) are emerging that allow users to better interact with computers. In these environments, being able to move visual objects…
(more)
▼ Many types of multi-display environments (MDEs) are emerging that allow users to better interact with computers. In these environments, being able to move visual objects (such as window icons or the cursor) from one display to another is a fundamental activity.
This dissertation focuses on understanding how human performance of cross-display actions is affected by the design of cross-display object movement interaction techniques. Three main aspects of cross-display actions are studied: how displays are referred to by the system and the users, how spatial actions are planned, and how actions are executed. Each of these three aspects is analyzed through laboratory experiments that provide empirical evidence on how different characteristics of interaction techniques affect performance.
The results further our understanding of cross-display interaction and can be used by designers of new MDEs to create more efficient multi-display interfaces.
Advisors/Committee Members: Gutwin, Carl, Mandryk, Regan, Kusalik, Tony, Jamali, Nadeem, Elias, Lorin, Bailey, Brian, Schneider, Kevin.
Subjects/Keywords: pointing; mouse control; cross-display object movement; multi-surface environments; multi-display environments; human-computer interaction; input; perspective; interaction techniques; remote pointing; dimensional overlap; stimulus-response compatibility
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Nacenta Sanchez, M. A. (2009). Cross-display object movement in multi-display environments. (Thesis). University of Saskatchewan. Retrieved from http://hdl.handle.net/10388/etd-01062010-123426
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Chicago Manual of Style (16th Edition):
Nacenta Sanchez, Miguel Angel. “Cross-display object movement in multi-display environments.” 2009. Thesis, University of Saskatchewan. Accessed April 19, 2021.
http://hdl.handle.net/10388/etd-01062010-123426.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
MLA Handbook (7th Edition):
Nacenta Sanchez, Miguel Angel. “Cross-display object movement in multi-display environments.” 2009. Web. 19 Apr 2021.
Vancouver:
Nacenta Sanchez MA. Cross-display object movement in multi-display environments. [Internet] [Thesis]. University of Saskatchewan; 2009. [cited 2021 Apr 19].
Available from: http://hdl.handle.net/10388/etd-01062010-123426.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Council of Science Editors:
Nacenta Sanchez MA. Cross-display object movement in multi-display environments. [Thesis]. University of Saskatchewan; 2009. Available from: http://hdl.handle.net/10388/etd-01062010-123426
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
.