Indexing and knowledge discovery of gaussian mixture models and multiple-instance learning.
Degree: PhD, Mathematik, Informatik und Statistik, 2018, Ludwig-Maximilians-Universität
Due to the increasing quantity and variety of generated and stored data, the manual and automatic analysis becomes a more and more challenging task in many modern applications, like biometric identification and content-based image retrieval. In this thesis, we consider two very typical, related inherent structures of objects: Multiple-Instance (MI) objects and Gaussian Mixture Models (GMM). In both approaches, each object is represented by a set. For MI, each object is a set of vectors from a multi-dimensional space. For GMM, each object is a set of multi-variate Gaussian distribution functions, providing the ability to approximate arbitrary distributions in a concise way. Both approaches are very powerful and natural as they allow to express (1) that an object is additively composed from several components or (2) that an object may have several different, alternative kinds of behavior. Thus we can model e.g. an image which may depict a set of different things (1). Likewise, we can model a sports player who has performed differently at different games (2). We can use GMM to approximate MI objects and vice versa. Both ways of approximation can be appealing because GMM are more concise whereas for MI objects the single components are less complex.
A similarity measure quantifies similarities between two objects to assess how much alike these objects are. On this basis, indexing and similarity search play essential roles in data mining, providing efficient and/or indispensable supports for a variety of algorithms such as classification and clustering. This thesis aims to solve challenges in the indexing and knowledge discovery of complex data using MI objects and GMM.
For the indexing of GMM, there are several techniques available, including universal index structures and GMM-specific methods. However, the well-known approaches either suffer from poor performance or have too many limitations. To make use of the parameterized properties of GMM and tackle the problem of potential unequal length of components, we propose the Gaussian Components based Index (GCI) for efficient queries on GMM. GCI decomposes GMM into their components, and stores the n-lets of Gaussian combinations that have uniform length of parameter vectors in traditional index structures. We introduce an efficient pruning strategy to filter unqualified GMM using the so-called Matching Probability (MP) as the similarity measure. MP sums up the joint probabilities of two objects all over the space. GCI achieves better performance than its competitors on both synthetic and real-world data. To further increase its efficiency, we propose a strategy to store GMM components in a normalized way. This strategy improves the ability of filtering unqualified GMM. Based on the normalized transformation, we derive a set of novel similarity measures for GMM.
Since MP is not a metric (i.e., a symmetric, positive definite distance function guaranteeing the triangle inequality), which would be essential for the application of various analysis techniques, we…
Advisors/Committee Members: Böhm, Christian (advisor).
to Zotero / EndNote / Reference
APA (6th Edition):
Zhou, L. (2018). Indexing and knowledge discovery of gaussian mixture models and multiple-instance learning. (Doctoral Dissertation). Ludwig-Maximilians-Universität. Retrieved from https://edoc.ub.uni-muenchen.de/21737/
Chicago Manual of Style (16th Edition):
Zhou, Linfei. “Indexing and knowledge discovery of gaussian mixture models and multiple-instance learning.” 2018. Doctoral Dissertation, Ludwig-Maximilians-Universität. Accessed December 16, 2018.
MLA Handbook (7th Edition):
Zhou, Linfei. “Indexing and knowledge discovery of gaussian mixture models and multiple-instance learning.” 2018. Web. 16 Dec 2018.
Zhou L. Indexing and knowledge discovery of gaussian mixture models and multiple-instance learning. [Internet] [Doctoral dissertation]. Ludwig-Maximilians-Universität; 2018. [cited 2018 Dec 16].
Available from: https://edoc.ub.uni-muenchen.de/21737/.
Council of Science Editors:
Zhou L. Indexing and knowledge discovery of gaussian mixture models and multiple-instance learning. [Doctoral Dissertation]. Ludwig-Maximilians-Universität; 2018. Available from: https://edoc.ub.uni-muenchen.de/21737/