Advanced search options

Advanced Search Options 🞨

Browse by author name (“Author name starts with…”).

Find ETDs with:

in
/  
in
/  
in
/  
in

Written in Published in Earliest date Latest date

Sorted by

Results per page:

Language: English

You searched for subject:(Data Mining Procedure). One record found.

Search Limiters

Last 2 Years | English Only

No search limiters apply to these results.

▼ Search Limiters

1. Liu, Peng. Adaptive Mixture Estimation and Subsampling PCA.

Degree: PhD, Sciences, 2009, Case Western Reserve University School of Graduate Studies

Data mining is important in scientific research, knowledge discovery and decision making. A typical challenge in data mining is that a data set may be too large to be loaded all together, at one time, into computer memory for analyses. Even if it can be loaded all at once for an analysis, too many nuisance features may mask important information in the data. In this dissertation, two new methodologies for analyzing large data are studied. The first methodology is concerned with adaptive estimation of mixture parameters in heterogeneous populations of large-n data. Our adaptive estimation procedures, the partial EM (PEM) and its Bayesian variants (BMAP and BPEM) work well for large or streaming data. They can also handle the situation in which later stage data may contain extra components (a.k.a. "contaminations" or "intrusions") and hence have applications in network traffic analysis and intrusion detection. Furthermore, the partial EM estimate is consistent and efficient. It compares well with a full EM estimate when a full EM procedure is feasible. The second methodology is about subsampling large-p data for selecting important features under the principal component analysis (PCA) framework. Our new method is called subsampling PCA (SPCA). Diagnostic tools for choosing parameter values, such as subsample size and iteration number, in our SPCA procedure are developed. It is shown through analysis and simulation that the SPCA can overcome the masking effect of nuisance features and pick up the important variables and major components. Its application to gene expression data analysis is also demonstrated. Advisors/Committee Members: Sun, Jiayang (Advisor).

Subjects/Keywords: Statistics; large data; data mining; mixture models; Gaussian mixtures; parameter estimation; adaptive procedure; partial EM; high-dimensional data; large p small n; dimension reduction; feature selection; subsampling

Record DetailsSimilar RecordsGoogle PlusoneFacebookTwitterCiteULikeMendeleyreddit

APA · Chicago · MLA · Vancouver · CSE | Export to Zotero / EndNote / Reference Manager

APA (6th Edition):

Liu, P. (2009). Adaptive Mixture Estimation and Subsampling PCA. (Doctoral Dissertation). Case Western Reserve University School of Graduate Studies. Retrieved from http://rave.ohiolink.edu/etdc/view?acc_num=case1220644686

Chicago Manual of Style (16th Edition):

Liu, Peng. “Adaptive Mixture Estimation and Subsampling PCA.” 2009. Doctoral Dissertation, Case Western Reserve University School of Graduate Studies. Accessed May 05, 2021. http://rave.ohiolink.edu/etdc/view?acc_num=case1220644686.

MLA Handbook (7th Edition):

Liu, Peng. “Adaptive Mixture Estimation and Subsampling PCA.” 2009. Web. 05 May 2021.

Vancouver:

Liu P. Adaptive Mixture Estimation and Subsampling PCA. [Internet] [Doctoral dissertation]. Case Western Reserve University School of Graduate Studies; 2009. [cited 2021 May 05]. Available from: http://rave.ohiolink.edu/etdc/view?acc_num=case1220644686.

Council of Science Editors:

Liu P. Adaptive Mixture Estimation and Subsampling PCA. [Doctoral Dissertation]. Case Western Reserve University School of Graduate Studies; 2009. Available from: http://rave.ohiolink.edu/etdc/view?acc_num=case1220644686

.