1.
Cho, Jang Ik.
Partial EM Procedure for Big-Data Linear Mixed Effects Model, and Generalized PPE for High-Dimensional Data in Julia.

Degree: PhD, Epidemiology and Biostatistics, 2018, Case Western Reserve University School of Graduate Studies

URL: http://rave.ohiolink.edu/etdc/view?acc_num=case152845439167999

Methodologically, this dissertation contributes to two areas in Statistics: Linear mixed effects models for big data and Test of equal covariance for high-dimensional data. Scientifically,…
Subjects/Keywords: Statistics; Biostatistics; Biomedical Research; Health; Mining; Partial EM, Big Data, Mixed Effects, SCAN, ECHO, Projection Pursuit, PPE, Covariacne, High-dimensional Data, Clustering

…*Partial* *EM* Procedure for *Big*-*data* Linear *Mixed* *Effects* Model, and
Generalized *PPE* for *High*… …*Projection* *Pursuit* Ellipse (*PPE*) to test for equal variance in
*high*-*dimensional* *data* is… …to two areas in Statistics:
Linear *mixed* *effects* models for *big* *data* and Test of equal… …capacity. As a solution, we
ix
proposed a new modern approach to *Big*-*data* Linear *Mixed* *Effects*… …Bartletts test and a modern
benchmark for *high* *dimensional* p *data*.
x
Part I
*Big*-*data* Linear…

2.
Soledad Espezua Llerena.
Redução dimensional de dados de alta dimensão e poucas amostras usando Projection Pursuit.

Degree: 2013, University of São Paulo

URL: http://www.teses.usp.br/teses/disponiveis/18/18153/tde-10102013-150240/

►

Reduzir a dimensão de bancos de dados é um passo importante em processos de reconhecimento de padrões e aprendizagem de máquina. Projection Pursuit (PP) tem…
(more)

Subjects/Keywords: Classificação; Dados de microarranjo; Projection Pursuit; Redução dimensional; Classification; Dimentionality reduction; Microarray data; Projection Pursuit

University of Minnesota

3. Datta, Abhirup. Statistical Methods for Large Complex Datasets.

Degree: PhD, Biostatistics, 2016, University of Minnesota

URL: http://hdl.handle.net/11299/199089

Modern technological advancements have enabled massive-scale collection, processing and storage of information triggering the onset of the 'big data' era where in every two days…
(more)

Subjects/Keywords: Big data; High dimensional data; Large spatial data

KTH

4.
Lannsjö, Fredrik.
Forecasting the Business Cycle using Partial Least Squares.

Degree: Mathematical Statistics, 2014, KTH

URL: http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-151378

►

Partial Least Squares is both a regression method and a tool for variable selection, that is especially appropriate for models based on numerous (possibly…
(more)

Subjects/Keywords: Quantitative Forecast; Partial Least Squares; Variable Selection; High-dimensional Regression; Big Data; Business Cycle; Leading Indicators

Rice University

5.
Yang, Yuchen.
Convergence of K-indicators Clustering with Alternating Projection Algorithms.

Degree: MA, Engineering, 2017, Rice University

URL: http://hdl.handle.net/1911/105482

Data clustering is a fundamental unsupervised machine learning problem, and the most widely used method of data clustering over the decades is k-means. Recently, a…
(more)

Subjects/Keywords: Data clustering; Alternating Projection

University of Waterloo

6.
Xie, Yijun.
Applications of Projection Pursuit in Functional Data Analysis: Goodness-of- fit, Forecasting, and Change-point Detection.

Degree: 2021, University of Waterloo

URL: http://hdl.handle.net/10012/16710

Dimension reduction methods for functional data have been avidly studied in recent years. However, existing methods are primarily based on summarizing the data by their…
(more)

Subjects/Keywords: functional data analysis; dimension reduction; projection pursuit

NSYSU

7.
Tai, Chiech-an.
An Automatic Data Clustering Algorithm based on Differential Evolution.

Degree: Master, Computer Science and Engineering, 2013, NSYSU

URL: http://etd.lib.nsysu.edu.tw/ETD-db/ETD-search/view_etd?URN=etd-0730113-152814

As one of the traditional optimization problems, clustering still plays a vital role for the re-searches both theoretically and practically nowadays. Although many successful clustering…
(more)

Subjects/Keywords: automatic clustering; data clustering; high-dimensional dataset; histogram analysis; differential evolution

University of California – Riverside

8.
Zakaria, Jesin.
Developing Efficient Algorithms for Data Mining Large Scale High Dimensional Data.

Degree: Computer Science, 2013, University of California – Riverside

URL: http://www.escholarship.org/uc/item/660316zp

Data mining and knowledge discovery has attracted a great deal of attention in information technology in recent years. The rapid progress of computer hardware technology…
(more)

Subjects/Keywords: Computer science; Clustering; Data Mining; High Dimensional Data; Scalable; Time Series

University of Adelaide

9.
Conway, Annie.
Clustering of proteomics imaging mass spectrometry data.

Degree: 2016, University of Adelaide

URL: http://hdl.handle.net/2440/112036

This thesis presents a toolbox for the exploratory analysis of multivariate data, in particular proteomics imaging mass spectrometry data. Typically such data consist of 15000…
(more)

Subjects/Keywords: clustering; proteomics; multivariate data analysis; high-dimensional data analysis; machine learning

York University

10.
Li, Xuan.
Statistical Inference for High-Dimensional Genetic Data.

Degree: PhD, Mathematics & Statistics, 2019, York University

URL: http://hdl.handle.net/10315/35894

This dissertation focuses on three types of high-dimensional genetic data: protein sequences, DNA methylation data, and microRNA expression data. The four major parts are presented…
(more)

Subjects/Keywords: Statistics; Statistical genetics; High-dimensional data; Clustering categorical data; Model-based clustering; Two-sample problem

NSYSU

11.
Hsu, Jen-Hao.
A Study of Partial Periodic Utility Mining.

Degree: Master, Computer Science and Engineering, 2017, NSYSU

URL: http://etd.lib.nsysu.edu.tw/ETD-db/ETD-search/view_etd?URN=etd-0814117-230426

The existing studies related to partial periodic pattern mining only consider the frequency of patterns in periodic segment data to determine their significance, and the…
(more)

Subjects/Keywords: data mining; high utility; partial periodic pattern; projection; utility upper bound

Texas A&M University

12.
Song, Qifan.
Variable Selection for Ultra High Dimensional Data.

Degree: PhD, Statistics, 2014, Texas A&M University

URL: http://hdl.handle.net/1969.1/153224

Variable selection plays an important role for the high dimensional data analysis. In this work, we first propose a Bayesian variable selection approach for ultra-high…
(more)

Subjects/Keywords: High Dimensional Variable Selection; Big Data; Penalized Likelihood Approach; Posterior Consistency

IUPUI

13.
Cheung, Chung Ching.
A-Optimal Subsampling For Big Data General Estimating Equations.

Degree: 2019, IUPUI

URL: http://hdl.handle.net/1805/20022

►

Indiana University-Purdue University Indianapolis (IUPUI)

A significant hurdle for analyzing big data is the lack of effective technology and statistical inference methods. A popular approach…
(more)

Subjects/Keywords: Subsampling; Big Data; A-optimality; General Estimating Equations; High Dimensional Statistics

University of Minnesota

14.
Wang, Boxiang.
Modern Classification with Big Data.

Degree: PhD, Statistics, 2018, University of Minnesota

URL: http://hdl.handle.net/11299/216325

Rapid advances in information technologies have ushered in the era of "big data" and revolutionized the scientific research across many disciplines, including economics, genomics, neuroscience,…
(more)

Subjects/Keywords: Big data; Classification; High-dimensional analysis; Machine learning; Optimization

Deakin University

15.
Huynh, Viet Huu.
Towards scalable Bayesian nonparametric methods for data analytics.

Degree: School of Information Technology, 2017, Deakin University

URL: http://hdl.handle.net/10536/DRO/DU:30103238

Resorting big data to actionable information involves dealing with four dimensions of challenges in big data (called four V's): volume, variety, velocity, veracity. In this…
(more)

Subjects/Keywords: big data; data mining; multi-level clustering

National University of Ireland – Galway

16.
Fallah, Lida.
Aspects of modeling and application of survival-type data.
.

Degree: 2018, National University of Ireland – Galway

URL: http://hdl.handle.net/10379/7349

Survival analysis is collection of methods for analyzing data where the outcome of interest is the time to an event and some of the observations…
(more)

Subjects/Keywords: Survival analysis; Mixture models; EM algorithm; Longitudinal studies; Mixed models; High-dimensional data; Mathematics, Statistics, and Applied Mathematics; Biostatistics

17. Freyaldenhoven, Simon. Essays on Factor Models and Latent Variables in Economics.

Degree: Department of Economics, 2018, Brown University

URL: https://repository.library.brown.edu/studio/item/bdr:792643/

This dissertation examines the modeling of latent variables in economics in a variety of settings. The first two chapters contribute to the growing body of…
(more)

Subjects/Keywords: high dimensional data

Purdue University

18.
Cheung, Chung Ching.
A-OPTIMAL SUBSAMPLING FOR BIG DATA GENERAL ESTIMATING EQUATIONS.

Degree: Mathematics, 2019, Purdue University

URL: http://hdl.handle.net/10.25394/pgs.8986571.v1

A significant hurdle for analyzing big data is the lack of effective technology and statistical inference methods. A popular approach for analyzing data with…
(more)

Subjects/Keywords: Statistics; subsampling; general estimating equations; a-optimality; big data; High Dimensional Data

19.
Mohebi, Ehsan.
Nonsmooth optimization models and algorithms for data clustering and visualization.

Degree: PhD, 2015, Federation University Australia

URL: http://researchonline.federation.edu.au/vital/access/HandleResolver/1959.17/77001

►

Cluster analysis deals with the problem of organization of a collection of patterns into clusters based on a similarity measure. Various distance functions can be

Subjects/Keywords: Cluster analysis; Clustering problems; Cluster structure; Data set; High dimensional data visualization; Algorithms; Similarity measures

University of Melbourne

20.
Rathore, Punit.
*Big** data* cluster analysis and its applications.

Degree: 2018, University of Melbourne

URL: http://hdl.handle.net/11343/219493

► The increasing prevalence of Internet of things (IoT) technologies, smartphones, and social media services generates a huge amount of *data*, popularly known as ’*big* data’.…
(more)

Subjects/Keywords: big data clustering; cluster analysis; high-dimensional data; streaming data; smart city; internet of things; online anomaly detection; online change point detection; intelligent transportation; large-scale trajectory data; trajectory prediction; scalable algorithms; single linkage, cluster validation

21.
Ahfock, Daniel Christian.
New statistical perspectives on efficient *Big* * Data* algorithms for

Degree: PhD, 2019, University of Cambridge

URL: https://doi.org/10.17863/CAM.38965 ; https://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.774731

► This thesis is focused on the development of computationally efficient procedures for regression modelling with datasets containing a large number of observations. Standard algorithms be…
(more)

Subjects/Keywords: Bayesian model selection; Random projection; Big Data

University of Cambridge

22.
Ahfock, Daniel Christian.
New statistical perspectives on efficient *Big* * Data* algorithms for

Degree: PhD, 2019, University of Cambridge

URL: https://www.repository.cam.ac.uk/handle/1810/291805https://www.repository.cam.ac.uk/bitstream/1810/291805/3/77a82354-7e80-4b67-95f4-d9b667591ca9_confirmations.txt ; https://www.repository.cam.ac.uk/bitstream/1810/291805/4/license.txt ; https://www.repository.cam.ac.uk/bitstream/1810/291805/5/77a82354-7e80-4b67-95f4-d9b667591ca9.zip ; https://www.repository.cam.ac.uk/bitstream/1810/291805/6/dissertation_v2.pdf.txt ; https://www.repository.cam.ac.uk/bitstream/1810/291805/7/dissertation_v2.pdf.jpg

► This thesis is focused on the development of computationally efficient procedures for regression modelling with datasets containing a large number of observations. Standard algorithms be…
(more)

Subjects/Keywords: Bayesian model selection; Random projection; Big Data

Virginia Tech

23.
Sun, Jinhui.
Robust Feature Screening Procedures for *Mixed* Type of *Data*.

Degree: PhD, Statistics, 2016, Virginia Tech

URL: http://hdl.handle.net/10919/73709

► *High* *dimensional* *data* have been frequently collected in many fields of scientific research and technological development. The traditional idea of best subset selection methods, which…
(more)

Subjects/Keywords: ultra-high dimensional variable selection; feature screening; mixed type of data

Penn State University

24.
Huang, Yuan.
* Projection* Test for

Degree: 2015, Penn State University

URL: https://submit-etda.libraries.psu.edu/catalog/26249

► Testing the population mean is fundamental in statistical inference. When the dimensionality of a population is *high*, traditional Hotelling's T^{2} test becomes practically infeasible due…
(more)

Subjects/Keywords: High-dimensional data; Hotelling's T2 test; Projection test; One-sample problem; Two-sample problem

25.
ANTONINO, Victor Oliveira.
Mapas auto-organizáveis com topologioa variante no tempo para categorização *em* subespaços *em* dados de alta dimensionalidade e vistas múltiplas.

Degree: 2016, Federal University of Pernambuco

URL: https://repositorio.ufpe.br/handle/123456789/18623

►

Métodos e algoritmos *em* aprendizado de máquina não supervisionado têm sido empregados *em* diversos problemas significativos. Uma explosão na disponibilidade de dados de várias fontes…
(more)

Subjects/Keywords: Dados em Alta Dimensionalidade; Campo Receptivo Local; Aprendizagem por Relevância; Mapas Auto-Organizáveis; Agrupamento em Subespaços; High-Dimensional Data; Local Receptive Field; Relevance Learning; SelfOrganizing Maps (SOMs); Subspace Clustering

Duquesne University

26.
Baumgardner, Adam.
Accounting for Correlation in the Analysis of Randomized Controlled Trials with Multiple Layers of * Clustering*.

Degree: MS, Computational Mathematics, 2016, Duquesne University

URL: https://dsc.duq.edu/etd/296

► A common goal in medical research is to determine the effect that a treatment has on subjects over time. Unfortunately, the analysis of *data* from…
(more)

Subjects/Keywords: Longitudinal Data; Mixed Effects Models

27.
Perrot, Alexandre.
La visualisation d’information à l’ère du *Big* * Data* : résoudre les problèmes de scalabilité par l’abstraction multi-échelle : Information Visualization in the

Degree: Docteur es, Informatique, 2017, Bordeaux

URL: http://www.theses.fr/2017BORD0775

►

L’augmentation de la quantité de données à visualiser due au phénomène du *Big* * Data* entraîne de nouveaux défis pour le domaine de la visualisation d’information.…
(more)

Subjects/Keywords: Mégadonnées; Partitionnement; Visualisation; Big Data; Clustering; Visualization

University of Tennessee – Knoxville

28.
Lu, Yuping.
Advances in *Big* * Data* Analytics: Algorithmic Stability and

Degree: 2019, University of Tennessee – Knoxville

URL: https://trace.tennessee.edu/utk_graddiss/5514

► Analysis of what has come to be called “*big* data” presents a number of challenges as *data* continues to grow in size, complexity and heterogeneity.…
(more)

Subjects/Keywords: big data; robustness; clustering; paraclique; outlier detection

University of Minnesota

29.
Traganitis, Panagiotis.
Scalable and Ensemble Learning for *Big* *Data*.

Degree: PhD, Electrical/Computer Engineering, 2019, University of Minnesota

URL: http://hdl.handle.net/11299/206358

► The turn of the decade has trademarked society and computing research with a ``*data* deluge.'' As the number of smart, highly accurate and Internet-capable devices…
(more)

Subjects/Keywords: Big Data; clustering; Ensemble; learning; subspace; unsupervised

University of Minnesota

30.
Traganitis, Panagiotis.
Large-scale *Clustering* using Random Sketching and Validation.

Degree: M.S.E.E., Electrical Engineering, 2015, University of Minnesota

URL: http://hdl.handle.net/11299/175489

► The advent of *high*-speed Internet, modern devices and global connectivity has introduced the world to massive amounts of *data*, that are being generated, communicated and…
(more)

Subjects/Keywords: big data; clustering; random; sketching; SkeVa; validation

