Full Record

New Search | Similar Records

Author
Title Sparsity control for robustness and social data analysis.
URL
Publication Date
Date Accessioned
Degree PhD
Discipline/Department Electrical Engineering
Degree Level doctoral
University/Publisher University of Minnesota
Abstract The information explosion propelled by the advent of personal computers, the Internet, and the global-scale communications has rendered statistical learning from data increasingly important for analysis and processing. The ability to mine valuable information from unprecedented volumes of data will facilitate preventing or limiting the spread of epidemics and diseases, identifying trends in global financial markets, protecting critical infrastructure including the smart grid, and understanding the social and behavioral dynamics of emergent social-computational systems. Along with data that adhere to postulated models, present in large volumes of data are also those that do not – the so-termed outliers. This thesis contributes in several issues that pertain to resilience against outliers, a fundamental aspect of statistical inference tasks such as estimation, model selection, prediction, classification, tracking, and dimensionality reduction, to name a few. The recent upsurge of research toward compressive sampling and parsimonious signal representations hinges on signals being sparse, either naturally, or, after projecting them on a proper basis. The present thesis introduces a neat link between sparsity and robustness against outliers, even when the signals involved are not sparse. It is argued that controlling sparsity of model residuals leads to statistical learning algorithms that are computationally affordable and universally robust to outlier models. Even though focus is placed first on robustifying linear regression, the universality of the developed framework is highlighted through diverse generalizations that pertain to: i) the information used for selecting the sparsity-controlling parameters; ii) the nominal data model; and iii) the criterion adopted to fit the chosen model. Explored application domains include preference measurement for consumer utility function estimation in marketing, and load curve cleansing – a critical task in power systems engineering and management. Finally, robust principal component analysis (PCA) algorithms are developed to extract the most informative low-dimensional structure, from (grossly corrupted) high-dimensional data. Beyond its ties to robust statistics, the developed outlier-aware PCA framework is versatile to accommodate novel and scalable algorithms to: i) track the low-rank signal subspace as new data are acquired in real time; and ii) determine principal components robustly in (possibly) infinite-dimensional feature spaces. Synthetic and real data tests corroborate the effectiveness of the proposed robust PCA schemes, when used to identify aberrant responses in personality assessment surveys, as well as unveil communities in social networks, and intruders from video surveillance data.
Subjects/Keywords Big data; Compressed sensing; Outlier rejection; Social data analysis; Sparsity; Statistical learning; Electrical Engineering
Language en
Country of Publication us
Record ID oai:conservancy.umn.edu:11299/129576
Repository umn
Date Retrieved
Date Indexed 2018-11-20
Note [] University of Minnesota Ph.D. dissertation. May 2012. Major: Electrical Engineering. Advisor: Professor Georgios B. Giannakis. 1 computer file (PDF); ix, 126 pages, appendices p. 110 115.;

Sample Search Hits | Sample Images

…advent of personal computers, the Internet, and the global-scale communications has rendered statistical learning from data increasingly important for analysis and processing. At any given time instant and all around the globe, large volumes of data are…

…engagement. Thus, a holistic approach to preference measurement, analysis, and 1.1 Motivation and Context 7 management (PM for short) holds the keys to understanding and engineering SoCS. PM has a long history in marketing, retailing, product…

…design, healthcare, and also psychology and behavioral sciences, where conjoint analysis (CA - the PM ‘workhorse’) is commonly used [55,60,90]. In a nutshell, the goal of PM is to learn the utility function of an individual or group of…

…This has led a recent panel of experts on online data collection [71, p. 108] to suggest that ‘researchers should use exploratory data analysis and systematic data mining to identify and eliminate records with anomalous data patterns or to…

…adopted to fit the chosen model. Accordingly, we can divide the contributions of this thesis in three interrelated thrusts: [T1] Robust learning for conjoint analysis. Driven by the explosion of web-collected metric and choice-based preference…

…the context of PM, the nonparametric models investigated in this thrust can capture interdependencies among product attributes, an attractive feature lacking with linear utilities. [T3] Robust principal component analysis. The goal here is to…

…robustify principal component analysis (PCA), thus enabling the possibility of extracting informative low-dimensional structure from (grossly corrupted) high-dimensional data. Real-time algorithms are developed to process data as it is…

…x5B;T3] follows next along with a succinct literature review per thrust. Moreover, contributions of this thesis in each case are pointed out. 1.2.1 Robust learning for conjoint analysis To address the challenges outlined in Section 1.1.2…

.