Advanced search options

You searched for `id:"handle:2152/68254"`

. One record found.

▼ Search Limiters

University of Texas – Austin

1. Yan, Bowei. Theoretical analysis for convex and non-convex clustering algorithms.

Degree: Statistics, 2018, University of Texas – Austin

URL: http://hdl.handle.net/2152/68254

Clustering is one of the most important unsupervised learning problem in the machine learning and statistics community. Given a set of observations, the goal is to find the latent cluster assignment of the data points. The observations can be either some covariates corresponding to each data point, or the relational networks representing the affinity between pair of nodes. We study the problem of community detection in stochastic block models and clustering mixture models. The two kinds of problems bear a lot of resemblance, and similar techniques can be applied to solve them.
It is common practice to assume some underlying model for the data generating process in order to analyze it properly. With some pre-defined partitions of all data points, generative models can be defined to represent those two types of data observations. For the covariates, the mixture model is one of the most flexible and widely-used models, where each cluster i comes from some distribution D [subscript i], and the entire distribution is a convex sum over all distributions [mathematical equation]. We assume that the data is Gaussian or sub-gaussian, and analyze two algorithms: 1) Expectation-Maximization algorithm, which is notoriously non-convex and sensitive to local optima, and 2) Convex relaxation of the k-means algorithm. We show both methods are consistent under certain conditions when the signal to noise ratio is relatively high. And we obtain the upper bounds for error rate if the signal to noise ration is low. When there are outliers in the data set, we show that the semi-definite relaxation exhibits more robust result compared to spectral methods.
For the networks, we consider the Stochastic Block Model (SBM), in which the probability of edge presence is fully determined by the cluster assignments of the pair of nodes. We use a semi-definite programming (SDP) relaxation to learn the clustering matrix, and discuss the role of model parameters. In most SDP relaxations of SBM, the number of communities is required for the algorithm, which is a strong requirement for many real-world applications. In this thesis, we propose to introduce a regularization to the nuclear norm, which is shown to be able to exactly recover both the number of communities and cluster memberships even when the number of communities is unknown.
In many real-world networks, it is more common to see both network structure and node covariates simultaneously. In this case, we present a regularization based method to effectively combine the two sources of information. The proposed method works especially well when the covariates and network contain complementary information.
*Advisors/Committee Members: Sarkar, Purnamrita (advisor), Caramanis, Constantine (committee member), Mueller, Peter (committee member), Walker, Stephen (committee member).*

Subjects/Keywords: Clustering; Sub-gaussian; Mixture model; Community detection; EM algorithm; Semi-definite programming; Stochastic block model

Record Details Similar Records

❌

APA · Chicago · MLA · Vancouver · CSE | Export to Zotero / EndNote / Reference Manager

APA (6^{th} Edition):

Yan, B. (2018). Theoretical analysis for convex and non-convex clustering algorithms. (Thesis). University of Texas – Austin. Retrieved from http://hdl.handle.net/2152/68254

Note: this citation may be lacking information needed for this citation format:

Not specified: Masters Thesis or Doctoral Dissertation

Chicago Manual of Style (16^{th} Edition):

Yan, Bowei. “Theoretical analysis for convex and non-convex clustering algorithms.” 2018. Thesis, University of Texas – Austin. Accessed November 16, 2018. http://hdl.handle.net/2152/68254.

Note: this citation may be lacking information needed for this citation format:

Not specified: Masters Thesis or Doctoral Dissertation

MLA Handbook (7^{th} Edition):

Yan, Bowei. “Theoretical analysis for convex and non-convex clustering algorithms.” 2018. Web. 16 Nov 2018.

Vancouver:

Yan B. Theoretical analysis for convex and non-convex clustering algorithms. [Internet] [Thesis]. University of Texas – Austin; 2018. [cited 2018 Nov 16]. Available from: http://hdl.handle.net/2152/68254.

Note: this citation may be lacking information needed for this citation format:

Not specified: Masters Thesis or Doctoral Dissertation

Council of Science Editors:

Yan B. Theoretical analysis for convex and non-convex clustering algorithms. [Thesis]. University of Texas – Austin; 2018. Available from: http://hdl.handle.net/2152/68254

Not specified: Masters Thesis or Doctoral Dissertation