You searched for +publisher:"Cornell University" +contributor:("Wells, Martin Timothy")
.
Showing records 1 – 30 of
44 total matches.
◁ [1] [2] ▶

Cornell University
1.
Zhu, Liao.
MULTI-FACTOR MODELS USING HIGH DIMENSIONAL APPROACHES.
Degree: M.S., Statistics, Statistics, 2019, Cornell University
URL: http://hdl.handle.net/1813/70070
► Capital Asset Pricing Model (CAPM) model has been widely studied and hundreds of papers attempt to add a few new factors into the model. In…
(more)
▼ Capital Asset Pricing Model (CAPM) model has been widely studied and hundreds of papers attempt to add a few new factors into the model. In this paper, instead of adding only a few factors, we introduced a new system of high-dimensional approaches to study thousands of factors together. As a result, the fitting power was dramatically raised which also comes with a strong economic explanation.
Advisors/Committee Members: Wells, Martin Timothy (chair), Jarrow, Robert A. (committee member).
Subjects/Keywords: Asset pricing model; high-dimensional statistics; LASSO; machine learning; minimax prototype clustering; multi-factor model
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Zhu, L. (2019). MULTI-FACTOR MODELS USING HIGH DIMENSIONAL APPROACHES. (Masters Thesis). Cornell University. Retrieved from http://hdl.handle.net/1813/70070
Chicago Manual of Style (16th Edition):
Zhu, Liao. “MULTI-FACTOR MODELS USING HIGH DIMENSIONAL APPROACHES.” 2019. Masters Thesis, Cornell University. Accessed January 26, 2021.
http://hdl.handle.net/1813/70070.
MLA Handbook (7th Edition):
Zhu, Liao. “MULTI-FACTOR MODELS USING HIGH DIMENSIONAL APPROACHES.” 2019. Web. 26 Jan 2021.
Vancouver:
Zhu L. MULTI-FACTOR MODELS USING HIGH DIMENSIONAL APPROACHES. [Internet] [Masters thesis]. Cornell University; 2019. [cited 2021 Jan 26].
Available from: http://hdl.handle.net/1813/70070.
Council of Science Editors:
Zhu L. MULTI-FACTOR MODELS USING HIGH DIMENSIONAL APPROACHES. [Masters Thesis]. Cornell University; 2019. Available from: http://hdl.handle.net/1813/70070

Cornell University
2.
Proulx, Jade.
Pulsed Light Based Treatments As A Non-Thermal Strategy For Microbial Control On Cheese Surface.
Degree: M.S., Food Science and Technology, Food Science and Technology, 2015, Cornell University
URL: http://hdl.handle.net/1813/39361
► Cheese products made from pasteurized milk often undergo post-pasteurization contamination when subjected to cutting, slicing, or packaging, either at the processing plant or in retail…
(more)
▼ Cheese products made from pasteurized milk often undergo post-pasteurization contamination when subjected to cutting, slicing, or packaging, either at the processing plant or in retail environments. Post-processing cross-contamination has significant health and economic consequences when leading to either hospitalizations or losses due to microbial spoilage. Pulsed Light (PL) treatment, consisting of short, high-energy light pulses, known to effectively inactivate microorganisms on surfaces, was evaluated as a solution to address surface decontamination of cheeses, either as a stand-alone treatment, or in combination with the antimicrobials nisin and natamycin. The effect of PL on color change, oxidative stability and onset of mold growth was also investigated. Slices of white cheddar and processed cheese were spot inoculated with Pseudomonas fluorescens 1150, Escherichia coli ATCC 25922 and Listeria innocua FSL C2-008, at a concentration of either 5 or 7 log CFU/slice. The inoculated samples were exposed to PL doses of 1.2 to 13.4 J/cm2, directly or through UV-transparent packaging. For combination treatments, cheese slices were dipped into a 2.5% Nisaplin (nisin) or a 50 ppm Natamax (natamycin) solution prior to inoculation. The antimicrobial treatments were tested both before and after PL application. The survivors were recovered and enumerated by standard plate counting (SPC). When survivor counts fell below the SPC detection limit, the most probable number (MPN) technique was used. Experiments were performed in triplicate and data was analyzed using a general linear model. Color change, oxidative stability, and onset of molding were monitored periodically on non-inoculated cheddar cheese samples stored at 6 [MASCULINE ORDINAL INDICATOR]C for one month. Color measurements were taken before and after PL treatment, and expressed as CIELAB values. Development of lipid peroxides was monitored colorimetricaly as a measure of oxidative stability, and the onset of molding was assessed visually on a daily basis. PL treatment alone was most effective against E. coli, achieving a maximum reduction of 5.4 ± 0.1 log CFU, at a dose of 13.2 J/cm2. For P. fluorescens, a maximum reduction of 3.7 ± 0.8 log CFU was obtained while a 3.4 ± 0.2 log CFU maximum reduction was achieved for L. innocua. The packaging, inoculum level and cheese type had no effect on L. innocua and P. fluorescens inactivation levels, while E. coli's response was more variable, depending on treatment conditions. PL combination treatments with antimicrobials showed that the presence of natamycin in cheese may interfere with PL, while a synergistic effect between PL and nisin was observed against Listeria, and only when nisin was applied after the PL treatment. PL was also found to be effective in extending the shelf life of cheese. Treatment of cheddar cheese slices with 9 PL pulses, or 10.1 J/cm2, delayed the onset of molding by a week, and slowed down the rate of molding after that. In terms of the effect of PL on cheese quality, no significant color…
Advisors/Committee Members: Moraru, Carmen I (chair), Wells, Martin Timothy (committee member).
Subjects/Keywords: Pulsed Light; Cheese; Food safety
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Proulx, J. (2015). Pulsed Light Based Treatments As A Non-Thermal Strategy For Microbial Control On Cheese Surface. (Masters Thesis). Cornell University. Retrieved from http://hdl.handle.net/1813/39361
Chicago Manual of Style (16th Edition):
Proulx, Jade. “Pulsed Light Based Treatments As A Non-Thermal Strategy For Microbial Control On Cheese Surface.” 2015. Masters Thesis, Cornell University. Accessed January 26, 2021.
http://hdl.handle.net/1813/39361.
MLA Handbook (7th Edition):
Proulx, Jade. “Pulsed Light Based Treatments As A Non-Thermal Strategy For Microbial Control On Cheese Surface.” 2015. Web. 26 Jan 2021.
Vancouver:
Proulx J. Pulsed Light Based Treatments As A Non-Thermal Strategy For Microbial Control On Cheese Surface. [Internet] [Masters thesis]. Cornell University; 2015. [cited 2021 Jan 26].
Available from: http://hdl.handle.net/1813/39361.
Council of Science Editors:
Proulx J. Pulsed Light Based Treatments As A Non-Thermal Strategy For Microbial Control On Cheese Surface. [Masters Thesis]. Cornell University; 2015. Available from: http://hdl.handle.net/1813/39361

Cornell University
3.
Eilertson, Kirsten.
Estimation And Inference Of Random Effect Models With Applications To Population Genetics And Proteomics.
Degree: PhD, Statistics, 2011, Cornell University
URL: http://hdl.handle.net/1813/30679
► In this dissertation I present two methodologies for estimation and inference of random effect models with applications to population genetics and proteomics. The first methodology…
(more)
▼ In this dissertation I present two methodologies for estimation and inference of random effect models with applications to population genetics and proteomics. The first methodology presented, SnIPRE, is designed for identifying genes under natural selection. SnIPRE is a "McDonald-Kreitman" type of analysis, in that it is based on MK table data and has an advantage over other types of statistics because it is robust to demography. Similar to the MKprf method, SnIPRE makes use of genome-wide information to increase power, but is nonparametric in the sense that it makes no assumptions (and does not require estimation) of parameters such as mutation rate and species divergence time in order to identify genes under selection. In simulations SnIPRE outperforms both the MK statistic and the two versions of MKprf considered. With the right assumptions SnIPRE may be used to estimate population parameters, and in chapter 3 we discuss the robustness of the method to the assumption of independent sites. I also propose a procedure for more precise estimation of the confidence bounds of the selection effect, and then apply our method to Drosophila and human-chimp comparison data. PROWLRE, an empirical Bayes method for analyzing shotgun-proteomics data, is introduced in the final chapter. While a fully Bayesian implementation of this model is straightforward, the empirical Bayes implementation is more challenging. I present an EM algorithm designed for fitting this latent variable model and then compare the results to the Bayesian estimation on simulated and synthetic data.
Advisors/Committee Members: Bustamante, Carlos D. (chair), Booth, James (committee member), Wells, Martin Timothy (committee member).
Subjects/Keywords: Mixed Effect Models; Natural Selection; Proteomics
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Eilertson, K. (2011). Estimation And Inference Of Random Effect Models With Applications To Population Genetics And Proteomics. (Doctoral Dissertation). Cornell University. Retrieved from http://hdl.handle.net/1813/30679
Chicago Manual of Style (16th Edition):
Eilertson, Kirsten. “Estimation And Inference Of Random Effect Models With Applications To Population Genetics And Proteomics.” 2011. Doctoral Dissertation, Cornell University. Accessed January 26, 2021.
http://hdl.handle.net/1813/30679.
MLA Handbook (7th Edition):
Eilertson, Kirsten. “Estimation And Inference Of Random Effect Models With Applications To Population Genetics And Proteomics.” 2011. Web. 26 Jan 2021.
Vancouver:
Eilertson K. Estimation And Inference Of Random Effect Models With Applications To Population Genetics And Proteomics. [Internet] [Doctoral dissertation]. Cornell University; 2011. [cited 2021 Jan 26].
Available from: http://hdl.handle.net/1813/30679.
Council of Science Editors:
Eilertson K. Estimation And Inference Of Random Effect Models With Applications To Population Genetics And Proteomics. [Doctoral Dissertation]. Cornell University; 2011. Available from: http://hdl.handle.net/1813/30679

Cornell University
4.
Bar, Haim.
Parallel Testing, And Variable Selection - A Mixture-Model Approach With Applications In Biostatistics.
Degree: PhD, Statistics, 2012, Cornell University
URL: http://hdl.handle.net/1813/29325
► We develop efficient and powerful statistical methods for high-dimensional data, where the sample size is much smaller than the number of features (the so-called 'large…
(more)
▼ We develop efficient and powerful statistical methods for high-dimensional data, where the sample size is much smaller than the number of features (the so-called 'large p, small n' problem). We deal with three important problems. First, we develop a mixture-model approach for parallel testing for unequal variances in two-sample experiments. The treatment effect on the variance has received little attention in the statistical literature, which so far focused mostly on the effect on the mean. The effect on the variance is increasingly recognized in recent biological literature, and we develop an empirical Bayes approach for testing differences in variance when the number of tests is large. We show that the model is useful in a wide range of applications, that our method is much more powerful than traditional tests for unequal variances, and that it is robust to the normality assumption. Second, we extend these ideas and develop a novel bivariate normal model that tests for both differential expression and differential variation between the two groups. We show in simulations that this new method yields a substantial gain in power when differential variation is present. Through a three-step estimation approach, in which we apply the Laplace approximation and the EM algorithm, we get a computationally efficient method, which is particularly well-suited for 'large p, small n' situations. Third, we deal with the problem of variable selection where the number of putative variables is large, possibly much larger than the sample size. We develop a model-based, empirical Bayes approach. By treating the putative variables as random effects, we get shrinkage estimation, which results in increased power and significantly faster convergence, compared with simulation-based methods. Furthermore, we employ computational tricks which allow us to increase the speed of our algorithm, to handle a very large number of putative variables, and to control the multicollinearity in the model. The motivation for developing this approach is QTL analysis, but our method is applicable to a broad range of applications. We use two widely-studied data sets, and show that our model selection algorithm yields excellent results.
Advisors/Committee Members: Booth, James (chair), Wells, Martin Timothy (committee member), Strawderman, Robert Lee (committee member).
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Bar, H. (2012). Parallel Testing, And Variable Selection - A Mixture-Model Approach With Applications In Biostatistics. (Doctoral Dissertation). Cornell University. Retrieved from http://hdl.handle.net/1813/29325
Chicago Manual of Style (16th Edition):
Bar, Haim. “Parallel Testing, And Variable Selection - A Mixture-Model Approach With Applications In Biostatistics.” 2012. Doctoral Dissertation, Cornell University. Accessed January 26, 2021.
http://hdl.handle.net/1813/29325.
MLA Handbook (7th Edition):
Bar, Haim. “Parallel Testing, And Variable Selection - A Mixture-Model Approach With Applications In Biostatistics.” 2012. Web. 26 Jan 2021.
Vancouver:
Bar H. Parallel Testing, And Variable Selection - A Mixture-Model Approach With Applications In Biostatistics. [Internet] [Doctoral dissertation]. Cornell University; 2012. [cited 2021 Jan 26].
Available from: http://hdl.handle.net/1813/29325.
Council of Science Editors:
Bar H. Parallel Testing, And Variable Selection - A Mixture-Model Approach With Applications In Biostatistics. [Doctoral Dissertation]. Cornell University; 2012. Available from: http://hdl.handle.net/1813/29325

Cornell University
5.
Wan, Muting.
Model-Based Classification With Applications To High-Dimensional Data In Bioinformatics.
Degree: PhD, Statistics, 2015, Cornell University
URL: http://hdl.handle.net/1813/39389
► In recent years, sparse classification problems have emerged in many fields of study. Finite mixture models have been developed to facilitate Bayesian inference where parameter…
(more)
▼ In recent years, sparse classification problems have emerged in many fields of study. Finite mixture models have been developed to facilitate Bayesian inference where parameter sparsity is substantial. Shrinkage estimation allows strength borrowing across features in light of the parallel nature of multiple hypothesis tests. Important examples that incorporate shrinkage estimation and finite mixture model for sparse classification include the hierarchical model in Smyth (2004) and the explicit mixture model in Bar et al. (2010) for Bayesian microarray analysis. Classification with finite mixture models is based on the posterior expectation of latent indicator variables. These quantities are typically estimated using the expectation-maximization (EM) algorithm in an empirical Bayes approach or Markov chain Monte Carlo (MCMC) in a fully Bayesian approach. MCMC is limited in applicability where high-dimensional data are involved because its sampling-based nature leads to slow computations and hard-to-monitor convergence. In a fully Bayesian framework, we investigate the feasibility and performance of variational Bayes (VB) approximation and apply the VB approach to fully Bayesian versions of several finite mixture models that have been proposed in bioinformatics. We find that it achieves desirable speed and accuracy in sparse classification with hierarchical mixture models for high-dimensional data. Another example of sparse classification in bioinformatics solvable via model-based approaches is expression quantitative trait loci (eQTL) detection, in which determining whether association between a gene and any given single nucleotide polymorphism (SNP) is significant is regarded as classifying genes as null or non-null with respect to the given SNP. High-dimensionality of the data not only causes difficulties in computations, but also renders the confounding impact of unwanted variation in the data irrefutable. Model-based approaches that account for unwanted variation by incorporating a factor analysis term representing hidden factors and their effects have been adopted in applications such as differential analysis and eQTL detection. HEFT (Gao et al., 2014) is a fast approach for model-based eQTL identification while simultaneously learning hidden effects. We develop a hierarchical mixture model-based empirical Bayes approach for sparse classification while simultaneously accounting for unwanted variation, as well as a family of model-based approaches that are its simplifications with the aim of attractive computational efficiency. We investigate feasibility and performance of these model-based approaches in comparison with HEFT using several real data examples in bioinformatics.
Advisors/Committee Members: Booth, James (chair), Hooker, Giles J. (committee member), Wells, Martin Timothy (committee member).
Subjects/Keywords: Bayesian inference; Linear mixed models; Bioinformatics
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Wan, M. (2015). Model-Based Classification With Applications To High-Dimensional Data In Bioinformatics. (Doctoral Dissertation). Cornell University. Retrieved from http://hdl.handle.net/1813/39389
Chicago Manual of Style (16th Edition):
Wan, Muting. “Model-Based Classification With Applications To High-Dimensional Data In Bioinformatics.” 2015. Doctoral Dissertation, Cornell University. Accessed January 26, 2021.
http://hdl.handle.net/1813/39389.
MLA Handbook (7th Edition):
Wan, Muting. “Model-Based Classification With Applications To High-Dimensional Data In Bioinformatics.” 2015. Web. 26 Jan 2021.
Vancouver:
Wan M. Model-Based Classification With Applications To High-Dimensional Data In Bioinformatics. [Internet] [Doctoral dissertation]. Cornell University; 2015. [cited 2021 Jan 26].
Available from: http://hdl.handle.net/1813/39389.
Council of Science Editors:
Wan M. Model-Based Classification With Applications To High-Dimensional Data In Bioinformatics. [Doctoral Dissertation]. Cornell University; 2015. Available from: http://hdl.handle.net/1813/39389

Cornell University
6.
Bao, Tong.
Essays On Social Effects And Social Media.
Degree: PhD, Management, 2011, Cornell University
URL: http://hdl.handle.net/1813/33632
► Two significant phenomena emerge from recent internet development: consumers are influenced by social network; and consumers engage in consumption and production of user-generated content. This…
(more)
▼ Two significant phenomena emerge from recent internet development: consumers are influenced by social network; and consumers engage in consumption and production of user-generated content. This dissertation studies social influence and social media. In Chapter 1, we study how summer internship employer choices of MBA students at a major
university are influenced by the choices made by their fellow students. We develop a simultaneous model of each individual's choice as a function of other students' choices. Our model of interdependence in decision making is structural and equilibrium-based. Also, the model is general enough to allow both positive and negative effects of average group choices on any individual's decision. The structure of our data enables us to identify endogenous social effects separately from exogenous or correlated effects. Specifically, in our data we see each student making choices about whether or not to apply for each job opening; exogenous and correlated effects do not vary in this sample and therefore endogenous effects are identified. We employ a two-stage procedure to address the endogeneity of choices: we estimate empirical choice probabilities in the first stage, and taste parameters for employer attributes and peer influence in the second stage. Our results show that as expected, students prefer jobs with strong employer attributes (e.g. high salary, large firm size). In addition, students are influenced by their peers' choices. However, in contrast to previous studies, we find negative (rather than positive) social effects. That is, strong attributes also make an internship employer less attractive, leading to a lower choice probability relative to cases of zero or positive social effects. This negative social effect is consistent with congestion, i.e. students are aware that a good internship will attract the interest of more students, thus lowering the odds of getting it. We find that these negative social effects are stronger for students with more work experience and stronger GMAT scores. While positive social effect leads to concentration of choices, negative social effect helps prevent concentration. In chapter 2, we analyze how large content-sharing websites operate for companies like Google and Yahoo. A content website provider needs to understand content users to achieve different objectives. Consumers searching content take sampling probability as given in deciding consumption, and producers are motivated by endorsement. Sampling probability is a key policy instrument. Endorsement may explain why a small number of producers generate most content. Individual behaviors alone cannot explain genesis and persistence of sampling probability and endorsement. Two distinguishing features of content-being free and non-rival preclude application of celebrated market equilibrium theory. We develop a content equilibrium from first principles. Consumer and producer can be compatible, and their interaction gives rise to endogenous sampling probability and endorsement. Inequality arises:…
Advisors/Committee Members: Kadiyali, Vrinda (chair), Gupta, Sachin (coChair), Huttenlocher, Daniel Peter (committee member), Wells, Martin Timothy (committee member).
Subjects/Keywords: Marketing; Social network; Social Media
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Bao, T. (2011). Essays On Social Effects And Social Media. (Doctoral Dissertation). Cornell University. Retrieved from http://hdl.handle.net/1813/33632
Chicago Manual of Style (16th Edition):
Bao, Tong. “Essays On Social Effects And Social Media.” 2011. Doctoral Dissertation, Cornell University. Accessed January 26, 2021.
http://hdl.handle.net/1813/33632.
MLA Handbook (7th Edition):
Bao, Tong. “Essays On Social Effects And Social Media.” 2011. Web. 26 Jan 2021.
Vancouver:
Bao T. Essays On Social Effects And Social Media. [Internet] [Doctoral dissertation]. Cornell University; 2011. [cited 2021 Jan 26].
Available from: http://hdl.handle.net/1813/33632.
Council of Science Editors:
Bao T. Essays On Social Effects And Social Media. [Doctoral Dissertation]. Cornell University; 2011. Available from: http://hdl.handle.net/1813/33632

Cornell University
7.
Zhao, Yue.
Contributions To The Statistical Inference For The Semiparametric Elliptical Copula Model.
Degree: PhD, Statistics, 2015, Cornell University
URL: http://hdl.handle.net/1813/41052
► This thesis addresses aspects of the statistical inference problem for the semiparametric elliptical copula model. A copula (function) for a continuous multivariate distribution is the…
(more)
▼ This thesis addresses aspects of the statistical inference problem for the semiparametric elliptical copula model. A copula (function) for a continuous multivariate distribution is the joint distribution function of the transformed marginal distributions, where the transformation is the probability integral transform. As such, copula is a tool to couple or decouple the multivariate dependence structure from the behaviors of the individual margins. The semiparametric elliptical copula model is the family of distributions whose dependence structures are specified by parametric elliptical copulas but whose marginal distributions are left unspecified. The elliptical copula is in turn uniquely characterized by a characteristic generator and a copula correlation matrix [SIGMA]. In the first part of this thesis, we address the estimation of [SIGMA]. A natural estimate for [SIGMA] is the plug-in estimator [SIGMA] with Kendall's tau statistic. We first obtain a sharp bound on the operator norm of [SIGMA] [-] [SIGMA]. Then, we study a factor model of [SIGMA], for which we propose a refined estimator [SIGMA] by fitting a lowrank matrix plus a diagonal matrix to [SIGMA] using least squares with a nuclear norm penalty on the low-rank matrix. The bound on the operator norm of [SIGMA] [-] [SIGMA] serves to scale the penalty term, and we obtain finite sample oracle inequalities for [SIGMA]. We provide data-driven versions of all our estimation procedures. In the second part of this thesis, we specialize to a subset of the semiparametric elliptical copula model and study the classification of two distributions that have the same Gaussian copula but that are otherwise arbitrary in high dimensions. Under this semiparametric Gaussian copula setting, we derive an accurate semiparametric estimator of the log density ratio, which leads to our empirical decision rule and a bound on its associated excess risk. Our estimation procedure takes advantage of the potential sparsity as well as the low noise condition in the problem, which allows us to achieve faster convergence rate of the excess risk than is possible in the existing literature on semiparametric Gaussian copula classification. We demonstrate the efficiency of our semiparametric empirical decision rule by showing that the bound on the excess risk nearly achieves a convergence rate of n[-]1/2 in the simple setting of Gaussian distribution classification.
Advisors/Committee Members: Wegkamp,Marten H. (chair), Bunea,Florentina (committee member), Wells,Martin Timothy (committee member).
Subjects/Keywords: semiparametric; elliptical copula
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Zhao, Y. (2015). Contributions To The Statistical Inference For The Semiparametric Elliptical Copula Model. (Doctoral Dissertation). Cornell University. Retrieved from http://hdl.handle.net/1813/41052
Chicago Manual of Style (16th Edition):
Zhao, Yue. “Contributions To The Statistical Inference For The Semiparametric Elliptical Copula Model.” 2015. Doctoral Dissertation, Cornell University. Accessed January 26, 2021.
http://hdl.handle.net/1813/41052.
MLA Handbook (7th Edition):
Zhao, Yue. “Contributions To The Statistical Inference For The Semiparametric Elliptical Copula Model.” 2015. Web. 26 Jan 2021.
Vancouver:
Zhao Y. Contributions To The Statistical Inference For The Semiparametric Elliptical Copula Model. [Internet] [Doctoral dissertation]. Cornell University; 2015. [cited 2021 Jan 26].
Available from: http://hdl.handle.net/1813/41052.
Council of Science Editors:
Zhao Y. Contributions To The Statistical Inference For The Semiparametric Elliptical Copula Model. [Doctoral Dissertation]. Cornell University; 2015. Available from: http://hdl.handle.net/1813/41052

Cornell University
8.
Earls, Cecilia.
Bayesian Hierarchical Gaussian Process Models For Functional Data Analysis.
Degree: PhD, Statistics, 2014, Cornell University
URL: http://hdl.handle.net/1813/38765
► This dissertation encompasses a breadth of topics in the area of functional data analysis where each function is modeled as a Gaussian process within the…
(more)
▼ This dissertation encompasses a breadth of topics in the area of functional data analysis where each function is modeled as a Gaussian process within the framework of a Bayesian hierarchical model. As Gaussian processes cannot be worked with directly in this context, a foundational aspect of this work illustrates that using a finite approximation to each process is sufficient to provide good estimates throughout the entire process. More importantly, it is established that using a finite approximation of a bivariate random process within the estimation procedure also results in providing good estimates throughout the entire bivariate process. With this result, the mean and covariance functions associated with a Gaussian process can be considered as random effects within a Bayesian hierarchical model. Inference for both parameters is based upon their posterior distributions which provide not only estimates of these parameters, but also quantifies variation in these parameters. Here we also propose Bayesian hierarchical models for smoothing, functional linear regression, and functional registration. The registration model introduced here is shown to favorably compare with the best registration methods currently available as measured by the Sobolev Least Squares criterion. Within this registration framework, an Adapted Variational Bayes algorithm is introduced to address the computational costs associated with inference in high-dimensional Bayesian models. With multiple examples, both simulated and using real data, it is shown that this algorithm results in registered function estimates that closely agree with corresponding estimates obtained from an MCMC sampling scheme. With this algorithm, functional prediction is considered for the first time in a registration context. The final area of inference for functional data that is proposed for the first time here is a combined registration and factor analysis model. This model is shown to outperform currently available registration methods for data in which the registered functions vary in more than one functional direction. The models presented here are applied to several simulated data sets as well as data from the Berkeley Growth Study, functional sea-surface temperature data, and a juggling data set.
Advisors/Committee Members: Hooker, Giles J. (chair), Wells, Martin Timothy (committee member), Booth, James (committee member).
Subjects/Keywords: Functional data; registration; Covariance estimation
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Earls, C. (2014). Bayesian Hierarchical Gaussian Process Models For Functional Data Analysis. (Doctoral Dissertation). Cornell University. Retrieved from http://hdl.handle.net/1813/38765
Chicago Manual of Style (16th Edition):
Earls, Cecilia. “Bayesian Hierarchical Gaussian Process Models For Functional Data Analysis.” 2014. Doctoral Dissertation, Cornell University. Accessed January 26, 2021.
http://hdl.handle.net/1813/38765.
MLA Handbook (7th Edition):
Earls, Cecilia. “Bayesian Hierarchical Gaussian Process Models For Functional Data Analysis.” 2014. Web. 26 Jan 2021.
Vancouver:
Earls C. Bayesian Hierarchical Gaussian Process Models For Functional Data Analysis. [Internet] [Doctoral dissertation]. Cornell University; 2014. [cited 2021 Jan 26].
Available from: http://hdl.handle.net/1813/38765.
Council of Science Editors:
Earls C. Bayesian Hierarchical Gaussian Process Models For Functional Data Analysis. [Doctoral Dissertation]. Cornell University; 2014. Available from: http://hdl.handle.net/1813/38765

Cornell University
9.
Lee, Kwan Seung.
Law And Legal Environments Surrounding Law Firms.
Degree: M.S., Industrial and Labor Relations, Industrial and Labor Relations, 2015, Cornell University
URL: http://hdl.handle.net/1813/41122
► ABASTRACT The research on law and organizations has experienced multiple phases: law as an incentive for or constraint on organizational compliance, law as an element…
(more)
▼ ABASTRACT The research on law and organizations has experienced multiple phases: law as an incentive for or constraint on organizational compliance, law as an element of the wider institutional environment surrounding organizations, and law the meaning of which is in turn shaped by the proactive practices by the organization. Additionally, an emerging strand of entrepreneurship research has begun to view legislative change as a source of business opportunities. However, few attempts have been made to analyze the relationship between law and organizations combining existing perspectives on law and organizations and on entrepreneurship. In this article, I first examine the positive direct economic impact of law on law firms through providing opportunities for a new legal service since entrepreneurship research suggests that law can influence organizations' structural decisions about whether to practice employment law. Next, I argue that the legal environment also indirectly affects law firms negatively through client organizations. An employee-friendly legal environment makes client organizations more heedful of potential negative consequences and their proactive and careful responses, in turn, alleviate the pressure on law firms to practice employment law. Using a hierarchical logistic regression and controlling for other variables, my findings show that an increase in the number of employment complaints leads to a higher likelihood of law firms practicing employment law. It also reveals that a progressive state legal environment decreases that likelihood. Keywords: Law Firms, Employment Law, Law and Organizations ii
Advisors/Committee Members: Tolbert,Pamela S (chair), Colvin,Alexander James (committee member), Wells,Martin Timothy (committee member).
Subjects/Keywords: law firms; employment law; law and organizations
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Lee, K. S. (2015). Law And Legal Environments Surrounding Law Firms. (Masters Thesis). Cornell University. Retrieved from http://hdl.handle.net/1813/41122
Chicago Manual of Style (16th Edition):
Lee, Kwan Seung. “Law And Legal Environments Surrounding Law Firms.” 2015. Masters Thesis, Cornell University. Accessed January 26, 2021.
http://hdl.handle.net/1813/41122.
MLA Handbook (7th Edition):
Lee, Kwan Seung. “Law And Legal Environments Surrounding Law Firms.” 2015. Web. 26 Jan 2021.
Vancouver:
Lee KS. Law And Legal Environments Surrounding Law Firms. [Internet] [Masters thesis]. Cornell University; 2015. [cited 2021 Jan 26].
Available from: http://hdl.handle.net/1813/41122.
Council of Science Editors:
Lee KS. Law And Legal Environments Surrounding Law Firms. [Masters Thesis]. Cornell University; 2015. Available from: http://hdl.handle.net/1813/41122

Cornell University
10.
Mentch, Lucas.
Ensemble Trees And Clts: Statistical Inference In Machine Learning.
Degree: PhD, Statistics, 2015, Cornell University
URL: http://hdl.handle.net/1813/41126
► As data grows in size and complexity, scientists are relying more heavily on learning algorithms that can adapt to underlying relationships in the data without…
(more)
▼ As data grows in size and complexity, scientists are relying more heavily on learning algorithms that can adapt to underlying relationships in the data without imposing a formal model structure. These learning algorithms can produce very accurate predictions, but create something of a black-box and thus are very difficult to analyze. Classical statistical models on the other hand insist on a more rigid structure but are intuitive and easy to interpret. The fundamental goal of this work is to bridge these approaches by developing limiting distributions and formal statistical inference procedures for broad classes of ensemble learning methods. This is accomplished by drawing a connection between the structure of subsampled ensembles and U-statistics. In particular, we extend the existing theory of U-statistics to include infinite-order and random kernel cases and develop the relevant asymptotic theory for these new classes of estimators. This allows us to produce confidence intervals for predictions generated by supervised learning ensembles like bagged trees and random forests. We also develop formal testing procedures for feature significance and extend these to produce hypothesis tests for additivity. When a large number of test points is required or the additive structure is particularly complex, we employ random projections and utilize recent theoretical developments. Finally, we further extend these ideas and propose an alternative permutation scheme to address the problem of variable selection with random forests.
Advisors/Committee Members: Hooker,Giles J. (chair), Wegkamp,Marten H. (committee member), Wells,Martin Timothy (committee member).
Subjects/Keywords: U-statistics; Random Forests; Bagging
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Mentch, L. (2015). Ensemble Trees And Clts: Statistical Inference In Machine Learning. (Doctoral Dissertation). Cornell University. Retrieved from http://hdl.handle.net/1813/41126
Chicago Manual of Style (16th Edition):
Mentch, Lucas. “Ensemble Trees And Clts: Statistical Inference In Machine Learning.” 2015. Doctoral Dissertation, Cornell University. Accessed January 26, 2021.
http://hdl.handle.net/1813/41126.
MLA Handbook (7th Edition):
Mentch, Lucas. “Ensemble Trees And Clts: Statistical Inference In Machine Learning.” 2015. Web. 26 Jan 2021.
Vancouver:
Mentch L. Ensemble Trees And Clts: Statistical Inference In Machine Learning. [Internet] [Doctoral dissertation]. Cornell University; 2015. [cited 2021 Jan 26].
Available from: http://hdl.handle.net/1813/41126.
Council of Science Editors:
Mentch L. Ensemble Trees And Clts: Statistical Inference In Machine Learning. [Doctoral Dissertation]. Cornell University; 2015. Available from: http://hdl.handle.net/1813/41126

Cornell University
11.
Gilbert, Daniel E.
Luck, Fairness and Bayesian Tensor Completion.
Degree: PhD, Statistics, 2019, Cornell University
URL: http://hdl.handle.net/1813/67532
► This thesis contains papers on three diverse topics. The first topic is luck in games, and how to measure it. Game theory is the study…
(more)
▼ This thesis contains papers on three diverse topics. The first topic is luck in games, and how to measure it. Game theory is the study of tractable games which may be used to model more complex systems. Board games, video games and sports, however, are intractable by design, so "ludological" theories about these games as complex phenomena should be grounded in empiricism. A first "ludometric" concern is the empirical measurement of the amount of luck in various games. We argue against a narrow view of luck which includes only factors outside any player's control, and advocate for a holistic definition of luck as complementary to the variation in effective skill within a population of players. We introduce two metrics for luck in a game for a given population - one information theoretical, and one Bayesian, and discuss the estimation of these metrics using sparse, high-dimensional regression techniques. Finally, we apply these techniques to compare the amount of luck between various professional sports, between Chess and Go, and between two hobby board games: Race for the Galaxy and Seasons. The second topic centers on matrix and tensor completion, which are frameworks for a wide range of problems, including collaborative filtering, missing data, and image reconstruction. Missing entries are estimated by leveraging an assumption that the matrix or tensor is low-rank. Most existing Bayesian techniques encourage rank-sparsity by modelling factorized matrices and tensors with Normal-Gamma priors. However, the Horseshoe prior and other "global-local" formulations provide tuning-parameter-free solutions which may better achieve simultaneous rank-sparsity and missing-value recovery. We find these global-local priors outperform commonly used alternatives in simulations and in a collaborative filtering task predicting board game ratings. The third topic is a review and novel perspective on fairness in algorithms. A substantial portion of the literature on fairness in algorithms proposes, analyzes, and operationalizes simple formulaic criteria for assessing fairness. Two of these criteria, Equalized Odds and Calibration by Group, have gained significant attention for their simplicity and intuitive appeal, but also for their incompatibility. This chapter provides a perspective on the meaning and consequences of these and other fairness criteria using graphical models which reveals Equalized Odds and related criteria to be ultimately misleading. An assessment of various graphical models suggests that fairness criteria should ultimately be case-specific and sensitive to the nature of the information the algorithm processes.
Advisors/Committee Members: Wells, Martin Timothy (chair), Booth, James (committee member), Wilson, Andrew Gordon (committee member).
Subjects/Keywords: luck; matrix completion; tensor completion; Statistics; bayesian; equalized odds; fairness
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Gilbert, D. E. (2019). Luck, Fairness and Bayesian Tensor Completion. (Doctoral Dissertation). Cornell University. Retrieved from http://hdl.handle.net/1813/67532
Chicago Manual of Style (16th Edition):
Gilbert, Daniel E. “Luck, Fairness and Bayesian Tensor Completion.” 2019. Doctoral Dissertation, Cornell University. Accessed January 26, 2021.
http://hdl.handle.net/1813/67532.
MLA Handbook (7th Edition):
Gilbert, Daniel E. “Luck, Fairness and Bayesian Tensor Completion.” 2019. Web. 26 Jan 2021.
Vancouver:
Gilbert DE. Luck, Fairness and Bayesian Tensor Completion. [Internet] [Doctoral dissertation]. Cornell University; 2019. [cited 2021 Jan 26].
Available from: http://hdl.handle.net/1813/67532.
Council of Science Editors:
Gilbert DE. Luck, Fairness and Bayesian Tensor Completion. [Doctoral Dissertation]. Cornell University; 2019. Available from: http://hdl.handle.net/1813/67532

Cornell University
12.
Narayanan, Rajendran.
Shrinkage Estimation For Penalised Regression, Loss Estimation And Topics On Largest Eigenvalue Distributions.
Degree: PhD, Statistics, 2012, Cornell University
URL: http://hdl.handle.net/1813/31137
► The dissertation can be broadly classified into four projects. They are presented in four different chapters as (a) Stein estimation for l1 penalised regression and…
(more)
▼ The dissertation can be broadly classified into four projects. They are presented in four different chapters as (a) Stein estimation for l1 penalised regression and model selection, (b) Loss estimation for model selection, (c) Largest eigenvalue distributions of random matrices, and (d) Maximum domain of attraction of Tracy-Widom Distribution. In the first project, we construct Stein-type shrinkage estimators for the coefficients of a linear model, based on a convex combination of the Lasso and the least squares estimator. Since the Lasso constraint set is a closed and bounded polyhedron (a crosspolytope), we observe that under a general quadratic loss function, we can treat the Lasso solution as a metric projection of the least squares estimator onto the constraint set. We derive analytical expressions for the decision theoretic risk difference of the proposed Stein-type estimators and Lasso and establish data-based verifiable conditions for risk gains of the proposed estimator over Lasso. Following the Stein's Unbiased Risk Estimation (SURE) framework, we further derive expressions for unbiased esimates of prediction error for selecting the optimal tuning parameter. In the second project, we consider the following problem. For a random vector X , estimation of the unknown location parameter [theta] using an estimator d(X ) is often accompanied by a loss function L(d(X ), [theta]). Performance of such an estimator is usually evaluated using the risk of d(X ). We consider estimating the loss function using an estimator [lamda](X ) which is conditional on the actual observations as opposed to an average over the sampling distribution of d(X ). In this context, we consider estimating the loss function when the unknown mean vector [theta] of a multivariate normal distribution with an arbitrary covariance matrix is estimated using both the MLE and a shrinkage estimator. We derive sufficient conditions for inadmissibility of the unbiased estimators of loss for such a random vector. We further establish conditions for improved estimators of the loss function for a linear model when the Lasso is used as a model selection tool and exhibit such an improved estimator. The largest eigenvalue of the Gaussian and Jacobi ensembles plays an important role in classical multivariate analysis and random matrix theory. Historically, the exact distribution for the largest eigenvalue has required extensive tables or use of specialised software. More recently, asymptotic approximations for the cumulative distribution function of the largest eigenvalue in both settings have been shown to have the Tracy-Widom limit. Our main results concern using a unified approach to derive the exact cumulative distribution function of the largest eigenvalue in both settings in terms of elements of a matrix that have explicit scalar analytical forms. In the fourth chapter, the maximum of i.i.d. Tracy-Widom distributed random variables arising from the Gaussian unitary ensemble is shown to belong to the Gumbel domain of attraction. This theoretical result…
Advisors/Committee Members: Wells, Martin Timothy (chair), Strawderman, Robert Lee (committee member), Nussbaum, Michael (committee member).
Subjects/Keywords: Shrinkage Estimation; Loss Estimation; Distribution of Largest Eigenvalue; Domain of Attraciton of Tracy-Widom
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Narayanan, R. (2012). Shrinkage Estimation For Penalised Regression, Loss Estimation And Topics On Largest Eigenvalue Distributions. (Doctoral Dissertation). Cornell University. Retrieved from http://hdl.handle.net/1813/31137
Chicago Manual of Style (16th Edition):
Narayanan, Rajendran. “Shrinkage Estimation For Penalised Regression, Loss Estimation And Topics On Largest Eigenvalue Distributions.” 2012. Doctoral Dissertation, Cornell University. Accessed January 26, 2021.
http://hdl.handle.net/1813/31137.
MLA Handbook (7th Edition):
Narayanan, Rajendran. “Shrinkage Estimation For Penalised Regression, Loss Estimation And Topics On Largest Eigenvalue Distributions.” 2012. Web. 26 Jan 2021.
Vancouver:
Narayanan R. Shrinkage Estimation For Penalised Regression, Loss Estimation And Topics On Largest Eigenvalue Distributions. [Internet] [Doctoral dissertation]. Cornell University; 2012. [cited 2021 Jan 26].
Available from: http://hdl.handle.net/1813/31137.
Council of Science Editors:
Narayanan R. Shrinkage Estimation For Penalised Regression, Loss Estimation And Topics On Largest Eigenvalue Distributions. [Doctoral Dissertation]. Cornell University; 2012. Available from: http://hdl.handle.net/1813/31137

Cornell University
13.
Kirtland, Kelly Meredith.
Outlier Detection and Multicollinearity in Sequential Variable Selection: A Least Angle Regression-Based Approach.
Degree: PhD, Statistics, 2017, Cornell University
URL: http://hdl.handle.net/1813/47809
► As lasso regression has grown exceedingly popular as a tool for coping with variable selection in high-dimensional data, diagnostic methods have not kept pace. The…
(more)
▼ As lasso regression has grown exceedingly popular as a tool for coping with variable selection in high-dimensional data, diagnostic methods have not kept pace. The primary difficulty of outlier detection in high-dimensional data is the inability to examine all subspaces, either simultaneously or sequentially. I explore the impact of outliers on lasso variable selection and penalty parameter estimation, and propose a tree-like outlier nominator based on the LARS algorithm. The least angle regression outlier nomination (LARON) algorithm follows variable selection paths and prediction summaries for the original data set and data subsets after removing potential outliers. This provides visual insight into the effect of specific points on lasso fits while allowing for a data-directed exploration of various subspaces.
Simulation studies indicate that LARON is generally more powerful at detecting outliers than standard diagnostics applied to Lasso models after fitting a model. One reason for this improvement is that observations with unusually high influence can inflate the penalty parameter and result in a severely underfit model. We explore this result through simulations and theoretically using a Lasso homotopy adapted for online observations. Additionally, LARON is able to explore multiple subspaces while post-hoc diagnostics rely on a variable selection that has already occurred under possible influence of an unusual observation. However, LARON underperforms random nomination when attempting to detect high leverage, non-influential points located in minor eigenvalue directions in high dimensional settings. The lack of detection appears to result from a robustness in Lasso's variable selection process against such points.
A new R package implementing the LARON algorithm is presented and its functionality to detect multicollinearity in the data, even when masked by high leverage points, described. This package is then used to analyze data created by simulation and several real data sets.
Advisors/Committee Members: Velleman, Paul F (chair), Wells, Martin Timothy (committee member), Hooker, Giles J. (committee member).
Subjects/Keywords: Statistics; LARS; lasso; multicollinearity; outlier nomination; sequential; variable selection
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Kirtland, K. M. (2017). Outlier Detection and Multicollinearity in Sequential Variable Selection: A Least Angle Regression-Based Approach. (Doctoral Dissertation). Cornell University. Retrieved from http://hdl.handle.net/1813/47809
Chicago Manual of Style (16th Edition):
Kirtland, Kelly Meredith. “Outlier Detection and Multicollinearity in Sequential Variable Selection: A Least Angle Regression-Based Approach.” 2017. Doctoral Dissertation, Cornell University. Accessed January 26, 2021.
http://hdl.handle.net/1813/47809.
MLA Handbook (7th Edition):
Kirtland, Kelly Meredith. “Outlier Detection and Multicollinearity in Sequential Variable Selection: A Least Angle Regression-Based Approach.” 2017. Web. 26 Jan 2021.
Vancouver:
Kirtland KM. Outlier Detection and Multicollinearity in Sequential Variable Selection: A Least Angle Regression-Based Approach. [Internet] [Doctoral dissertation]. Cornell University; 2017. [cited 2021 Jan 26].
Available from: http://hdl.handle.net/1813/47809.
Council of Science Editors:
Kirtland KM. Outlier Detection and Multicollinearity in Sequential Variable Selection: A Least Angle Regression-Based Approach. [Doctoral Dissertation]. Cornell University; 2017. Available from: http://hdl.handle.net/1813/47809

Cornell University
14.
Ji, Pengsheng.
Selected Topics In Nonparametric Testing And Variable Selection For High Dimensional Data.
Degree: PhD, Statistics, 2012, Cornell University
URL: http://hdl.handle.net/1813/31124
► Part I: The Gaussian white noise model has been used as a general framework for nonparametric problems. The asymptotic equivalence of this model to density…
(more)
▼ Part I: The Gaussian white noise model has been used as a general framework for nonparametric problems. The asymptotic equivalence of this model to density estimation and nonparametric regression has been established by Nussbaum (1996), Brown and Low (1996). In Chapter 1, we consider testing for presence of a signal in Gaussian white noise with intensity n[-]1/2 , when the alternatives are given by smoothness ellipsoids with an L2 -ball of radius [rho] removed. It is known that, for a fixed Sobolev type ellipsoid [SIGMA]([beta], M ) of smoothness [beta] and size M , the radius rate [rho] n[-]4[beta]/(4[beta]+1) is the critical separation rate, in the sense that the minimax error of second kind over [alpha]-tests stays asymptotically between 0 and 1 strictly (Ingster, 1982). In addition, Ermakov (1990) found the sharp asymptotics of the minimax error of second kind at the separation rate. For adaptation over both [beta] and M in that context, it is known that a log log-penalty over the separation rate for [rho] is necessary for a nonzero asymptotic power. Here, following an example in nonparametric estimation related to the Pinsker constant, we investigate the adaptation problem over the ellipsoid size M only, for fixed smoothness degree [beta]. It is established that the Ermakov type sharp asymptotics can be preserved in that adaptive setting, if [rho] [RIGHTWARDS ARROW] 0 slower than the separation rate. The penalty for adapation in that setting turns out to be a sequence tending to infinity arbitrarily slowly. In Chapter 2, motivated by the sharp asymptotics of nonparametric estimation for non-Gaussian regression (Golubev and Nussbaum, 1990), we extend Ermakov's sharp asymptotics for the minimax testing errors to the nonparametric regression model with nonnormal errors. The paper entitled "Sharp Asymptotics for Risk Bounds in Nonparametric Testing with Uncertainty in Error Distributions" is in preparation. This part is joint work with Michael Nussbaum. Part II: Consider a linear model Y = X [beta] + z, z ~ N (0, In ). Here, X = Xn, p , where both p and n are large but p > n. We model the rows of X as iid samples from N (0, 1 Ω), where Ω is a p x p correlation matrix, which is unknown to us but is n presumably sparse. The vector [beta] is also unknown but has relatively few nonzero coordinates, and we are interested in identifying these nonzeros. We propose the Univariate Penalization Screeing (UPS) for variable selection. This is a Screen and Clean method where we screen with Univariate thresholding, and clean with Penalized MLE. It has two important properties: Sure Screening and Separable After Screening. These properties enable us to reduce the original regression problem to many small-size regression problems that can be fitted separately. The UPS is effective both in theory and in computation. We measure the performance of a procedure by the Hamming distance, and use an asymptotic framework where p [RIGHTWARDS ARROW] [INFINITY] and other quantities (e.g., n, sparsity level and strength of signals) are linked…
Advisors/Committee Members: Nussbaum, Michael (chair), Booth, James (committee member), Wells, Martin Timothy (committee member).
Subjects/Keywords: minimax hypothesis testing; graph; phase diagram; screen and clean; Hamming distance; variable selection
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Ji, P. (2012). Selected Topics In Nonparametric Testing And Variable Selection For High Dimensional Data. (Doctoral Dissertation). Cornell University. Retrieved from http://hdl.handle.net/1813/31124
Chicago Manual of Style (16th Edition):
Ji, Pengsheng. “Selected Topics In Nonparametric Testing And Variable Selection For High Dimensional Data.” 2012. Doctoral Dissertation, Cornell University. Accessed January 26, 2021.
http://hdl.handle.net/1813/31124.
MLA Handbook (7th Edition):
Ji, Pengsheng. “Selected Topics In Nonparametric Testing And Variable Selection For High Dimensional Data.” 2012. Web. 26 Jan 2021.
Vancouver:
Ji P. Selected Topics In Nonparametric Testing And Variable Selection For High Dimensional Data. [Internet] [Doctoral dissertation]. Cornell University; 2012. [cited 2021 Jan 26].
Available from: http://hdl.handle.net/1813/31124.
Council of Science Editors:
Ji P. Selected Topics In Nonparametric Testing And Variable Selection For High Dimensional Data. [Doctoral Dissertation]. Cornell University; 2012. Available from: http://hdl.handle.net/1813/31124

Cornell University
15.
Chen, Maximillian.
Dimension Reduction And Inferential Procedures For Images.
Degree: PhD, Statistics, 2014, Cornell University
URL: http://hdl.handle.net/1813/37105
► High-dimensional data analysis has been a prominent topic of statistical research in recent years due to the growing presence of high-dimensional electronic data. Much of…
(more)
▼ High-dimensional data analysis has been a prominent topic of statistical research in recent years due to the growing presence of high-dimensional electronic data. Much of the current work has been done on analyzing a sample of high-dimensional multivariate data. However, not as much research has been done on analyzing a sample of matrixvariate data. The population value decomposition (PVD), originated in Crainiceanu et al (2011), is a method for dimension reduction of a population of massive images. Images are decomposed into a product of two orthogonal matrices with population-specific features and one matrix with subject-specific features. The problems of finding the optimal row and column dimensions of reduction for the population of data matrices and inference in the PVD framework have yet to be solved. To find the optimal row and column dimensions, we base our methods on the low-rank approximation methods and optimization procedures of Manton et al (2003). In order to develop our inferential procedures, we assume our data to be matrix normally distributed. We introduce likelihood-ratio tests, score tests, and regression-based inferential procedures for the one, two, and k-sample problems and derive the distributions of the resulting test statistics. Results of the implementation of inferential procedures on simulated facial imaging data will be discussed.
Advisors/Committee Members: Wells, Martin Timothy (chair), Diciccio, Thomas J (committee member), Booth, James (committee member).
Subjects/Keywords: imaging data; dimension reduction; hypothesis testing
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Chen, M. (2014). Dimension Reduction And Inferential Procedures For Images. (Doctoral Dissertation). Cornell University. Retrieved from http://hdl.handle.net/1813/37105
Chicago Manual of Style (16th Edition):
Chen, Maximillian. “Dimension Reduction And Inferential Procedures For Images.” 2014. Doctoral Dissertation, Cornell University. Accessed January 26, 2021.
http://hdl.handle.net/1813/37105.
MLA Handbook (7th Edition):
Chen, Maximillian. “Dimension Reduction And Inferential Procedures For Images.” 2014. Web. 26 Jan 2021.
Vancouver:
Chen M. Dimension Reduction And Inferential Procedures For Images. [Internet] [Doctoral dissertation]. Cornell University; 2014. [cited 2021 Jan 26].
Available from: http://hdl.handle.net/1813/37105.
Council of Science Editors:
Chen M. Dimension Reduction And Inferential Procedures For Images. [Doctoral Dissertation]. Cornell University; 2014. Available from: http://hdl.handle.net/1813/37105

Cornell University
16.
Li, James.
Tensor (Multidimensional Array) Decomposition, Regression And Software For Statistics And Machine Learning.
Degree: PhD, Statistics, 2014, Cornell University
URL: http://hdl.handle.net/1813/37082
► This thesis illustrates connections between statistical models for tensors, introduces a novel linear model for tensors with 3 modes, and implements tensor software in the…
(more)
▼ This thesis illustrates connections between statistical models for tensors, introduces a novel linear model for tensors with 3 modes, and implements tensor software in the form of an R package. Tensors, or multidimensional arrays, are a natural generalization of the vectors and matrices that are ubiquitous in statistical modeling. However, while matrix algebra has been well-studied and plays a crucial role in the interaction between data and the parameters of any given model, algebra of higher-order arrays has been relatively overlooked in data analysis and statistical theory. The emergence of multilinear datasets - where observations are vector-variate, matrix-variate, or even tensor-variate - only serve to emphasize the relative lack of statistical understanding around tensor data structures. In the first half of the thesis, we highlight classic tensor algebraic results and models used in image analysis, chemometrics, and psychometrics, as well as connect them to recent statistical models. The second half of the thesis features a linear model that is based off a recently introduced tensor multiplication. For this model, we prove some of the classic properties that we would expect from a 3-tensor generalization of the matrix ordinary least squares. We also apply our model to a functional dataset to demonstrate one possible usage. We conclude this thesis with an exposition of the software developed to facilitate tensor modeling and manipulation in R. This software implements many of the classic tensor decomposition models as well as our own linear model.
Advisors/Committee Members: Wells, Martin Timothy (chair), Booth, James (committee member), Bien, Jacob (committee member).
Subjects/Keywords: tensor; multilinear; multidimensional; linear regression; tensor least squares; inference; machine learning; prediction
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Li, J. (2014). Tensor (Multidimensional Array) Decomposition, Regression And Software For Statistics And Machine Learning. (Doctoral Dissertation). Cornell University. Retrieved from http://hdl.handle.net/1813/37082
Chicago Manual of Style (16th Edition):
Li, James. “Tensor (Multidimensional Array) Decomposition, Regression And Software For Statistics And Machine Learning.” 2014. Doctoral Dissertation, Cornell University. Accessed January 26, 2021.
http://hdl.handle.net/1813/37082.
MLA Handbook (7th Edition):
Li, James. “Tensor (Multidimensional Array) Decomposition, Regression And Software For Statistics And Machine Learning.” 2014. Web. 26 Jan 2021.
Vancouver:
Li J. Tensor (Multidimensional Array) Decomposition, Regression And Software For Statistics And Machine Learning. [Internet] [Doctoral dissertation]. Cornell University; 2014. [cited 2021 Jan 26].
Available from: http://hdl.handle.net/1813/37082.
Council of Science Editors:
Li J. Tensor (Multidimensional Array) Decomposition, Regression And Software For Statistics And Machine Learning. [Doctoral Dissertation]. Cornell University; 2014. Available from: http://hdl.handle.net/1813/37082

Cornell University
17.
Cunningham, Caitlin.
Markov Methods For Identifying Chip-Seq Peaks.
Degree: PhD, Statistics, 2013, Cornell University
URL: http://hdl.handle.net/1813/33826
► Used to analyze protein interactions with DNA, Chromatin Immunoprecipitation sequencing (ChIP-seq) uses high throughput sequencing technologies to map millions of short DNA "reads" to a…
(more)
▼ Used to analyze protein interactions with DNA, Chromatin Immunoprecipitation sequencing (ChIP-seq) uses high throughput sequencing technologies to map millions of short DNA "reads" to a reference genome. As the majority of reads map to a protein binding region for a specific protein of interest, a large read count at any given position indicates the presence of a binding region, so that scientists seek "peaks," areas of high counts along the genome. This thesis presents several methods to identify binding regions, utilizing hidden Markov model methods. Unlike existing methods, the final model, HiDe-Peak, accounts for both several major covariates, including mappability and GC content, as well as the dependence between counts present in the dataset. On real data, HiDe-Peak performs in line with existing methods, and in simulations, outperforms its competitors.
Advisors/Committee Members: Booth, James (chair), Hooker, Giles J. (committee member), Wells, Martin Timothy (committee member).
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Cunningham, C. (2013). Markov Methods For Identifying Chip-Seq Peaks. (Doctoral Dissertation). Cornell University. Retrieved from http://hdl.handle.net/1813/33826
Chicago Manual of Style (16th Edition):
Cunningham, Caitlin. “Markov Methods For Identifying Chip-Seq Peaks.” 2013. Doctoral Dissertation, Cornell University. Accessed January 26, 2021.
http://hdl.handle.net/1813/33826.
MLA Handbook (7th Edition):
Cunningham, Caitlin. “Markov Methods For Identifying Chip-Seq Peaks.” 2013. Web. 26 Jan 2021.
Vancouver:
Cunningham C. Markov Methods For Identifying Chip-Seq Peaks. [Internet] [Doctoral dissertation]. Cornell University; 2013. [cited 2021 Jan 26].
Available from: http://hdl.handle.net/1813/33826.
Council of Science Editors:
Cunningham C. Markov Methods For Identifying Chip-Seq Peaks. [Doctoral Dissertation]. Cornell University; 2013. Available from: http://hdl.handle.net/1813/33826

Cornell University
18.
Zhao, Ping.
Essays On Customer Relationship Management In Social Networks.
Degree: PhD, Management, 2014, Cornell University
URL: http://hdl.handle.net/1813/36151
► The past decade has witnessed an explosive growth of social media. It has never been so easy for customers to connect and interact with each…
(more)
▼ The past decade has witnessed an explosive growth of social media. It has never been so easy for customers to connect and interact with each other within various forms of social networks. This trend provides companies with unprecedented opportunities to enhance firm-customer relationships by leveraging the power of social influence among customers. However, it also poses potential risks when negative word-of-mouth gets viral on customer networks. The traditional customer relationship management (CRM) research treats customers independently, which might no longer apply to today's networked customers. Therefore, this dissertation investigates approaches to embed social network analysis components into customer relationship management techniques, thus finding a way to harness the power of social influence to improve the efficiency of companies' CRM efforts. Essay 1 of this dissertation studies the "group-to-one" social influence of strong and weak ties within a social network, with a framework of a social interaction model, a social influence model, and a tie strength measure. It is found that the social influence mechanism through strong and weak ties is complex. Sharing and reciprocity play an important role in the way social influence affects customers' purchasing behaviors. It is also found that, as a whole, weak ties are more influential than strong ties. Essay 2 attempts to model customers' defection decisions within a social network. By jointly estimating a dyadic level tie strength model and an individual level defection decision model, it is found that customers who actively interact with others tend to have strong ties with them. Also, customers with strong ties tend to have stronger influence on other customers' defections than customers with weak ties. In this essay, a new approach to measure customers' social network value is also proposed. Essay 3 discusses the promising research opportunities in integrating social network analysis (SNA) components into customer relationship management (CRM). It briefly reviews the four critical aspects of CRM: acquisition, retention, growth, and firm-customer relationship dynamics. Within each aspect the discussion focuses on the possible impact of social network components on CRM models, and how to combine CRM and SNA in modeling efforts.
Advisors/Committee Members: Rao, Vithala R. (chair), Wells, Martin Timothy (committee member), Narayan, Vishal (committee member).
Subjects/Keywords: Customer Relationship Management; Social Network Analysis; Marketing Modeling
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Zhao, P. (2014). Essays On Customer Relationship Management In Social Networks. (Doctoral Dissertation). Cornell University. Retrieved from http://hdl.handle.net/1813/36151
Chicago Manual of Style (16th Edition):
Zhao, Ping. “Essays On Customer Relationship Management In Social Networks.” 2014. Doctoral Dissertation, Cornell University. Accessed January 26, 2021.
http://hdl.handle.net/1813/36151.
MLA Handbook (7th Edition):
Zhao, Ping. “Essays On Customer Relationship Management In Social Networks.” 2014. Web. 26 Jan 2021.
Vancouver:
Zhao P. Essays On Customer Relationship Management In Social Networks. [Internet] [Doctoral dissertation]. Cornell University; 2014. [cited 2021 Jan 26].
Available from: http://hdl.handle.net/1813/36151.
Council of Science Editors:
Zhao P. Essays On Customer Relationship Management In Social Networks. [Doctoral Dissertation]. Cornell University; 2014. Available from: http://hdl.handle.net/1813/36151

Cornell University
19.
Bucca Olea, Mauricio Esteban.
SOCIOECONOMIC INEQUALITY AMONG RACIAL GROUPS IN THE UNITED STATES.
Degree: PhD, Sociology, 2018, Cornell University
URL: http://hdl.handle.net/1813/59572
► This dissertation investigates socioeconomic inequalities across racial groups in the United States and asks several intertwined questions: Does the process of intergeneration mobility differ for…
(more)
▼ This dissertation investigates socioeconomic inequalities across racial groups in the United States and asks several intertwined questions: Does the process of intergeneration mobility differ for Blacks and Whites? Does skin color affect the educational and labor market outcomes of different racial groups? Is the increased educational resemblance of spouses related to the takeoff in income inequality among Black and White households? Three separate articles, each using unique data and methods, provide new approaches to these questions. The first article compares sibling correlations in income -a measure of social immobility- across Black and White populations and explains the higher mobility rates displayed by Blacks. Using Bayesian models for dispersion, I find that Blacks display lower sibling correlation than Whites due the larger income heterogeneity among children of the same family. This pattern is partially explained by the poorer socioeconomic standing of Blacks parents, but part of the Black-White remains unexplained. The second article studies the effects of skin color on the educational attainment and earnings of individuals of different racial groups. Using regression and sibling fixed-effects models, I find that, after accounting for the higher socioeconomic status of lighter skinned families, skin color has no effect on educational attainment, but it has a positive effect on the income of Black men and women. Additionally, this study finds that Blacks are the group that displays the largest variability in skin color while Whites' skin tone is almost invariant. Finally, the third article uses micro-simulations to understand the impact of trends and patterns of educational assortative mating on the increase of income inequality for Black and White families. The results provide an exhaustive confirmation of the minor or null effect of the educational assortative mating on income inequality, ruling out some possible explanations for this finding. Results suggest that such null effect is not due to offsetting trends for Blacks and Whites; to countervailing effects of educational expansion and changing assortative behavior; to insufficiently strong changes in assortative mating and selection into marriage; or to the use of methods that are not able to detect complex patterns. Taken together, the three articles address key discussions in the social stratification literature, informed by principled and innovative empirical strategies.
Advisors/Committee Members: Weeden, Kim (chair), Wells, Martin Timothy (committee member), Bischoff, Kendra (committee member), Maralani, Vida (committee member).
Subjects/Keywords: Sociology; race; Inequality; Demography; Mobility
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Bucca Olea, M. E. (2018). SOCIOECONOMIC INEQUALITY AMONG RACIAL GROUPS IN THE UNITED STATES. (Doctoral Dissertation). Cornell University. Retrieved from http://hdl.handle.net/1813/59572
Chicago Manual of Style (16th Edition):
Bucca Olea, Mauricio Esteban. “SOCIOECONOMIC INEQUALITY AMONG RACIAL GROUPS IN THE UNITED STATES.” 2018. Doctoral Dissertation, Cornell University. Accessed January 26, 2021.
http://hdl.handle.net/1813/59572.
MLA Handbook (7th Edition):
Bucca Olea, Mauricio Esteban. “SOCIOECONOMIC INEQUALITY AMONG RACIAL GROUPS IN THE UNITED STATES.” 2018. Web. 26 Jan 2021.
Vancouver:
Bucca Olea ME. SOCIOECONOMIC INEQUALITY AMONG RACIAL GROUPS IN THE UNITED STATES. [Internet] [Doctoral dissertation]. Cornell University; 2018. [cited 2021 Jan 26].
Available from: http://hdl.handle.net/1813/59572.
Council of Science Editors:
Bucca Olea ME. SOCIOECONOMIC INEQUALITY AMONG RACIAL GROUPS IN THE UNITED STATES. [Doctoral Dissertation]. Cornell University; 2018. Available from: http://hdl.handle.net/1813/59572

Cornell University
20.
Chen, Wei.
Methods For High Dimensional Matrix Computation And Diagnostics Of Distributed System.
Degree: PhD, Operations Research, 2014, Cornell University
URL: http://hdl.handle.net/1813/37139
► Big data provides opportunities, but also brings new challenges to modern scientific computing. In this thesis, we conduct sparse principal component analysis (SPCA) on high…
(more)
▼ Big data provides opportunities, but also brings new challenges to modern scientific computing. In this thesis, we conduct sparse principal component analysis (SPCA) on high dimensional matrices. We propose a modified curvilinear algorithm to solve eigenvalue optimization with orthogonal constraints, and combine it with an augmented Lagrangian method to improve its computational efficiency. We compare our algorithm against standard PCA on the recovery of low-rank tensors and a mean-reverted statistical arbitrage strategy. The explosion of big data has also influenced the development on distributed computing systems. For debugging purposes, we are interested in predicting server run-time based on system data early in the process. We study discriminative models in functional data analysis, and introduce generative models that capture server regime-change behaviors. We also design computational methods, including a blocked Gibbs sampler, to improve the accuracy and efficiency of model estimation.
Advisors/Committee Members: Wells, Martin Timothy (chair), Tang, Ao (committee member), Turnbull, Bruce William (committee member).
Subjects/Keywords: sparse principal component analysis; generative models; discriminative models
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Chen, W. (2014). Methods For High Dimensional Matrix Computation And Diagnostics Of Distributed System. (Doctoral Dissertation). Cornell University. Retrieved from http://hdl.handle.net/1813/37139
Chicago Manual of Style (16th Edition):
Chen, Wei. “Methods For High Dimensional Matrix Computation And Diagnostics Of Distributed System.” 2014. Doctoral Dissertation, Cornell University. Accessed January 26, 2021.
http://hdl.handle.net/1813/37139.
MLA Handbook (7th Edition):
Chen, Wei. “Methods For High Dimensional Matrix Computation And Diagnostics Of Distributed System.” 2014. Web. 26 Jan 2021.
Vancouver:
Chen W. Methods For High Dimensional Matrix Computation And Diagnostics Of Distributed System. [Internet] [Doctoral dissertation]. Cornell University; 2014. [cited 2021 Jan 26].
Available from: http://hdl.handle.net/1813/37139.
Council of Science Editors:
Chen W. Methods For High Dimensional Matrix Computation And Diagnostics Of Distributed System. [Doctoral Dissertation]. Cornell University; 2014. Available from: http://hdl.handle.net/1813/37139

Cornell University
21.
Tecuapetla Gomez, Inder.
Asymptotic Inference For Locally Stationary Processes.
Degree: PhD, Statistics, 2013, Cornell University
URL: http://hdl.handle.net/1813/34256
► The study of locally stationary processes contains theory and methods about a class of processes that describe random phenomena whose fluctuations occur both in time…
(more)
▼ The study of locally stationary processes contains theory and methods about a class of processes that describe random phenomena whose fluctuations occur both in time and space. We consider three aspects of locally stationary processes that have not been explore in the already vast literature on these nonstationary processes. We begin by studying the asymptotic efficiency of simple hypotheses tests via large deviation principles. We establish the analogues of classic results such as Stein's lemma, Chernoff bound and the more general Hoeffding bound. These results are based on a large deviation principle for the log-likelihood ratio test statistic between two locally stationary Gaussian processes which is obtained and presented in the first chapter. In the second chapter we consider the Bayesian estimation of two parameters of a locally stationary process: trend and time-varying spectral density functions, respectively. Under smoothness conditions on the latter function, we obtain the asymptotic normality and efficiency, with respect to a broad class of loss functions, of Bayesian estimators. In passing we also show the asymptotic equivalence between Bayesian estimators and the maximum likelihood estimate. Our concluding fourth chapter explores the time-varying spectral density estimation problem from the point of view of Le Cam's theory of statistical experiments. We establish that the estimation of a time-varying spectral density function can be asymptotically construed as a white noise problem with drift. This result is based on Le Cam's connection theorem.
Advisors/Committee Members: Nussbaum, Michael (chair), Resnick, Sidney Ira (committee member), Wells, Martin Timothy (committee member).
Subjects/Keywords: Locally stationary processes; Large deviations; Bayesian estimation; Le Cam theory of statistical experiments
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Tecuapetla Gomez, I. (2013). Asymptotic Inference For Locally Stationary Processes. (Doctoral Dissertation). Cornell University. Retrieved from http://hdl.handle.net/1813/34256
Chicago Manual of Style (16th Edition):
Tecuapetla Gomez, Inder. “Asymptotic Inference For Locally Stationary Processes.” 2013. Doctoral Dissertation, Cornell University. Accessed January 26, 2021.
http://hdl.handle.net/1813/34256.
MLA Handbook (7th Edition):
Tecuapetla Gomez, Inder. “Asymptotic Inference For Locally Stationary Processes.” 2013. Web. 26 Jan 2021.
Vancouver:
Tecuapetla Gomez I. Asymptotic Inference For Locally Stationary Processes. [Internet] [Doctoral dissertation]. Cornell University; 2013. [cited 2021 Jan 26].
Available from: http://hdl.handle.net/1813/34256.
Council of Science Editors:
Tecuapetla Gomez I. Asymptotic Inference For Locally Stationary Processes. [Doctoral Dissertation]. Cornell University; 2013. Available from: http://hdl.handle.net/1813/34256

Cornell University
22.
Clement, David.
Estimating Equation Methods For Longitudinal And Survival Data.
Degree: PhD, Statistics, 2011, Cornell University
URL: http://hdl.handle.net/1813/33515
► This thesis analyzes censored data in recurrent event, longitudinal, and survival settings. In Chapter 2, a straightforward, flexible methodology is proposed to estimate parameters indexing…
(more)
▼ This thesis analyzes censored data in recurrent event, longitudinal, and survival settings. In Chapter 2, a straightforward, flexible methodology is proposed to estimate parameters indexing the conditional means and variances of the interevent times in a recurrent event process. In Chapter 3, we analyze discretely and informatively observed multivariate continuous longitudinal data; missingness and terminal events are introduced in Chapter 4. In Chapters 3 and 4, the inter-event times are considered a nuisance and the goal is to estimate parameters driving the longitudinal process. To do this, we propose an innovative conditional estimating equation that can model individual trajectories. Finally, Chapter 5 uses these subject-specific trajectories to estimate parameters indexing the terminal event process and predict future survival for arbitrary subjects.
Advisors/Committee Members: Strawderman, Robert Lee (chair), Hooker, Giles J. (committee member), Wells, Martin Timothy (committee member).
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Clement, D. (2011). Estimating Equation Methods For Longitudinal And Survival Data. (Doctoral Dissertation). Cornell University. Retrieved from http://hdl.handle.net/1813/33515
Chicago Manual of Style (16th Edition):
Clement, David. “Estimating Equation Methods For Longitudinal And Survival Data.” 2011. Doctoral Dissertation, Cornell University. Accessed January 26, 2021.
http://hdl.handle.net/1813/33515.
MLA Handbook (7th Edition):
Clement, David. “Estimating Equation Methods For Longitudinal And Survival Data.” 2011. Web. 26 Jan 2021.
Vancouver:
Clement D. Estimating Equation Methods For Longitudinal And Survival Data. [Internet] [Doctoral dissertation]. Cornell University; 2011. [cited 2021 Jan 26].
Available from: http://hdl.handle.net/1813/33515.
Council of Science Editors:
Clement D. Estimating Equation Methods For Longitudinal And Survival Data. [Doctoral Dissertation]. Cornell University; 2011. Available from: http://hdl.handle.net/1813/33515

Cornell University
23.
Federman, Jessica.
The Effect Of Distrust On Cognitive Flexibility And Knowledge Transfer.
Degree: PhD, Industrial and Labor Relations, 2014, Cornell University
URL: http://hdl.handle.net/1813/37072
► In recent times, more attention in the management and educational psychology literature has been devoted to understanding how and why creative potential is not being…
(more)
▼ In recent times, more attention in the management and educational psychology literature has been devoted to understanding how and why creative potential is not being achieved, despite the pressing need for innovative thinking and an increased capacity to transfer knowledge more adaptively. Scholars have argued that creativity and the capacity to use knowledge adaptively is often minimal, unless cognitive flexibility (Day & Goldstone, 2012), variability and cognitive incongruity is introduced (Hatano & Inagaki, 1992). Related to these ideas, recent research has demonstrated that distrust serves as a processing influence that enables individuals to think more flexibly and creatively (Mayer & Mussweiler, 2011). The goal of the current research was to investigate whether distrust effects the capacity to process information in more flexible ways, leading to an increase in knowledge transfer. Across three experimental studies, I primed a psychological mindset by having participants complete a scrambled sentence task made of words synonymous with distrust, trust, or neutral meaning. Study 1 measured the effect of distrust on the capacity to solve an immediate analogical transfer problem. Study 2 measured whether distrust aids in discrediting irrelevant information, when solving an immediate analogical transfer problem. Study 3 measured the effect of distrust on the capacity to solve an analogical transfer problem over a delay of time (4 days).
Advisors/Committee Members: Bell, Bradford (chair), Dyer, Lee D (chair), Wells, Martin Timothy (committee member), Hammer, Tove Helland (committee member), Hammer, Tove Helland (committee member), Wells, Martin Timothy (committee member).
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Federman, J. (2014). The Effect Of Distrust On Cognitive Flexibility And Knowledge Transfer. (Doctoral Dissertation). Cornell University. Retrieved from http://hdl.handle.net/1813/37072
Chicago Manual of Style (16th Edition):
Federman, Jessica. “The Effect Of Distrust On Cognitive Flexibility And Knowledge Transfer.” 2014. Doctoral Dissertation, Cornell University. Accessed January 26, 2021.
http://hdl.handle.net/1813/37072.
MLA Handbook (7th Edition):
Federman, Jessica. “The Effect Of Distrust On Cognitive Flexibility And Knowledge Transfer.” 2014. Web. 26 Jan 2021.
Vancouver:
Federman J. The Effect Of Distrust On Cognitive Flexibility And Knowledge Transfer. [Internet] [Doctoral dissertation]. Cornell University; 2014. [cited 2021 Jan 26].
Available from: http://hdl.handle.net/1813/37072.
Council of Science Editors:
Federman J. The Effect Of Distrust On Cognitive Flexibility And Knowledge Transfer. [Doctoral Dissertation]. Cornell University; 2014. Available from: http://hdl.handle.net/1813/37072

Cornell University
24.
Lee, Kwan Seung.
NONCOMPETE AGREEMENTS: HISTORY, DIFFUSION, AND CONSEQUENCES.
Degree: PhD, Industrial and Labor Relations, 2019, Cornell University
URL: http://hdl.handle.net/1813/67298
► Given widespread use of noncompete agreements (NCAs) and substantial impact on both employees and the overall economy, it is worth investigating more closely the various…
(more)
▼ Given widespread use of noncompete agreements (NCAs) and substantial impact on both employees and the overall economy, it is worth investigating more closely the various implications of NCA use. There remain three big questions about NCAs. First, even though existing studies have focused on state law differences in NCA enforcement, ironically not many have paid attention to how U.S. state laws governing NCAs have developed in different states. Second, in contrast to growing research on the consequences of NCAs, no studies have examined how NCAs have diffused or what caused organizations to use NCAs. Last, studies that investigated the consequences of NCAs rather relied on the proxy measure of different state NCA laws, but a dearth of research has looked into organizations and their employees that actually have NCAs (c.f. Marx, 2011). In the three chapters of my dissertation, I attempt to answers each of the above questions, respectively. I begin by introducing briefly the history of NCAs and variation in NCA enforceability across states with an emphasis on how existing research on NCAs have utilized such variation to investigate various phenomena related to organizational NCA use. I indicate limitations of prior studies and suggest ways to improve understanding of NCAs. Then, I move on to investigate what factors facilitated or slowed down the organizational NCA adoptions by Standard & Poor’s (S&P) 500 firms with their CEOs. I present predictors at different levels in and outside the organization. Finally, I examine changes to CEO compensation and tenure due to having NCAs. The moderating effects of employment pathway of CEOs and state NCA laws are also discussed in detail.
Advisors/Committee Members: Tolbert, Pamela S. (chair), Wells, Martin Timothy (committee member), Colvin, Alexander James (committee member), Rissing, Benjamin A (committee member).
Subjects/Keywords: CEO tenure; contested practices; noncompete agreement; Diffusion; Management; Labor relations; Organization theory; CEO compensation
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Lee, K. S. (2019). NONCOMPETE AGREEMENTS: HISTORY, DIFFUSION, AND CONSEQUENCES. (Doctoral Dissertation). Cornell University. Retrieved from http://hdl.handle.net/1813/67298
Chicago Manual of Style (16th Edition):
Lee, Kwan Seung. “NONCOMPETE AGREEMENTS: HISTORY, DIFFUSION, AND CONSEQUENCES.” 2019. Doctoral Dissertation, Cornell University. Accessed January 26, 2021.
http://hdl.handle.net/1813/67298.
MLA Handbook (7th Edition):
Lee, Kwan Seung. “NONCOMPETE AGREEMENTS: HISTORY, DIFFUSION, AND CONSEQUENCES.” 2019. Web. 26 Jan 2021.
Vancouver:
Lee KS. NONCOMPETE AGREEMENTS: HISTORY, DIFFUSION, AND CONSEQUENCES. [Internet] [Doctoral dissertation]. Cornell University; 2019. [cited 2021 Jan 26].
Available from: http://hdl.handle.net/1813/67298.
Council of Science Editors:
Lee KS. NONCOMPETE AGREEMENTS: HISTORY, DIFFUSION, AND CONSEQUENCES. [Doctoral Dissertation]. Cornell University; 2019. Available from: http://hdl.handle.net/1813/67298

Cornell University
25.
Tan, Hui Fen.
Interpretable Approaches to Opening Up Black-Box Models.
Degree: PhD, Statistics, 2019, Cornell University
URL: http://hdl.handle.net/1813/67545
► In critical domains such as healthcare, finance, and criminal justice, merely knowing what was predicted, and not why, may be insufficient to deploy a machine…
(more)
▼ In critical domains such as healthcare, finance, and criminal justice, merely knowing what was predicted, and not why, may be insufficient to deploy a machine learning model. This dissertation proposes new methods to open up black-box models, with the goal of helping creators, as well as users, of machine learning models increase their trust and understanding of the models. The first part of this dissertation proposes new post-hoc, global explanations for black-box models, developed using model-agnostic distillation techniques or by leveraging known structure specific to the black-box model. First, we propose a distillation approach to learn global additive explanations that describe the relationship between input features and model predictions, showing that distilled additive explanations have fidelity, accuracy, and interpretability advantages over non-additive explanations, via a user study with expert users. Second, we work specifically on tree ensembles, leveraging tree structure to construct a similarity metric for gradient boosted tree models. We use this similarity metric to select prototypical observations in each class, presenting an alternative to other tree ensemble interpretability methods such as seeking one tree that best represents the ensemble or feature importance methods. The second part of this dissertation studies the use of interpretability approaches to probe and debug black-box models in algorithmic fairness settings. Here, black-box takes on another meaning, with many risk-scoring models for high stakes decision such as credit scoring and judicial bail being proprietary and opaque, not lending themselves to easy inspection or validation. We propose Distill-and-Compare, an approach to probe such risk scoring models by leveraging additional information on ground-truth outcomes that the risk scoring model was intended to predict. We find that interpretability approaches can help uncover previously unknown sources of bias. Finally, we provide a concrete case study using the interpretability methods proposed in this dissertation to debug black-box models, in this case, a hybrid Human + Machine recidivism prediction model. Our methods revealed that human and COMPAS decision making anchored on the same features, and hence did not differ significantly enough to harness the promise of hybrid Human + Machine decision making, concluding this dissertation on interpretability approaches for real-world settings.
Advisors/Committee Members: Hooker, Giles J. (chair), Wells, Martin Timothy (committee member), Joachims, Thorsten (committee member), Caruana, Rich A. (committee member).
Subjects/Keywords: Statistics; black-box models; explanations; tree ensembles; Computer science; Interpretability; machine learning; fairness
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Tan, H. F. (2019). Interpretable Approaches to Opening Up Black-Box Models. (Doctoral Dissertation). Cornell University. Retrieved from http://hdl.handle.net/1813/67545
Chicago Manual of Style (16th Edition):
Tan, Hui Fen. “Interpretable Approaches to Opening Up Black-Box Models.” 2019. Doctoral Dissertation, Cornell University. Accessed January 26, 2021.
http://hdl.handle.net/1813/67545.
MLA Handbook (7th Edition):
Tan, Hui Fen. “Interpretable Approaches to Opening Up Black-Box Models.” 2019. Web. 26 Jan 2021.
Vancouver:
Tan HF. Interpretable Approaches to Opening Up Black-Box Models. [Internet] [Doctoral dissertation]. Cornell University; 2019. [cited 2021 Jan 26].
Available from: http://hdl.handle.net/1813/67545.
Council of Science Editors:
Tan HF. Interpretable Approaches to Opening Up Black-Box Models. [Doctoral Dissertation]. Cornell University; 2019. Available from: http://hdl.handle.net/1813/67545

Cornell University
26.
Zhu, Fan.
Factor Models For Call Price Surface Without Static Arbitrage.
Degree: PhD, Operations Research, 2012, Cornell University
URL: http://hdl.handle.net/1813/29307
► Although stochastic volatility models and local volatility model are very popular among the market practitioner for exotic option pricing and hedging, they have several critical…
(more)
▼ Although stochastic volatility models and local volatility model are very popular among the market practitioner for exotic option pricing and hedging, they have several critical defects both in theory and practice. We develop a new methodology for equity exotic option pricing and hedging within the marketbased approach framework. We build stochastic factor models for the whole surface of European call option prices directly from the market data, and then use this model to price exotic options, which is not liquidly traded. The factor models are built based on Karhunen-Loeve decomposition, which can be viewed as an infinite dimensional PCA. We develop the mathematical framework of centered and uncentered versions of the Karhunen-Loeve decomposition and study how to incorporate critical shape constraints. The shape constraints are important because no static arbitrage conditions should be satisfied by our factor models. We discuss this methodology theoretically and investigate it by applying to the simulated data.
Advisors/Committee Members: Wells, Martin Timothy (chair), Jarrow, Robert A. (committee member), Jackson, Peter (committee member), Nussbaum, Michael (committee member).
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Zhu, F. (2012). Factor Models For Call Price Surface Without Static Arbitrage. (Doctoral Dissertation). Cornell University. Retrieved from http://hdl.handle.net/1813/29307
Chicago Manual of Style (16th Edition):
Zhu, Fan. “Factor Models For Call Price Surface Without Static Arbitrage.” 2012. Doctoral Dissertation, Cornell University. Accessed January 26, 2021.
http://hdl.handle.net/1813/29307.
MLA Handbook (7th Edition):
Zhu, Fan. “Factor Models For Call Price Surface Without Static Arbitrage.” 2012. Web. 26 Jan 2021.
Vancouver:
Zhu F. Factor Models For Call Price Surface Without Static Arbitrage. [Internet] [Doctoral dissertation]. Cornell University; 2012. [cited 2021 Jan 26].
Available from: http://hdl.handle.net/1813/29307.
Council of Science Editors:
Zhu F. Factor Models For Call Price Surface Without Static Arbitrage. [Doctoral Dissertation]. Cornell University; 2012. Available from: http://hdl.handle.net/1813/29307

Cornell University
27.
Ho, Jing-Mao.
Social Statisticalization: Number, State, Science.
Degree: PhD, Sociology, 2019, Cornell University
URL: http://hdl.handle.net/1813/67698
► This dissertation explains why society becomes statisticalized—a form of rationalization that influences society through the production, consumption, and dissemination of statistical numbers. In this general…
(more)
▼ This dissertation explains why society becomes statisticalized—a form of rationalization that influences society through the production, consumption, and dissemination of statistical numbers. In this general social process, people increasingly depend on statistics to make decisions, justify practices, and update knowledge. As a result, social statistics are able to change human behavior, reconfigure social relations, shape political discourse, and constitute cultural norms. Ultimately, statistical rationality not only reproduces but also reinforces a variety of defining characteristics of modern society such as efficiency, standardization, formalization, calculability, predictability, and the replacement of humans with technologies. In Chapter 1, I ask: Why do modern states routinely keep official statistics on their societies? This chapter presents arguments about how gathering official statistics as a technology of state power facilitate modern states’ engagements in democratic state building, capitalist state building, colonial, and post-colonial state building, and war making in world society. These arguments are illustrated by historical case studies and tested by cross-national longitudinal analyses of the worldwide establishment of National Statistical Systems (NSSes) in 157 countries from 1826 to 2010. The analyses demonstrate that, although there are regional and temporal variations, democratization, capitalist development, aggregate war onsets, colonization, and inter-governmental linkages generally prompt modern states to establish NSSes. To get their hands on societies, modern states need to collect official statistics to keep societies under their watchful eyes. In Chapter 2, I ask: How does the statistics profession become globally institutionalized? This chapter analyzes the worldwide founding of national statistics associations from 1833 to 2011, arguing that the statistics profession emerged in the nineteenth-century West and spread to other parts of the world in the twentieth century. Based on an integrated institutionalist framework, I conduct event history analysis to demonstrate that the process of the global professionalization of statistics is determined by both national characteristics and the world polity. On one hand, democracy and national material capacity generally encourage the establishment of the statistics profession, and the effect of colonialism is in the opposite direction. On the other hand, intergovernmental organizations create a world polity in which the statistics profession is able to be diffused and constructed. Separate analyses for the pre-1945 and post-1945 eras indicate that, while there are regional and temporal variations, the world polity is a robust factor throughout the entire period of analysis. In Chapter 3, I ask: Why does the American state routinely collate statistical data on crime? Suprisingly, the relationship between state and criminal statistics is undertheorized. This chapter develops a theoretical framework that triangulates…
Advisors/Committee Members: Strang, David (chair), Wells, Martin Timothy (committee member), Berezin, Mabel M. (committee member), Ziewitz, Malte Carsten (committee member).
Subjects/Keywords: Statistics; Knowledge; institution; rationalization; state; Sociology; science
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Ho, J. (2019). Social Statisticalization: Number, State, Science. (Doctoral Dissertation). Cornell University. Retrieved from http://hdl.handle.net/1813/67698
Chicago Manual of Style (16th Edition):
Ho, Jing-Mao. “Social Statisticalization: Number, State, Science.” 2019. Doctoral Dissertation, Cornell University. Accessed January 26, 2021.
http://hdl.handle.net/1813/67698.
MLA Handbook (7th Edition):
Ho, Jing-Mao. “Social Statisticalization: Number, State, Science.” 2019. Web. 26 Jan 2021.
Vancouver:
Ho J. Social Statisticalization: Number, State, Science. [Internet] [Doctoral dissertation]. Cornell University; 2019. [cited 2021 Jan 26].
Available from: http://hdl.handle.net/1813/67698.
Council of Science Editors:
Ho J. Social Statisticalization: Number, State, Science. [Doctoral Dissertation]. Cornell University; 2019. Available from: http://hdl.handle.net/1813/67698

Cornell University
28.
Steingrimsson, Jon.
Information Recovery With Missing Data When Outcomes Are Right Censored.
Degree: PhD, Statistics, 2015, Cornell University
URL: http://hdl.handle.net/1813/41100
► This dissertation focuses on utilizing information more efficiently in several settings when some observations are right-censored using the semiparametric efficiency theory developed in Robins et…
(more)
▼ This dissertation focuses on utilizing information more efficiently in several settings when some observations are right-censored using the semiparametric efficiency theory developed in Robins et al. (1994). Chapter 2 focuses on estimation of the regression parameter in the semiparametric accelerated failure time model when the data is collected using a case-cohort design. The previously proposed methods of estimation use some form of HorvitzThompsons estimators which are known to be inefficient and the main aim of Chapter 2 is to improve efficiency of estimation of the regression parameter for the accelerated failure time model for case-cohort studies. We derive the semiparametric information bound and propose a more practical class of augmented estimators motivated by the augmentation theory developed in Robins et al. (1994). We develop large sample properties, identify the most efficient estimator within the class of augmented estimators, and give practical guidance on how to calculate the estimator. Regression trees are non-parametric methods that use reduction in loss to partition the covariate space into binary partitions creating a prediction model that is easily interpreted and visualized. When some observations are censored the full data loss function is not a function of the observed data and Molinaro et al. (2004) used inverse probability weighted estimators to extend the loss functions to right-censored outcomes. Motivated by semiparametric efficiency theory Chapter 3 extends the approach in Molinaro et al. (2004) by using doubly robust loss function that utilize information on censored observations better in addition to being more robust to the modeling choices that need to be made. Regression trees are known to suffer from instability with minor changes in the data sometimes resulting in very different trees. Ensemble based methods that average several trees have been shown to lead to prediction models that usually have smaller prediction error. One such ensemble based method is random forests Breiman (2001) and in Chapter 4 we use the regression tree methodology developed in Chapter 3 as building blocks to random forests.
Advisors/Committee Members: Hooker,Giles J. (chair), Strawderman,Robert Lee (coChair), Wells,Martin Timothy (committee member), Ruppert,David (committee member).
Subjects/Keywords: Missing Data; Semiparametric Theory; Censored Data
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Steingrimsson, J. (2015). Information Recovery With Missing Data When Outcomes Are Right Censored. (Doctoral Dissertation). Cornell University. Retrieved from http://hdl.handle.net/1813/41100
Chicago Manual of Style (16th Edition):
Steingrimsson, Jon. “Information Recovery With Missing Data When Outcomes Are Right Censored.” 2015. Doctoral Dissertation, Cornell University. Accessed January 26, 2021.
http://hdl.handle.net/1813/41100.
MLA Handbook (7th Edition):
Steingrimsson, Jon. “Information Recovery With Missing Data When Outcomes Are Right Censored.” 2015. Web. 26 Jan 2021.
Vancouver:
Steingrimsson J. Information Recovery With Missing Data When Outcomes Are Right Censored. [Internet] [Doctoral dissertation]. Cornell University; 2015. [cited 2021 Jan 26].
Available from: http://hdl.handle.net/1813/41100.
Council of Science Editors:
Steingrimsson J. Information Recovery With Missing Data When Outcomes Are Right Censored. [Doctoral Dissertation]. Cornell University; 2015. Available from: http://hdl.handle.net/1813/41100

Cornell University
29.
Martin, Sean.
The Role Of Moral Identity In Newcomers' Socialization.
Degree: PhD, Management, 2013, Cornell University
URL: http://hdl.handle.net/1813/34082
► This dissertation addresses the role of moral identity - or the self-importance that people place on morality as a central part of their identity -…
(more)
▼ This dissertation addresses the role of moral identity - or the self-importance that people place on morality as a central part of their identity - in the experiences of organizational newcomers. I address the role of moral identity from two sides. First, I explore how organizational forces in the form of ethical socialization interact with individual traits to determine whether moral identity increases or decreases. Second, I explore the role of individuals' moral identity in helping them transition during the socialization process from being organizational outsider to insiders. In the process, I address how moral identity influences the way that individuals come to think about their work. Finally, I conclude by discussing future studies that can and will follow in this vein. Specifically, I look at an important tool that can be used to influence individuals' values - the organizational story. I posit several research questions concerning how organizational stories may be used to achieve different outcomes. I explore the empirical questions in the field using a longitudinal survey design. The sample site is a large IT firm that prioritizes ethics and organizational values.
Advisors/Committee Members: Detert, James Roland (chair), Dragoni, Lisa (committee member), Wells, Martin Timothy (committee member), Gino, Francesca (committee member).
Subjects/Keywords: Moral Identity; Business Ethics; Socialization
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Martin, S. (2013). The Role Of Moral Identity In Newcomers' Socialization. (Doctoral Dissertation). Cornell University. Retrieved from http://hdl.handle.net/1813/34082
Chicago Manual of Style (16th Edition):
Martin, Sean. “The Role Of Moral Identity In Newcomers' Socialization.” 2013. Doctoral Dissertation, Cornell University. Accessed January 26, 2021.
http://hdl.handle.net/1813/34082.
MLA Handbook (7th Edition):
Martin, Sean. “The Role Of Moral Identity In Newcomers' Socialization.” 2013. Web. 26 Jan 2021.
Vancouver:
Martin S. The Role Of Moral Identity In Newcomers' Socialization. [Internet] [Doctoral dissertation]. Cornell University; 2013. [cited 2021 Jan 26].
Available from: http://hdl.handle.net/1813/34082.
Council of Science Editors:
Martin S. The Role Of Moral Identity In Newcomers' Socialization. [Doctoral Dissertation]. Cornell University; 2013. Available from: http://hdl.handle.net/1813/34082

Cornell University
30.
Logsdon, Benjamin.
Sparse Model Building From Genome-Wide Variation With Graphical Models.
Degree: PhD, Computational Biology, 2011, Cornell University
URL: http://hdl.handle.net/1813/29227
► High throughput sequencing and expression characterization have lead to an explosion of phenotypic and genotypic molecular data underlying both experimental studies and outbred populations. We…
(more)
▼ High throughput sequencing and expression characterization have lead to an explosion of phenotypic and genotypic molecular data underlying both experimental studies and outbred populations. We develop a novel class of algorithms to reconstruct sparse models among these molecular phenotypes (e.g. expression products) and genotypes (e.g. single nucleotide polymorphisms), via both a Bayesian hierarchical model, when the sample size is much smaller than the model dimension (i.e. p n) and the well characterized adaptive lasso algo- rithm. Specifically, we propose novel approaches to the problems of increasing power to detect additional loci in genome-wide association studies using our variational algorithm, efficiently learning directed cyclic graphs from expression and genotype data using the adaptive lasso, and constructing genomewide undirected graphs among genotype, expression and downstream phenotype data using an extension of the variational feature selection algorithm. The Bayesian hierarchical model is derived for a parametric multiple regression model with a mixture prior of a point mass and normal distribution for each regression coefficient, and appropriate priors for the set of hyperparameters. When combined with a probabilistic consistency bound on the model dimension, this approach leads to very sparse solutions without the need for cross validation. We use a variational Bayes approximate inference approach in our algorithm, where we impose a complete factorization across all parameters for the approximate posterior distribution, and then minimize the KullbackLeibler divergence between the approximate and true posterior distributions. Since the prior distribution is non-convex, we restart the algorithm many times to find multiple posterior modes, and combine information across all discovered modes in an approximate Bayesian model averaging framework, to reduce the variance of the posterior probability estimates. We perform analysis of three major publicly available data-sets: the HapMap 2 genotype and expression data collected on immortalized lymphoblastoid cell lines, the genome-wide gene expression and genetic marker data collected for a yeast intercross, and genomewide gene expression, genetic marker, and downstream phenotypes related to weight in a mouse F2 intercross. Based on both simulations and data analysis we show that our algorithms can outperform other state of the art model selection procedures when including thousands to hundreds of thousands of genotypes and expression traits, in terms of aggressively controlling false discovery rate, and generating rich simultaneous statistical models.
Advisors/Committee Members: Mezey, Jason G. (chair), Clark, Andrew (committee member), Bustamante, Carlos D. (committee member), Wells, Martin Timothy (committee member).
Subjects/Keywords: Variational Bayes; Gene expression network reconstruction; Graphical models
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Logsdon, B. (2011). Sparse Model Building From Genome-Wide Variation With Graphical Models. (Doctoral Dissertation). Cornell University. Retrieved from http://hdl.handle.net/1813/29227
Chicago Manual of Style (16th Edition):
Logsdon, Benjamin. “Sparse Model Building From Genome-Wide Variation With Graphical Models.” 2011. Doctoral Dissertation, Cornell University. Accessed January 26, 2021.
http://hdl.handle.net/1813/29227.
MLA Handbook (7th Edition):
Logsdon, Benjamin. “Sparse Model Building From Genome-Wide Variation With Graphical Models.” 2011. Web. 26 Jan 2021.
Vancouver:
Logsdon B. Sparse Model Building From Genome-Wide Variation With Graphical Models. [Internet] [Doctoral dissertation]. Cornell University; 2011. [cited 2021 Jan 26].
Available from: http://hdl.handle.net/1813/29227.
Council of Science Editors:
Logsdon B. Sparse Model Building From Genome-Wide Variation With Graphical Models. [Doctoral Dissertation]. Cornell University; 2011. Available from: http://hdl.handle.net/1813/29227
◁ [1] [2] ▶
.