Full Record

New Search | Similar Records

Title Partial EM Procedure for Big-Data Linear Mixed Effects Model, and Generalized PPE for High-Dimensional Data in Julia
Publication Date
Degree PhD
Discipline/Department Epidemiology and Biostatistics
Degree Level doctoral
University/Publisher Case Western Reserve University School of Graduate Studies
Abstract Methodologically, this dissertation contributes to two areas in Statistics: Linear mixed effects models for big data and Test of equal covariance for high-dimensional data. Scientifically, this dissertation helps to comprehensively evaluate the effect of the Specialty Care Access Network-Extension for Community Healthcare Outcomes (SCAN-ECHO) training on primary care providers at outpatient clinics in treating diabetes for the VA patient population.In the first part of this dissertation, we introduce three challenges and offer solutions to each, in examining the effect of SCAN-ECHO training on VA diabetic patients. The first challenge was data curation for longitudinal variables. As a solution, we developed an R-function called "fusion"' customized to our data structure for effective data curation. The second challenge was measurement variability and heterogeneity of the population. Different types of summary measures were used to reduce the variability of the outcome. Longitudinal cluster analysis was conducted to identify similar subgroups among the heterogeneous population. The third challenge was fitting linear mixed effects model for big data that could not be imported to R because the data exceeded the memory capacity. As a solution, we proposed a new modern approach to Big-data Linear Mixed Effects Model (bLMM) using a Partial EM (PEM) algorithm and data partitioning. Our PEM procedure was developed to analyze the effect of SCAN-ECHO training on diabetes treatment but this analytic approach is of interest by itself (statistical contribution 1) because the PEM is a general procedure for fitting LMM for big data. We evaluated the performance of bLMM PEM by comparing PEM to the following three methods for fitting LMM: Broyden-Fletcher-Goldfarb-Shanno (BFGS) algorithm using the entire data, full EM using the entire data, and meta analysis using data partitions. Finally, for implementation, we applied our PEM procedure to evaluate the effect of SCAN-ECHO training for diabetes treatment. In the second part of this dissertation, improvement in the optimization algorithm for Projection Pursuit Ellipse (PPE) to test for equal variance in high-dimensional data is introduced (statistical contribution 2). Many standard multivariate techniques were developed based on the assumption that the covariance matrices from different groups are equal. A well-known test for the equality of covariance is the Bartlett’s test. However, the Bartlett’s test is only a function of the volumes of covariance matrices, which does not account for the shapes and orientations of the matrices. In this work we developed a Projection Pursuit Ellipses procedure for high-dimensional data (hPPE) and compared its performance to the Bartlett’s test and a modern benchmark for high dimensional p data.
Subjects/Keywords Statistics; Biostatistics; Biomedical Research; Health; Mining; Partial EM, Big Data, Mixed Effects, SCAN, ECHO, Projection Pursuit, PPE, Covariacne, High-dimensional Data, Clustering
Contributors Sun, Jiayang (Advisor); Albert, Jeffrey (Committee Chair)
Language en
Rights unrestricted ; This thesis or dissertation is protected by copyright: all rights reserved. It may not be copied or redistributed beyond the terms of applicable copyright laws.
Country of Publication us
Format application/pdf
Record ID oai:etd.ohiolink.edu:case152845439167999
Repository ohiolink
Date Indexed 2021-01-29
Grantor Case Western Reserve University School of Graduate Studies

Sample Images | Cited Works