Full Record

Author | Jiang, Lai |

Title | Fully Bayesian T-probit Regression with Heavy-tailed Priors for Selection in High-Dimensional Features with Grouping Structure |

URL | http://hdl.handle.net/10388/ETD-2015-09-2232 |

Publication Date | 2015 |

Date Available | 2015-01-01 00:00:00 |

University/Publisher | University of Saskatchewan |

Abstract | Feature selection is demanded in many modern scientific research problems that use high-dimensional data. A typical example is to find the genes that are most related to a certain disease (e.g., cancer) from high-dimensional gene expression profiles. There are tremendous difficulties in eliminating a large number of useless or redundant features. The expression levels of genes have structure; for example, a group of co-regulated genes that have similar biological functions tend to have similar mRNA expression levels. Many statistical methods have been proposed to take the grouping structure into consideration in feature selection and regression, including Group LASSO, Supervised Group LASSO, and regression on group representatives. In this thesis, we propose to use a sophisticated Markov chain Monte Carlo method (Hamiltonian Monte Carlo with restricted Gibbs sampling) to fit T-probit regression with heavy-tailed priors to make selection in the features with grouping structure. We will refer to this method as fully Bayesian T-probit. The main feature of fully Bayesian T-probit is that it can make feature selection within groups automatically without a pre-specification of the grouping structure and more efficiently discard noise features than LASSO (Least Absolute Shrinkage and Selection Operator). Therefore, the feature subsets selected by fully Bayesian T-probit are significantly more sparse than subsets selected by many other methods in the literature. Such succinct feature subsets are much easier to interpret or understand based on existing biological knowledge and further experimental investigations. In this thesis, we use simulated and real datasets to demonstrate that the predictive performances of the more sparse feature subsets selected by fully Bayesian T-probit are comparable with the much larger feature subsets selected by plain LASSO, Group LASSO, Supervised Group LASSO, random forest, penalized logistic regression and t-test. In addition, we demonstrate that the succinct feature subsets selected by fully Bayesian T-probit have significantly better predictive power than the feature subsets of the same size taken from the top features selected by the aforementioned methods. |

Subjects/Keywords | Bayesian methods; probit; MCMC; gene expression data; grouping structure |

Contributors | Li, Longhai; Bickis, Mik; Liu, Juxin; Kusalik, Anthony; Stephens, David |

Language | en |

Country of Publication | ca |

Record ID | handle:10388/ETD-2015-09-2232 |

Other Identifiers | TC-SSU-2015092232 |

Repository | sask |

Date Retrieved | 2018-12-03 |

Date Indexed | 2018-12-06 |

Sample Images | Cited Works

- Abramowitz, M. and Stegun, I. A. (1972), Handbook of Mathematical Functions, Dover publications.
- Alder, B. J. and Wainwright, T. (1959), “Studies in molecular dynamics. I. General method,” The Journal of Chemical Physics, 31, 459–466.
- Alon, U., Barkai, N., Notterman, D. A., Gish, K., Ybarra, S., Mack, D., and Levine, A. J. (1999), “Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays,” Proceedings of the National Academy of Sciences, 96, 6745–6750.
- Andersen, H. C. (1980), “Molecular dynamics simulations at constant pressure and/or temperature,” The Journal of Chemical Physics, 72, 2384–2393.
- Anderson, N. L. and Anderson, N. G. (2005), “Proteome and proteomics: new technologies, new concepts, and new words,” Electrophoresis, 19, 1853–1861.
- Andersson, A., Ritz, C., Lindgren, D., Edén, P., Lassen, C., Heldrup, J., Olofsson, T., Råde, J., Fontes, M., Porwit-Macdonald, A., et al. (2007), “Microarray-based classification of a consecutive series of 121 childhood acute leukemias: prediction of leukemic and genetic subtype as well as of minimal residual disease status,” Leukemia, 21, 1198–1203.
- Andrews, D. and Mallows, C. (1974), “Scale mixtures of normal distributions,” Journal of the Royal Statistical Society. Series B (Methodological), 36, 99–102.
- Armagan, A., Dunson, D. B., and Lee, J. (2013), “Generalized double Pareto shrinkage,” Statistica Sinica, 23, 119.
- Barbieri, M. M. and Berger, J. O. (2004), “Optimal predictive model selection,” Annals of Statistics, 870–897. Ben-Hur, A., Horn, D., Siegelmann, H., and Vapnik, V. (2002), “Support vector clustering,” The Journal of Machine Learning Research, 2, 125–137.
- Bhattacharya, A., Pati, D., Pillai, N. S., and Dunson, D. B. (2012), “Bayesian shrinkage,” arXiv preprint arXiv:1212.6088.
- Bishop, J. F. (1999), “Adult acute myeloid leukaemia: update on treatment.” The Medical Journal of Australia, 170, 39–43.
- Boser, B., Guyon, I., and Vapnik, V. (1992), “A training algorithm for optimal margin classifiers,” in Proceedings of the Fifth Annual Workshop on Computational Learning Theory, ACM, pp. 144–152.
- Bottolo, L. and Richardson, S. (2010), “Evolutionary stochastic search for Bayesian model exploration,” Bayesian Analysis, 5, 583–618.
- Breiman, L. (2001), “Random forests,” Machine Learning, 45, 5–32.
- Brown, P. J., Vannucci, M., and Fearn, T. (1998), “Multivariate Bayesian variable selection and prediction,” Journal of the Royal Statistical Society: Series B (Statistical Methodology), 60, 627641.
- Bureau, A., Dupuis, J., Falls, K., Lunetta, K. L., Hayward, B., Keith, T. P., and Van Eerdewegh, P. (2005), “Identifying SNPs predictive of phenotype using random forests,” Genetic Epidemiology, 28, 171–182.
- Caron, F. and Doucet, A. (2008), “Sparse Bayesian nonparametric regression,” in Proceedings of the 25th international conference on Machine learning, ACM, p. 8895.
- Carvalho, C. M., Polson, N. G., and Scott, J. G. (2009), “Handling sparsity via the horseshoe,” Journal of Machine Learning Research, 5, 73–80. — (2010), “The horseshoe estimator for sparse signals,” Biometrika, 97, 465–465.
- Chang, T.-W. (1983), “Binding of cells to matrixes of distinct antibodies coated on solid surface,” Journal of Immunological Methods, 65, 217–223.
- Cho, R. J., Campbell, M. J., Winzeler, E. A., Steinmetz, L., Conway, A., Wodicka, L.,
- Wolfsberg, T. G., Gabrielian, A. E., Landsman, D., Lockhart, D. J., et al. (1998), “A genome-wide transcriptional analysis of the mitotic cell cycle,” Molecular Cell, 2, 65–73.
- Clarke, R., Ressom, H. W., Wang, A., Xuan, J., Liu, M. C., Gehan, E. A., and Wang, Y. (2008), “The properties of high-dimensional data spaces: implications for exploring gene and protein expression data,” Nature Reviews Cancer, 8, 37–49.
- Cortes, C. and Vapnik, V. (1995), “Support-vector networks,” Machine Learning, 20, 273– 297.
- Dempster, A. P., Laird, N. M., and Rubin, D. B. (1977), “Maximum likelihood from incomplete data via the EM algorithm,” Journal of the Royal Statistical Society. Series B (Methodological), 1–38.
- Denison, D. G. (2002), Bayesian methods for nonlinear classification and regression, vol. 386, John Wiley & Sons.
- Denison, D. G. T., Holmes, C. C., Mallick, B. K., and Smith, A. F. M. (2002), Bayesian methods for nonlinear classification and regression, Wiley Series in Probability and Statistics. Dı́az-Uriarte, R. and De Andres, S. A. (2006), “Gene selection and classification of microarray data using random forest,” BMC Bioinformatics, 7, 3.
- Donoho, D., Johnstone, I., Kerkyacharian, G., and Picard, D. (1995), “Wavelet shrinkage: asymptopia?” Journal of the Royal Statistical Society. Series B (Methodological), 57, 301– 369.
- Duane, S., Kennedy, A. D., Pendleton, B. J., and Roweth, D. (1987), “Hybrid monte carlo,” Physics Letters B, 195, 216–222.
- Dudoit, S., Fridlyand, J., and Speed, T. (2002), “Comparison of discrimination methods for the classification of tumors using gene expression data,” Journal of the American Statistical Association, 97, 77–87.
- Efron, B., Hastie, T., Johnstone, I., Tibshirani, R., et al. (2004), “Least angle regression,” The Annals of Statistics, 32, 407–499.
- Fan, J. and Li, R. (2001), “Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties,” Journal of the American Statistical Association, 96, 1348–1360.
- Fan, L. I. and Zhang, N. R. (2010), “Bayesian variable selection in structured highdimensional covariate spaces with applications in genomics,” JASA Theory and Methods, 105, 1202–1214.
- Firth, D. (1993), “Bias reduction of maximum likelihood estimates,” Biometrika, 80, 27–38.
- Forgy, E. W. (1965), “Cluster analysis of multivariate data: efficiency versus interpretability of classifications,” Biometrics, 21, 768–769.
- Gelman, A. (2006), “Prior distributions for variance parameters in hierarchical models,” Bayesian Analysis, 1, 515–533.
- Gelman, A., Jakulin, A., Pittau, M. G., and Su, Y. (2008), “A weakly informative default prior distribution for logistic and other regression models,” The Annals of Applied Statistics, 2, 1360–1383.
- Gelman, A. and Rubin, D. B. (1992), “Inference from iterative simulation using multiple sequences,” Statistical Science, 457–472.
- Geman, S. and Geman, D. (1984), “Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images,” Pattern Analysis and Machine Intelligence, IEEE Transactions on, 721–741.
- Genkin, A., Lewis, D. D., and Madigan, D. (2007), “Large-scale Bayesian logistic regression for text categorization,” Technometrics, 49, 291–304.
- Gentleman, R., Carey, V., Huber, W., Irizarry, R., and Dudoit, S. (2006), Bioinformatics and computational biology solutions using R and Bioconductor, Springer Science & Business Media.
- George, E. I. and McCulloch, R. E. (1993), “Variable selection via Gibbs sampling,” Journal of the American Statistical Association, 88, 881–889.
- Geweke, J. et al. (1991), Evaluating the accuracy of sampling-based approaches to the calculation of posterior moments, vol. 196, Federal Reserve Bank of Minneapolis, Research Department.
- Gilks, W. R. (2005), Markov chain monte carlo, Wiley Online Library.
- Grade, M., Hörmann, P., Becker, S., Hummon, A. B., Wangsa, D., Varma, S., Simon, R.,
- Liersch, T., Becker, H., Difilippantonio, M. J., et al. (2007), “Gene expression profiling reveals a massive, aneuploidy-dependent transcriptional deregulation and distinct differences between lymph node–negative and lymph node–positive colon carcinomas,” Cancer Research, 67, 41–56.
- Griffin, J. E. and Brown, P. J. (2010), “Inference with Normal-Gamma prior distributions in regression problems,” Bayesian Analysis, 5, 171–188. — (2011), “Bayesian Hyper-Lassos with Non-Convex Penalization,” Australian & New Zealand Journal of Statistics, 53, 423–442.
- Guan, Y. and Stephens, M. (2011), “Bayesian variable selection regression for genome-wide association studies and other large-scale problems,” The Annals of Applied Statistics, 5, 1780–1815.
- Gygil, S. P., Rist, B., Gerber, S. A., Turecek, F., Gelb, M. H., and Aebersold, R. (1999), “Quantitative analysis of complex protein mixtures using isotope-coded affinity tags,” Nature Biotechnology, 17, 994–999.
- Hans, C., Dobra, A., and West, M. (2007), “Shotgun stochastic search for large p regression,” Journal of the American Statistical Association, 102, 507–516.
- Hartigan, J. A. and Wong, M. A. (1979), “Algorithm AS 136: A k-means clustering algorithm,” Applied Statistics, 100–108.
- Ho, T. K. (1995), “Random decision forests,” in Document Analysis and Recognition, 1995., Proceedings of the Third International Conference on, IEEE, vol. 1, pp. 278–282.
- Hoerl, A. and Kennard, R. (1970), “Ridge regression: applications to nonorthogonal problems,” Technometrics, 12, 69–82.
- Hoeting, J., Madigan, D., Raftery, A., and Volinsky, C. (1999), “Bayesian model averaging: a tutorial (with comments by M. Clyde, David Draper and EI George, and a rejoinder by the authors,” Statistical Science, 14, 382–417.
- Holland, P. W. and Welsch, R. E. (1977), “Robust regression using iteratively reweighted least-squares,” Communications in Statistics-Theory and Methods, 6, 813–827.
- Homan, M. D. and Gelman, A. (2014), “The no-U-turn sampler: Adaptively setting path lengths in Hamiltonian Monte Carlo,” The Journal of Machine Learning Research, 15, 1593–1623.
- Ishwaran, H. and Rao, J. S. (2005a), “Spike and slab gene selection for multigroup microarray data,” Journal of the American Statistical Association, 100, 764–780. — (2005b), “Spike and slab gene selection for multigroup microarray data,” Journal of the American Statistical Association, 100, 764–780. — (2005c), “Spike and slab variable selection: frequentist and Bayesian strategies,” The Annals of Statistics, 33, 730–773. Jerónimo, C., Henrique, R., Hoque, M. O., Mambo, E., Ribeiro, F. R., Varzim, G., Oliveira, J., Teixeira, M. R., Lopes, C., and Sidransky, D. (2004), “A quantitative promoter methylation profile of prostate cancer,” Clinical Cancer Research, 10, 8472–8478.
- Johnson, S. C. (1967), “Hierarchical clustering schemes,” Psychometrika, 32, 241–254.
- Kim, Y., Kim, J., and Kim, Y. (2006), “Blockwise sparse regression,” Statistica Sinica, 16, 375.
- Krishnapuram, B., Carin, L., Figueiredo, M. A., and Hartemink, A. J. (2005), “Sparse multinomial logistic regression: Fast algorithms and generalization bounds,” Pattern Analysis and Machine Intelligence, IEEE Transactions on, 27, 957–968.
- Kyung, M., Gill, J., Ghosh, M., and Casella, G. (2010), “Penalized Regression, Standard Errors, and Bayesian Lassos,” Bayesian Analysis, 5, 369–412.
- Lamnisos, D., Griffin, J. E., and Steel, M. F. (2012), “Cross-validation prior choice in Bayesian probit regression with many covariates,” Statistics and Computing, 22, 359–373.
- Lange, K. L., Little, R. J., and Taylor, J. M. (1989), “Robust statistical modeling using the t distribution,” Journal of the American Statistical Association, 84, 881–896.
- Lee, K., Sha, N., Dougherty, E., Vannucci, M., and Mallick, B. (2003), “Gene selection: a Bayesian variable selection approach,” Bioinformatics, 19, 90.
- Li, F. and Zhang, N. R. (2010), “Bayesian variable selection in structured high-dimensional covariate spaces with applications in genomics,” Journal of the American Statistical Association, 105.
- Li, L. and Yao, W. (2014), “Fully Bayesian Logistic Regression with Hyper-Lasso Priors for High-dimensional Feature Selection,” arXiv, 1405.3319 [stat].
- Lindley, D. V. (1957), “A statistical paradox,” Biometrika, 44, 187–187.
- Liu, C. (2004), “Robit regression: A simple robust alternative to logistic and probit regression,” in Applied Bayesian Modeling and Causal Inference from Incomplete-Data Perspectives, John Wiley & Sons, Ltd, pp. 227–238.
- Lunetta, K. L., Hayward, L. B., Segal, J., and Van Eerdewegh, P. (2004), “Screening largescale association study data: exploiting interactions using random forests,” BMC Genetics, 5, 32.
- Lunn, D., Spiegelhalter, D., Thomas, A., and Best, N. (2009), “The BUGS project: Evolution, critique and future directions,” Statistics in Medicine, 28, 3049–3067.
- Ma, S., Song, X., and Huang, J. (2007), “Supervised group Lasso with applications to microarray data analysis,” BMC Bioinformatics, 8, 60.
- MacQueen, J. et al. (1967), “Some methods for classification and analysis of multivariate observations,” in Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, California, USA, vol. 1, pp. 281–297.
- Marcotte, E., Pellegrini, M., Ng, H., Rice, D., Yeates, T., and Eisenberg, D. (1999), “Detecting protein function and protein-protein interactions from genome sequences,” Science, 285, 751.
- Meier, L. (2009), “grpLASSO: Fitting user specified models with Group Lasso penalty (2009),” R Package Version 0.4-2.
- Meier, L., Van De Geer, S., and Bühlmann, P. (2008), “The group LASSO for logistic regression,” Journal of the Royal Statistical Society: Series B (Statistical Methodology), 70, 53–71.
- Metropolis, N., Rosenbluth, A. W., Rosenbluth, M. N., Teller, A. H., and Teller, E. (1953), “Equation of State Calculations by Fast Computing Machines,” The Journal of Chemical Physics, 21, 1087–1092.
- Michalak, P. (2008), “Coexpression, coregulation, and cofunctionality of neighboring genes in eukaryotic genomes,” Genomics, 91, 243–248.
- Mitchell, T. J. and Beauchamp, J. J. (1988), “Bayesian variable selection in linear regression,” Journal of the American Statistical Association, 1023–1032.
- Murtagh, F. (1984), “Complexities of hierarchic clustering algorithms: State of the art,” Computational Statistics Quarterly, 1, 101–113.
- Neal, R. (2011), “MCMC Using Hamiltonian Dynamics,” in Handbook of Markov Chain Monte Carlo, CRC Press, pp. 113–162.
- Neal, R. M. (1995), “Bayesian learning for neural networks,” Ph.D. thesis, University of Toronto.
- Norris, J. R. (1998), Markov chains, Cambridge University Press.
- Organization, W. H. (2014), “World Cancer Report 2014.” World Cancer Report, 1.1.
- Paik, S., Shak, S., Tang, G., Kim, C., Baker, J., Cronin, M., Baehner, F. L., Walker, M. G.,
- Watson, D., Park, T., et al. (2004), “A multigene assay to predict recurrence of tamoxifentreated, node-negative breast cancer,” New England Journal of Medicine, 351, 2817–2826.
- Park, M. Y., Hastie, T., and Tibshirani, R. (2007), “Averaged gene expressions for regression,” Biostatistics, 8, 212–227.
- Pepe, M. S. (2000), “Receiver operating characteristic methodology,” Journal of the American Statistical Association, 95, 308–311.
- Plummer, M. et al. (2003), “JAGS: A program for analysis of Bayesian graphical models using Gibbs sampling,” in Proceedings of the 3rd International Workshop on Distributed Statistical Computing, Vienna, vol. 124, p. 125.
- Polson, N. G. and Scott, J. G. (2010), “Shrink globally, act locally: Sparse Bayesian regularization and prediction,” Bayesian Statistics, 9, 501538. — (2012a), “Good, great, or lucky? Screening for firms with sustained superior performance using heavy-tailed priors,” The Annals of Applied Statistics, 6, 161–185, zentralblatt MATH identifier1235.91144, Mathematical Reviews number (MathSciNet) MR2951533. — (2012b), “Local shrinkage rules, Levy processes and regularized regression,” Journal of the Royal Statistical Society: Series B (Statistical Methodology), 74, 287311. — (2012c), “On the half-Cauchy prior for a global scale parameter,” Bayesian Analysis, 7, 887902.
- Raftery, A. E. and Lewis, S. M. (1992), “[Practical Markov Chain Monte Carlo]: comment: one long run with diagnostics: implementation strategies for Markov Chain Monte Carlo,” Statistical Science, 493–497.
- Robert, C. P. (2013), “On the Jeffreys-Lindley’s paradox,” ArXiv, 1303.5973v, 1–13.
- Rockova, V. and George, E. I. (2013), “EMVS: The EM Approach to Bayesian Variable Selection,” Journal of the American Statistical Association, forthcoming. Ročková, V. and George, E. I. (2014), “Emvs: The em approach to bayesian variable selection,” Journal of the American Statistical Association, 109, 828–846.
- Rokach, L. and Maimon, O. (2005), “Clustering methods,” in Data Mining and Knowledge Discovery Handbook, Springer, pp. 321–352.
- Rossky, P., Doll, J., and Friedman, H. (1978), “Brownian dynamics as smart Monte Carlo simulation,” The Journal of Chemical Physics, 69, 4628–4633.
- Sha, N., Vannucci, M., Tadesse, M. G., Brown, P. J., Dragoni, I., Davies, N., Roberts, T. C.,
- Contestabile, A., Salmon, M., and Buckley, C. (2004), “Bayesian variable selection in multinomial probit models to identify molecular signatures of disease stage,” Biometrics, 60, 812–819.
- Shafer, G. (1982), “Lindley’s paradox,” Journal of the American Statistical Association, 77, 325–334.
- Shevade, S. K. and Keerthi, S. S. (2003), “A simple and efficient algorithm for gene selection using sparse logistic regression,” Bioinformatics, 19, 2246–2253.
- Spellman, P. T., Sherlock, G., Zhang, M. Q., Iyer, V. R., Anders, K., Eisen, M. B., Brown, P. O., Botstein, D., and Futcher, B. (1998), “Comprehensive identification of cell cycle– regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization,” Molecular Biology of the Cell, 9, 3273–3297.
- Steinberg, D. and Colla, P. (2009), “CART: classification and regression trees,” The Top Ten Algorithms in Data Mining, 9, 179.
- Struyf, A., Hubert, M., and Rousseeuw, P. (1997), “Clustering in an object-oriented environment,” Journal of Statistical Software, 1, 1–30.
- Tibshirani, R. (1996), “Regression shrinkage and selection via the LASSO,” Journal of the Royal Statistical Society. Series B (Methodological), 58, 267–288.
- Tibshirani, R., Chu, G., Narasimhan, B., and Li, J. (2011), “SAMR: Significance Analysis of Microarrays,” R Package Version, 2.
- Tibshirani, R., Hastie, T., Narasimhan, B., and Chu, G. (2002), “Diagnosis of multiple cancer types by shrunken centroids of gene expression,” Proceedings of the National Academy of Sciences of the United states of America, 99, 6567.
- Tibshirani, R., Saunders, M., Rosset, S., Zhu, J., and Knight, K. (2005), “Sparsity and smoothness via the fused LASSO,” Journal of the Royal Statistical Society: Series B (Statistical Methodology), 67, 91–108.
- Tibshirani, R., Walther, G., and Hastie, T. (2001), “Estimating the number of clusters in a data set via the gap statistic,” Journal of the Royal Statistical Society: Series B (Statistical Methodology), 63, 411–423.
- Tibshirani, R. et al. (1997), “The LASSO method for variable selection in the Cox model,” Statistics in Medicine, 16, 385–395.
- Tolosi, L. and Lengauer, T. (2011), “Classification with correlated features: unreliability of feature ranking and solution,” Bioinformatics, 27, 1986–1994.
- Troyanskaya, O., Cantor, M., Sherlock, G., Brown, P., Hastie, T., Tibshirani, R., Botstein, D., and Altman, R. B. (2001), “Missing value estimation methods for DNA microarrays,” Bioinformatics, 17, 520–525.
- Tseng, P. and Yun, S. (2009), “A coordinate gradient descent method for nonsmooth separable minimization,” Mathematical Programming, 117, 387–423.
- Tusher, V. G., Tibshirani, R., and Chu, G. (2001), “Significance analysis of microarrays applied to the ionizing radiation response,” Proceedings of the National Academy of Sciences, 98, 5116–5121. van der Pas, S. L., Kleijn, B. J. K., and van der Vaart, A. W. (2014), “The Horseshoe Estimator: Posterior Concentration around Nearly Black Vectors,” arXiv:1404.0202 [math, stat].
- Vapnik, V. (1995), “The nature of statistical learning,” .
- Wahba, G. (1990), Spline models for observational data, Society for Industrial Mathematics.
- Wang, X., Istepanian, R., and Song, Y. (2003), “Microarray image enhancement by denoising using stationary wavelet transform,” NanoBioscience, 2, 184–189.
- Wang, Z., Liu, H., and Zhang, T. (2014), “Optimal computational and statistical rates of convergence for sparse nonconvex learning problems,” Annals of Statistics, 42, 2164.
- Weston, J., Mukherjee, S., Chapelle, O., Pontil, M., Poggio, T., and Vapnik, V. (2001), “Feature selection for SVMs,” Advances in Neural Information Processing Systems, 668– 674.
- Wolpert, R. L. et al. (2004), “A Conversation with James O. Berger,” Statistical Science, 19, 205–218.
- Yi, G., Sze, S.-H., and Thon, M. R. (2007), “Identifying clusters of functionally related genes in genomes,” Bioinformatics, 23, 1053–1060.
- Yuan, M. and Lin, Y. (2006), “Model selection and estimation in regression with grouped variables,” Journal of the Royal Statistical Society: Series B (Statistical Methodology), 68, 49–67.
- Zhao, P., Rocha, G., and Yu, B. (2009), “The composite absolute penalties family for grouped and hierarchical variable selection,” The Annals of Statistics, 3468–3497.
- Zorn, C. (2005), “A solution to separation in binary response models,” Political Analysis, 13, 157–170.
- Zou, H. (2006), “The Adaptive LASSO and Its Oracle Properties,” Journal of the American Statistical Association, 101, 1418–1429.