A Comparison of Parametric and Semi-Parametric Models for Microarray Data Analysis

Authors

  • Linda Chaba Strathmore Institute of Mathematical Sciences, Strathmore University, Ole Sangale Road, Nairobi, Kenya
  • John Odhiambo Strathmore Institute of Mathematical Sciences, Strathmore University, Ole Sangale Road, Nairobi, Kenya
  • Bernard Omolo Division of Mathematics and Computer Science, University of South Carolina-Upstate 800 University Way, Spartanburg, South Carolina, USA

DOI:

https://doi.org/10.6000/1929-6029.2017.06.04.1

Keywords:

Copula, Goodness-of-fit, Melanoma, Microarray, Power, Type I error

Abstract

Microarray technology has revolutionized genomic studies by enabling the study of differential expression of thousands of genes simultaneously. Parametric, nonparametric and semi-parametric statistical methods have been proposed for gene selection within the last sixteen years. In an effort to find the “gold standard", the performance of some common parametric and nonparametric methods have been compared in terms of power to select differentially expressed genes and other desirable properties. However, no such comparisons have been conducted between parametric and semi-parametric models. In this study, we compared a semi-parametric model based on copulas with a parametric model (the quantitative trait analysis or QTA model) in terms of power and the ability to control the Type I error rate. In addition, we proposed a simple algorithm for choosing an optimal copula. The two approaches were applied to a publicly available melanoma cell lines dataset for validation. Both methods performed well in terms of power but the copula approach was notably the better. In terms of the Type I error rate control, the two methods were comparable. More methods for selecting an optimal copula for gene expression data need to be developed, as the proposed procedure is limited to copulas that permit both negative and positive dependence only.

References

Baldi P, Long AD. A Bayesian framework for the analysis of microarray expression data: regularized t-test and statistical inferences of gene changes. Bioinformatics 2001; 17: 509-519. https://doi.org/10.1093/bioinformatics/17.6.509 DOI: https://doi.org/10.1093/bioinformatics/17.6.509

Newton MA, Kendziorski CM, Richmond CS, Blattner FR. On Differential Variability of Expression Ratios: Improving Statistical Inference about Gene Expression Changes from Microarray Data. J Comput Biol 2001; 8: 37-52. https://doi.org/10.1089/106652701300099074 DOI: https://doi.org/10.1089/106652701300099074

Ibrahim JG, Chen MH, Gray RJ. Bayesian Models for Gene Expression With DNA Microarray Data. J Am Stat Assoc 2002; 97: 88-99. https://doi.org/10.1198/016214502753479257 DOI: https://doi.org/10.1198/016214502753479257

Lee KE, Sha N, Dougherty ER, Vannucci M, Mallick BK. Gene selection: a Bayesian variable selection approach. Bioinformatics 2003; 19(1): 90-97. https://doi.org/10.1093/bioinformatics/19.1.90 DOI: https://doi.org/10.1093/bioinformatics/19.1.90

Kendziorski CM, Newton MA, Lan H, Gould MN. On parametric empirical Bayes methods for comparing multiple groups using replicated gene expression profiles. Stat Med 2003; 22: 3899-3914. https://doi.org/10.1002/sim.1548 DOI: https://doi.org/10.1002/sim.1548

Smyth GK. Linear models and empirical Bayes methods for assessing differential expression in microarray experiments. Stat Appl Genet Mol Biol 2004; 3: Article 3. https://doi.org/10.2202/1544-6115.1027 DOI: https://doi.org/10.2202/1544-6115.1027

Scharpf RB, Tjelmeland H, Parmigiani G, Nobel AB. A Bayesian Model for Cross-Study Differential Gene Expression. J Am Stat Assoc 2009; 104: 1295-1310. https://doi.org/10.1198/jasa.2009.ap07611 DOI: https://doi.org/10.1198/jasa.2009.ap07611

Dhanasekaran SM, Barrette TR, Ghosh D, Shah R, Varambally S, Kurachi K, et al. Delineation of prognostic biomarkers in prostate cancer. Nature 2001; 412(6849): 822-826. https://doi.org/10.1038/35090585 DOI: https://doi.org/10.1038/35090585

Wigle DA, Jurisica I, Radulovich N, Pintilie M, Rossant J, Liu N, et al. Molecular profiling of non-small cell lung cancer and correlation with disease-free survival. Cancer Res 2002; 62: 3005-3008.

Newton MA, Noueiry A, Sarkar D, Ahlquist P. Detecting differential gene expression with a semiparametric hierarchical mixture method. Biostatistics 2004; 5: 155-176. https://doi.org/10.1093/biostatistics/5.2.155 DOI: https://doi.org/10.1093/biostatistics/5.2.155

Owzar K, Jung SH, Sen PK. A Copula Approach for Detec-ting Prognostic Genes Associated With Survival Outcome in Microarray Studies. Biometrics 2007; 63: 1089-1098. https://doi.org/10.1111/j.1541-0420.2007.00802.x DOI: https://doi.org/10.1111/j.1541-0420.2007.00802.x

Tusher VG, Tibshirani R, Chu G. Significance analysis of microarrays applied to the ionizing radiation response. Proc Natl Acad Sci USA 2001; 98: 5116-5121. https://doi.org/10.1073/pnas.091062498 DOI: https://doi.org/10.1073/pnas.091062498

Efron B, Tibshirani R, Storey JD, Tusher V. Empirical Bayes analysis of a microarray experiment. J Am Stat Assoc 2001; 96: 1151-1160. https://doi.org/10.1198/016214501753382129 DOI: https://doi.org/10.1198/016214501753382129

Le CT, Pan W, Lin J. A mixture model approach to detecting differentially expressed genes with microarray data. Funct Integr Genomics 2003; 3: 117-124. https://doi.org/10.1007/s10142-003-0085-7 DOI: https://doi.org/10.1007/s10142-003-0085-7

Pan W. On the use of permutation in and the performance of a class of nonparametric methods to detect differential gene expression. Bioinformatics 2003; 19: 1333-1340. https://doi.org/10.1093/bioinformatics/btg167 DOI: https://doi.org/10.1093/bioinformatics/btg167

Korn EL, Troendle JF, McShane LM, Simon R. Controlling the number of false discoveries: application to high-dimensional genomic data. J Stat Plan Inference 2004; 124: 379-398. https://doi.org/10.1016/S0378-3758(03)00211-8 DOI: https://doi.org/10.1016/S0378-3758(03)00211-8

Simon R, Lam A, Li MC, Ngan M, Menenzes S, Zhao Y. Analysis of gene expression data using BRB-Array Tools. Cancer Inform 2007; 3: 11. DOI: https://doi.org/10.1177/117693510700300022

Benjamini Y, Hochberg Y. Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing. J R Stat Soc Series B Stat Methodol 1995; 57: 289-300. DOI: https://doi.org/10.1111/j.2517-6161.1995.tb02031.x

Sklar. Fonctions de r'epartition 'a n dimensions et leures marges. Publications de l'Institut de Statistique de L'Universit'e de Paris 1959; 8: 229-231.

Genest C, Ghoudi K, Rivest LP. A semiparametric estimation procedure of dependence parameters in multivariate families of distributions. Biometrika 1995; 82: 543-552. https://doi.org/10.1093/biomet/82.3.543 DOI: https://doi.org/10.1093/biomet/82.3.543

Joe H. Asymptotic efficiency of the two-stage estimation method for copula-based models. J Multivar Anal 2005; 94: 401-419. https://doi.org/10.1016/j.jmva.2004.06.003 DOI: https://doi.org/10.1016/j.jmva.2004.06.003

Westfall PH, Young SS. Resampling-based multiple testing: Examples and methods for p-value adjustment. John Wiley & Sons 1993; vol. 279.

Storey JD, Tibshirani R. Statistical significance for genomewide studies. Proc Natl Acad Sci USA 2003; 100: 9440-9445. https://doi.org/10.1073/pnas.1530509100 DOI: https://doi.org/10.1073/pnas.1530509100

Kim JM, Jung YS, Sungur EA, Han KH, Park C, Sohn I. A copula method for modeling directional dependence of genes. BMC Bioinformatics 2008; 9: 225. https://doi.org/10.1186/1471-2105-9-225 DOI: https://doi.org/10.1186/1471-2105-9-225

Yuan A, Chen G, Zhou ZC, Bonney G, Rotimi C. Gene Copy Number Analysis for Family Data Using Semiparametric Copula Model. Bioinform Biol Insights 2008; 2: 343-355. DOI: https://doi.org/10.4137/BBI.S839

Fermanian JD. Goodness-of-fit tests for copulas. J Multivar Anal 2005; 95: 119-152. https://doi.org/10.1016/j.jmva.2004.07.004 DOI: https://doi.org/10.1016/j.jmva.2004.07.004

Wang A. Goodness-of-fit tests for Archimedean copula models. Stat Sin 2010; 20: 441.

Genest C, Quessy JF, Remillard B. Goodness-of-fit Procedures for Copula Models Based on the Probability Integral Transformation. Scand Stat Theory Appl 2006; 33: 337-366. https://doi.org/10.1111/j.1467-9469.2006.00470.x DOI: https://doi.org/10.1111/j.1467-9469.2006.00470.x

Dobri J, Schmid F. A goodness of fit test for copulas based on Rosenblatt's transformation. Comput Stat Data Anal 2007; 51: 4633-4642. https://doi.org/10.1016/j.csda.2006.08.012 DOI: https://doi.org/10.1016/j.csda.2006.08.012

Berg D. Copula goodness-of-fit testing: an overview and power comparison. Euro J Financ 2009; 15: 675-701. https://doi.org/10.1080/13518470802697428 DOI: https://doi.org/10.1080/13518470802697428

Genest C, Remillard B, Beaudoin D. Goodness-of-fit tests for copulas: A review and a power study. Insur Math Econ 2009; 44: 199-213. https://doi.org/10.1016/j.insmatheco.2007.10.005 DOI: https://doi.org/10.1016/j.insmatheco.2007.10.005

Akaike H. A new look at the statistical model identification. IEEE Trans Automat Contr 1974; 19: 716-723. https://doi.org/10.1109/TAC.1974.1100705 DOI: https://doi.org/10.1109/TAC.1974.1100705

Schwarz G. Estimating the dimension of a model. Ann Stat 1978; 6: 461-464. https://doi.org/10.1214/aos/1176344136 DOI: https://doi.org/10.1214/aos/1176344136

Kim JM, Jung YS, Soderberg T. Directional Dependence of Genes Using Survival Truncated FGM Type Modification Copulas. Communications in Statistics - Simulation and Computation 2009; 38: 1470-1484. https://doi.org/10.1080/03610910903009336 DOI: https://doi.org/10.1080/03610910903009336

Golub GH, Van Loan CF. Matrix Computations. Johns Hopkins Studies in the Mathematical Sciences. Johns Hopkins University Press; 1996.

Kaufmann WK, Nevis KR, Qu P, Ibrahim JG, Zhou T, Zhou Y, et al. Defective cell cycle checkpoint functions in melanoma are associated with altered patterns of gene expression. J Invest Dermatol 2008; 128: 175-187. https://doi.org/10.1038/sj.jid.5700935 DOI: https://doi.org/10.1038/sj.jid.5700935

Kaufmann WK, Carson CC, Omolo B, Filgo AJ, Sambade MJ, Simpson DA, et al. Mechanisms of chromosomal instability in melanoma: Chromosomal Instability in Melanoma. Environ Mol Mutagen 2014; 55: 457-471. https://doi.org/10.1002/em.21859 DOI: https://doi.org/10.1002/em.21859

Downloads

Published

2017-12-08

How to Cite

Chaba, L., Odhiambo, J., & Omolo, B. (2017). A Comparison of Parametric and Semi-Parametric Models for Microarray Data Analysis. International Journal of Statistics in Medical Research, 6(4), 134–143. https://doi.org/10.6000/1929-6029.2017.06.04.1

Issue

Section

General Articles

Most read articles by the same author(s)