Using Copulas to Select Prognostic Genes in Melanoma Patients

Authors

  • Linda Chaba Strathmore Institute of Mathematical Sciences, Strathmore University, Ole Sangale Road, Nairobi, Kenya
  • John Odhiambo Strathmore Institute of Mathematical Sciences, Strathmore University, Ole Sangale Road, Nairobi, Kenya
  • Bernard Omolo Division of Mathematics and Computer Science, University of South Carolina-Upstate, 800 University Way, Spartanburg, South Carolina, USA

DOI:

https://doi.org/10.6000/1929-6029.2017.06.03.3

Keywords:

Copula, False discovery rate, Melanoma, Microarray, Power

Abstract

Melanoma of the skin is the fifth and seventh most commonly diagnosed carcinoma in men and women, respectively, in the USA. So far, gene signatures prognostic for overall and distant metastasis-free survival, for example, have been promising in the identification of therapeutic targets for primary and metastatic melanoma. But most of these gene signatures have been selected using statistics that depend entirely on the parametric distributions of the data (e.g. t-statistics). In this study, we assessed the impact of relaxing the parametric assumptions on the power of the models used for gene selection. We developed a semi-parametric model for feature selection that does not depend on the distributions of the covariates. This copula-based model only assumed that the marginal distributions of the covariates are continuous. Simulations indicated that the copula-based model had reasonable power at various levels of the false discovery rate (FDR). These results were validated in a publicly-available melanoma dataset. Relaxing parametric assumptions on microarray data may yield procedures that have good power for differential gene expression analysis.

References

Siegel RL, Miller KD, Jemal A. Cancer Statistics, 2017. CA Cancer J Clin 2017; 67: 7-30. https://doi.org/10.3322/caac.21387 DOI: https://doi.org/10.3322/caac.21387

Winnepenninckx V, Lazar V, Michiels S, Dessen P, Stas M, Alonso SR, et al. Gene Expression Profiling of Primary Cutaneous Melanoma and Clinical Outcome. J Natl Cancer Inst 2006; 98: 472-482. https://doi.org/10.1093/jnci/djj103 DOI: https://doi.org/10.1093/jnci/djj103

Mandruzzato S, Callegaro A, Turcatel G, Francescato S, Montesco MC, Chiarion-Sileni V, et al. A gene expression signature associated with survival in metastatic melanoma. J Transl Med 2006; 4: 50. https://doi.org/10.1186/1479-5876-4-50 DOI: https://doi.org/10.1186/1479-5876-4-50

John T, Black MA, Toro TT, Leader D, Gedye CA, Davis ID, et al. Predicting Clinical Outcome through Molecular Profiling in Stage III Melanoma. Clin Cancer Res 2008; 14: 5173-5180. https://doi.org/10.1158/1078-0432.CCR-07-4170 DOI: https://doi.org/10.1158/1078-0432.CCR-07-4170

Bogunovic D, O'Neill DW, Belitskaya-Levy I, Vacic V, Yu YL, Adams S, et al. Immune profile and mitotic index of metastatic melanoma lesions enhance clinical staging in predicting patient survival. Proc Natl Acad Sci USA 2009; 106: 20429-20434. https://doi.org/10.1073/pnas.0905139106 DOI: https://doi.org/10.1073/pnas.0905139106

Jonsson G, Busch C, Knappskog S, Geisler J, Miletic H, Ringnr M, et al. Gene Expression Profiling Based Identification of Molecular Subtypes in Stage IV Melanomas with Different Clinical Outcome. Clin Cancer Res 2010; 16: 3356-3367. https://doi.org/10.1158/1078-0432.CCR-09-2509 DOI: https://doi.org/10.1158/1078-0432.CCR-09-2509

Carson C, Omolo B, Chu H, Zhou Y, Sambade MJ, Peters EC, et al. A prognostic signature of defective p53-dependent G1 checkpoint function in melanoma cell lines: A signature of defective p53 function in melanoma. Pigment Cell Melanoma Res 2012; 25: 514-526. https://doi.org/10.1111/j.1755-148X.2012.01010.x DOI: https://doi.org/10.1111/j.1755-148X.2012.01010.x

Omolo B, Carson C, Chu H, Zhou Y, Simpson DA, Hesse JE, et al. A prognostic signature of G2 checkpoint function in melanoma cell lines. Cell Cycle 2013; 12: 1071-1082. https://doi.org/10.4161/cc.24067 DOI: https://doi.org/10.4161/cc.24067

Kaufmann WK, Carson CC, Omolo B, Filgo AJ, Sambade MJ, Simpson DA, et al. Mechanisms of chromosomal instability in melanoma: Chromosomal Instability in Melanoma. Environ Mol Mutagen 2014; 55: 457-471. https://doi.org/10.1002/em.21859 DOI: https://doi.org/10.1002/em.21859

Tusher VG, Tibshirani R, Chu G. Significance analysis of microarrays applied to the ionizing radiation response. Proc Natl Acad Sci USA 2001; 98: 5116-5121. https://doi.org/10.1073/pnas.091062498 DOI: https://doi.org/10.1073/pnas.091062498

Troyanskaya OG, Garber ME, Brown PO, Botstein D, Altman RB. Nonparametric methods for identifying differentially expressed genes in microarray data. Bioinformatics 2002; 18: 1454-1461. https://doi.org/10.1093/bioinformatics/18.11.1454 DOI: https://doi.org/10.1093/bioinformatics/18.11.1454

Chaba L, Odhiambo J, Omolo B. Evaluation of Methods for Gene Selection in Melanoma Cell Lines. Int J Stats Med Res 2017; 6: 1-9. https://doi.org/10.6000/1929-6029.2017.06.01.1 DOI: https://doi.org/10.6000/1929-6029.2017.06.01.1

Bandyopadhyay S, Mallik S, Mukhopadhyay A. A Survey and Comparative Study of Statistical Tests for Identifying Differential Expression from Microarray Data. IEEE/ACM Trans Comput Biol Bioinformatics 2014; 11: 95-115. https://doi.org/10.1109/TCBB.2013.147 DOI: https://doi.org/10.1109/TCBB.2013.147

Bair E. Identification of significant features in DNA microarray data: Feature selection in DNA microarray data. Wiley Interdiscip Rev Comput Stat 2013; 5: 309-325. https://doi.org/10.1002/wics.1260 DOI: https://doi.org/10.1002/wics.1260

Genest C, Ghoudi K, Rvest LP. A semiparametric estimation procedure of dependence parameters in multivariate families of distributions. Biometrika 1995; 82(3): 543-552. https://doi.org/10.1093/biomet/82.3.543 DOI: https://doi.org/10.1093/biomet/82.3.543

Owzar K, Jung SH, Sen PK. A Copula Approach for Detecting Prognostic Genes Associated With Survival Outcome in Microarray Studies. Biometrics 2007; 63: 1089-1098. https://doi.org/10.1111/j.1541-0420.2007.00802.x DOI: https://doi.org/10.1111/j.1541-0420.2007.00802.x

Benjamini Y, Hochberg Y. Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing. J R Stat Soc Series B Stat Methodol 1995; 57: 289-300. Available from: http://www.jstor.org/stable/2346101. DOI: https://doi.org/10.1111/j.2517-6161.1995.tb02031.x

Sklar. Fonctions de r'epartition 'a n dimensions et leures marges. Publications de l'Institut de Statistique de L'Universit'e de Paris 1959; 8: 229-231.

Joe H. Asymptotic efficiency of the two-stage estimation method for copula-based models. J Multivar Anal 2005; 94: 401-419. https://doi.org/10.1016/j.jmva.2004.06.003 DOI: https://doi.org/10.1016/j.jmva.2004.06.003

Westfall PH, Young SS. Resampling-based multiple testing: Examples and methods for p-value adjustment. John Wiley & Sons 1993; vol. 279.

Storey JD, Tibshirani R. Statistical significance for genomewide studies. Proc Natl Acad Sci USA 2003; 100: 9440-9445. https://doi.org/10.1073/pnas.1530509100 DOI: https://doi.org/10.1073/pnas.1530509100

Golub GH, Van Loan CF. Matrix Computations. Johns Hopkins Studies in the Mathematical Sciences. Johns Hopkins University Press; 1996. Available from: https://books.google.co.ke/books?id=mlOa7wPX6OYC.

Simon R, Lam A, Li MC, Ngan M, Menenzes S, Zhao Y. Analysis of gene expression data using BRB-Array Tools. Cancer Inform 2007; 3: 11. DOI: https://doi.org/10.1177/117693510700300022

Bair E, Tibshirani R. Semi-Supervised Methods to Predict Patient Survival from Gene Expression Data. PLoS Biol 2004; 2. https://doi.org/10.1371/journal.pbio.0020108 DOI: https://doi.org/10.1371/journal.pbio.0020108

Huang DW, Sherman BT, Lempicki RA. Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources. Nature Protocols 2009; 4: 44-57. https://doi.org/10.1038/nprot.2008.211 DOI: https://doi.org/10.1038/nprot.2008.211

Tibshirani R. Regression Shrinkage and Selection via the Lasso. J R Stat Soc Series B Stat Methodol 1996; 58: 267-288. DOI: https://doi.org/10.1111/j.2517-6161.1996.tb02080.x

Tarpey PS, Smith R, Pleasance E, Whibley A, Edkins S, Hardy C, et al. A systematic, large-scale resequencing screen of X-chromosome coding exons in mental retardation. Nat Genet 2009; 41: 535-543. https://doi.org/10.1038/ng.367 DOI: https://doi.org/10.1038/ng.367

Genest C, Quessy JF, Remillard B. Goodness-of-fit Procedures for Copula Models Based on the Probability Integral Transformation. Scand J Statist 2006; 33: 337-366. https://doi.org/10.1111/j.1467-9469.2006.00470.x DOI: https://doi.org/10.1111/j.1467-9469.2006.00470.x

Berg D. Copula goodness-of-fit testing: an overview and power comparison. Euro J Financ 2009; 15: 675-701. https://doi.org/10.1080/13518470802697428 DOI: https://doi.org/10.1080/13518470802697428

Published

2017-08-03

How to Cite

Chaba, L., Odhiambo, J., & Omolo, B. (2017). Using Copulas to Select Prognostic Genes in Melanoma Patients. International Journal of Statistics in Medical Research, 6(3), 114–122. https://doi.org/10.6000/1929-6029.2017.06.03.3

Issue

Section

General Articles

Most read articles by the same author(s)