Using Copulas to Select Prognostic Genes in Melanoma Patients
DOI:
https://doi.org/10.6000/1929-6029.2017.06.03.3Keywords:
Copula, False discovery rate, Melanoma, Microarray, PowerAbstract
Melanoma of the skin is the fifth and seventh most commonly diagnosed carcinoma in men and women, respectively, in the USA. So far, gene signatures prognostic for overall and distant metastasis-free survival, for example, have been promising in the identification of therapeutic targets for primary and metastatic melanoma. But most of these gene signatures have been selected using statistics that depend entirely on the parametric distributions of the data (e.g. t-statistics). In this study, we assessed the impact of relaxing the parametric assumptions on the power of the models used for gene selection. We developed a semi-parametric model for feature selection that does not depend on the distributions of the covariates. This copula-based model only assumed that the marginal distributions of the covariates are continuous. Simulations indicated that the copula-based model had reasonable power at various levels of the false discovery rate (FDR). These results were validated in a publicly-available melanoma dataset. Relaxing parametric assumptions on microarray data may yield procedures that have good power for differential gene expression analysis.
References
Siegel RL, Miller KD, Jemal A. Cancer Statistics, 2017. CA Cancer J Clin 2017; 67: 7-30. https://doi.org/10.3322/caac.21387 DOI: https://doi.org/10.3322/caac.21387
Winnepenninckx V, Lazar V, Michiels S, Dessen P, Stas M, Alonso SR, et al. Gene Expression Profiling of Primary Cutaneous Melanoma and Clinical Outcome. J Natl Cancer Inst 2006; 98: 472-482. https://doi.org/10.1093/jnci/djj103 DOI: https://doi.org/10.1093/jnci/djj103
Mandruzzato S, Callegaro A, Turcatel G, Francescato S, Montesco MC, Chiarion-Sileni V, et al. A gene expression signature associated with survival in metastatic melanoma. J Transl Med 2006; 4: 50. https://doi.org/10.1186/1479-5876-4-50 DOI: https://doi.org/10.1186/1479-5876-4-50
John T, Black MA, Toro TT, Leader D, Gedye CA, Davis ID, et al. Predicting Clinical Outcome through Molecular Profiling in Stage III Melanoma. Clin Cancer Res 2008; 14: 5173-5180. https://doi.org/10.1158/1078-0432.CCR-07-4170 DOI: https://doi.org/10.1158/1078-0432.CCR-07-4170
Bogunovic D, O'Neill DW, Belitskaya-Levy I, Vacic V, Yu YL, Adams S, et al. Immune profile and mitotic index of metastatic melanoma lesions enhance clinical staging in predicting patient survival. Proc Natl Acad Sci USA 2009; 106: 20429-20434. https://doi.org/10.1073/pnas.0905139106 DOI: https://doi.org/10.1073/pnas.0905139106
Jonsson G, Busch C, Knappskog S, Geisler J, Miletic H, Ringnr M, et al. Gene Expression Profiling Based Identification of Molecular Subtypes in Stage IV Melanomas with Different Clinical Outcome. Clin Cancer Res 2010; 16: 3356-3367. https://doi.org/10.1158/1078-0432.CCR-09-2509 DOI: https://doi.org/10.1158/1078-0432.CCR-09-2509
Carson C, Omolo B, Chu H, Zhou Y, Sambade MJ, Peters EC, et al. A prognostic signature of defective p53-dependent G1 checkpoint function in melanoma cell lines: A signature of defective p53 function in melanoma. Pigment Cell Melanoma Res 2012; 25: 514-526. https://doi.org/10.1111/j.1755-148X.2012.01010.x DOI: https://doi.org/10.1111/j.1755-148X.2012.01010.x
Omolo B, Carson C, Chu H, Zhou Y, Simpson DA, Hesse JE, et al. A prognostic signature of G2 checkpoint function in melanoma cell lines. Cell Cycle 2013; 12: 1071-1082. https://doi.org/10.4161/cc.24067 DOI: https://doi.org/10.4161/cc.24067
Kaufmann WK, Carson CC, Omolo B, Filgo AJ, Sambade MJ, Simpson DA, et al. Mechanisms of chromosomal instability in melanoma: Chromosomal Instability in Melanoma. Environ Mol Mutagen 2014; 55: 457-471. https://doi.org/10.1002/em.21859 DOI: https://doi.org/10.1002/em.21859
Tusher VG, Tibshirani R, Chu G. Significance analysis of microarrays applied to the ionizing radiation response. Proc Natl Acad Sci USA 2001; 98: 5116-5121. https://doi.org/10.1073/pnas.091062498 DOI: https://doi.org/10.1073/pnas.091062498
Troyanskaya OG, Garber ME, Brown PO, Botstein D, Altman RB. Nonparametric methods for identifying differentially expressed genes in microarray data. Bioinformatics 2002; 18: 1454-1461. https://doi.org/10.1093/bioinformatics/18.11.1454 DOI: https://doi.org/10.1093/bioinformatics/18.11.1454
Chaba L, Odhiambo J, Omolo B. Evaluation of Methods for Gene Selection in Melanoma Cell Lines. Int J Stats Med Res 2017; 6: 1-9. https://doi.org/10.6000/1929-6029.2017.06.01.1 DOI: https://doi.org/10.6000/1929-6029.2017.06.01.1
Bandyopadhyay S, Mallik S, Mukhopadhyay A. A Survey and Comparative Study of Statistical Tests for Identifying Differential Expression from Microarray Data. IEEE/ACM Trans Comput Biol Bioinformatics 2014; 11: 95-115. https://doi.org/10.1109/TCBB.2013.147 DOI: https://doi.org/10.1109/TCBB.2013.147
Bair E. Identification of significant features in DNA microarray data: Feature selection in DNA microarray data. Wiley Interdiscip Rev Comput Stat 2013; 5: 309-325. https://doi.org/10.1002/wics.1260 DOI: https://doi.org/10.1002/wics.1260
Genest C, Ghoudi K, Rvest LP. A semiparametric estimation procedure of dependence parameters in multivariate families of distributions. Biometrika 1995; 82(3): 543-552. https://doi.org/10.1093/biomet/82.3.543 DOI: https://doi.org/10.1093/biomet/82.3.543
Owzar K, Jung SH, Sen PK. A Copula Approach for Detecting Prognostic Genes Associated With Survival Outcome in Microarray Studies. Biometrics 2007; 63: 1089-1098. https://doi.org/10.1111/j.1541-0420.2007.00802.x DOI: https://doi.org/10.1111/j.1541-0420.2007.00802.x
Benjamini Y, Hochberg Y. Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing. J R Stat Soc Series B Stat Methodol 1995; 57: 289-300. Available from: http://www.jstor.org/stable/2346101. DOI: https://doi.org/10.1111/j.2517-6161.1995.tb02031.x
Sklar. Fonctions de r'epartition 'a n dimensions et leures marges. Publications de l'Institut de Statistique de L'Universit'e de Paris 1959; 8: 229-231.
Joe H. Asymptotic efficiency of the two-stage estimation method for copula-based models. J Multivar Anal 2005; 94: 401-419. https://doi.org/10.1016/j.jmva.2004.06.003 DOI: https://doi.org/10.1016/j.jmva.2004.06.003
Westfall PH, Young SS. Resampling-based multiple testing: Examples and methods for p-value adjustment. John Wiley & Sons 1993; vol. 279.
Storey JD, Tibshirani R. Statistical significance for genomewide studies. Proc Natl Acad Sci USA 2003; 100: 9440-9445. https://doi.org/10.1073/pnas.1530509100 DOI: https://doi.org/10.1073/pnas.1530509100
Golub GH, Van Loan CF. Matrix Computations. Johns Hopkins Studies in the Mathematical Sciences. Johns Hopkins University Press; 1996. Available from: https://books.google.co.ke/books?id=mlOa7wPX6OYC.
Simon R, Lam A, Li MC, Ngan M, Menenzes S, Zhao Y. Analysis of gene expression data using BRB-Array Tools. Cancer Inform 2007; 3: 11. DOI: https://doi.org/10.1177/117693510700300022
Bair E, Tibshirani R. Semi-Supervised Methods to Predict Patient Survival from Gene Expression Data. PLoS Biol 2004; 2. https://doi.org/10.1371/journal.pbio.0020108 DOI: https://doi.org/10.1371/journal.pbio.0020108
Huang DW, Sherman BT, Lempicki RA. Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources. Nature Protocols 2009; 4: 44-57. https://doi.org/10.1038/nprot.2008.211 DOI: https://doi.org/10.1038/nprot.2008.211
Tibshirani R. Regression Shrinkage and Selection via the Lasso. J R Stat Soc Series B Stat Methodol 1996; 58: 267-288. DOI: https://doi.org/10.1111/j.2517-6161.1996.tb02080.x
Tarpey PS, Smith R, Pleasance E, Whibley A, Edkins S, Hardy C, et al. A systematic, large-scale resequencing screen of X-chromosome coding exons in mental retardation. Nat Genet 2009; 41: 535-543. https://doi.org/10.1038/ng.367 DOI: https://doi.org/10.1038/ng.367
Genest C, Quessy JF, Remillard B. Goodness-of-fit Procedures for Copula Models Based on the Probability Integral Transformation. Scand J Statist 2006; 33: 337-366. https://doi.org/10.1111/j.1467-9469.2006.00470.x DOI: https://doi.org/10.1111/j.1467-9469.2006.00470.x
Berg D. Copula goodness-of-fit testing: an overview and power comparison. Euro J Financ 2009; 15: 675-701. https://doi.org/10.1080/13518470802697428 DOI: https://doi.org/10.1080/13518470802697428
Published
How to Cite
Issue
Section
License
Copyright (c) 2017 Linda Chaba, John Odhiambo, Bernard Omolo
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
Policy for Journals/Articles with Open Access
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are permitted and encouraged to post links to their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work
Policy for Journals / Manuscript with Paid Access
Authors who publish with this journal agree to the following terms:
- Publisher retain copyright .
- Authors are permitted and encouraged to post links to their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work .