Validation of Gene Expression Profiles in Genomic Data through Complementary Use of Cluster Analysis and PCA-Related Biplots
DOI:
https://doi.org/10.6000/1929-6029.2012.01.02.09Keywords:
Microarrays, cluster stability, multivariate visualization, Principal Components Analysis, cell polarityAbstract
High-throughput genomic assays are used in molecular biology to explore patterns of joint expression of thousands of genes.
These methodologies had relevant developments in the last decade, and concurrently there was a need for appropriate methods for analyzing the massive data generated.
Identifying sets of genes and samples characterized by similar values of expression and validating these results are two critical issues related to these investigations because of their clinical implication. From a statistical perspective, unsupervised class discovery methods like Cluster Analysis are generally adopted.
However, the use of Cluster Analysis mainly relies on the use of hierarchical techniques without considering possible use of other methods. This is partially due to software availability and to easiness of representation of results through a heatmap, which allows to simultaneously visualize clusterization of genes and samples on the same graphical device. One drawback of this strategy is that clusters’ stability is often neglected, thus leading to over-interpretation of results.
Moreover, validation of results using external datasets is still subject of discussion, since it is well known that batch effects may condition gene expression results even after normalization.
In this paper we compared several clustering algorithms (hierarchical, k-means, model-based, Affinity Propagation) and stability indices to discover common patterns of expression and to assess clustering reliability, and propose a rank-based passive projection of Principal Components for validation purposes.
Results from a study involving 23 tumor cell lines and 76 genes related to a specific biological pathway and derived from a publicly available dataset, are presented.
References
Simon RM, Korn EL, McShane LM, Radmacher MD, Wright GW, Zhao Y. Design and analysis of DNA microarray investigations. New York: Springer 2003.
Kaufman L, Rousseeuw PJ. Finding groups in data-An introduction to cluster analysis. New York: John Wiley and Sons, Inc 1990. DOI: https://doi.org/10.1002/9780470316801
Joliffe LT. Principal Components Analysis. 2nd ed. New York: Springer-Verlag 2002.
Alter O, Brown PO, Botstein D. Singular value decomposition for genome-wide expression data processing and modeling. Proc Natl Acad Sci USA 2000; 97: 10101-6. http://dx.doi.org/10.1073/pnas.97.18.10101 DOI: https://doi.org/10.1073/pnas.97.18.10101
Chapman S, Schenk P, Kazan K, Manners J. Using biplots to interpret gene expression patterns in plants. Bioinformatics 2001; 18(1): 202-4. http://dx.doi.org/10.1093/bioinformatics/18.1.202 DOI: https://doi.org/10.1093/bioinformatics/18.1.202
Handl J, Knowles J, Kell DB. Computational cluster validation in post-genomic data-analysis. Bioinformatics 2005; 21(15): 3201-12. http://dx.doi.org/10.1093/bioinformatics/bti517 DOI: https://doi.org/10.1093/bioinformatics/bti517
Datta S, Datta S. Comparison and validation of statistical clustering techniques for microarray gene expression data. Bioinformatics 2003; 19(4): 459-66. http://dx.doi.org/10.1093/bioinformatics/btg025 DOI: https://doi.org/10.1093/bioinformatics/btg025
Yeung KY, Haynor DR, Ruzzo WL. Validating clustering for gene expression data. Bioinformatics 2001, 17(4): 309-18. http://dx.doi.org/10.1093/bioinformatics/17.4.309 DOI: https://doi.org/10.1093/bioinformatics/17.4.309
Gabriel KR. The biplot graphic display of matrices with application to principal components analysis. Biometrika 1971; 58(3): 453-67. http://dx.doi.org/10.1093/biomet/58.3.453 DOI: https://doi.org/10.1093/biomet/58.3.453
Lander ES. Array of hope. Nat Genet 1999; 21: 3-4. http://dx.doi.org/10.1038/4427 DOI: https://doi.org/10.1038/4427
Ross DT, Scherf U, Eisen MB, Perou CM, Rees C, Spellman P, et al. Systematic variation in gene expression patterns in human cancer cell lines. Nat Genet 2000; 24: 227-35. http://dx.doi.org/10.1038/73432 DOI: https://doi.org/10.1038/73432
Scherf U, Ross DT, Waltham M, Smith LH, Lee JK, Tanabe L et al. A gene expression database for the molecular pharmacology of cancer. Nat Genet 2000; 24: 236-44. http://dx.doi.org/10.1038/73439 DOI: https://doi.org/10.1038/73439
Lee M, Vasioukhin V. Cell polarity and cancer-cell and tissue polarity as a non-canonical tumor suppressor. J Cell Sci 2008; 121: 1141-50. http://dx.doi.org/10.1242/jcs.016634 DOI: https://doi.org/10.1242/jcs.016634
Morrison SH, Kimble J. Asymmetric and symmetric stem-cell divisions in development and cancer. Nature 2006; 441: 1068-74. http://dx.doi.org/10.1038/nature04956 DOI: https://doi.org/10.1038/nature04956
Hugo H, Ackland ML, Blick T, et al. Epithelial-Mesenchymal and Mesenchymal-Epithelial Transitions in Carcinoma Progression. J Cell Physiol 2007; 213: 374-83. http://dx.doi.org/10.1002/jcp.21223 DOI: https://doi.org/10.1002/jcp.21223
Moreno-Buono G, Portillo F, Cano A. Transcriptional regulation of cell polarity in EMT and cancer. Oncogene 2008; 27: 6958-69. http://dx.doi.org/10.1038/onc.2008.346 DOI: https://doi.org/10.1038/onc.2008.346
Cavallaro U, Cristofori G. Cell adhesion and signalling by cadherins and Ig-CAMs in cancer. Nat Rev Cancer 2004; 4: 118-32. http://dx.doi.org/10.1038/nrc1276 DOI: https://doi.org/10.1038/nrc1276
Cowin P, Rowlands TM, Hatsell SJ. Cadherins and catenins in breast cancer. Curr Opin Cell Biol 2005; 17: 499-508. http://dx.doi.org/10.1016/j.ceb.2005.08.014 DOI: https://doi.org/10.1016/j.ceb.2005.08.014
Eisen MB, Spellman PT, Brown PO, Botstein D. Cluster analysis and display of genome-wide expression patterns. Proc Natl Acad Sci USA 1998; 95(25): 14863-8. http://dx.doi.org/10.1073/pnas.95.25.14863 DOI: https://doi.org/10.1073/pnas.95.25.14863
Frey BJ, Dueck D. Clustering by passing messages between data points. Science 2007; 315: 972-6. http://dx.doi.org/10.1126/science.1136800 DOI: https://doi.org/10.1126/science.1136800
Soria D, Garibaldi JM, Ambrogi F, Boracchi P, Raimondi E, Biganzoli E. Cancer profiles by Affinity Propagation. Int J Knowl Eng Soft Data Paradig 2009; 1(3): 195-215. http://dx.doi.org/10.1504/IJKESDP.2009.028814 DOI: https://doi.org/10.1504/IJKESDP.2009.028814
Fraley C, Raftery AE. Model-based clustering, discriminant analysis, and density estimation. J Am Stat Assoc 2002; 97(458): 611-31. http://dx.doi.org/10.1198/016214502760047131 DOI: https://doi.org/10.1198/016214502760047131
McShane LM, Radmacher MD, Freidlin B, Yu R, Li MC, Simon R. Methods for assessing reproducibility of clustering patterns observed in analyses of microarray data. Bioinformatics 2002; 18(11): 1462-9. http://dx.doi.org/10.1093/bioinformatics/18.11.1462 DOI: https://doi.org/10.1093/bioinformatics/18.11.1462
Smolkin M, Ghosh D. Cluster stability scores for microarray data in cancer studies. BMC Bioinformatics 2003; 4: 36. http://dx.doi.org/10.1186/1471-2105-4-36 DOI: https://doi.org/10.1186/1471-2105-4-36
Scherer A, Ed. Batch effects and noise in microarray experiments - Sources and Solutions. New York: Wiley 2009. http://dx.doi.org/10.1002/9780470685983 DOI: https://doi.org/10.1002/9780470685983
R Core Team (2012). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. ISBN 3-900051-07-0, URL http://www.R-project.org/
Venables WN, Ripley BD. Modern applied statistics with S. 4th ed. New York: Springer-Verlag 2002. http://dx.doi.org/10.1007/978-0-387-21706-2 DOI: https://doi.org/10.1007/978-0-387-21706-2
Parmigiani G, Garrett-Mayer ES, Anbazhagan R, Gabrielson E. A Cross-Study Comparison of Gene Expression Studies for the Molecular Classification of Lung Cancer. Clin Cancer Res 2004; 10: 2922-7. http://dx.doi.org/10.1158/1078-0432.CCR-03-0490 DOI: https://doi.org/10.1158/1078-0432.CCR-03-0490
Garrett-Mayer E, Parmigiani G, Zhong X, Cope L, Gabrielson E. Cross-Study validation and combined analysis of gene expression microarray data. Biostatistics 2008; 9(2): 333-54. http://dx.doi.org/10.1093/biostatistics/kxm033 DOI: https://doi.org/10.1093/biostatistics/kxm033
Lusa L, McShane LM, Reid JF, et al. Challenges in projecting clustering results across gene expression-profiling datasets. J Natl Canc Inst 2007; 99: 1715-23. http://dx.doi.org/10.1093/jnci/djm216 DOI: https://doi.org/10.1093/jnci/djm216
Kennelly D, Kavanagh DO, Hogan AM, Winter DC. Oestrogen and the colon: potential mechanisms for cancer prevention. Lancet Oncol 2008; 9: 385-91. http://dx.doi.org/10.1016/S1470-2045(08)70100-1 DOI: https://doi.org/10.1016/S1470-2045(08)70100-1
Heimann R, Lan F, McBride R, Heimann S. Separating favorable from unfavorable prognostic markers in breast cancer: the role of E-cadherin. Cancer Res 2000; 60: 298-304.
Gould RBE, Bracken MB. E-cadherin immunohistochemical expression as a prognostic factor in infiltrating ductal carcinoma of the breast: a systematic review and meta-analysis. Breast Cancer Res Treat 2006; 100: 139-48. http://dx.doi.org/10.1007/s10549-006-9248-2 DOI: https://doi.org/10.1007/s10549-006-9248-2
Hazan RB, Phillips GR, Qiao RF, Norton L, Aaronson SA. Exogenous expression of NCadherinin breast cancer cells induces cell migration, invasion, and metastasis. J Cell Biol 2000; 148: 779-90. http://dx.doi.org/10.1083/jcb.148.4.779 DOI: https://doi.org/10.1083/jcb.148.4.779
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2012 Niccolò Bassani, Federico Ambrogi, Danila Coradini, Patrizia Boracchi, Elia Biganzoli
This work is licensed under a Creative Commons Attribution 4.0 International License.
Policy for Journals/Articles with Open Access
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are permitted and encouraged to post links to their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work
Policy for Journals / Manuscript with Paid Access
Authors who publish with this journal agree to the following terms:
- Publisher retain copyright .
- Authors are permitted and encouraged to post links to their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work .