Exploring the Performance of Methods to Deal Multicollinearity: Simulation and Real Data in Radiation Epidemiology Area
DOI:
https://doi.org/10.6000/1929-6029.2018.07.02.2Keywords:
Lasso Regression, Multicollinearity, Organs volume modelling, Partial Least Squares Regression, Principal Components Regression, Ridge Regression.Abstract
The issue of multicollinearity has long been acknowledged in statistical modelling; however, it is often untreated in the most of published papers. Indeed, the use of methods for multicollinearity correction is still scarce. One important reason is that despite many proposed methods, little is known about their strength or performance. We compare the statistical properties and performance of four main techniques to correct multicollinearity, i.e., Ridge Regression (R-R), Principal Components Regression (PC-R), Partial Least Squares Regression (PLS-R), and Lasso Regression (L-R), in both a simulation study and two real data examples used for modelling volumes of heart and Thyroid as a function of clinical and anthropometric parameters. We find that when the statistical approaches were used to address different levels of collinearity, we observed that R-R, PC-R and PLS-R appeared to have a somewhat similar behavior, with a slight advantage for the PLS-R. Indeed, in all implemented cases, the PLS-R always provided the smallest value of root mean square error (RMSE). When the degree of collinearity was moderate, low or very low, the L-R method had also somewhat similar performance to other methods. Furthermore, correction methods allowed us to provide stable and trustworthy parameter estimates for predictors in the modelling of heart and Thyroid volumes. Therefore, this work will contribute to highlighting performances of methods used only for situations ranging from low to very high multicollinearity.
References
Pitard A, Viel JF. Some methods to address collinearity among pollutants in epidemiological time series. Statistics in Medicine 1997; 16(5): 527-44. https://doi.org/10.1002/(SICI)1097-0258(19970315)16:5<527::AID-SIM429>3.0.CO;2-C DOI: https://doi.org/10.1002/(SICI)1097-0258(19970315)16:5<527::AID-SIM429>3.0.CO;2-C
Schroeder, Mary Ann. Diagnosing and dealing with multicollinearity. Western Journal of Nursing Research 1990; 12(2): 175-187. https://doi.org/10.1177/019394599001200204 DOI: https://doi.org/10.1177/019394599001200204
Gordon RA. Issues in multiple regression. American Journal of Sociology 1968; 73: 592-616. https://doi.org/10.1086/224533 DOI: https://doi.org/10.1086/224533
Dormann CF, Elith J, Bacher S, et al. Collinearity: a review of methods to deal with it and a simulation study evaluating their performance. Ecography 2012; 35: 001-020.
Weisberg S. Applied Linear Regression, third edition. New-York: Wiley. 2005. www.stat.umn.edu/alr DOI: https://doi.org/10.1002/0471704091
Buonaccorsi JP. A modified estimating equation approach to correcting for measurement error in regression. Biometrika 1996; 83: 433-440. https://doi.org/10.1093/biomet/83.2.433 DOI: https://doi.org/10.1093/biomet/83.2.433
Hoerl E, Kennard RW. Ridge Regression: Applications to Nonorthogonal Problems. Technometrics 1970; 12: 69-82. https://doi.org/10.1080/00401706.1970.10488635 DOI: https://doi.org/10.1080/00401706.1970.10488635
Guilkey DK, Murphy JL. Directed Ridge Regression Techniques in cases of Multicollinearity. Journal of American Statistical Association 1975; 70: 767-775. https://doi.org/10.1080/01621459.1975.10480301 DOI: https://doi.org/10.1080/01621459.1975.10480301
El-Dereny M, Rashwan NI. Solving Multicollinearity Problem Using Ridge Regression Models. International Journal of Contemporary Mathematical Sciences 2011; 6: 585-600.
Meijer RJ, Goeman JJ. Efficient approximate k-fold and leave-one-out cross-validation for ridge regression. Biometrical Journal 2013; 55: 141-55. https://doi.org/10.1002/bimj.201200088 DOI: https://doi.org/10.1002/bimj.201200088
SAS Institute Inc. SAS® 9.3 System Options: Reference, Second Edition. Cary, NC: SAS Institute Inc. 2011. https://support.sas.com/documentation/cdl/en/lesysoptsref/64892/PDF/default/lesysoptsref.pdf
Vigneau E, Bertrand D, Qannari EM. Application of latent root regression for calibration in near-infrared spectroscopy: Comparion with principal component regression and partial least squares. Chemometrics and Intelligent laboratory system 1996; 35: 231-238. https://doi.org/10.1016/S0169-7439(96)00051-2 DOI: https://doi.org/10.1016/S0169-7439(96)00051-2
Cassel C, Westlund AH, Hackl P. Robustness of partial least-squares method for estimating latent variable quality structures. Journal of Applied Statistics 1999; 26: 435-448. https://doi.org/10.1080/02664769922322 DOI: https://doi.org/10.1080/02664769922322
Chun H, Keleş S. Sparse Partial Least Squares Regression for Simultaneous Dimension Reduction and Variable Selection. Journal of the Royal Statistical Society B Statistical Methodology 2010; 72: 3-25. https://doi.org/10.1111/j.1467-9868.2009.00723.x DOI: https://doi.org/10.1111/j.1467-9868.2009.00723.x
Boulesteix AL, Strimmer K. Partial least squares: a versatile tool for the analysis of high-dimensional genomic data. Briefings in Bioinformatics 2007; 8: 32-44. https://doi.org/10.1093/bib/bbl016 DOI: https://doi.org/10.1093/bib/bbl016
Helland I. On the structure of Partial Least Squares. Communications in Statistics - Simulation and Computation 1988; 17: 581-607. https://doi.org/10.1080/03610918808812681 DOI: https://doi.org/10.1080/03610918808812681
Tibshirani R. Regression shrinkage and selection via the lasso: a retrospective. Journal of the Royal Statistical Society B Statistical Methodology 2011; 73(3): 267-288. https://doi.org/10.1111/j.1467-9868.2011.00771.x DOI: https://doi.org/10.1111/j.1467-9868.2011.00771.x
Zou H, Hastie T. Regularization and Variable Selection via the Elastic Net. Journal of the Royal Statistical Society B Statistical Methodology 2005; 67: 301-320. https://doi.org/10.1111/j.1467-9868.2005.00503.x DOI: https://doi.org/10.1111/j.1467-9868.2005.00503.x
Efron B, Hastie T, Johnstone I, Tibshirani R. Least angle regression. Annals of Statistics 2004; 32: 407-451. https://doi.org/10.1214/009053604000000067 DOI: https://doi.org/10.1214/009053604000000067
Grandvalet Y. Least absolute shrinkage is equivalent to quadratic penalization. In Niklasson L, Boden M, Ziemske T (eds) ICANN'98 Perspectives in Neural Computing. Springer-Verlag: Berlin 1998. https://doi.org/10.1007/978-1-4471-1599-1_27 DOI: https://doi.org/10.1007/978-1-4471-1599-1_27
Hastie T, Tibshirani R, Friedman J. The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer-Verlag. New York 2001. https://doi.org/10.1007/978-0-387-21606-5 DOI: https://doi.org/10.1007/978-0-387-21606-5
Badouna AN, Veres C, Haddy N, et al. Total heart volume as a function of clinical and anthropometric parameters in a population of external beam radiation therapy patients. Physics in Medicine & Biology 2012; 57: 473-484. https://doi.org/10.1088/0031-9155/57/2/473 DOI: https://doi.org/10.1088/0031-9155/57/2/473
International Commission on Radiological Protection (ICRP). Basic Anatomical and Physiological Data for Use in Radiological Protection: Reference Values ICRP Publication 89 (Pergamon: Oxford) 2002.
Graham TP Jr, Jarmakani JM, Canent RV Jr, et al. Left heart volume estimation in infancy and childhood. Reevaluation of methodology and normal values. Circulation 1971; 43: 895-904. https://doi.org/10.1161/01.CIR.43.6.895 DOI: https://doi.org/10.1161/01.CIR.43.6.895
Veres C, Garsi JP, Rubino C, et al. Thyroid volume measurement in external beam radiotherapy patients using CT imaging: correlation with clinical and anthropometric characteristics. Physics in Medicine & Biology 2010; 55: 507-519. https://doi.org/10.1088/0031-9155/55/21/N02 DOI: https://doi.org/10.1088/0031-9155/55/21/N02
Xu XG, Bednarz B, Paganetti H. A review of dosimetry studies on external-beam radiation treatment with respect to second cancer induction. Physics in Medicine & Biology 2008; 53: 193-241. https://doi.org/10.1088/0031-9155/53/13/R01 DOI: https://doi.org/10.1088/0031-9155/53/13/R01
Zaidi H, Xu XG. Computational anthropomorphic models of the human anatomy: the path to realistic Monte Carlo modelling in radiological sciences. Annual Review of Biomedical Engineering 2007; 9: 471-500. https://doi.org/10.1146/annurev.bioeng.9.060906.151934 DOI: https://doi.org/10.1146/annurev.bioeng.9.060906.151934
Scarboro SB, Stovall M, White A, et al. Effect of organ size and position on out-of-field dose distributions during radiation therapy. Physics in Medicine & Biology 2010; 55: 7025-7036. https://doi.org/10.1088/0031-9155/55/23/S05 DOI: https://doi.org/10.1088/0031-9155/55/23/S05
Barrère X, Valeix P, Preziosi P, Bensimon M, Pelletier B, Galan P, Hercberg S. Determinants of thyroid volume in healthy French adults participating in the SU.VI.MAX cohort. Clinical Endocrinology 2000; 52: 273-278. https://doi.org/10.1046/j.1365-2265.2000.00939.x DOI: https://doi.org/10.1046/j.1365-2265.2000.00939.x
Gruber MHJ. Regression Estimators: a Comparative Study. Academic Press: Boston 1990. DOI: https://doi.org/10.1016/B978-0-12-304752-6.50006-8
Frank IE, Friedman JH. A statistical view of some chemometrics regression tools. Technometrics 1993; 35: 109-148. https://doi.org/10.1080/00401706.1993.10485033 DOI: https://doi.org/10.1080/00401706.1993.10485033
Wentzell PD, Montoto V. Comparison of principal components regression and partial least squares regression through generic simulations of complex mixtures. Chemometrics and Intelligent Laboratory Systems 2003; 65: 257-279. https://doi.org/10.1016/S0169-7439(02)00138-7 DOI: https://doi.org/10.1016/S0169-7439(02)00138-7
Wu J, Devlin B, Ringquist S, Trucco M, Roeder K. Screen and clean: a tool for identifying interactions in genome-wide association studies. Genetic Epidemiology 2010; 34: 275-85. https://doi.org/10.1002/gepi.20459 DOI: https://doi.org/10.1002/gepi.20459
Xu S. Estimating polygenic effects using markers of the entire genome. Genetics 2003; 163: 789-801. DOI: https://doi.org/10.1093/genetics/163.2.789
Curtis SM, Ghosh SK. A Bayesian Approach to Multicollinearity and the Simultaneous Selection and Clustering of Predictors in Linear Regression. Journal of Statistical Theory and Practice 2011; 5: 715-735. https://doi.org/10.1080/15598608.2011.10483741 DOI: https://doi.org/10.1080/15598608.2011.10483741
Willis CE, Perlack RD. Multicollinearity: effects, symptoms, and remedies. Northeastern Journal of Agricultural and Resource Economics 1978; 7: 55-61. https://doi.org/10.1017/S0163548400001989 DOI: https://doi.org/10.1017/S0163548400001989
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2018 Mickaël Dubocq, Nadia Haddy, Boris Schwartz, Carole Rubino, Florent Dayet, Florent de Vathaire, Ibrahima Diallo, Rodrigue S. Allodji
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
Policy for Journals/Articles with Open Access
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are permitted and encouraged to post links to their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work
Policy for Journals / Manuscript with Paid Access
Authors who publish with this journal agree to the following terms:
- Publisher retain copyright .
- Authors are permitted and encouraged to post links to their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work .