Relaxed Adaptive Lasso for Classification on High-Dimensional Sparse Data with Multicollinearity
DOI:
https://doi.org/10.6000/1929-6029.2023.12.13Keywords:
High-dimensional sparse data, machine learning, multicollinearity, penalized logistic regression, variable selection methodAbstract
High-dimensional sparse data with multicollinearity is frequently found in medical data. This problem can lead to poor predictive accuracy when applied to a new data set. The Least Absolute Shrinkage and Selection Operator (Lasso) is a popular machine-learning algorithm for variable selection and parameter estimation. Additionally, the adaptive Lasso method was developed using the adaptive weight on the l1-norm penalty. This adaptive weight is related to the power order of the estimators. Thus, we focus on 1) the power of adaptive weight on the penalty function, and 2) the two-stage variable selection method. This study aimed to propose the relaxed adaptive Lasso sparse logistic regression. Moreover, we compared the performances of the different penalty functions by using the mean of the predicted mean squared error (MPMSE) for the simulation study and the accuracy of classification for a real-data application. The results showed that the proposed method performed best on high-dimensional sparse data with multicollinearity. Along with, for classifier with the support vector machine, this proposed method was also the best option for the variable selection process.
References
Makalic E, Schmidt DF. Review of modern logistic regression methods with application to small and medium sample size problems. In: Li J, editor. AI 2010: Advances in artificial intelligence. Lecture notes in computer science. 1st ed. Berlin, Heidelberg: Springer 2010; p. 213-222. https://doi.org/10.1007/978-3-642-17432-2_22 DOI: https://doi.org/10.1007/978-3-642-17432-2_22
Sudjai N, Siriwanarangsun P, Lektrakul N, et al. Tumor-to-bone distance and radiomic features on MRI distinguish intramuscular lipomas from well-differentiated liposarcomas. J Orthop Surg Res 2023; 18: 255. https://doi.org/10.1186/s13018-023-03718-4 DOI: https://doi.org/10.1186/s13018-023-03718-4
Sudjai N, Siriwanarangsun P, Lektrakul N, et al. Robustness of radiomic features: two-dimensional versus three-dimensional MRI-based feature reproducibility in lipomatous soft-tissue tumors. Diagnostics 2023; 13: 258. https://doi.org/10.3390/diagnostics13020258 DOI: https://doi.org/10.3390/diagnostics13020258
Tang Y, Cui J, Zhu J, Fan G. Differentiation between lipomas and atypical lipomatous tumors of the extremities using radiomics. J Magn Reson Imaging 2022; 56: 1746-54. https://doi.org/10.1002/jmri.28167 DOI: https://doi.org/10.1002/jmri.28167
Kleinbaum DG, Klein M. Logistic regression: a self-learning text. 3rd ed. New York: Springer; 2010. https://doi.org/10.1007/978-1-4419-1742-3 DOI: https://doi.org/10.1007/978-1-4419-1742-3
Hosmer DW, Lemeshow SJ. Applied logistic regression. 3rd ed. New Jersey: Wiley; 2013. https://doi.org/10.1002/9781118548387 DOI: https://doi.org/10.1002/9781118548387
Senaviratna NAMR, Cooray TMJA. Multicollinearity in binary logistic regression model. In: Thapa N, editor. Theory and practice of mathematics and computer science. 1st ed. West Bengal: BP International 2021; p. 11-9. https://doi.org/10.9734/bpi/tpmcs/v6/2417E DOI: https://doi.org/10.9734/bpi/tpmcs/v6/2417E
Brimacombe M. High-dimensional data and linear models: a review. Open Access Med Stat 2014; 4: 17-27. https://doi.org/10.2147/OAMS.S56499 DOI: https://doi.org/10.2147/OAMS.S56499
Belsley DA, Kuh E, Welsch RE. Regression diagnostics: identifying influential data and sources of collinearity. New York: John Wiley & Sons; 1980. https://doi.org/10.1002/0471725153 DOI: https://doi.org/10.1002/0471725153
Kastrin A, Peterlin B. Rasch-based high-dimensionality data reduction and class prediction with applications to microarray gene expression data. Expert Syst Appl 2010; 37: 5178-85. https://doi.org/10.1016/j.eswa.2009.12.074 DOI: https://doi.org/10.1016/j.eswa.2009.12.074
Hosseinnataj A, Bahrampour A, Baneshi M, et al. Penalized Lasso methods in health data: application to trauma and influenza data of Kerman. Journal of Kerman University of Medical Sciences 2019; 26: 440-9. https://doi.org/10.22062/jkmu.2019.89573
Pavlou M, Ambler G, Seaman S, De Iorio M, Omar RZ. Review and evaluation of penalised regression methods for risk prediction in low-dimensional data with few events. Stat Med 2016; 35: 1159-77. https://doi.org/10.1002/sim.6782 DOI: https://doi.org/10.1002/sim.6782
Hoerl AE, Kennard RW. Ridge regression: biased estimation for nonorthogonal problems. Technometrics 1970; 12: 55-67. https://doi.org/10.1080/00401706.1970.10488634 DOI: https://doi.org/10.1080/00401706.1970.10488634
Tibshirani R. Regression shrinkage and selection via the Lasso. J R Stat Soc Series B Stat Methodol 1996; 58: 267-88. https://doi.org/10.1111/j.2517-6161.1996.tb02080.x DOI: https://doi.org/10.1111/j.2517-6161.1996.tb02080.x
Zou H, Hastie T. Regularization and variable selection via the elastic Net. J R Stat Soc Series B Stat Methodol 2005; 67: 301-20. https://doi.org/10.1111/j.1467-9868.2005.00503.x DOI: https://doi.org/10.1111/j.1467-9868.2005.00503.x
Zou H. The adaptive Lasso and Its oracle properties. J Am Stat Assoc. 2006; 101: 1418-29. https://doi.org/10.1198/016214506000000735 DOI: https://doi.org/10.1198/016214506000000735
Meinshausen N. Relaxed Lasso. Comput Stat Data Anal 2007; 52: 374-93. https://doi.org/10.1016/j.csda.2006.12.019 DOI: https://doi.org/10.1016/j.csda.2006.12.019
Zou H, Zhang HH. On the adaptive elastic-net with a diverging number of parameters. Ann Stat 2009; 37: 1733-51. https://doi.org/10.1214/08-AOS625 DOI: https://doi.org/10.1214/08-AOS625
Cherkassky V, Mulier F. Learning from data: concepts, theory, and methods. 2nd ed. New Jersey: John Wiley and Sons; 2006. https://doi.org/10.1002/9780470140529 DOI: https://doi.org/10.1002/9780470140529
Algamal ZY, Lee MH. Penalized logistic regression with the adaptive LASSO for gene selection in high-dimensional cancer classification. Expert Syst Appl 2015; 42: 9326-32. https://doi.org/10.1016/j.eswa.2015.08.016 DOI: https://doi.org/10.1016/j.eswa.2015.08.016
James G, Witten D, Hastie T, Tibshirani R. An introduction to statistical learning with applications in R. New York: Springer; 2013. https://doi.org/10.1007/978-1-4614-7138-7 DOI: https://doi.org/10.1007/978-1-4614-7138-7
Hardin J, Garcia SR, Golan D. A method for generating realistic correlation matrices. Ann Appl Stat 2013; 7: 1733-62. https://doi.org/10.1214/13-AOAS638 DOI: https://doi.org/10.1214/13-AOAS638
Hastie T, Tibshirani T, Friedman JB. The elements of statistical learning: data mining inference and prediction. 2nd ed. Berlin, Heidelberg: Springer; 2009. https://doi.org/10.1007/978-0-387-84858-7 DOI: https://doi.org/10.1007/978-0-387-84858-7
Kassambara A. Machine learning essentials: practical guide in R. 1st ed: STHDA; 2017.
Sudjai N, Duangsaphon M. Liu-type logistic regression coefficient estimation with multicollinearity using the bootstrapping method. Science, Engineering and Health Studies 2020; 14: 203-14. https://li01.tci-thaijo.org/index.php/sehs/article/view/222465
Algamal ZY, Lee MH. Regularized logistic regression with adjusted adaptive elastic net for gene selection in high dimensional cancer classification. Comput Biol Med 2015; 67: 136-45. https://doi.org/10.1016/j.compbiomed.2015.10.008 DOI: https://doi.org/10.1016/j.compbiomed.2015.10.008
Downloads
Published
How to Cite
Issue
Section
License
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
Policy for Journals/Articles with Open Access
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are permitted and encouraged to post links to their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work
Policy for Journals / Manuscript with Paid Access
Authors who publish with this journal agree to the following terms:
- Publisher retain copyright .
- Authors are permitted and encouraged to post links to their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work .