Application of Generalized Additive Models to the Evaluation of Continuous Markers for Classification Purposes
DOI:
https://doi.org/10.6000/1929-6029.2015.04.03.8Keywords:
Discriminatory capability, ROC, AUC, optimal cutpoint, biomarker, plasma glucoseAbstract
Background: Receiver operating characteristic (ROC) curve and derived measures as the Area Under the Curve (AUC) are often used for evaluating the discriminatory capability of a continuous biomarker in distinguishing between alternative states of health. However, if the marker shows an irregular distribution, with a dominance of diseased subjects in noncontiguous regions, classification using a single cutpoint is not appropriate, and it would lead to erroneous conclusions. This study sought to describe a procedure for improving the discriminatory capacity of a continuous biomarker, by using generalized additive models (GAMs) for binary data.
Methods: A new classification rule is obtained by using logistic GAM regression models to transform the original biomarker, with the predicted probabilities being the new transformed continuous biomarker. We propose using this transformed biomarker to establish optimal cut-offs or intervals on which to base the classification. This methodology is applied to different controlled scenarios, and to real data from a prospective study of patients undergoing surgery at a University Teaching Hospital, for examining plasma glucose as postoperative infection biomarker.
- Results: Both, theoretical scenarios and real data results show that when the risk marker-disease relationship is not monotone, using the new transformed biomarker entails an improvement in discriminatory capacity. Moreover, in these situations, an optimal interval seems more reasonable than a single cutpoint to define lower and higher disease-risk categories.
Conclusions: Using statistical tools which allow for greater flexibility (e.g., GAMs) can optimize the classificatory capacity of a potential marker using ROC analysis. So, it is important to question linearity in marker-outcome relationships, in order to avoid erroneous conclusions.
References
Metz CE. Basic principles of ROC analysis. Semin Nucl Med 1978; 8: 283-98. http://dx.doi.org/10.1016/S0001-2998(78)80014-2 DOI: https://doi.org/10.1016/S0001-2998(78)80014-2
Swets JA, Pickett RM. Evaluation of diagnostic systems: Methods from signal detection theory. New York: Academic Press 1982.
Hanley JA, McNeil BJ. The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology 1982; 143: 29-36. http://dx.doi.org/10.1148/radiology.143.1.7063747 DOI: https://doi.org/10.1148/radiology.143.1.7063747
McIntosh MW, Pepe MS. Combining several screening tests: optimality of the risk score. Biometrics 2002; 58: 657-64. http://dx.doi.org/10.1111/j.0006-341X.2002.00657.x DOI: https://doi.org/10.1111/j.0006-341X.2002.00657.x
Lustres-Pérez V, Rodríguez-Álvarez MX, Pazos-Pata M, Cadarso-Suárez C, Fernández-Pulpeiro E. The application of Receiver Operating Characteristic (ROC) methodology in biological studies of marine resources: sex determination of Paracentrotus lividus (Lamarck, 1816). SORT 2010; 34: 239-48.
Hastie TJ, Tibshirani RJ. Generalized additive models. London: Chapman and Hall 1990.
Mazumdar M, Glassman JR. Categorizing a prognostic variable: review of methods, code for easy implementation and applications to decision-making about cancer treatments. Stat Med 2000; 19: 113-32. http://dx.doi.org/10.1002/(SICI)1097-0258(20000115)19:1<113::AID-SIM245>3.0.CO;2-O DOI: https://doi.org/10.1002/(SICI)1097-0258(20000115)19:1<113::AID-SIM245>3.0.CO;2-O
Altman DG, Lausen B, Sauerbrei W, Schumacher M. Dangers of using “optimal” cutpoints in the evaluation of prognostic factors. J Natl Cancer Inst 1994; 86: 829-35. http://dx.doi.org/10.1093/jnci/86.11.829 DOI: https://doi.org/10.1093/jnci/86.11.829
Lausen B, Schumacher M. Evaluating the effect of optimized cutoff values in the assessment of prognostic factors. Comput Stat Data Anal 1996; 21: 307-26. http://dx.doi.org/10.1016/0167-9473(95)00016-X DOI: https://doi.org/10.1016/0167-9473(95)00016-X
Klotsche J, Ferger D, Pieper L, Rehm J, Wittchen HU. A novel nonparametric approach for estimating cut-offs in continuous risk indicators with application to diabetes epidemiology. BMC Med Res Methodol 2009; 9: 63. http://dx.doi.org/10.1186/1471-2288-9-63 DOI: https://doi.org/10.1186/1471-2288-9-63
Figueiras A, Cadarso-Suárez C. Application of nonparametric models for calculating odds ratios and their confidence intervals for continuous exposures. Am J Epidemiol 2001; 154: 264-75. http://dx.doi.org/10.1093/aje/154.3.264 DOI: https://doi.org/10.1093/aje/154.3.264
Altemeier W. Surgical infections: incisional wounds. In: Bennet JV, Brachman P, editors. Hospital infections. Boston: Little, Brown and Company 1979.
Neyman J, Pearson ES. On the problem of the most efficient tests of statistical hypothesis. Philos Trans R Soc Lond A 1933; 231: 289-337. http://dx.doi.org/10.1098/rsta.1933.0009 DOI: https://doi.org/10.1098/rsta.1933.0009
Eilers P, Marx B. Flexible smoothing with B-splines and penalties. Stat Sci 1996; 11: 89-121. http://dx.doi.org/10.1214/ss/1038425655 DOI: https://doi.org/10.1214/ss/1038425655
Wood SN. Thin plate regression splines. J R Stat Soc Series B Stat Methodol 2003; 65: 95-114. http://dx.doi.org/10.1111/1467-9868.00374 DOI: https://doi.org/10.1111/1467-9868.00374
Lang S, Brezger A. Bayesian P-splines. J Comput Graph Stat 2004; 13: 183-212. http://dx.doi.org/10.1198/1061860043010 DOI: https://doi.org/10.1198/1061860043010
McCullagh P, Nelder J. Generalized linear models. 2nd ed. London: Chapman and Hall 1989. http://dx.doi.org/10.1007/978-1-4899-3242-6 DOI: https://doi.org/10.1007/978-1-4899-3242-6
Wand MP, Jones MC. Kernel smoothing. London: Chapman and Hall 1995. http://dx.doi.org/10.1007/978-1-4899-4493-1 DOI: https://doi.org/10.1007/978-1-4899-4493-1
Wood SN. Stable and efficient multiple smoothing parameter estimation for generalized additive models. J Am Stat Assoc 2004; 99: 673-86. http://dx.doi.org/10.1198/016214504000000980 DOI: https://doi.org/10.1198/016214504000000980
Zhao LP, Kristal AR, White E. Estimating relative risk functions in case-control studies using a nonparametric logistic regression. Am J Epidemiol 2006; 144: 598-609. http://dx.doi.org/10.1093/oxfordjournals.aje.a008970 DOI: https://doi.org/10.1093/oxfordjournals.aje.a008970
R Development Core Team. R: A language and environment for statistical computing, version R.3.1.3. R Foundation for Statistical Computing, Vienna, Austria 2015. URL http://www.R-project.org/
Wood SN. Generalized additive models, an introduction with R. Boca Raton, Florida: Chapman and Hall/CRC 2006.
Du P, Tang L. Transformation-invariant and nonparametric monotone smooth estimation of ROC curves. Stat Med 2009; 28: 349-59. http://dx.doi.org/10.1002/sim.3465 DOI: https://doi.org/10.1002/sim.3465
Swets JA, Tanner WPJ, Birdsall TG. Decision processes in perception. Phychol Rev 1961; 68: 301-40. http://dx.doi.org/10.1037/h0040547 DOI: https://doi.org/10.1037/h0040547
Egan JP. Signal detection theory and ROC Analysis. New York: Academic Press 1975.
Sáez M, Cadarso-Suárez C, Figueiras A. np.OR: an S-Plus function for pointwise nonparametric estimation of odds-ratios of continuous predictors. Comput Methods Programs Biomed 2003; 71: 175-79. http://dx.doi.org/10.1016/S0169-2607(02)00076-7 DOI: https://doi.org/10.1016/S0169-2607(02)00076-7
Efron B. Bootstrap methods: Another look at the jackknife. Ann Stat 1979; 7: 1-26. http://dx.doi.org/10.1214/aos/1176344552 DOI: https://doi.org/10.1214/aos/1176344552
Cid-Álvarez B, Gude F, Cadarso-Suárez C, et al. Admission and fasting plasma glucose for estimating risk of death of diabetic and nondiabetic patients with acute coronary syndrome: nonlinearity of hazard ratios and time-dependent comparison. Am Heart J 2009; 58: 989-97. http://dx.doi.org/10.1016/j.ahj.2009.10.004 DOI: https://doi.org/10.1016/j.ahj.2009.10.004
Bertone-Johnson ER, Tworoger SS, Hankinson SE. Recreational physical activity and steroid hormone levels in postmenopausal women. Am J Epidemiol 2009; 170: 1095-104. http://dx.doi.org/10.1093/aje/kwp254 DOI: https://doi.org/10.1093/aje/kwp254
Riddle DL, Stratford PW. Interpreting validity indexes for diagnostic tests: an illustration using the Berg balance test. Phys Ther 1999; 79: 939-50. DOI: https://doi.org/10.1093/ptj/79.10.939
Greiner M, Pfeiffer D, Smith RD. Principals and practical application of the receiver operating characteristic analysis for diagnostic tests. Prev Vet Med 2002; 45: 23-41. http://dx.doi.org/10.1016/S0167-5877(00)00115-X DOI: https://doi.org/10.1016/S0167-5877(00)00115-X
Van den Berghe G, Wouters P, Weekers F, et al. Intensive insulin therapy in the critically ill patients. N Engl J Med 2001; 345: 1359-67. http://dx.doi.org/10.1056/NEJMoa011300 DOI: https://doi.org/10.1056/NEJMoa011300
Heagerty PJ, Lumley T, Pepe MS. Time-dependent ROC curves for censored survival data and a diagnostic marker. Biometrics 2000; 56: 337-44. http://dx.doi.org/10.1111/j.0006-341X.2000.00337.x DOI: https://doi.org/10.1111/j.0006-341X.2000.00337.x
Heagerty PJ, Zheng Y. Survival model predictive accuracy and ROC curves. Biometrics 2005; 61: 92-105. http://dx.doi.org/10.1111/j.0006-341X.2005.030814.x DOI: https://doi.org/10.1111/j.0006-341X.2005.030814.x
Cox DR. Regression models and life-tables (with discussion). J R Stat Soc Series B Stat Methodol 1972; 34: 187-220. DOI: https://doi.org/10.1111/j.2517-6161.1972.tb00899.x
Cadarso-Suárez C, Meira-Machado L, Kneib T, Gude F. Flexible hazard ratio curves for continuous predictors in multi-state models: an application to breast cancer data. Stat Modelling 2010; 10: 291-314. http://dx.doi.org/10.1177/1471082X0801000303 DOI: https://doi.org/10.1177/1471082X0801000303
Lado MJ, Cadarso-Suárez C, Roca-Pardiñas J, Tahoces PG: Using generalized additive models for construction of nonlinear classifiers in computer-aided diagnosis systems. IEEE Trans Inf Technol Biomed 2006; 10: 246-53. http://dx.doi.org/10.1109/TITB.2005.859892 DOI: https://doi.org/10.1109/TITB.2005.859892
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2015 Mónica López-Ratón, Mar Rodríguez-Girondo, María Xosé Rodríguez-Álvarez, Carmen Cadarso-Suárez, Francisco Gude
This work is licensed under a Creative Commons Attribution 4.0 International License.
Policy for Journals/Articles with Open Access
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are permitted and encouraged to post links to their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work
Policy for Journals / Manuscript with Paid Access
Authors who publish with this journal agree to the following terms:
- Publisher retain copyright .
- Authors are permitted and encouraged to post links to their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work .