Imputation of Missing Data for a Continuous Variable with an Ordinal form of Risk Function: When to Apply the Transformation?
DOI:
https://doi.org/10.6000/1929-6029.2014.03.04.6Keywords:
Missing data, risk function, transformation, Multiple Imputation.Abstract
Introduction: Imputation of missing data and selection of appropriate risk function are of importance . Sometimes a variable with continuous nature will be offered to the regression model as an ordinal variable. Our aim is to investigate whether to offer the continuous form of the variable to the imputation phase and its ordinal from to the modeling phase, or whether to offer the ordinal version to both phases.
Material and Methods: The outcome and main variable of interest was use of diet as a body change approach, and Body Mass Index (BMI). We randomly deleted 10%, 20%, and 40% of BMI values. In strategies 1 and 2, BMI was offered to the imputation phase as a continuous (BMIC) and ordinal variable (BMIO). Missing data were imputed using linear and polytomous regression respectively. In strategy 1, after imputation, BMIC was categorized (named BMICO) and offered to the modeling phase. In strategy 2, after imputation of BMIO values, this variable was offered to the logistic model (named BMIOO). We compared two strategies at Event Per Variables (EPV) of 75, 10, and 5.
Result: At EPVs of 75 and 10 no remarkable difference was seen. However, at EPV of 5, strategy 2 was superior. At 20% and 40% missing rates, strategy 1 was 2.21 and 3.67 times more likely to produce Severe Relative Bias. At high missing rate, power was higher in strategy2 (90% versus 83%).
Conclusions: When EPV is low and missing rate is high, categorizing of variable before imputation of missing data produces less SRB and leads to higher power.
References
Barzi F, Woodward M. Imputations of missing values in practice: results from imputations of serum cholesterol in 28 cohort studies. Am J Epidemiol 2004; 160(1): 34-45. http://dx.doi.org/10.1093/aje/kwh175 DOI: https://doi.org/10.1093/aje/kwh175
Donders ART, et al. Review: a gentle introduction to imputation of missing values. J Clin Epidemiol 2006; 59(10): 1087-1091. http://dx.doi.org/10.1016/j.jclinepi.2006.01.014 DOI: https://doi.org/10.1016/j.jclinepi.2006.01.014
Acock AC. Working with missing values. J Marriage Family 2005; 67(4): 1012-1028. http://dx.doi.org/10.1111/j.1741-3737.2005.00191.x DOI: https://doi.org/10.1111/j.1741-3737.2005.00191.x
Arnold AM, Kronmal RA. Multiple imputation of baseline data in the cardiovascular health study. Am J Epidemiol 2003; 157(1): 74. http://dx.doi.org/10.1093/aje/kwf156 DOI: https://doi.org/10.1093/aje/kwf156
Baneshi MR, Talei AR. Impact of imputation of missing data on estimation of survival rates: an example in breast cancer. Iran J Cancer Prevent 2010; 3(3): 127-31.
Bono C, et al. Missing data on the Center for Epidemiologic Studies Depression Scale: a comparison of 4 imputation techniques. Res Soc Admin Pharm 2007; 3(1): 1-27. http://dx.doi.org/10.1016/j.sapharm.2006.04.001 DOI: https://doi.org/10.1016/j.sapharm.2006.04.001
Baneshi MR. Prevention of Disease Complications through Diagnostic Models: How to Tackle the Problem of Missing Data? Iran J Public Health 2012; 41(1).
Baneshi MR, Talei AR. Does the Missing Data Imputation Method Affect the Composition and Performance of Prognostic Models? Iran Red Cresce Med J 2012; 14(1):
-6.
Zhang X. A Study of Methods for Missing data problems in Epidemiologic Studies with Historical Exposures 2009; University of Southern California.
Van Buuren S, Boshuizen HC, Knook DL. Multiple imputation of missing blood pressure covariates in survival analysis. Statist Med 1999; 18(6): 681-694. http://dx.doi.org/10.1002/(SICI)1097-0258(19990330)18:6<681::AID-SIM71>3.0.CO;2-R DOI: https://doi.org/10.1002/(SICI)1097-0258(19990330)18:6<681::AID-SIM71>3.0.CO;2-R
Janssen KJM, et al. Missing covariate data in medical research: to impute is better than to ignore. J Clin Epidemiol 2010; 63(7): 721-727. http://dx.doi.org/10.1016/j.jclinepi.2009.12.008 DOI: https://doi.org/10.1016/j.jclinepi.2009.12.008
Marshall A, et al. Comparison of techniques for handling missing covariate data within prognostic modelling studies: a simulation study. BMC Med Res Methodol 2010; 10(1): 7. http://dx.doi.org/10.1186/1471-2288-10-7 DOI: https://doi.org/10.1186/1471-2288-10-7
Gorelick MH. Bias arising from missing data in predictive models. J Clin Epidemiol 2006; 59(10): 1115-1123. http://dx.doi.org/10.1016/j.jclinepi.2004.11.029 DOI: https://doi.org/10.1016/j.jclinepi.2004.11.029
Schenker N, et al. Multiple imputation of missing income data in the National Health Interview Survey. J Am Statist Assoc 2006; 101(475): 924-933. http://dx.doi.org/10.1198/016214505000001375 DOI: https://doi.org/10.1198/016214505000001375
Azur MJ, et al. Multiple imputation by chained equations: what is it and how does it work? Int J Methods Psychiatr Res 2011; 20(1): 40-49. http://dx.doi.org/10.1002/mpr.329 DOI: https://doi.org/10.1002/mpr.329
White IR, Royston P, Wood AM. Multiple imputation using chained equations: Issues and guidance for practice. Statist Med 2010; 30(4): 377-399. http://dx.doi.org/10.1002/sim.4067 DOI: https://doi.org/10.1002/sim.4067
Wayman JC. Multiple imputation for missing data: What is it and how can I use it 2003.
Baneshi M, Talei A. Dichotomisation of continuous data: review of methods, advantages, and disadvantages. Iran J Cancer Prevent 2011; 4(1): 26-32.
Garrusi B, Garousi S, Baneshi MR. Body image and body change: predictive factors in an Iranian population. Int J Prevent Med 2013; 4(8): 940.
Al-Sendi A, Shetty P, Musaiger A. Prevalence of overweight and obesity among Bahraini adolescents: a comparison between three different sets of criteria. Eur J Clin Nutr 2003; 57(3): 471-474. http://dx.doi.org/10.1038/sj.ejcn.1601560 DOI: https://doi.org/10.1038/sj.ejcn.1601560
Vittinghoff E, McCulloch CE. Relaxing the rule of ten events per variable in logistic and Cox regression. Am J Epidemiol 2007; 165(6): 710. http://dx.doi.org/10.1093/aje/kwk052 DOI: https://doi.org/10.1093/aje/kwk052
Knol MJ, et al. Unpredictable bias when using the missing indicator method or complete case analysis for missing confounder values: an empirical example. J Clin Epidemiol 2010; 63: 728-736. http://dx.doi.org/10.1016/j.jclinepi.2009.08.028 DOI: https://doi.org/10.1016/j.jclinepi.2009.08.028
Langkamp DL, Lehman A, Lemeshow S. Techniques for handling missing data in secondary analyses of large surveys. Acad Pediatr 2010; 10(3): 205-210. http://dx.doi.org/10.1016/j.acap.2010.01.005 DOI: https://doi.org/10.1016/j.acap.2010.01.005
Guan NC, Yusoff MSB. Missing values in data analysis: Ignore or Impute? 2011.
Morris TP, et al. Multiple imputation for an incomplete covariate that is a ratio. Statist Med 2014; 33(1): 88-104. http://dx.doi.org/10.1002/sim.5935 DOI: https://doi.org/10.1002/sim.5935
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2014 Mohammad Reza Baneshi, Behshid Garrusi, Saiedeh Haji-Maghsoudi
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
Policy for Journals/Articles with Open Access
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are permitted and encouraged to post links to their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work
Policy for Journals / Manuscript with Paid Access
Authors who publish with this journal agree to the following terms:
- Publisher retain copyright .
- Authors are permitted and encouraged to post links to their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work .