Imputation of Missing Data for a Continuous Variable with an Ordinal form of Risk Function: When to Apply the Transformation?

Authors

  • Mohammad Reza Baneshi Research Center for Modeling in Health, Institute for Futures Studies in Health, Kerman University of Medical Sciences, Kerman, Kerman University of Medical Sciences, Kerman, Iran
  • Behshid Garrusi Department of Community Medicine, Neuroscience Research Center, Afzalipour Medical School, Kerman University of Medical Sciences, Kerman, Iran
  • Saiedeh Haji-Maghsoudi Research Center for Social Determinants of Health, Institute for Futures Studies in Health, Kerman University of Medical Sciences, Iran

DOI:

https://doi.org/10.6000/1929-6029.2014.03.04.6

Keywords:

Missing data, risk function, transformation, Multiple Imputation.

Abstract

Introduction: Imputation of missing data and selection of appropriate risk function are of importance . Sometimes a variable with continuous nature will be offered to the regression model as an ordinal variable. Our aim is to investigate whether to offer the continuous form of the variable to the imputation phase and its ordinal from to the modeling phase, or whether to offer the ordinal version to both phases.

Material and Methods: The outcome and main variable of interest was use of diet as a body change approach, and Body Mass Index (BMI). We randomly deleted 10%, 20%, and 40% of BMI values. In strategies 1 and 2, BMI was offered to the imputation phase as a continuous (BMIC) and ordinal variable (BMIO). Missing data were imputed using linear and polytomous regression respectively. In strategy 1, after imputation, BMIC was categorized (named BMICO) and offered to the modeling phase. In strategy 2, after imputation of BMIO values, this variable was offered to the logistic model (named BMIOO). We compared two strategies at Event Per Variables (EPV) of 75, 10, and 5.

Result: At EPVs of 75 and 10 no remarkable difference was seen. However, at EPV of 5, strategy 2 was superior. At 20% and 40% missing rates, strategy 1 was 2.21 and 3.67 times more likely to produce Severe Relative Bias. At high missing rate, power was higher in strategy2 (90% versus 83%).

Conclusions: When EPV is low and missing rate is high, categorizing of variable before imputation of missing data produces less SRB and leads to higher power.

Author Biographies

Mohammad Reza Baneshi, Research Center for Modeling in Health, Institute for Futures Studies in Health, Kerman University of Medical Sciences, Kerman, Kerman University of Medical Sciences, Kerman, Iran

Futures Studies in Health

Behshid Garrusi, Department of Community Medicine, Neuroscience Research Center, Afzalipour Medical School, Kerman University of Medical Sciences, Kerman, Iran

Neuroscience Research Center

Saiedeh Haji-Maghsoudi, Research Center for Social Determinants of Health, Institute for Futures Studies in Health, Kerman University of Medical Sciences, Iran

Health

References

Barzi F, Woodward M. Imputations of missing values in practice: results from imputations of serum cholesterol in 28 cohort studies. Am J Epidemiol 2004; 160(1): 34-45. http://dx.doi.org/10.1093/aje/kwh175 DOI: https://doi.org/10.1093/aje/kwh175

Donders ART, et al. Review: a gentle introduction to imputation of missing values. J Clin Epidemiol 2006; 59(10): 1087-1091. http://dx.doi.org/10.1016/j.jclinepi.2006.01.014 DOI: https://doi.org/10.1016/j.jclinepi.2006.01.014

Acock AC. Working with missing values. J Marriage Family 2005; 67(4): 1012-1028. http://dx.doi.org/10.1111/j.1741-3737.2005.00191.x DOI: https://doi.org/10.1111/j.1741-3737.2005.00191.x

Arnold AM, Kronmal RA. Multiple imputation of baseline data in the cardiovascular health study. Am J Epidemiol 2003; 157(1): 74. http://dx.doi.org/10.1093/aje/kwf156 DOI: https://doi.org/10.1093/aje/kwf156

Baneshi MR, Talei AR. Impact of imputation of missing data on estimation of survival rates: an example in breast cancer. Iran J Cancer Prevent 2010; 3(3): 127-31.

Bono C, et al. Missing data on the Center for Epidemiologic Studies Depression Scale: a comparison of 4 imputation techniques. Res Soc Admin Pharm 2007; 3(1): 1-27. http://dx.doi.org/10.1016/j.sapharm.2006.04.001 DOI: https://doi.org/10.1016/j.sapharm.2006.04.001

Baneshi MR. Prevention of Disease Complications through Diagnostic Models: How to Tackle the Problem of Missing Data? Iran J Public Health 2012; 41(1).

Baneshi MR, Talei AR. Does the Missing Data Imputation Method Affect the Composition and Performance of Prognostic Models? Iran Red Cresce Med J 2012; 14(1):

-6.

Zhang X. A Study of Methods for Missing data problems in Epidemiologic Studies with Historical Exposures 2009; University of Southern California.

Van Buuren S, Boshuizen HC, Knook DL. Multiple imputation of missing blood pressure covariates in survival analysis. Statist Med 1999; 18(6): 681-694. http://dx.doi.org/10.1002/(SICI)1097-0258(19990330)18:6<681::AID-SIM71>3.0.CO;2-R DOI: https://doi.org/10.1002/(SICI)1097-0258(19990330)18:6<681::AID-SIM71>3.0.CO;2-R

Janssen KJM, et al. Missing covariate data in medical research: to impute is better than to ignore. J Clin Epidemiol 2010; 63(7): 721-727. http://dx.doi.org/10.1016/j.jclinepi.2009.12.008 DOI: https://doi.org/10.1016/j.jclinepi.2009.12.008

Marshall A, et al. Comparison of techniques for handling missing covariate data within prognostic modelling studies: a simulation study. BMC Med Res Methodol 2010; 10(1): 7. http://dx.doi.org/10.1186/1471-2288-10-7 DOI: https://doi.org/10.1186/1471-2288-10-7

Gorelick MH. Bias arising from missing data in predictive models. J Clin Epidemiol 2006; 59(10): 1115-1123. http://dx.doi.org/10.1016/j.jclinepi.2004.11.029 DOI: https://doi.org/10.1016/j.jclinepi.2004.11.029

Schenker N, et al. Multiple imputation of missing income data in the National Health Interview Survey. J Am Statist Assoc 2006; 101(475): 924-933. http://dx.doi.org/10.1198/016214505000001375 DOI: https://doi.org/10.1198/016214505000001375

Azur MJ, et al. Multiple imputation by chained equations: what is it and how does it work? Int J Methods Psychiatr Res 2011; 20(1): 40-49. http://dx.doi.org/10.1002/mpr.329 DOI: https://doi.org/10.1002/mpr.329

White IR, Royston P, Wood AM. Multiple imputation using chained equations: Issues and guidance for practice. Statist Med 2010; 30(4): 377-399. http://dx.doi.org/10.1002/sim.4067 DOI: https://doi.org/10.1002/sim.4067

Wayman JC. Multiple imputation for missing data: What is it and how can I use it 2003.

Baneshi M, Talei A. Dichotomisation of continuous data: review of methods, advantages, and disadvantages. Iran J Cancer Prevent 2011; 4(1): 26-32.

Garrusi B, Garousi S, Baneshi MR. Body image and body change: predictive factors in an Iranian population. Int J Prevent Med 2013; 4(8): 940.

Al-Sendi A, Shetty P, Musaiger A. Prevalence of overweight and obesity among Bahraini adolescents: a comparison between three different sets of criteria. Eur J Clin Nutr 2003; 57(3): 471-474. http://dx.doi.org/10.1038/sj.ejcn.1601560 DOI: https://doi.org/10.1038/sj.ejcn.1601560

Vittinghoff E, McCulloch CE. Relaxing the rule of ten events per variable in logistic and Cox regression. Am J Epidemiol 2007; 165(6): 710. http://dx.doi.org/10.1093/aje/kwk052 DOI: https://doi.org/10.1093/aje/kwk052

Knol MJ, et al. Unpredictable bias when using the missing indicator method or complete case analysis for missing confounder values: an empirical example. J Clin Epidemiol 2010; 63: 728-736. http://dx.doi.org/10.1016/j.jclinepi.2009.08.028 DOI: https://doi.org/10.1016/j.jclinepi.2009.08.028

Langkamp DL, Lehman A, Lemeshow S. Techniques for handling missing data in secondary analyses of large surveys. Acad Pediatr 2010; 10(3): 205-210. http://dx.doi.org/10.1016/j.acap.2010.01.005 DOI: https://doi.org/10.1016/j.acap.2010.01.005

Guan NC, Yusoff MSB. Missing values in data analysis: Ignore or Impute? 2011.

Morris TP, et al. Multiple imputation for an incomplete covariate that is a ratio. Statist Med 2014; 33(1): 88-104. http://dx.doi.org/10.1002/sim.5935 DOI: https://doi.org/10.1002/sim.5935

Downloads

Published

2014-11-06

How to Cite

Baneshi, M. R., Garrusi, B., & Haji-Maghsoudi, S. (2014). Imputation of Missing Data for a Continuous Variable with an Ordinal form of Risk Function: When to Apply the Transformation?. International Journal of Statistics in Medical Research, 3(4), 378–383. https://doi.org/10.6000/1929-6029.2014.03.04.6

Issue

Section

General Articles