Supplementing Missing Self-Reported Race Data with a Probability Distribution in Logistic Regression Models
DOI:
https://doi.org/10.6000/1929-6029.2015.04.03.2Keywords:
Race and ethnicity, Bayesian Improved Surname Geocoding, up-to-date immunization, direct substitution approach, partial information maximum likelihood estimatorAbstract
Race is often included as an independent variable in health services research, especially in recent studies of racial and ethnic disparities in health care. Although self-reported information on race exists in large electronic health records (EHR) data, these data are sometimes missing. Recently Bayesian Improved Surname Geocoding method (BISG) is used to estimate the probability distribution of race categories for those with missing information on race. The BISG estimated probability distribution has been used in reporting health care measures but not in statistical modellings with dichotomous events as outcomes. We propose two approaches to accommodate available distribution probability of an independent categorical variable (e.g., race) in logistic regression models: 1) a direct substitution approach and 2) a partial information maximum likelihood estimator (PIMLE). In examining the association between race and up-to-date immunization status of children by three years old from an integrated health care organization, 11.3% of 14,903 children have missing self-reported race information but have BISG estimated probability distribution for the six race/ethnicity categories. We employed the direct substitution approach and PIMLE approach to analyze the under vaccination data. Both approaches included all observations and thus yielded smaller standard errors of estimated coefficients compared to the complete data analyses. Our simulation study showed that the direct substitution approach and PIMLE yielded nearly unbiased coefficient estimates and preserved efficiency when the missing rate of the independent categorical variable was up to 30%.
References
Boehmer U, Kressin NR, Berlowitz DR, Christiansen CL, Kazis LE, Jones JA. Self-reported vs administrative race/ethnicity data and study results. Am J Public Health 2002; 92: 1471-2. http://dx.doi.org/10.2105/AJPH.92.9.1471 DOI: https://doi.org/10.2105/AJPH.92.9.1471
Bilheimer LT, Sisk JE. Continue collecting adequate data on racial and ethnic disparities in health: the challenges. Health Affairs 2008; 27: 383-91. http://dx.doi.org/10.1377/hlthaff.27.2.383 DOI: https://doi.org/10.1377/hlthaff.27.2.383
Institute of Medicine 2009. Race, ethnicity, and language data: standardization for health care quality improvement.Washington, DC: The National Academies Press.
Elliott MN, Fremont A, Morrison PA, Pantoja P, Lurie N. A new method for estimating race/ethnicity and associated disparities where administrative records lack self-reported race/ethnicity. Health Serv Res 2008; 43: 1722-36. http://dx.doi.org/10.1111/j.1475-6773.2008.00854.x DOI: https://doi.org/10.1111/j.1475-6773.2008.00854.x
Elliott MN, Morrison P, Fremont A, McCaffrey D, Pantoja P, Lurie N. Using the census bureau’s surname list to improve estimates of race/ethnicity and associated disparities. Health Services and Outcomes Research Methodology 2009; 9: 69-83. http://dx.doi.org/10.1007/s10742-009-0047-1 DOI: https://doi.org/10.1007/s10742-009-0047-1
Adjaye-Gbewonyo D, Bednarczyk RA, Davis RL, Omer SB. Using the Bayesian Improved Surname Geocoding Method (BISG) to create a working classification of race and ethnicity in a diverse managed care population: a validation study. Health Serv Res 2014; 49: 268-83. http://dx.doi.org/10.1111/1475-6773.12089 DOI: https://doi.org/10.1111/1475-6773.12089
van der Heijden GJ, Donders AR, Stijnen T, Moons KG. Imputation of missing values is superior to complete case analysis and the missing-indicator method in multivariable diagnostic research: a clinical example. J Clin Epidemiol 2006; 59: 1102-9. http://dx.doi.org/10.1016/j.jclinepi.2006.01.015 DOI: https://doi.org/10.1016/j.jclinepi.2006.01.015
Janssen KJ, Donders AR, Harrell FE Jr, Vergouwe Y, Chen Q, Grobbee DE, Moons KG. Missing covariate data in medical research: to impute is better than to ignore. J Clin Epidemiol 2010; 63: 721-7. http://dx.doi.org/10.1016/j.jclinepi.2009.12.008 DOI: https://doi.org/10.1016/j.jclinepi.2009.12.008
Raebel MA, Xu S, Goodrich GK, Schroeder EB, Schmittdiel JA, Segal JB, O’Connor PJ, Nichols GA, Lawrence JM, Kirchner HL, Elston Lafata J, Butler M, Newton KM, Steiner JF. Initial antihyperglycemic drug therapy among 241 327 adults with newly identified diabetes from 2005 through 2010: a surveillance, prevention, and management of diabetes mellitus (SUPREME-DM) study. Ann Pharmacother 2013; 47: 1280-91. http://dx.doi.org/10.1177/1060028013503624 DOI: https://doi.org/10.1177/1060028013503624
Horton NJ, Kleinman KP. Much ado about nothing: a comparison of missing data methods and software to fit incomplete data regression models. The American Statistician 2007; 61: 79-90. http://dx.doi.org/10.1198/000313007X172556 DOI: https://doi.org/10.1198/000313007X172556
McCaffrey DF, Elliott MN. Power of tests for a dichotomous independent variable measured with error. Health Serv Res 2008; 43: 1085-101. http://dx.doi.org/10.1111/j.1475-6773.2007.00810.x DOI: https://doi.org/10.1111/j.1475-6773.2007.00810.x
SAS Institute Inc 2011. Base SAS® 9.3 Procedures Guide. Cary, NC: SAS Institute Inc.
Glanz JM, Newcomer SR, Narwaney KJ, Hambidge SJ, Daley MF, Wagner NM, McClure DL, Xu S, Rowhani-Rahbar A, Lee GM, Nelson JC, Donahue JG, Naleway AL, Nordin JD, Lugg MM, Weintraub ES. A population-based cohort study of under vaccination in eight managed care organizations across the United States. Archives of Pediatrics & Adolescent Medicine. JAMA Pediatrics 2013; 167: 274-281. http://dx.doi.org/10.1001/jamapediatrics.2013.502 DOI: https://doi.org/10.1001/jamapediatrics.2013.502
Sugerman DE, Barskey AE, Delea MG, Ortega-Sanchez IR, Bi D, Ralston KJ, Rota PA, Waters-Montijo K, Lebaron CW. Measles outbreak in a highly vaccinated population, San Diego, 2008: role of the intentionally undervaccinated. Pediatrics 2010; 125: 747-55. http://dx.doi.org/10.1542/peds.2009-1653 DOI: https://doi.org/10.1542/peds.2009-1653
Omer SB, Enger KS, Moulton LH, Halsey NA, Stokley S, Salmon DA. Geographic clustering of nonmedical exemptions to school immunization requirements and associations with geographic clustering of pertussis. Am. J. Epidemiol 2008; 168: 1389-96. http://dx.doi.org/10.1093/aje/kwn263 DOI: https://doi.org/10.1093/aje/kwn263
Luman ET, Ching PL, Jumaan AO, Seward JF. Uptake of varicella vaccination among young children in the United States: a success story in eliminating racial and ethnic disparities. Pediatrics 2006; 117: 999-1008. http://dx.doi.org/10.1542/peds.2005-1201 DOI: https://doi.org/10.1542/peds.2005-1201
Centers for Disease Control and Prevention. National, state, and local area vaccination coverage among children aged 19-35 months — United States, 2011. Morbidity and Mortality Weekly Report (MMWR) 2012; 61: 689-696. Available from: http://www.cdc.gov/mmwr/preview/mmwrhtml/mm6135a1.htm
Xu S, Schroeder EB, Shetterly S, Goodrich GK, O'Connor PJ, Steiner JF, Schmittdiel JA, Desai J, Pathak RD, Neugebauer R, Butler MG, Kirchner L, Raebel MA. Accuracy of hemoglobin A1c imputation using fasting plasma glucose in diabetes research using electronic health records data. Statistics, Optimization & Information Computing 2014; 2: 93-104. DOI: https://doi.org/10.19139/soic.v2i2.68
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2015 Stanley Xu, Komal Narwaney, Sophia Newcomer, Jason Glanz
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
Policy for Journals/Articles with Open Access
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are permitted and encouraged to post links to their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work
Policy for Journals / Manuscript with Paid Access
Authors who publish with this journal agree to the following terms:
- Publisher retain copyright .
- Authors are permitted and encouraged to post links to their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work .