ijsmr

International Journal of Statistics in Medical Research

Multiple Imputation by Fully Conditional Specification for Dealing with Missing Data in a Large Epidemiologic Study
Pages 287-295
Yang Liu and Anindya De
DOI:
http://dx.doi.org/10.6000/1929-6029.2015.04.03.7
Published: 19 August 2015


Abstract: Missing data commonly occur in large epidemiologic studies. Ignoring incompleteness or handling the data inappropriately may bias study results, reduce power and efficiency, and alter important risk/benefit relationships. Standard ways of dealing with missing values, such as complete case analysis (CCA), are generally inappropriate due to the loss of precision and risk of bias. Multiple imputation by fully conditional specification (FCS MI) is a powerful and statistically valid method for creating imputations in large data sets which include both categorical and continuous variables. It specifies the multivariate imputation model on a variable-by-variable basis and offers a principled yet flexible method of addressing missing data, which is particularly useful for large data sets with complex data structures. However, FCS MI is still rarely used in epidemiology, and few practical resources exist to guide researchers in the implementation of this technique. We demonstrate the application of FCS MI in support of a large epidemiologic study evaluating national blood utilization patterns in a sub-Saharan African country. A number of practical tips and guidelines for implementing FCS MI based on this experience are described.

Keywords: Missing data, multiple imputation, fully conditional specification, complete case analysis, blood utilization.
Download Full Article

International Journal of Statistics in Medical Research

Application of Generalized Additive Models to the Evaluation of Continuous Markers for Classification Purposes
Pages 296-305
Mónica López-Ratón, Mar Rodríguez-Girondo, María Xosé Rodríguez-Álvarez, Carmen Cadarso-Suárez and Francisco Gude
DOI:
http://dx.doi.org/10.6000/1929-6029.2015.04.03.8
Published: 19 August 2015


Abstract: Background: Receiver operating characteristic (ROC) curve and derived measures as the Area Under the Curve (AUC) are often used for evaluating the discriminatory capability of a continuous biomarker in distinguishing between alternative states of health. However, if the marker shows an irregular distribution, with a dominance of diseased subjects in noncontiguous regions, classification using a single cutpoint is not appropriate, and it would lead to erroneous conclusions. This study sought to describe a procedure for improving the discriminatory capacity of a continuous biomarker, by using generalized additive models (GAMs) for binary data.

Methods: A new classification rule is obtained by using logistic GAM regression models to transform the original biomarker, with the predicted probabilities being the new transformed continuous biomarker. We propose using this transformed biomarker to establish optimal cut-offs or intervals on which to base the classification. This methodology is applied to different controlled scenarios, and to real data from a prospective study of patients undergoing surgery at a University Teaching Hospital, for examining plasma glucose as postoperative infection biomarker.

Results: Both, theoretical scenarios and real data results show that when the risk marker-disease relationship is not monotone, using the new transformed biomarker entails an improvement in discriminatory capacity. Moreover, in these situations, an optimal interval seems more reasonable than a single cutpoint to define lower and higher disease-risk categories.

Conclusions: Using statistical tools which allow for greater flexibility (e.g., GAMs) can optimize the classificatory capacity of a potential marker using ROC analysis. So, it is important to question linearity in marker-outcome relationships, in order to avoid erroneous conclusions.

Keywords: Discriminatory capability, ROC, AUC, optimal cutpoint, biomarker, plasma glucose.
Download Full Article

ijsmr logo-pdf 1349088093

Statistics and Policy Decisions: Issues in Statistical Analyses
Pages 162-171
Helena Chmura Kraemer
DOI:
http://dx.doi.org/10.6000/1929-6029.2015.04.02.1
Published: 21 May 2015


Abstract: When national policy decisions are to be guided by the results of statistical analyses, it is important, to avoid being misled to look beyond the authors’ conclusions and first to assess the study design, measurement and analytic methods, in order to decide whether a study’s conclusions rest on a solid foundation. In particular, observational studies must be carefully and critically evaluated. Using a study widely cited concerning the effects of low-level lead exposure and IQ, we illustrate several methodological errors, long known but often ignored. The goal is not to settle the controversies about the effect of lead on IQ, nor to disparage observational studies, for they are the foundation of all studies done to guide policy, but to encourage additional care in the use of such studies to address policy questions.

Keywords: Policy decisions, Statistical Significance, Practical or Policy Significance, Methodological Errors, Lead/IQ Association.

 

Download Full Article

ijsmr logo-pdf 1349088093

On the Relationship between the Reliability and Accuracy of Bio-Behavioral Diagnoses: Simple Math to the Rescue
Pages 172-179
Dom Cicchetti
DOI:
http://dx.doi.org/10.6000/1929-6029.2015.04.02.2
Published: 21 May 2015


Abstract: An equivalence between the J statistic (Jack Youden, 1950) and the Kappa statistic (K), Cohen (1960), was discovered by Helena Kraemer (1982). J is defined as: [Sensitivity (Se) + Specificity (Sp)] – 1. The author (2011) added the remaining two validity components to the J Index, namely, Predicted Positive Accuracy (PPA) and Predicted Negative Accuracy (PNA). The resulting D Index or D = [(Se + Sp) + (PPA + PNA) – 1] / 2. The purpose of this research is to compare J and D as estimates of K, using both actual and simulated data sets. The actual data consisted of ratings of clinical depression and self-reports of gonorrhea. The simulated data sets represented binary diagnoses when the percentages of Negative and Positive cases were: (Identical; Slightly varying; Mildly varying; Moderately varying; or Markedly varying diagnostic patterns, For both the diagnosis of clinical depression, and the self-reports of gonorrhea, D produced closer approximations to Kappa. For the simulated data, under both identical and slightly different patterns of assigning Negative and Positive binary diagnoses, K, D and J produced identical results. While J produced acceptably close values to K under the condition of Mild discrepancies in the proportions of Negative and Positive cases, D continued to more closely approximate K. While D more closely estimated K under Markedly varying diagnostic patterns, D produced values under this extreme condition that were closer than would have been predicted. The significance of these findings for future research is discussed.

Keywords: Binary Diagnoses, Diagnostic Reliability, Diagnostic Accuracy.

Download Full Article