ijsmr

International Journal of Statistics in Medical Research

Development of Predictive Models for Continuous Flow Left Ventricular Assist Device Patients using Bayesian Networks
Pages 423-434
Natasha A. Loghmanpour, Manreet K. Kanwar, Raymond L. Benza, Srinivas Murali and James F. Antaki
DOI:
http://dx.doi.org/10.6000/1929-6029.2014.03.04.11
Published: 06 November 2014


Abstract: Background:Existing prognostic tools for patient selection for ventricular assist devices (VADs) such as the Destination Therapy Risk Score (DTRS) and newly published HeartMate II Risk Score (HMRS) have limited predictive ability, especially with the current generation of continuous flow VADs (cfVADs). This study aims to use a modern machine learning approach, employing Bayesian Networks (BNs), which overcomes some of the limitations of traditional statistical methods.

Methods:Retrospective data from 144 patients at Allegheny General Hospital and Integris Health System from 2007 to 2011 were analyzed. 43 data elements were grouped into four sets: demographics, laboratory tests, hemodynamics, and medications. Patients were stratified by survival at 90 days post LVAD.

Results:The independent variables were ranked based on their predictive power and reduced to an optimal set of 10: hematocrit, aspartate aminotransferase, age, heart rate, transpulmonary gradient, mean pulmonary artery pressure, use of diuretics, platelet count, blood urea nitrogen and hemoglobin. Two BNs, Naïve Bayes (NB) and Tree-Augmented Naïve Bayes (TAN) outperformed the DTRS in identifying low risk patients (specificity: 91% and 93% vs. 78%) and outperformed HMRS predictions of high risk patients (sensitivity: 80% and 60% vs. 25%). Both models were more accurate than DTRS and HMRS (90% vs. 73% and 84%), Kappa (NB: 0.56 TAN: 0.48, DTRS: 0.14, HMRS: 0.22), and AUC (NB: 80%, TAN: 84%, DTRS: 59%, HMRS: 59%).

Conclusion:The Bayesian Network models developed in this study consistently outperformed the DTRS and HMRS on all metrics. An added advantage is their intuitive graphical structure that closely mimics natural reasoning patterns. This warrants further investigation with an expanded patient cohort, and inclusion of adverse event outcomes.

Keywords: Risk Stratification, Heart Failure, Bayesian, Decision Support, Prognosis, VAD, Risk Score.
Download Full Article

International Journal of Statistics in Medical Research

The Current State of Validation of Administrative Healthcare Databases in Italy: A Systematic Review
Pages 309-320
Iosief Abraha, Massimiliano Orso, Piero Grilli, Francesco Cozzolino, Paolo Eusebi, Paola Casucci, Mauro Marchesi, Maria Laura Luchetta, Luisa Fruttini, Raoul Ciappelloni, Rita De Florio, Gianni Giovannini and Alessandro Montedori
DOI:
http://dx.doi.org/10.6000/1929-6029.2014.03.03.10
Published: 25 August 2014


Abstract: Background:Administrative healthcare databases are widely present in Italy. Our aim was to describe the current state of healthcare databases validity in terms of discharge diagnoses (according to the International Classification of Diseases, ICD-9 code) and their output in terms of research.

Methods: A systematic search of electronic databases including Medline and Embase (1995-2013) and of local sources was performed. Inclusion criteria were: healthcare databases in any Italian territory routinely and passively collecting data; medical investigations or procedures at patient level data; the use of a validation process. The quality of studies was evaluated using the STARD criteria. Citations of the included studies were explored using Scopus and Google Scholar.

Results: The search strategy allowed the identification of 16 studies of which 3 were in Italian. Thirteen studies used regional administrative databases from Lombardia, Piemonte, Lazio, Friuli-Venezia Giulia and Veneto. The ICD-9 codes of the following diseases were successfully validated: amyotrophic lateral sclerosis (3 studies in four different regional administrative databases), stroke (3 studies), gastrointestinal bleeding (1 study), thrombocytopenia (1 study), epilepsy (1 study), infection (1 study), chronic obstructive pulmonary disease (1 study), Guillain-Barre syndrome (1 study), and cancer diseases (4 studies). The quality of reporting was variable among the studies. Only 6 administrative databases produced further research related to the validated ICD-9 codes.

Conclusion: Administrative healthcare databases in Italy need an extensive process of validation for multiple diagnostic codes to perform high quality epidemiological and health services research.

Keywords: Healthcare databases, Sensitivity, Specificity, Predictive values, Health administrative data, Diagnostic accuracy, Misclassification bias, Diagnostic accuracy, Health services research, Epidemiology.
Download Full Article

International Journal of Statistics in Medical Research

A Simple Approach to Sample Size Calculation for Count Data in Matched Cohort Studies
Pages 321-330
Dexiang Gao, Gary K. Grunwald and Stanley Xu
DOI:
http://dx.doi.org/10.6000/1929-6029.2014.03.03.11
Published: 25 August 2014


Abstract: In matched cohort studies exposed and unexposed individuals are matched on certain characteristics to form clusters to reduce potential confounding effects. Data in these studies are clustered and thus dependent due to matching. When the outcome is a Poisson count, specialized methods have been proposed for sample size estimation. However, in practice the variance of the counts often exceeds the mean (i.e. counts are overdispersed), so that Poisson methods don’t apply. We propose a simple approach for calculating statistical power and sample size for clustered Poisson data when the proportion of exposed subjects in a cluster is constant across clusters. We extend the approach to clustered count data with overdispersion, which is common in practice. We evaluate these approaches with simulation studies and apply them to a matched cohort study examining the association of parental depression with health care utilization. Simulation results show that the methods for estimating power and sample size performed reasonably well under the scenarios examined and were robust in the presence of mixed exposure proportions up to 30%.

Keywords: Clustered Poisson data, Overdispersion, Subject heterogeneity, Statistical power, Sample size.
Download Full Article

ijsmr logo-pdf 1349088093

Comparison of Methods for Clustered Data Analysis in a Non-Ideal Situation: Results from an Evaluation ofPredictors of Yellow Fever Vaccine Refusal in the Global TravEpiNet(GTEN) Consortium
Pages 215-223
Sowmya R. Rao, Regina C. LaRocque, Emily S.Jentes, Stefan H.F. Hagmann, Edward T. Ryan, PaulineV. Han, DavidG. Kleinbaum and Global TravEpiNet Consortium
DOI:
http://dx.doi.org/10.6000/1929-6029.2014.03.03.1
Published: 05 August 2014


Abstract: Not accounting for clustering in data from multiple centersmight yield biased estimates and their standard errors, potentially leading to incorrect inferences.We fit 15 different models with different correlation structures and with/without adjustment for small clusters, including unadjusted logistic regression, Population-averaged models (Generalized Estimating Equations), Cluster-specific models (linear and non-linear with random intercept)and Survey data analysis methodsto study the association of variables with the probability of declining yellow fever vaccine among patients seeking pre-travel health consultations at 18 US practices in the Global TravEpiNet Consortium from1 January, 2009,to6 June, 2012. Results varied by the method chosen. Generally, when the odds ratio estimates were similar, adjusting for clustering and the small number of clinics increased the standard errors. We chose the random intercept model with the Morel, Bokossa and Neerchal (MBN) adjustment to be the most preferable method for the GTEN dataset since this was one of the more conservative models that accounted for clustering, small sample sizes and also the random effect due to site.Investigators should not ignore clusteringand consider the appropriate adjustments necessary for their studies.

Keywords: Clustering, cluster size, cluster imbalance, data analysis.
Download Full Article