ijsmr
Examining the Probabilities of Type I Error for Unadjusted All Pairwise Comparisons and Bonferroni Adjustment Approaches in Hypothesis Testing for Proportions |
Abstract: The aim of this study is to examine the association among the probabilities of Type I errorobtained by Unadjusted All Pairwise Comparisons (UAPC) and Bonferroni-adjustment approaches, the sample size and the frequency of occurrence of an event (prevalence, proportion) in hypothesis testing of difference among the proportions in studies. In the simulation experiment planned for this purpose, 4 groups were formed and the proportions in each group were chosen between 0.10 and 0.90 so that they will be equal at each experiment. Furthermore, the sample sizes were chosen from 20 to 1000. In accordance with these scenarios, the probabilities of Type I error were calculated by both of approaches. In each approach, a significant S-curve relationship was found between the probability of Type I error and sample size. However, a significant quadratic relationship was found between the probabilities of Type I error and the proportions in each group. Nonlinear functional relations were put forward in order to estimate the observed Type I errorrates obtained by the two different approaches where sample size and the proportion in each group are known. Furthermore, it was founded that Bonferroni-adjustment approach cannot always protect Type I error level. It was observed that the probability of Type I error estimated by the functional relation on Type I error rate for UAPC approach is lower than the values calculated using the formula in the literature. Keywords: Proportion comparison, type I error, bonferroniadjustment, unadjusted all pairwise comparisons.Download Full Article |
Predictive Modelling of Patient Reported Radiotherapy-Related Toxicity by the Application of Symptom Clustering and Autoregression |
Abstract: Patient reported outcome measures (PROMs) are increasingly being used in research to explore experiences of cancer survivors. Techniques to predict symptoms, with the aim of providing triage care, rely on the ability to analyse trends in symptoms or quality of life and at present are limited. The secondary analysis in this study uses a statistical method involving the application of autoregression (AR) to PROMs in order to predict symptom intensity following radiotherapy, and to explore its feasibility as an analytical tool. The technique is demonstrated using an existing dataset of 94 prostate cancer patients who completed a validated battery of PROMs over time. In addition the relationship between symptoms was investigated and symptom clusters were identified to determine their value in assisting predictive modeling. Three symptom clusters, namely urinary, gastrointestinal and emotional were identified. The study indicates that incorporating symptom clustering into predictive modeling helps to identify the most informative predictor variables. The analysis also showed that the degree of rise of symptom intensity during radiotherapy has the ability to predict later radiotherapy-related symptoms. The method was most successful for the prediction of urinary and gastrointestinal symptoms. Quantitative or qualitative prediction was possible on different symptoms. The application of this technique to predict radiotherapy outcomes could lead to increased use of PROMs within clinical practice. This in turn would contribute to improvements in both patient care after radiotherapy and also strategies to prevent side effects. In order to further evaluate the predictive ability of the approach, the analysis of a larger dataset with a longer follow up was identified as the next step. Keywords: Predictive Modeling, Patient Reported Outcome Measures, Autoregression, Radiotherapy-Related Side Effects, Longitudinal Study.Download Full Article |
The Current State of Validation of Administrative Healthcare Databases in Italy: A Systematic Review |
Abstract: Background:Administrative healthcare databases are widely present in Italy. Our aim was to describe the current state of healthcare databases validity in terms of discharge diagnoses (according to the International Classification of Diseases, ICD-9 code) and their output in terms of research. Methods: A systematic search of electronic databases including Medline and Embase (1995-2013) and of local sources was performed. Inclusion criteria were: healthcare databases in any Italian territory routinely and passively collecting data; medical investigations or procedures at patient level data; the use of a validation process. The quality of studies was evaluated using the STARD criteria. Citations of the included studies were explored using Scopus and Google Scholar. Results: The search strategy allowed the identification of 16 studies of which 3 were in Italian. Thirteen studies used regional administrative databases from Lombardia, Piemonte, Lazio, Friuli-Venezia Giulia and Veneto. The ICD-9 codes of the following diseases were successfully validated: amyotrophic lateral sclerosis (3 studies in four different regional administrative databases), stroke (3 studies), gastrointestinal bleeding (1 study), thrombocytopenia (1 study), epilepsy (1 study), infection (1 study), chronic obstructive pulmonary disease (1 study), Guillain-Barre syndrome (1 study), and cancer diseases (4 studies). The quality of reporting was variable among the studies. Only 6 administrative databases produced further research related to the validated ICD-9 codes. Conclusion: Administrative healthcare databases in Italy need an extensive process of validation for multiple diagnostic codes to perform high quality epidemiological and health services research. Keywords: Healthcare databases, Sensitivity, Specificity, Predictive values, Health administrative data, Diagnostic accuracy, Misclassification bias, Diagnostic accuracy, Health services research, Epidemiology.Download Full Article |
Development of Predictive Models for Continuous Flow Left Ventricular Assist Device Patients using Bayesian Networks |
Abstract: Background:Existing prognostic tools for patient selection for ventricular assist devices (VADs) such as the Destination Therapy Risk Score (DTRS) and newly published HeartMate II Risk Score (HMRS) have limited predictive ability, especially with the current generation of continuous flow VADs (cfVADs). This study aims to use a modern machine learning approach, employing Bayesian Networks (BNs), which overcomes some of the limitations of traditional statistical methods. Methods:Retrospective data from 144 patients at Allegheny General Hospital and Integris Health System from 2007 to 2011 were analyzed. 43 data elements were grouped into four sets: demographics, laboratory tests, hemodynamics, and medications. Patients were stratified by survival at 90 days post LVAD. Results:The independent variables were ranked based on their predictive power and reduced to an optimal set of 10: hematocrit, aspartate aminotransferase, age, heart rate, transpulmonary gradient, mean pulmonary artery pressure, use of diuretics, platelet count, blood urea nitrogen and hemoglobin. Two BNs, Naïve Bayes (NB) and Tree-Augmented Naïve Bayes (TAN) outperformed the DTRS in identifying low risk patients (specificity: 91% and 93% vs. 78%) and outperformed HMRS predictions of high risk patients (sensitivity: 80% and 60% vs. 25%). Both models were more accurate than DTRS and HMRS (90% vs. 73% and 84%), Kappa (NB: 0.56 TAN: 0.48, DTRS: 0.14, HMRS: 0.22), and AUC (NB: 80%, TAN: 84%, DTRS: 59%, HMRS: 59%). Conclusion:The Bayesian Network models developed in this study consistently outperformed the DTRS and HMRS on all metrics. An added advantage is their intuitive graphical structure that closely mimics natural reasoning patterns. This warrants further investigation with an expanded patient cohort, and inclusion of adverse event outcomes. Keywords: Risk Stratification, Heart Failure, Bayesian, Decision Support, Prognosis, VAD, Risk Score.Download Full Article |
A Simple Approach to Sample Size Calculation for Count Data in Matched Cohort Studies |
Abstract: In matched cohort studies exposed and unexposed individuals are matched on certain characteristics to form clusters to reduce potential confounding effects. Data in these studies are clustered and thus dependent due to matching. When the outcome is a Poisson count, specialized methods have been proposed for sample size estimation. However, in practice the variance of the counts often exceeds the mean (i.e. counts are overdispersed), so that Poisson methods don’t apply. We propose a simple approach for calculating statistical power and sample size for clustered Poisson data when the proportion of exposed subjects in a cluster is constant across clusters. We extend the approach to clustered count data with overdispersion, which is common in practice. We evaluate these approaches with simulation studies and apply them to a matched cohort study examining the association of parental depression with health care utilization. Simulation results show that the methods for estimating power and sample size performed reasonably well under the scenarios examined and were robust in the presence of mixed exposure proportions up to 30%. Keywords: Clustered Poisson data, Overdispersion, Subject heterogeneity, Statistical power, Sample size.Download Full Article |