Development of Predictive Models for Continuous Flow Left Ventricular Assist Device Patients using Bayesian Networks

Background: Existing prognostic tools for patient selection for ventricular assist devices (VADs) such as the Destination Therapy Risk Score (DTRS) and newly published HeartMate II Risk Score (HMRS) have limited predictive ability, especially with the current generation of continuous flow VADs (cfVADs). This study aims to use a modern machine learning approach, employing Bayesian Networks (BNs), which overcomes some of the limitations of traditional statistical methods. Methods: Retrospective data from 144 patients at Allegheny General Hospital and Integris Health System from 2007 to 2011 were analyzed. 43 data elements were grouped into four sets: demographics, laboratory tests, hemodynamics, and medications. Patients were stratified by survival at 90 days post LVAD. Results: The independent variables were ranked based on their predictive power and reduced to an optimal set of 10: hematocrit, aspartate aminotransferase, age, heart rate, transpulmonary gradient, mean pulmonary artery pressure, use of diuretics, platelet count, blood urea nitrogen and hemoglobin. Two BNs, Naïve Bayes (NB) and Tree-Augmented Naïve Bayes (TAN) outperformed the DTRS in identifying low risk patients (specificity: 91% and 93% vs. 78%) and outperformed HMRS predictions of high risk patients (sensitivity: 80% and 60% vs. 25%). Both models were more accurate than DTRS and HMRS (90% vs. 73% and 84%), Kappa (NB: 0.56 TAN: 0.48, DTRS: 0.14, HMRS: 0.22), and AUC (NB: 80%, TAN: 84%, DTRS: 59%, HMRS: 59%). Conclusion: The Bayesian Network models developed in this study consistently outperformed the DTRS and HMRS on all metrics. An added advantage is their intuitive graphical structure that closely mimics natural reasoning patterns. This warrants further investigation with an expanded patient cohort, and inclusion of adverse event outcomes.


INTRODUCTION
Ventricular Assist Devices (VADs) have made great progress over the past 25 years in demonstrating clinical benefit and durability, and are steadily gaining traction in the treatment of advanced HF [1].They provide alternatives for patients awaiting transplant (Bridge to Transplant -BTT), for those ineligible for transplant (Destination Therapy -DT) or in patients where the eventual transplant candidacy is unknown (Bridge to Decision -BTD).Current estimates of potential VAD candidates in the US ranges from 80,000 to 200,000 per year [1].However, high incidence of adverse events diminishes the enthusiasm for this therapy, and hence stifles the rate of growth of its acceptance.
Likewise the resulting hospital readmissions of VAD patients -currently exceeding 50% within the first six months of initial dischargenegates potential cost-savings [2].As VAD use steadily increases [3][4][5], this has become an increasingly apparent obstacle to widespread application of this therapy.
This problem can be ameliorated by more judicious selection of patients and by implanting patients earlier in the progression of disease [6,7].This has motivated the development of risk scores to stratify patients based on the probability of clinical outcome.The most commonly cited of such scores for left ventricular assist device (LVAD) therapy is the Destination Therapy Risk Score (DTRS).However, it has demonstrated mediocre performance [8] in estimating mortality when applied to modern continuous flow LVAD (cfLVAD) populations.This is in part due to its derivation from a patient cohort with first generation pulsatile flow pumps [9].Accordingly, a more recent risk score was derived exclusively from patients receiving the HeartMate II, the most widely used cfVAD.This HeartMate II Risk Score (HMRS), however, demonstrated only marginal improvement over the DTRS.This score was able to predict 90-day survival based on five variables with an AUC of 70%, but only when applied to the same data from which it was derived.Furthermore, the predictors of long term (1 year) survival was limited to only two variables, age and implant center experience [10].We hypothesize that this is due to the shortcomings of traditional multivariate analysis, which cannot take into account nonlinear inter-relationships between significant clinical variables.
The purpose of this study was therefore to explore the use of modern Machine Learning methods, specifically Bayesian Networks (BNs) to provide a superior prognostic model for survival of LVAD recipients.The advantages of this algorithm include: its intuitive structure, efficient statistical pattern recognition, and tolerance of missing or erroneous data elements.

Patient Cohort
The dataset was derived retrospectively from a cohort of 144 patients who received an LVAD at Allegheny General Hospital (AGH) (n=88) and Integris Health System (IHS) (n=56) from January 2007 to December 2011.The inclusion criteria were: known device implant date, a minimum of 90 day post-op follow up information, at least 50% completeness of medical history and the use of a cfLVAD.

Pre-Operative Variable Selection and Data Preprocessing
The machine learning methods used for the present study were built upon techniques used previously by our group for decision support of optimal VAD weaning [11] and the need for right ventricular support due to right ventricular failure in LVAD recipients [12][13][14].Retrospective data elements were grouped into four sets (total of 43 elements): demographics (n=9), laboratory tests (n=12), hemodynamics (n=15) and medications (n=7).The predictive class value was 90day outcome and the patients were divided into two risk groups: low risk, defined as those who survived at least 90 days post LVAD implant; and high risk, those who died within 90 days.A subsequent pre-processing step was performed to fill in any missing data elements and discretize continuous variables.Data were extracted from preoperative day 14 to 1.For variables with multiple values, the value closest to the time of surgery was used.In circumstances where data elements were missing, data imputation techniques were implemented: mean for the continuous variables, mode for the category variables.

Bayesian Networks
One technique within the broad field of data mining is the Bayesian Network (BN) [15] classifier, which is an efficient algorithm that can recognize causal relationships or correlations among complex clinical variables, as well as relationships between each variable and the outcome (i.e.survival).BNs are acyclic directed graphs representing joint probability distributions over sets of variables.Every node is the graph represents a random variable.Lack of an arc between two nodes represents conditional independence between the variables that these nodes represent.Nodes are quantified by means of conditional probability tables (CPTs) associated with each node, describing the likelihood of the variable's values conditioned upon the values of adjacent nodes.For each node in the graph network, mutually exclusive and cumulatively exhausted states were defined and the CPT embedded.The joint probability is expressed as: Where P(X i parents(X i )) represents the conditional probability of variable X i .given the occurrence of its parent nodes.
The Naïve Bayes (NB) [16] classifier seen in Figure 1 is a simplified Bayesian classifier that uses Bayes theorem to predict which class a patient belongs to.Let C denote the class (mortality in this case) of the instance (patient), where c 1 =alive and c 2 =dead, and let X={x 1 ,x 2 ,…x 3 } be a vector of pre-operative variables (demographics, laboratory tests etc.) denoting the observed variable values (e.g.female or blood pressured of 90).For a patient with pre-operative variables X, NB uses the following equation to calculate the probability of the patient belonging to c 1 or c 2 : Where p(X i = x i C = c) represents the conditional probability of the pre-operative variable X i in specific states x i in the different classes c 1 and c 2 .In this study, we investigated both the NB and Tree-Augmented Naïve Bayes (TAN) [17,18] (Figure 2), where the former allows for only a single parent node and the latter allows two or more arrows to any single node.For example, NB could have INR connected to (associated with) survival and TAN could have INR connected to (associated with) both survival and age.TAN computes the conditional mutual information between pre-operative variables: ) log 2 P(x j , x k c i ) P(x j x i )P(x k c i ) Where X j and X k are pre-operative variables with the values x j and x k , respectively; C is the class node; P(x j , x k , c i ) is the joint probability of X j = x j and X k = x k ; C = c i in the dataset; P(x j , x k c i ) , P(x j x i ) and P(x k c i ) are the corresponding conditional probabilities.This measures the mutual dependence between two preoperative variables given the class variable.Building the maximum weighted spanning tree using this index, TAN represents the dependences among the preoperative variables, as well as the class variable.The example provided in Figure 2 would be represented as: The addition of edges between nodes allows for dependence between the variables, which is often the case in medical datasets.For additional details of the underlying BN theory and derivation, please refer to [19].The final NB and TAN models were based on a subset of clinical variables (called features) out of the original 43.This was done by a process known as feature selection, which reduces the total number of clinical variables to avoid over-fitting the model to the dataset.Using this method, the set of pre-operative variables was first ranked by Chi-Square ( 2 ) analysis: Where O=observed outcome and E=expected outcome.Once the variables are ranked, the top 10 most predictive variables were included in the model.
A two-step process was used to train and test the model to avoid overfitting to the data.First, a ten-fold cross validation was used for optimization of the algorithm parameter and model comparison.To perform this step, the data is subdivided into 10 sets of size n/10, where the model is trained on 9 of the sets and tested on 1.This is repeated 10 times and the mean accuracy is used to assess the final model's performance.In the second step, the resulting classifier/model was then evaluated on an independent testing/holdout dataset that has not been observed by the model.A training set was comprised of 66% of the data records while the testing set used the remaining 33%.The models were derived, built and implemented using an open-source machine learning software library (WEKA, Waikato Environment for Knowledge Analysis).

Statistical Analysis
The differences between groups (high vs. low risk) were evaluated by chi-square test.The comparison of continuous variables between the high and low risk patients was performed using either the Student t-test when normally distributed or the Wilcoxon rank sum test when distributed otherwise using SAS (version 9.3, Cary North Carolina).Categorical variables were compared using the Fisher exact test [20] with a commercial software package (MedCalc, version 9.5.0.0,Mariakerke, Belgium).Odds ratios (OR) and confidence intervals (CI) were also computed for both the high and low risk patients.
Sensitivity was defined as the ratio of true positives to the sum of true positives and false negatives (i.e.high risk patients correctly predicted as high risk).Specificity was defined as the ratio of true negatives to the sum of true negatives and false positives (i.e.low risk patients correctly predicted as low risk).Standard receiver operator characteristic (ROC) curves were constructed to illustrate overall sensitivity and specificity.The area under the ROC curve (AUC %) was calculated as an index of overall performance.These three metrics in addition to the Kappa statistic was used for comparison to the DTRS and HMRS.

Survival Analysis
Patient survival estimates were plotted using Kaplan-Meier curves.Differences between stratified groups was compared using log-rank statistics [21].The plots were created using JMP (Version 10.SAS Institute Inc., Cary, NC).Survival was calculated from the day of implant.Patients were censored if they received a heart transplant or device explant, which excluded 16 patients from the 144 cohort.

RESULTS
A summary of the pre-implant patient characteristic statistics can be seen in Tables 1 and 2. The overall accuracy, Kappa statistic and AUC % of both the NB and TAN models (Figure 3) exceeded the DTRS and HMRS in all three parameters (Table 3).The associated ROC curves (Figure 4) reveal the dramatic difference in performance between the Bayesian models (Figure 3) and traditional scores (NB: 79.5% and TAN: 83.6% vs. the DTRS: 59% and HMRS: 59%) when each was applied to our patient cohort.Likewise, the confusion matrix (Table 4) reveals the clear superiority of the NB and TAN (80% and 60%, respectively) at correctly identifying high risk patients as compared to the DTRS and HMRS (40% and 25%, respectively).Similarly, the false negatives (high risk patients inaccurately predicted as low risk) were much lower for the NB and TAN (20% and 40%) than both the DTRS and HMRS (60% and 75%).Although the NB and TAN (90.9% and 93.2%) performed similar to the HMRS (93.5%) in terms of specificity, they all outperformed the DTRS (78.2%).The NB, TAN and HMRS had similar frequency of false positives (9.1%, 6.8% and 6.4%), while the DTRS (21.4%) had a much higher percentage of incorrectly classified low risk patients.
Both the NB and TAN BN models were built from datasets using 10 most predictive variables, which emerged from feature selection based on their predictive power.The features listed in order from highest to lowest predictive power: hematocrit (%), aspartate aminotransferase (IU/liter), age, heart rate (beats/min), transpulmonary gradient (mm Hg), mean pulmonary artery pressure (mm Hg), use of diuretics, platelet count ( 109 /liter), blood urea nitrogen (mg/dl) and hemoglobin (g/dl).Many of the predictive features identified overlap with the DTRS (platelet count, mean pulmonary artery pressure, aspartate aminotransferase, hematocrit %, and BUN), yet only one overlapped with the HMRS (Age).There was no common variable that was included in all four predictive models.

Survival Outcome
The Kaplan Meier survival curves are provided in Figures 5-8 and summary in Table 5 for the DTRS, HMRS, NB, and TAN, with the patient population stratified into high and low risk.(The Kaplan Meier curve for the actual survival is provided in Figure 9 for reference).Both the DTRS and HMRS curves were derived from 136 patients (16 of the 144 were censored due to heart transplant within 90 days of VAD implant) and the NB and TAN were derived from 43 patients since the model performance was evaluated on the test set (48 patients, 5 of which were censored due to heart transplant within 90 days of VAD implant).The difference between the two groups was statistically significant for all models (log ranks: DTRS = 0.008, HMRS = 0.006, NB = <0.0001,TAN = 0.001).BMI: body mass index, BTT: bridge to transplant, BTD: bridge to decision, DT: destination therapy, HGB: hemoglobin, WBC: white blood cell count, HCT: hematocrit, ALT: Alanine Aminotransaminase, INR: international normalized ratio, BUN: blood urea nitrogen, AST: Aspartate Aminotransferase, IABP: intra-aortic balloon pump, PA: pulmonary artery, RV: right ventricular, RAP: right atrial pressure, PCWP: pulmonary capillary wedge pressure, TPG: transpulmonary pressure gradient, PVR: pulmonary vascular resistance, CO: cardiac output, CI: cardiac index, HR: heart rate.*Elective -LVAD scheduled as an outpatient, Urgent -LVAD scheduled same admission as heart failure decompensation, Emergent -following extracorporeal membrane oxygenation, IABP, acute coronary syndrome or ventricular tachycardia.†Compared elective vs. urgent and emergent.‡Compared BTT vs. DT.The DTRS (Figure 5) predicted 32 high risk and 96 low risk patients.Survival for high and low risk groups was 94% and 93%, respectively, at 30 days post-LVAD, 72% and 84% at 90 days, 50% and 71% at 180 days, and 22% and 40% at 1 year.
The NB model (Figure 7) predicted 4 high risk and 39 low risk patients.Because of the precipitous death rate of the NB high risk group, survival comparisons were made at 8 and 12 days, which were 75% and 50%, respectively.No patient identified as high risk survived beyond 71 days, consistent with the Kaplan Meier for the actual outcomes (Figure 9).Survival for the low risk group was 97% at 30 days post-LVAD, 95% at 90 days, 79% at 180 days, and 36% at 1 year.
Finally, the TAN model (Figure 8) predicted 5 high risk and 38 low risk patients.Survival for the high risk group was 60% at 30 days post-LVAD, 40% at 90 days, 20% at 180 days and 0% at 1 year.Survival for the low risk group was 95% at 30 days post-LVAD, 92% at 90 days, 79% at 180 days, and 37% at 1 year.

DISCUSSION
Although Bayesian statistics were first introduced (by Bayes) over 300 years ago, their utility in the medical workflow was only recognized within the past 25 years [22].Over the past ten years decision support models based on BNs have been published for a wide variety of medical disciplines [23][24][25][26][27][28].This study is however the first application of such models on a cohort of LVAD recipients.
Nearly all the variables in the models have been previously cited as being predictive for mortality and morbidity of LVAD recipients, including hematocrit [9], aspartate aminotransferas e [9], age [10,29], mean pulmonary artery pressure [9], platelet count [9] and blood urea nitrogen [9].The two classifiers investigate here exhibited different strengths and weaknesses.Although the NB classifier is less sophisticated than the TAN, it performed the best of all four scores that were evaluated.However, as the number patients and variables increase, the performance of TAN is likely to improve, while the NB would likely plateau and then diminish.This is due to the unique characteristic of the TAN classifier that accounts for both the influence of each independent variable on other variables.For example, hemoglobin < 9.25 g/dL was found to decrease the chance of survival twofold irrespective of other variables; whereas the influence of platelet count on survival was dependent upon the mean pulmonary artery pressure.(Data not shown.)With mean PAP less than 45.5 mmHg, platelet count < 123 x 10 3 / L decreased survival twofold; but with PAP greater than this cutoff, the platelet count decreased as much as four-fold.Hematocrit < 29% increased chance of death by 3.5 times, which was further amplified to nearly sixfold when hemoglobin was also between 9.25 and 10.35 g/dL.Blood urea nitrogen > 44.5 mg/dL increased risk of death twofold, but when platelet count > 258 x 10 3 / L, mortality was reduced by nearly the same amount.
The suboptimal performance of the DTRS when applied to a cohort of continuous flow devices has already been published [8].The HMRS was published within the past year, and therefore has yet to be compared to other patient cohorts or against other risk scores.In contrast to the traditional Frequentist statistical methods comprised of weighted combinations of independent variables, BNs provide: 1) a rigorous probabilistic framework in which to perform inference on unknown or predictive variables, 2) a means of learning probabilistic information from data, 3) the ability to capture the nonlinear inter-relationships amongst independent variables 4) the ability to accommodate missing data elements or sparse data records (common to medical datasets) and 5) a visual representation that is easy to interpret by human experts.Overall, these qualities provide more an accurate depiction of human decision making process and improved performance.Consequently they are more likely to be adopted into the clinical workflow than an anonymous "black box" type risk score.
The primary limitation in this study was, the moderate size of patient cohort used to create the models.Another potential limitation was institutional bias by including only two clinical programs.Missing data elements is another major obstacle that is often unavoidable when performing retrospective studies using medical data.Although we were able to use WEKA to substitute the missing variables with suitable surrogates, there is no true replacement for actual measured/observed data.We will address these limitations in future planned studies that expand the number of patient, broaden the endpoints to include adverse events such as stroke and infection, and extend the timeline to 6-months and 1-year.This will also provide an opportunity to update our existing prognostic model for RV failure [12,13].
Another important step towards translation of this model to clinical practice will be to incorporate it into a computer application that may be consulted at key decision points.This will also offer an opportunity to introduce expert knowledge and user customizationboth features hopefully enhancing the ultimate utility of the model.
BNs are able to more closely model the natural clinical decision-making process as compared to traditional risk scores and therefore offer a valuable tool for medical decisions such as identifying candidates most likely to benefit from VAD therapy.This study demonstrated a remarkable improvement over the existing risk scores (DTRS and HMRS) with respect to accuracy, Kappa, sensitivity, and specificity, also illustrated by predicted Kaplan-Meier survival curve that most closely resembles the actual outcomes.The BNs consistently outperformed the DTRS and HMRS due to (1) their abilities to learn from prior probability, (2) account for relationships between variables and (3) tolerate missing data elements.These results encourage continued validation and expansion of the models with a prospective, multicenter study.

Figure 2 :
Figure 2: Representation of a TAN classifier with four pre-operative variables.

Figure 5 :
Figure 5: Kaplan Meier survival curve for the Destination Therapy Risk Score.

Figure 6 :
Figure 6: Kaplan Meier survival curve for the HeartMate II Risk Score.

Figure 7 :
Figure 7: Kaplan Meier survival curve for the Naïve Bayes model.

Figure 8 :
Figure 8: Kaplan Meier survival curve for the Tree-Augmented Naïve Bayes model.

Figure 9 :
Figure 9: Kaplan Meier survival curve for the actual clinical outcomes.