ROC Analysis for Phase II Group Sequential Basket Clinical Trial

: The basket trial is a recent development in the clinical trial practice. It conducts the test of the same treatment on several different related diseases in a single trial, and has the advantage of reduced cost and enhanced efficiency. A natural question is how to assess the performance of the group sequential basket trial against the classical group sequential trial? To our knowledge, a formal assessment hasn’t been seen in the literature, and is the goal of this study. Specifically, we use the receiver operating characteristic curve to assess the performance of the mentioned two trials. We considered two cases, parametric and nonparametric settings. The former is efficient when the parametric model is correctly specified, but can bemis-leading if the model is incorrect; the latter is less efficient but is robust in that it cannot be wrong no matter what the true data generating model is. Simulation studies are conducted to evaluate the experiments, and it suggests that the group sequential basket trial generally outperforms the group sequential trial in either the parametric and nonparametric cases, and that the nonparametric method gives more accurate evaluation than the parametric one for moderate to large sample sizes.


INTRODUCTION
The basket clinical trial design (For example, [1][2][3][4]) is a recent practice in the clinical trial filed.Different from traditional clinical trials, which examines one treatment for one targeted disease, the basket design examines one treatment on several different (but often related) diseases in a single trial.By this way, it explores much more potential of the treatment, and save possible costs and time, if separate trials were conducted on the different diseases.Another motivation for this type of design is to examine a common response (such as a biomarker response) across multiple diseases (tumors).The number of patients with a putative biomarker within a single disease is small, which makes it difficult to enroll adequate number of patients in a conventional trial, and the basket trial which pool the responses from the same biomarker from all the patients with different diseases makes the trial possible.The basic assumption under the rationale for basket trial is that the fundamental classification of disease is the response, not disease type [5][6][7][8][9].The dis-advantage of this trial is that inactive responses from some disease patients may dilute the pooled signal and trigger failure of the entire trial.Thus this type of trial have been used primarily for exploratory settings [10].
*Address correspondence to this author at the Department of Biostatistics, Bioinformatics and Biomathematics, Georgetown University, Washington DC 20057, USA; Tel: 202-687-0766; E-mail: yuanao@hotmail.com The receiver operating characteristic (ROC) curve is common tool for evaluating the performance of test statistic.It is the plot of the true positive rate (TPR) (i.e.probability of identifying a diseased subject when the subject is truly diseased) versus false positive rate (FPR) (i.e.probability of identifying a diseased subject when the subject is not diseased).It is also widely used in biostatistics, medical diagnostic biomarkers, radiology, psychophysical and medical imaging research, military monitoring, and industrial quality control (Metz, 1978) [11].The ROC curve indicates the trade-off between the TPR and FPRunder different thresholds.It has many advantages and overcomes the limitation of using isolated measurements of TPR and FPR.The ROC curve is plotted by connecting all the points generated by possible thresholds [12].It has wide applications in biomedical research works (for example, [13,14]).
Here we consider the ROC analysis for basket phase II clinical trial.Two cases are considered, the parametric and nonparametric settings.The former is efficient when the parametric model is correctly specified, but can be mis-leading if the model is incorrect; the latter is less efficient but is robust in that it cannot be wrong no matter what the true data generating model is.Simulation studies are conducted to evaluate the experiments, and it suggests that the group sequential basket trial generally outperforms the group sequential trial in either the parametric and nonparametric cases, and that thenonparametric method gives more accurate evaluation than the parametric one for moderate to large sample sizes.

2.THE METHOD
Group sequential (or multi-stage) design is commonly used in phase II and III clinical trials to evaluate a new treatment against some existing one(s) [15][16][17][18][19][20][21][22][23][24].In contrast to the none-sequential clinical trial, the group sequential trial has the feature that it can early stop the trial before the planned end, ifextreme outcome is detected at some intermediate stage.Thus, group sequential basket design is natural in stage II clinical trial.A natural question is how to assess the performance of the group sequential basket trial against the classical group sequential trial?To our knowledge, a formal assessment hasn't been seen in the literature, and is the goal of this study.Specifically, we use the receiver operating characteristic curve, a commonly used tool for assessing the performance of test statistic, for our evaluations.
In this study we concentrate on continuous response.The trial has stages, at stage , there are independent responses for the -th patient with disease type , = 1, . . ., ; = 1, . . ., ( = 1, . . ., ) .Note that for fixed , has at least one non-zero entry but not all of them non-zero, as each patient may not have all the diseases.For each fixed , > for > , and we assume that the non-zero 's are iid ( , ) .For phase II clinical trial, often the total number of patients = ∑ ∑ is small (typically 10 < < 100 ), 2 ≤ ≤ 10 and 2 ≤ ≤ 10 .The hypothetical population mean response is = ( , . . ., ) .We are interested in testing the null hypothesis where = ( , … , ) is the given vector of threshold values for the responses to be effective, and is the vector of pre-specified meaningful differences for each of the diseases.Although the observations 's are independent, but the diseases themselves are dependent via the shared common factor(s), for example, the common marker(s) which brought the patients to the trial.Denote = ( , … , ) , and = ( , . ., ) be an iid copy of it but without zeroentry.We use frailty to model the dependence among the disease responses .Let be the shared common factor of the diseases, we assume that conditioning on , the responses form the diseases are independent, thus the joint law of is given by In particular, we assume ( | = ) = ( + , ), is a parameter to be estimated, and ∼ ( , ), where ( , ) can either be obtained from prior studies, or estimate from the current data.Assume − > 0( = 1, . . ., ), then where, , Ω, and related notations are given in the Appendix.Thus we can assume ( ) = Ω , the Pearson covariance matrix.Now let Ω be the given covariance matrix.Typically the test statistic = ( , . . ., ) are of the form = Ω ( , … , ) , ( = , . . ., ). ( Thus, under the right boundary of , ∼ ( , ), where = Ω / ( / , . . ., / ) and = { / , . . ., / } , a -dimensional diagonal matrix.
In practice, it is also of interest to test the effect of the treatment on each of the disease types, which can be formulated as , : ≤ vs , : > = + , ( = 1, . . ., ).If , is rejected, we conclude that the treatment is likely to be effective for this disease.

To test
, vs , , a simple way is just use the statistic / ∑ at the -th interim stage ( = 1, . . ., ), for disease type ( = 1, . . ., ).However, this is the classical trial, not the basket trial.In the latter trial, we want to use the information across all the diseases to perform each single hypothesis.
For this, at the -th stage, we want to test , vs , ( = 1, . . ., ) to see which disease type(s) are significant, which are not.Let = ∑ ( = 1, . . ., ).To borrow information from all the disease types, let = ∑ and ̄ = ∑ , we use the conditional statistic where ), and = / − /( ) .The level cut-off point is obtained accordingly, which depends on and is random.

If
, is rejected at stage , then data on the -th disease will be removed, and the trial moves on based on the remaining data.

Brief Review of ROC and AUC Analysis
The receiver operating characteristic (ROC) curve and the area under the curve (AUC) are common tools for assessing the performance of a test statistic.Here we derive the ROC and AUC for our case.We firstgive a brief review of these concepts.The ROC and AUC are used to characterize the relationship between the true positive rate (TPR) and the false positive rate (FPR) of a test statistic .TPR is the probability of identifying an abnormal location when the result of this location is truly abnormal, and FPR is the probability of identifying an abnormal location when the result of this location is truly normal.Let denote the disease status, = 1 means truly diseased, = 0 for no disease.Let be the distribution function of under the null hypothesis of no disease, and be that of under the alternative, given a threshold value , ( ) = ( > | = 1) = 1 − ( ) and ( ) = ( > | = 0) = 1 − ( ) , and the ROC curve is defined as Typically, is constructed such that large value of corresponds to evidence in favor of the alternative, so , and as a result . The AUC is defined as the mean value of the ROC curve, The last equality follows as If we have two treatments, then we have two ROC curves, and their performances are evaluated by comparing their ROC curves.Large corresponds to better test (treatments).

Parametric ROC
In many clinical trial studies, parametric models are used for testing the hypothesis and ROC analysis.In particular, the normal model is commonly used: ∼ ( , ) and ∼ ( , ) .Let Φ(⋅) be the distribution function of (0,1) and Φ (⋅) its inverse or quantile function.Write (1 − ) = , or The last equality above holds because for a standard normal random variable , ) .
The ROC curve of the basket trial for the -th disease at the -th interim trial is In contrast, the classical trial uses data , and , , separately for each without conditioning on and , and the corresponding ROC curve for the -th disease at the -th interim trial is with ( ) = / and ( ) = / .Since ≤ and ≤ , ROC curve of the basket group sequential trial is expected to have better performance than that of the classical group sequential trial.

Nonparametric ROC
When a specific parametric model is not justified for the observed data, the nonparametric method is preferred for robustness.In this case, the ROC curve will be constructed using the empirical distribution functions.However, different from the classical clinical trial, for the case of basket trial, to construct the empirical distribution for each disease, we need to borrow information from data on other diseases.To incorporate such shared side information, we use the method of empirical likelihood.
We consider the set-up for empirical likelihood (EL) as in Qin and Lawless (1994) [25].We first give a general description of the method.Suppose the side information can be incorporated into the EL through a -dimensional known function ( ) = ( ( ), . . ., ( )) via the relationship where [⋅] denotes the expectation with respect to the data distribution function .Since the empirical likelihood is a nonparametric maximum likelihood estimate of the distribution function, under some possible constraint(s), it is a step function with jumps at the data points, so we can set = ( )( = 1, . . ., ).The EL is defined as where the s are the nonparametric maximum likelihood estimated empirical masses assigned to the observation s.With the side information constraints, the EL is max subject to = 1 and ( ) = .
Then we define the nonparametric ROC curve for basket trial, for the -th disease at stage , based on the weighted empirical distribution functions, as In contrast, the nonparametric ROC curve for the classical trial, for the -th disease at stage , based on the weighted empirical distribution functions with and being the empirical distribution functions based on the data { : = 1, . . ., } and { : = 1, . . ., } respectively.Since (⋅) is constructed with auxiliary information from data across all the diseases, it is expected to have better performance than its counterpart (⋅) without such information.

SIMULATION STUDY
In this section, we conduct simulation studies to compare the ROC curves for basket sequential clinical trial vs the classical sequential trial, for both the parametric and nonparametric cases.
Here , represents the mean response vector of treatment arm with small gap comparing with the mean response of control group .Similarly, , is the mean vector with moderate gap, and , denotes mean vector with largedifference between treatment group and control group.In the fifth row, represents the sample size vector of treatment arm in stage one and be the sample size vector of control arm in stage one in the sixth row.Analogously, and denote the sample size vectors of treatment group and control group in stage two respectively.The ROC's for the data in Table 1 are plotted in Figures 1-3.Real line is the ROC for basket trail, the upper dashed line is the ROC for independent trial.The diagonal line is drawn for reference only.
Figure 1 shows the ROC's of basket and classical trails for the simulated five diseases, with large mean differences (gap) for case and control, the AUC is computed in the bracket below, with number on the left be the AUC for the basket trail, and on the right for that of the classical trial.We see that the ROC of the former is apparently higher overall, and the AUC is larger.
Figure 2 shows the ROC's of basket and classical trails for the simulated five disease, with small mean difference.In this case the ROC's for the two types of trails are close and both not significant.
Figure 3 shows the ROC's of the two types pf trails for the simulated five disease, with moderate mean difference.In this case we see from the ROC's that the advantage of the basket trial is apparent.
Overall, according to the three figures for five diseases above, we can witness that the basket design, with side information from all the disease data, performs better than the traditional design, this can be indicated by larger AUC.Meanwhile, the plotwith large gap shows the best result among the three, which means as the gap between treatment arm and control group increasing, the result becomes more apparent.Generally speaking, ROC curves behave better in stage one than stage two in all three plots.More specifically, based on Figure 1, the largest distance between AUC is about 0.166 and the smallest one is about 0.011.From Figure 2, the biggest gap between AUC is 0.032.On the contrary, the closest distance is about 0.002.According to Figure 3, the widest difference is about 0.101 and the most narrow one is about 0.002.Below we perform a simulation with nine diseases, the setup is shown in Table 2.The corresponding ROC's are show in Figures4-5.We see that the advantage of basket trial over the classical one is apparent.The conclusions are similar to the case of five diseases.1 to examine the performance of the nonparametric ROC's.In Figure 6, for a fixed disease, we compare the empirical ROC (dashed line step function) with the parametric normal ROC (dotted line) and the true ROC (solid line) obtained from the true underlying distributions.The sample sizes for case and control are given below the figures.We see that for small sample size ( , ) = (10,45), the advantage of the empirical ROC is not clear.As the sample size increases, the empirical one becomes more and more close to the real one, while the ROC from normal model yields biased estimate of the real ROC.

Now we use the setup in Table
In Figure 7, we compare the empirical ROC's (dashed line) for the basket trail with side information incorporated, given in ( 10)-( 12), to the empirical ROC (solid line) from the classical trial.Again, the ROCs from the basket trail out performs those of the classical trial.

A Hypothetical Basket Clinical Trial
A basket clinical trial has been conducted by Hyman, et al. (2015) [28] to evaluate the efficacy of a therapeutic oral treatment, Vemurafenib.The drug is an inhibitor of BRAF V600E enzyme which can improve the survival rate in melanoma patients whose cancer has a V600E BRAF mutation.To investigate the treatment effect on nonmelanoma cancer patients, the trial enrolled 122 patients with nonmelanoma cancers.The study included patients with non-small-cell lung cancer (NSCLC), ovarian cancer, colorectal cancer, cholangiocarcinoma, breast cancer, and multiple myeloma, and cancer due to any other BRAF V600 mutation.Among these groups, the three commonly occured disease groups are colorectal cancer (CC) with = 37 patients, NSCLC with = 20 patients, Erdheim-Chester disease (ECD) or Langerhans cell histiocytosis (LCH) with = 18 patients.
The original trial used the survival rates as patients' final outcomes.The outcomes such as the percentage    of tumor shrinkage may capture more information so that the trial may be better powered.Since the normal approximation is reasonable for percentages, we use the percentage of tumor shrinkage as patients' outcomes in this hypothetical example.According to a retrospective review in Dadu, et al. (2014) [29], most patients treated by Vemurafenib had tumor shrinkage ranging from 3% to 58%.We consider the three most common groups with the pre-specified samples sizes.Here = 3.We assume a two-stage group sequential design, and equal allocation between treatment and control for each disease.In the first stage, tumor shrinkage is observed from where the mean tumor shrinkage = (0.10,0.20,0.58) is assumed from aforementioned range on patients treated by Vemurafeni.Here we let = 0 .The multivariate normal distribution of the tumor size shrinkage is assumed to be ∼ ( , Ω).Take = 0, then by the formula in the Appendix, = .The diagonal elements in the covariance matrix, Ω , calculated from the variance expression for percentages, are (0.3,0.4,0.5) for three diseases.The correlation is assumed to be 0.1 for three diseases within the same patient.The hypothetical example dataset was simulated from this multivariate normal distribution.
With the normal distribution assumption, we apply the parametric ROC method to the simulated dataset.To borrow information from all the disease types, the values for = (0.121,0.247) and ̄ = (0.253,0.242), are obtained from the simulated dataset.The conditional statistic has the distribution:

CONCLUDING REMARKS
We evaluated the performances of basket trail and the classical trial by comparing their ROCs, with both parametric normal model and nonparametric model.Simulation studies show that the former is generally better in terms of higher ROCs.We find that when the total sample sizes for case and control are small, say no bigger than (100,100), the parametric method is more preferred; when the sample size is (100,100) or bigger, the nonparametric ROC gives better fit to the true ROC, while that from the normal model can give biased result.
= 10 CC patients treated by Vemurafenib, and = 9 control CC patients.Similarly, the numbers of patients observed at the first stage from other two diseases are = 5 and = 5 for NSCLC, = 5 and = 5 for ECD or LCH.The numbers of patients at the second stage for these three diseases are = 9 and = 8 for CC, = 5 and = 5 for NSCLC, and = 4 and = 4 for ECD or LCH.We are interested in testing the null hypothesis :