Establishing Non-Inferiority of a New Treatment in a Three-Arm Trial : Apply a Step-Down Hierarchical Model in a Papulopustular Acne Study and an Oral Prophylactic Antibiotics Study

Clinical trials comparing a test treatment with an active control therapy have become very popular in drug and medical device development in the last decade. An active controlled trial without a placebo, however, exhibits some major challenges in design, analysis, and interpretation, such as the determination of the non-inferiority margin or the assumption of constancy condition. When there are no ethical concerns, the comparison of a test drug with placebo usually provides the most convincing proof of the efficacy of a new treatment. Therefore, it may be advisable to conduct a three-arm trial — including placebo, active control, and the new treatment — if it is ethically justifiable such as a papulopustular acne study and an oral prophylactic antibiotics study. In this paper, we propose a statistical methodology for a three-arm non-inferiority trial with binary outcomes. We adapt the step-down hierarchical hypotheses and give a three-step testing procedure which is more realistic in conducting a clinical trial. We derived an optimal sample size allocation rule in an ethical and reliable manner to minimize the total sample size and hence to shorten the duration of the trials. Real examples from a papulopustular acne study and an oral prophylactic antibiotics study are used to demonstrate our methodology.


INTRODUCTION
The main purpose of a clinical trial is to demonstrate efficacy of a new treatment.Many investigators adapt an active controlled non-inferiority clinical trial without a placebo because of ethical concerns.Such a trial, however, exhibits some major challenges in design, analysis, and interpretation, such as the determination of the non-inferiority margin or the assumption of constancy condition.Because of the absence of a placebo arm, one cannot assert directly that the test treatment is superior to a placebo.Moreover, many researchers may not pay attention to verifying assay sensitivity, that is, to showing that the active control is better than the placebo in the current trial.When the placebo arm is present in the non-inferiority trial if ethically justifiable, the assay sensitivity can be concurrently verified and the issues discussed above will not be present.
Several studies have presented useful ideas for non-inferiority trial designs.Some of them suggest including a placebo arm in a non-inferiority trial when ethically justifiable (see [1 2]) and propose statistical methodologies for such design; these include Pigeot et al. [3], Koch and Röhmel [4], and Hauschke and Pigeot [5] for continuous outcome; Tang and Tang [6], Kieser and Friede [7], Hasler [8] for binary outcome; and Mielke et al. [9] for survival data, and Kombrink et al. [10] for censored time-to-event data.A three-arm trial including placebo, active control and test drug is referred to as gold standard design [5].In this design, the hypotheses can be formulated more precisely, the non-inferiority of the test treatment to an active control can be verified, and the efficacy of the test treatment can be accessed directly.Tang and Tang [6] proposed sample size allocation rules for a three-arm clinical trial by using binary outcomes based on rate difference.However, they did not consider the optimal sample size determination.Kieser and Friede [7] derived approximate sample size formulas in each patient group and proposed a complete two-step test procedure.Koch and Röhmel [4] and Hauschke and Pigeot [5] suggested comparing the test treatment with a placebo in the first step.Emphasizing the importance of comparing the test treatment versus a placebo, they indicated that nothing can rescue such a trial if the superiority of an experimental over a placebo cannot be shown.Hence, we propose a testing procedure with hierarchical hypotheses based on Koch and Röhmel's [4] suggestion and derive an optimal sample size determination of a three-arm trial.
The outline of this paper is as follows.In Section 2, we present the models and hypotheses for a three-arm gold standard design.The null variance estimation based on two methods is proposed and an optimal sample size allocation is also given.The simulation outcomes of type I error rate, and power are displayed in Section 3. We apply the proposed method to a papulopustular acne study and an oral prophylactic antibiotics study in Section 4. Finally, we conclude with discussion in Section 5.

Model and Hypotheses
In this article, we propose a statistical method for binary outcomes, such as improvement/no improvement and remission/no remission.We consider the primary clinical outcomes under a placebo, an active control, and a test treatment (X P , X C , and X T ), respectively, as independent and binomially distributed variables.That is, we assume that X k ~ B(n k , k ), where the success rate k represents unknown true response probability and n k be the sample size, k = P, C, and T.
Under gold standard three-arm design, we adapted the step-down hierarchical hypotheses for binomial outcomes.In the first step, we compared a test treatment with a placebo by the following hypothesis If the superiority of an experimental over a placebo cannot be shown, nothing can rescue such a trial.Thus, it is reasonable to compare the test treatment versus a placebo in the first step.If H 01 is rejected, we claimed that the test treatment is superior to the placebo and executed the second-step procedure.
In the second step, we compared an active control with a placebo by the following hypothesis Similarly, we claimed that the active control is superior to the placebo if H 02 is rejected.Consequently, the assay sensitivity was established.
After both hypotheses H 01 and H 02 were rejected, the non-inferiority hypothesis for a test treatment versus an active control was accessed at level with a pre-specified non-inferiority margin ( > 0).In other words, we wanted to ensure The margin can be a function of difference between response probabilities C and P [3], that is = ( 1)( C -P ), where is a pre-specified fixed fraction of active controlled effect.Koch and Tangen [11] mentioned the reasonable region for non-inferiority test is for between 0.5 and 0.99.Therefore, the null hypothesis could be simply rewritten as If H 03 is successfully rejected for a given , we claim that the test treatment retains more than 100% efficacy of the active control compared with the placebo.Therefore, the non-inferiority of the test treatment to the active control is declared.

Statistical Hierarchical Test Procedures
We described the step-down hierarchical hypotheses in the previous section.According to the hierarchical testing procedures (Figure 1), the familywise error rate (FWER) could be controlled at the same level, .
Statistical test procedures for hypotheses in (1) and ( 2) can be established according to conventional method [12], we now only focus on establishing the testing procedures in the third step to evaluate the noninferiority of the test treatment to the active control.For further development, we rewrote (3) and let which is a linear combination of = ( T , C , P ).We obtained the maximum likelihood estimator (MLE) ( ) ( ) ( ) and the variance can be estimated by where k estimates k under null hypothesis.We Method I: The value of k can be estimated by an observed value k [14], but it might be failed in a null hypothesis of non-zero difference between groups [15].
Method II: According to Farrington and Manning [15], the value of k can be estimated by a restricted maximum likelihood under the null hypothesis restriction T = C + 1 ( ) P .A third-degree likelihood equation of the proportion-type rates k is a problem that can be solved by Miettinen and Nurminen [16].The derivation is given in the Appendix A.
Therefore, we obtain the Wald statistic which is asymptotically standard normally distributed for ( ) = 0 .Thus, the null hypothesis (3) was rejected if T > z 1 ( 6 ) at the one-sided significant level , where z 1-is the 100(1 -) quantile of the standard normal distribution.

Power and Optimal Sample Size Determinations
In this subsection, we formulate the power function of the Wald test and determined the necessary sample size of test treatment to achieve a desired level of power 1 -.According to Chow et al. [12], sample size formulae for Step 1 and 2 are established, respectively (see Appendix B).In Step 3, the power function of the test ( 6) is given by and the power of the above inequality ( 7) is approximately where denotes the cumulative distribution function of the standard normal distribution.
Assume that an optimal sample size allocation for the test treatment group, the active control, and the placebo group can be expressed as n T : n C : n P = 1: C C : C P .Therefore, according to (8), we determined the sample size based on the following inequality: for observing T z 1 with a desired level of power 1 - .This led to a simplification of the procedure where n T has to be determined as the smallest value fulfilling According to Method I, the inequality (9) can be reduce to In general, the choice of C C and C P may be made by clinicians or investigators at the design stage for conducting a clinical trial.Given the values of C C and C P , we determined the total sample size N with To determine the minimum of the total sample size N, the optimal values for C C and C P are given as partial derivatives of (11) at zero.In Method I, the solutions of C C and C P are: and C P = 1 In Method II, an iterative procedure can be used to solve C C and C P in (11) and minimize of the total sample size N (see Appendix A).
We explored the required total sample size N based on Method I and II for different sample size allocation rules, different combination of the design parameters ( P , C , T ) and , and for given = 0.025, 1 -= 0.8 (see Table 1).In Table 1, we considered four different sample size allocations (balance design, two types of unbalance designs, and our proposed optimal sample size allocation), different choices of , ( P , C ) = (0.1, 0.8), and T = C .We found that Method I gives smaller total sample size N than Method II.Farrington and Manning [15] pointed out that Method I, however, suffer serious drawbacks such as underestimate or overestimate the true value of the null variance under the alternative hypothesis thus leading to incorrect sample sizes.Hence, in the following discussion we focus on Method II for precise sample size.When = 0.1, the sample sizes of Method II for balance design, 2: 2: 1 design, 3: 2: 1 design and optimal sample size allocation are 23, 27, 27, and 16, respectively.As seen in Table 1, the sample size of Method II obtained from optimal allocation design is always smaller than that obtained from the other sample size allocation rules.Furthermore, the total sample size increases as increases when other design parameters are fixed.This phenomenon is intuitively true since the requirement of the treatment effect is stronger for the larger margin ; hence, the required total sample size is larger.In Table 2, we set the margin at = 0.6 and 0.8 for four different sample size allocations.The first row is sample size of the treatment group (n T ), while the second row is the total sample size (N).For example, in the first row, sample sizes of Method II of the treatment group n T for balance design, 2: 2: 1 design, 3: 2: 1 design, and optimal sample size allocation are 29, 31, 36, and 38, respectively.In the second row, the total sample sizes N for four sample size allocation are 85, 77, 71, and 67, respectively.The result of Table 2 is similar to Table 1.In addition, we find that the total sample size increases as ratio of P / C increases when other design parameters are fixed.In Table 3, the corresponding sample size based on the three steps is calculated according to Appendix B and Eq.(11).We find that the required sample sizes per group of Step 1 and Step 2 are substantially smaller than the sample size of Step 3.
In Figure 2, we illustrate the sample size reductions for using the optimal allocation instead of a balance design, 2: 2: 1 and 3: 2: 1 designs, respectively, given P = 0.1, C = 0.8 and T = C .As seen in Figure 2, there are at least 20% sample size reductions when the balance design is replaced by optimal allocation.In addition, the sample size reductions are even greater than 30%, where the margins for are close to 0.1 and 0.9, respectively.For 3: 2: 1 design, we could save at least 2% sample size by relocating to the optimal sample size allocation.
Figure 3 presents the total sample size for = 0.5, 0.6, 0.7, 0.8, and 0.9, with the different proportions of P / C from 0.125 to 1, given C = 0.8 and T = C .As seen in Figure 3, the total sample size increases with increasing values of and active control effect P / C .The total sample size is enormous when P / C close to one and = 0.9, which is impractical in clinical trials.

SIMULATION
To examine the performance of proposed optimal sample size allocation in Method II, we conducted a simulation study for the type I error rate and the simulated power.All parameter constellations were simulated with 100,000 replications.

Type I Error Rate
In order to assess the type I error rate, we set sample size allocation designs control the type I error rates quite well at the nominal = 0.025.We concluded that the proposed optimal sample size allocation controls type I error very well.

Power
To assess the power performance of the proposed method, we set ( T -P ) / ( C -P ) > , = 0.8, = 0.025, and considered ( P , C , T ) = (0.1, 0.8, 0.7), (0.1, 0.8, 0.71), … , (0.1, 0.8, 0.99) with total sample size N being 300.Similarly, we considered the four allocation ratios as stated above.Figure 4 shows the power curves for the four allocation designs.Apparently, if = 0.8, the power curves of unbalanced sample size designs are higher than those of balanced sample size design.The proposed optimal allocation design is especially more powerful than other allocation rules.

A Papulopustular Acne Study
Papulopustular Acne is a common skin disease characterized by androgenic stimulation of sebaceous glands.Acne is a multifactorial disorder with spontaneous resolution in early adult life.Therefore, combined oral contraceptives (COCs) containing antiandrogenic progestogens are suitable candidates for acne treatment.A multinational, multicenter study was conducted as a three-arm, double-blind and randomized trial for 1326 female patients (16-45 years old) with mild to moderate papulopustular acne, which was discussed in Ernesta et al. [17] for the therapy of papulopustular acne of multifactorial disorder.The standard treatment was combined oral contraceptives (COCs) containing potent anti-androgen of ethinylestradiol (EE)/cyproterone acetate (CPA) drug.Ernesta et al. [17] showed that a new drug, EE/dienogest (DNG), is superior to a placebo and noninferior to an active control EE/CPA.As they pointed out, there is no binding affinity between the new drug and sex hormone-binding (SHBG).Furthermore, the new drug does not compete with free testosterone for binding SHBG.Hence, the component testosterone should be decreased to make the estrogen work [17].In Ernesta et al., totally, the 1326 patients were randomly allocated into the three groups, with proportion 2: 2: 1 (n T = 525, n C = 537, n P = 264).After treatment with COCs, the improvement rates of acne were reported as T = 91.9%,C = 90.2% and P = 76.2%.We applied optimal sample size allocation to this example.As mentioned in Section 2.1 and observed in Figure 3, is reasonable chosen between 0.5 and 0.8.Thus, given margin = 0.7, we only need 836 patients for detect the effect size in this study with = 0.025 and the desired power of 80% using our method.The proposed optimal sample size allocation method reassigned patients into the three groups, with the allocation ratio n T : n C : n P = 1: C C : C P = 1: 0.64: 0.41 (n T = 408, n C = 261, n P = 167).Compared to Ernesta et al. [17], 97 patients were reassigned to the treatment or active control group from the placebo group utilizing the proposed optimal sample size allocation.According to our method, the results showed DNG was superior to the placebo and non-inferior to CPA at = 0.025 with 80% power in this study.From the economical point of view, the required total sample size was reduced from 1326 to 836.Thus, the proposed optimal sample size allocation is an economically advantageous method for saving 37% of the total sample size.

An Oral Prophylactic Antibiotics Study
A large prospective study with 2083 patients, who were completed flexible cystoscopy (FC) from a threearm placebo controlled trial was conducted to examine whether oral prophylactic antibiotics (ciprofloxacin or trimethoprim) reduce the risk of bacteriuria after FC [18].A treatment group was treated with ciprofloxacin (500 mg); and an active control group was treated with trimethoprim (200 mg).The sample sizes for the three groups were n T = 687, n C = 712, and n P = 684.The sample size allocation ratio was approximately 1: 1: 1 (balanced design).After FC, the proportions of patients with a negative urine culture were T = 97.2%,C = 95.4%,P = 90.9%.In this example, we applied optimal sample size allocation to let the number of patients in the placebo group be as small as possible.Given margin = 0.7, the required total sample size is 1272 for detect the effect size in this study given = 0.025  and the desired power of 80% by our method.The new sample size allocation ratio was n T : n C : n P = 1: 0.68: 0.41 (n T = 608, n C = 414, n P = 250) by optimal sample size allocation.Compared to Johnson et al. [18], 434 patients were reassigned to the therapeutic group using our method.As a result, the proposed method using the optimal sample size allocation design was more ethical than the balance design.We concluded that the ciprofloxacin significantly reduced the bacteriuria after FC at = 0.025 with 80% power in this study.Using optimal sample size allocation, the required total sample size is reduced from 2083 to 1272.Thus, the proposed method saved 61.6% of total sample size compared to Johnson et al. [18].We concluded that the proposed design is economically and ethically better than the balanced design.

DISCUSSION
For two-arm non-inferiority trials, issues such as a choice of non-inferiority margin, constancy assumption, and assay sensitivity have been debated for years, and the statistical methodology has been challenged.Given all the issues as discussed, two-arm non-inferiority trials are needed when placebo is not a choice in situation of life threatening or disease progress may be irreversible.Three-arm non-inferiority trials may be a choice in other situations when placebo is acceptable such as in the disease areas of depression, bipolar disorders, and papulopustular acne.In this article, we proposed an optimal sample size allocation design for a three-arm non-inferiority trial when it is ethically justifiable.Moreover, we use restricted maximum likelihood method to correct sample size when the null hypothesis is non-zero between groups because Method I may produce incorrect sample size under null a hypothesis of non-zero difference.The proposed method can substantially reduce the total required number of patients.Furthermore, more patients can be reassigned to the therapeutic group using the proposed design.Thus, our method is not only desirable from an ethical point of view, but also substantially save the total sample size to achieve a certain power.
Our simulation study shows that the optimal sample size allocation design controls type I error rate fairly well in nominal level for most practical situations.In addition, the proposed design yields a power higher than the other competitive sample size allocation designs in each hypothesis testing.In conclusion, the use of the proposed design is recommended for noninferiority three-arm trials.
In some clinical trials, more than one primary endpoint could be investigated for efficacy evaluation, which may result in significant complexity in the design, conduct, analysis, and interpretation of data.In addition, testing multiple hypotheses for multiple primary outcomes may increase the FWER.We hope to address this issue by the rationale of optimal sample size allocation in the future.
Setting the partial derivative of log-likelihood with respect to P to be zero yields the following thirddegree likelihood equation:

evaluate k in 2 by
using two commonly used methods under the null hypothesis (3).

Figure 3 :
Figure 3: Required total sample size based on optimal sample size allocation design for = 0.5, 0.6, …, 0.9 with the different proportions of P / C from 0.125 to 1, given C = 0.8, and T = C.