Comparison of Post Hoc Multiple Pairwise Testing Procedures as Applied to Small k-Group Logrank Tests

Moonseong Heo; Andrew C. Leon

doi:10.6000/1929-6029.2013.02.02.04

Authors

Moonseong Heo Department of Epidemiology and Population Health, Albert Einstein College of Medicine, Bronx, NY, USA
Andrew C. Leon Department of Psychiatry; 3Department of Public Health, Weill Medical College of Cornell University, New York, NY, USA

DOI:

https://doi.org/10.6000/1929-6029.2013.02.02.04

Keywords:

Logrank test, multiplicity adjustment, post hoc tests, survival analysis

Abstract

The logrank test is widely used to compare groups on distribution of survival time in the presence of censoring. There is no convention for post hoc pairwise comparisons after a significant omnibus k-group logrank test. This simulation study compares four post hoc pairwise testing procedures: Bonferroni, Dunn-Šidák, Hochberg, and unadjusted post hoc logrank test procedure. Evaluation criteria include, familywise type I error rate, correct decision rate, number of correctly rejected pairs, and false discovery rate. We demonstrated that when conditioned upon rejection of the omnibus test, multiplicity adjustments may be unnecessary and can be overly conservative when k is at most 4, or number of comparisons is no greater than 6. This is supported by the results that the performance of the unadjusted post hoc logrank test procedure is preferred over the others on all criteria except for the false discovery rate. The Hochberg procedure appears to be superior among the adjustments examined. Data from a clinical trial for suicide prevention illustrate these approaches where number of comparison groups is often limited.

Author Biographies

Moonseong Heo, Department of Epidemiology and Population Health, Albert Einstein College of Medicine, Bronx, NY, USA

Professor

Department of Epidemiology and Population Health

Andrew C. Leon, Department of Psychiatry; 3Department of Public Health, Weill Medical College of Cornell University, New York, NY, USA

Department of Psychiatry, Department of Public Health

References

Alexopoulos GS, Katz IR, Bruce ML, Heo M, Ten Have TR, Raue PJ, et al. and the PROSPECT Group. Remission in depressed geriatric primary care patients: a report from the PROSPECT study. Am J Psychiatry 2005; 62: 718-24. http://dx.doi.org/10.1176/appi.ajp.162.4.718 DOI: https://doi.org/10.1176/appi.ajp.162.4.718

Reynolds CF 3rd, Frank E, Perel JM, Imber SD, Cornes C, Miller MD, et al. Nortriptyline and interpersonal psychotherapy as maintenance therapies for recurrent major depression: a randomized controlled trial in patients older than 59 years. J Am Med Assoc 1999; 281: 39-45. http://dx.doi.org/10.1001/jama.281.1.39 DOI: https://doi.org/10.1001/jama.281.1.39

Lieberman JA, Stroup TS, McEvoy JP, Swartz MS, Rosenheck RA, Perkins DO, et al. for the Clinical Antipsychotic Trials of Intervention Effectiveness (CATIE) Investigators. Effectiveness of Antipsychotic Drugs in Patients with Chronic Schizophrenia. New Engl J Med 2005; 353: 1209-23. http://dx.doi.org/10.1056/NEJMoa051688 DOI: https://doi.org/10.1056/NEJMoa051688

Mantel N. Evaluation of survival data and two new rank order statistics arising in its consideration. Cancer Chemother Rep 1966; 50: 163-70.

Peto R, Peto J. Asymptotically efficient rank invariant test procedures (with discussion). J Royal Statist Soc A 1972; 135: 185-206. http://dx.doi.org/10.2307/2344317 DOI: https://doi.org/10.2307/2344317

Hochberg Y. A sharper Bonferroni procedure for multiple tests of significance. Biometrika 1988; 75: 800-803. http://dx.doi.org/10.1093/biomet/75.4.800 DOI: https://doi.org/10.1093/biomet/75.4.800

Logan BR, Wang H, Zhang M-J. Pairwise multiple comparison adjustment in survival analysis. Statist Med 2005; 24: 2509-23. http://dx.doi.org/10.1002/sim.2125 DOI: https://doi.org/10.1002/sim.2125

Marcus R, Peritz E, Gabriel KR. On closed testing procedure with special referrnce to ordered analysis of variance. Biometrika 1976; 63: 655-60. http://dx.doi.org/10.1093/biomet/63.3.655 DOI: https://doi.org/10.1093/biomet/63.3.655

Chen Y-I. Multiple comparisons in carcinogenesis study with right-censored survival data. Statist Med 2000; 19: 353-67. http://dx.doi.org/10.1002/(SICI)1097-0258(20000215)19:3<353::AID-SIM333>3.0.CO;2-B DOI: https://doi.org/10.1002/(SICI)1097-0258(20000215)19:3<353::AID-SIM333>3.0.CO;2-B

Slepian D. The one-sided barrier problem for Gaussian noise. Bell Syst Tech J 1962; 41: 463-501. DOI: https://doi.org/10.1002/j.1538-7305.1962.tb02419.x

Steel RGD. A multiple comparison rank sum test: treatments versus control. Biometrics 1959; 15; 560-72. http://dx.doi.org/10.2307/2527654 DOI: https://doi.org/10.2307/2527654

Gehan EA. A generalized Wilcoxon test for comparing arbitrarily singly-censored samples. Biometrika 1965; 52: 203-23. DOI: https://doi.org/10.2307/2333825

Prentice RL. Linear rank tests with right censored data. Biometrika 1978; 65: 165-79. http://dx.doi.org/10.1093/biomet/65.1.167 DOI: https://doi.org/10.1093/biomet/65.1.167

Sidak Z. Rectangular confidence regions for the means of multivariate normal distributions. J Am Statist Assoc 1967; 62: 626-33. DOI: https://doi.org/10.1080/01621459.1967.10482935

Benjamini Y, Hochberg Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J Royal Statist Soc B 1995; 57: 289-300. DOI: https://doi.org/10.1111/j.2517-6161.1995.tb02031.x

Ury HK. A comparison of four procedures for multiple comparisons among means (pairwise contrasts) for arbitrary sample sizes. Technometrics 1976; 18: 89-97. http://dx.doi.org/10.2307/1267921 DOI: https://doi.org/10.2307/1267921

Keppel G. Design & Analysis: A Resercher’s Handbook, Englewood Cliffs, NJ: Prentice Hall 1982; pp. 157-159.

Fisher RA. The Design of Experiments, Oliver & Boyd: Edinburgh 1935.

Cook RJ, Dunnett CW. Multiple comparisons, in Encyclopedia of Biostatistics P Armitage and T Colton (eds.) Chichester, UK: John Wiley and Sons 1998; p. 2739.

Bruce ML, Ten Have TR, Reynolds CF 3rd, Katz II, Schulberg HC, Mulsant BH, et al. Reducing suicidal ideation and depressive symptoms in depressed older primary care patients: a randomized controlled trial. J Am Med Assoc 2004; 291: 1081-91. http://dx.doi.org/10.1001/jama.291.9.1081 DOI: https://doi.org/10.1001/jama.291.9.1081

Hamilton M. A rating scale for depression, J Neurol Neurosurg Psychiatry 1960; 23: 56-62. http://dx.doi.org/10.1136/jnnp.23.1.56 DOI: https://doi.org/10.1136/jnnp.23.1.56

Snaith RP, Baugh SJ, Clayden AD, Husain A, Sipple MA. The Clinical Anxiety Scale: An instrument derived from the Hamilton Anxiety Scale. Br J Psychiatry 1982; 141: 518-23. http://dx.doi.org/10.1192/bjp.141.5.518 DOI: https://doi.org/10.1192/bjp.141.5.518

Leon AC, Heo M. A comparison of multiplicity adjustment strategies for correlated binary endpoints with application to a study of homicide victims. J Biopharmaceut Statist 2005; 15: 839-55. http://dx.doi.org/10.1081/BIP-200067922 DOI: https://doi.org/10.1081/BIP-200067922

James S. The approximate multinormal probabilities applied to correlated multiple endpoints in clinical trials. Statist Med 1991; 10: 1123-35. http://dx.doi.org/10.1002/sim.4780100712 DOI: https://doi.org/10.1002/sim.4780100712

Nichols TE, Hayasaka S. Controlling the Familywise Error Rate in Functional Neuroimaging: A Comparative Review. Statist Methods Med Res 2003; 12: 419-46. http://dx.doi.org/10.1191/0962280203sm341ra DOI: https://doi.org/10.1191/0962280203sm341ra

Allison DB, Gadbury G, Heo M, Fernandez J, Prolla TA, Lee CK, Weindruch R. Statistical methods for the analysis of microarray gene expression data. Comput Statist Data Analysis 2002; 39: 1-20. http://dx.doi.org/10.1016/S0167-9473(01)00046-9 DOI: https://doi.org/10.1016/S0167-9473(01)00046-9