Sample Size and Statistical Power Calculation in Multivariable Analyses: Development and Implementation of "SampleSizeMulti" Packages in R
DOI:
https://doi.org/10.6000/1929-6029.2024.13.24Keywords:
Sample size, Statistical Inference, Regression Analysis, Epidemiological methods, Software Design, Research Design, correlation coefficient (source: Mesh)Abstract
This paper presents advanced methodological approaches and practical tools for sample size calculation in epidemiological studies involving multivariable analyses. Traditional sample size calculation methods often fail to account for the complexity of modern statistical analyses, particularly regarding the correlation between covariates in multivariable models.
We introduce a series of R packages (SampleSizeMulti) designed to address these limitations. These packages offer two distinct calculation approaches: one based on the multiple correlation coefficient between covariates (rho-based method) and another utilizing standard errors from previous studies (SE-based method). These complementary approaches provide comprehensive solutions for different association measures commonly used in epidemiological research: prevalence ratios, odds ratios, risk ratios, and hazard ratios.
The rho-based method innovatively incorporates the explicit consideration of the multiple correlation coefficient between covariates, significantly impacting required sample sizes in multivariable analyses. The SE-based method leverages information from previous studies through their confidence intervals, offering an alternative when correlation estimates are unavailable but published results exist. Furthermore, both approaches integrate crucial logistical considerations, including rejection rates, eligibility criteria, and expected losses to follow-up, providing researchers with realistic estimates of recruitment requirements and timelines.
Seven detailed case studies covering various epidemiological study designs and analytical scenarios demonstrate the practical application of these methods. These examples illustrate how correlation values, standard errors, and logistical factors influence sample size calculations and study planning.
The implementation in R ensures accessibility and reproducibility, while the incorporation of logistical planning tools bridges the gap between theoretical calculations and practical research requirements. These methods represent a significant advancement in study design methodology, potentially improving the quality and efficiency of epidemiological research by ensuring adequate statistical power while optimizing resource utilization.
References
García-García JA, Reding-Bernal A, López-Alvarenga JC. Cálculo del tamaño de la muestra en investigación en educación médica. Investig En Educ Médica. 2013; 2(8): 217-24. https://doi.org/10.1016/S2007-5057(13)72715-7 DOI: https://doi.org/10.1016/S2007-5057(13)72715-7
Biau DJ, Kernéis S, Porcher R. Statistics in brief: the importance of sample size in the planning and interpreting medical research. Clin Orthop. 2008; 466(9): 2282-8. https://doi.org/10.1007/s11999-008-0346-9 DOI: https://doi.org/10.1007/s11999-008-0346-9
Noordzij M, Tripepi G, Dekker FW, Zoccali C, Tanck MW, Jager KJ. Sample size calculations: basic principles and common pitfalls. Nephrol Dial Transplant Off Publ Eur Dial Transpl Assoc - Eur Ren Assoc. 2010; 25(5): 1388-93. https://doi.org/10.1093/ndt/gfp732 DOI: https://doi.org/10.1093/ndt/gfp732
Vittinghoff E, Glidden DV, Shiboski SC, McCulloch CE. Regression Methods in Biostatistics: Linear, Logistic, Survival, and Repeated Measures Models [Internet]. Boston,
MA: Springer US; 2012 [citado el 21 de octubre de 2024]. (Statistics for Biology and Health). https://doi.org/10.1007/978-1-4614-1353-0 DOI: https://doi.org/10.1007/978-1-4614-1353-0
Button KS, Ioannidis JPA, Mokrysz C, Nosek BA, Flint J, Robinson ESJ, et al. Power failure: why small sample size undermines the reliability of neuroscience. Nat Rev Neurosci. 2013; 14(5): 365-76. https://doi.org/10.1038/nrn3475 DOI: https://doi.org/10.1038/nrn3475
Althubaiti A. Sample size determination: A practical guide for health researchers. J Gen Fam Med. 2022; 24(2): 72. https://doi.org/10.1002/jgf2.600 DOI: https://doi.org/10.1002/jgf2.600
Hanley JA. Simple and multiple linear regression: sample size considerations. J Clin Epidemiol. 2016; 79: 112-9. https://doi.org/10.1016/j.jclinepi.2016.05.014 DOI: https://doi.org/10.1016/j.jclinepi.2016.05.014
Qin X. Sample size and power calculations for causal mediation analysis: A Tutorial and Shiny App. Behav Res Methods. 2024; 56(3): 1738-69. https://doi.org/10.3758/s13428-023-02118-0 DOI: https://doi.org/10.3758/s13428-023-02118-0
Statistical Methods for Rates and Proportions, 3rd Edition | Wiley [Internet]. Wiley.com. [citado el 21 de octubre de 2024]. Disponible en: https://www.wiley.com/en-in/Statistical+
Methods+for+Rates+and+Proportions%2C+3rd+Edition-p-9780471526292
Vittinghoff E, McCulloch CE. Relaxing the rule of ten events per variable in logistic and Cox regression. Am J Epidemiol. 2007; 165(6): 710-8. https://doi.org/10.1093/aje/kwk052 DOI: https://doi.org/10.1093/aje/kwk052
Demidenko E. Sample size and optimal design for logistic regression with binary interaction. Stat Med. 2008; 27(1): 36-46. https://doi.org/10.1002/sim.2980 DOI: https://doi.org/10.1002/sim.2980
Marill KA. Advanced statistics: linear regression, part II: multiple linear regression. Acad Emerg Med Off J Soc Acad Emerg Med. 2004; 11(1): 94-102. https://doi.org/10.1197/j.aem.2003.09.006 DOI: https://doi.org/10.1111/j.1553-2712.2004.tb01379.x
Zurakowski D, Staffa SJ. Statistical power and sample size calculations for time-to-event analysis. J Thorac Cardiovasc Surg. 2023; 166(6): 1542-1547.e1. https://doi.org/10.1016/j.jtcvs.2022.09.023 DOI: https://doi.org/10.1016/j.jtcvs.2022.09.023
Ury HK. Efficiency of case-control studies with multiple controls per case: continuous or dichotomous data. Biometrics. 1975; 31(3): 643-9. DOI: https://doi.org/10.2307/2529548
Modern Epidemiology [Internet]. [citado el 21 de octubre de 2024]. Disponible en: https://www.wolterskluwer.com/en/
solutions/ovid/modern-epidemiology-4634
Cohen J, Cohen P, West SG, Aiken L. Applied Multiple Regression/Correlation Analysis for the Behavioral Sciences, Third Edition [Internet]. Taylor and Francis; 2013 [citado el 21 de octubre de 2024]. https://doi.org/10.4324/9780203774441 DOI: https://doi.org/10.4324/9780203774441
Cohen J. Statistical Power Analysis for the Behavioral Sciences. 2a ed. New York: Routledge; 1988; 567 p. https://doi.org/10.4324/9780203771587 DOI: https://doi.org/10.4324/9780203771587
Jenkins DG, Quintana-Ascencio PF. A solution to minimum sample size for regressions. PloS One. 2020; 15(2): e0229345. https://doi.org/10.1371/journal.pone.0229345 DOI: https://doi.org/10.1371/journal.pone.0229345
Shieh G. Precise confidence intervals of regression-based reference limits: Method comparisons and sample size requirements. Comput Biol Med. 2017; 91: 191-7. https://doi.org/10.1016/j.compbiomed.2017.10.015 DOI: https://doi.org/10.1016/j.compbiomed.2017.10.015
Downloads
Published
How to Cite
Issue
Section
License
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
Policy for Journals/Articles with Open Access
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are permitted and encouraged to post links to their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work
Policy for Journals / Manuscript with Paid Access
Authors who publish with this journal agree to the following terms:
- Publisher retain copyright .
- Authors are permitted and encouraged to post links to their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work .