A Smooth Test of Goodness-ofFit for the Weibull Distribution : An Application to an HIV Retention Data

Abstract: In this study, we fit the two-parameter Weibull distribution to an HIV retention data and assess the fit using a smooth test of goodness-of-fit. The smooth test described here is a score test and is derived as an extension of the Neyman’s smooth test. Simulations are conducted to compare the power of the smooth test with the power of each of three empirical goodness-of-fit tests for the Weibull distribution. Results show that the smooth tests of order three and four are more powerful than the three empirical goodness-of-fit tests. For validation, we used retention data from an HIV care setting in Kenya.

The Kenya AIDS Response Progress Report 2014 [1] estimates the number of people living with HIV (PLHIV) to be 1.6 million, while those actively on antiretroviral therapy (ART) to be 800,000.Retaining patients on ART, however, remains a major challenge.Disruption in care through missed scheduled visits undermines both social as well as clinical outcomes, including risk of virological failure [2].Discontinuation of ART can lead to drug resistance, HIV-related illnesses and death.It has been shown that individuals who miss visits in the first year of treatment have a higher mortality rate [2].Studies also show that retention of patients who are on ART treatment remains stable after 12 months of ART initiation, with loss to follow-up (LTFU) being the main cause of attrition [3].In resource-limited settings, it is common to find patients dropping out of ART treatment.The dropouts are usually attributed to LTFU [4].Due to significant dropouts, patients may not realize the benefits of ART.Previous studies have singled out associations between frequent LTFU and more severe opportunistic illnesses [5].More innovation is therefore required for further ART scale-up and to improve retention in care.
Patients who are actively receiving ART are particularly vulnerable to developing drug-resistant infections when virological failure occurs, which could potentially result in broad resistance to ART and transmission of drug-resistant viruses [2].Determining *Address correspondence to this author at the Division of Mathematics & Computer Science, University of South Carolina-Upstate, 800 University Way, Spartanburg, South Carolina, USA; Tel: +1 864-503-5362; Fax: +1 864-503-5930; E-mail: bomolo@uscupstate.educorrect patterns of LTFU and exploring factors associated with them is therefore crucial in identifying the patients who are at-risk of LTFU.Further, analyses of time to LTFU is useful in informing development of evidence-based interventions that improve patient outcomes [2].
In this paper, we fit a two-parameter Weibull distribution to time to LTFU data (HIV retention data).The probability distribution in this context provides an ideal tool to perform specific risk and probability calculations, and apply the results to make wellinformed decisions.This is in cognizance of the fact that an inappropriate distribution may lead to incorrect calculations and eventually wrong decisions.Therefore, finding a distribution that fits the data best is important, particularly to avoid time and financial loss that may arise from an invalid model selection.More about importance and reasons for fitting probability distributions correctly to the data have been elaborated by Hahn and Shapiro [6].
Time-to-failure data arises in many fields (e.g medical, public health, reliability, etc.) and is often assumed to follow the Weibull distribution.The popularity of the Weibull distribution is mainly because of its flexibility and ability to describe survival data [7].
Here, we adopt a smooth test of goodness-of-fit to fit the two-parameter Weibull distribution.The smooth test considered here is a score test obtained by extending the Neyman's goodness-of-fit approach [8], where the score test is obtained by nesting the null hypothesis in a larger class of probability distribution functions [9].Our study also seeks to validate the result in [10], which showed that the smooth tests for the Weibull distribution are more powerful than several other tests, including the Crammer-Von Moses, Kolmogorov-Sminorv and Andersen-Darling tests.Furthermore, fitting the Weibull distribution to data simulated from the log-normal and gamma distributions can be reliably assessed by smooth tests [11].
We fit the two-parameter Weibull distribution to an HIV retention data from two public hospitals in Kenya and assess the fit using a smooth test.Our event of interest is time to first LTFU.Other exits from the program (i.e.death, transfer-outs) are removed.Patients who are also active on ART at the end of the observation period are removed.From a routine programmatic arrangement, patients started on ART are expected to regularly attend clinic for either continuous monitoring or drug refill.A patient is considered to be LTFU if s/he fails to show-up within 48 hours after a scheduled appointment and s/he cannot be reached in any way.Testing for goodness of fit for LTFU is therefore important not only to determine which distribution best fits the failure-time data but also to enhance the accuracy in estimating the hazard function, which indicates the instantaneous risk of LTFU.In our formulation, we propose a score test for the null hypothesis that the failure-time data follows a two-parameter Weibull distribution.
To the best of our knowledge, there is no published article on fitting a parametric distribution to time to LTFU data.This paper will therefore significantly contribute the statistical methodology for analysing LTFU data that could give information that is vital in improving HIV care retention and streamline national policy in HIV programming.
In the section 2, we present the methods for our analysis, which includes the description of our data, modelling time to LTFU, the formulation of smooth goodness-of-fit for the Weibull distribution and simulations.We also discuss the performance of the smooth test.In section 3, we discuss the application and results of the test to a locally available HIV retention data.In section 4, we discuss the simulation and validation results.Finally, in section 5, we provide some concluding remarks.

Theoretical Framework
Suppose we wish to test the null hypothesis that X 1 , X 2 ,!, X n is a random sample from a specified continuous distribution with probability density function f (x; !) , where != (! 1 , ! 2 ,!, !m ) T is a vector of nuisance parameters.We can construct a smooth test by extending the Neyman's goodness-of-fit approach of nesting the null hypothesis in a larger class of probability distribution functions [8,9].
We embed the null probability density function in an order k alternative as follows: where h i (x, !) is orthonormal to f (x, !) and != (! 1 ,! 2 ,!,! k ) T is a vector of real parameters and C(!, ") is a normalizing constant that ensures !k (x,", #) integrates to one, i.e. (2) Testing for the goodness-of-fit of f (x; !) is equivalent to testing H 0 :! = 0 against H A :! " 0 .Assuming that the partial derivatives of the loglikelihood function together with their expectations exist, the derivation of the score test statistics using the maximum likelihood function for the observed random sample X 1 , X 2 ,!, X n has extensively been discussed by [9,10,12,13].
The log-likelihood function is defined as The partial derivatives for the log-likelihood function (3) generates the score function U ! and the asymptotic covariance matrix ! of U ! as follows [9, 14]: and where and The score statistic therefore takes the form Here the score function U ! = U ! (") has r th element (h r (X 1 ; !) + h r (X 2 ; !) +!+ h r (X n ; !)) / n and ! is the asymptotic covariance matrix of U ! . But where which essentially reduces to M = I k and the score test takes the form S( !0 ) .!0 is the maximum likelihood estimator of !under the null hypothesis and where Vj = 1 n i=1 n !h r (x i ; "0 ) .The score statistic for testing H 0 :! = 0 against H A :! " 0 is denoted by S k ( !) .The choice of k depends on !through the model-dependent modified Bayes information criterion (modBIC) given by in which, relative to BIC, twice the maximized loglikelihood has been replaced by the score statistic.We define k as the smallest order that maximizes modBIC k i.e. k = min{k !{1, 2,!, d} and modBIC k " modBIC r , r = 1, 2,!, d}.
This is also referred to as the selection rule [9].

Proof
Consider a sequence of independent and identical orthonormal samples h r (X 1 ; !), h r (X 2 ; !),!, h r (X n ; !) .By the orthonormality condition, The variable V i1 represents a standard score.Using the orthonormal conditions (i.e.E 0 [h i (X; !)] = 0 and !11 =1), the variable reduces to Applying the Central Limit Theorem, for each identical and independently distributed V i1 ,V i 2 ,! the sum of the standard score !i = V i1 + V i 2 +! tends to the standard normal distribution with mean 0 and variance 1 as the size becomes sufficiently large.That is Since the sum of squares of standard normal variates is ! 2 distributed, The test statistic of the smooth test ( i=1 k !" i 2 ) only depends on order k and not the sample size n .The smooth test of goodness-of-fit therefore stands out as the preferred test compared to other empirical GOF procedures that are usually affected by the sample size.Another feature of the S( !) statistic is that its components can be used to indicate alternative probability distribution that would fit a given dataset [15].

Smooth Test for the Weibull Distribution
The two-parameter Weibull distribution is defined as where ! is the scale parameter and ! is the shape parameter.
The orthonormal polynomials for the Weibull distribution for the first four orders are derived from the Extreme Value distribution because the distribution approaches Weibull as the sample size n becomes large.
The first five orthonormal polynomials are given as below [9,16]: where Vr = 1 n j=1 n !h j .The first four values of the score statistics for the two-parameter Weibull distribution are: and and We reject the null hypothesis for large values of the test statistics.

Empirical Goodness-of-fit Tests
Since we are dealing with complete data, we employ other standard empirical goodness-of-fit methods.The conventional empirical goodness-of-fit tests considered here are the Anderson-Darling ( A 2 ), the Kolmogorov-Smirnov ( D n ) and the Cramer-von-Mises ( ! 2 ) tests.
These tests are based on the departure between the empirical distribution function, F n , and theoretical distribution function, F n , of the sampled data.The null hypothesis is rejected when the difference is too large, which would suggest that the sampled data does not come from the underlying distribution.For the case of a Weibull distribution, we consider the Extreme Value Distribution [17] and therefore apply the empirical cumulative distribution function of ln( X i ) instead of X i [12,16].A measure of the difference from the empirical cumulative distribution function of ln( X i ) is computed against the estimated theoretical cumulative distribution function using the maximum likelihood method.That is, ) .The goodness-of-fit test statistics are defined as follows.

Simulation Studies
Simulations were conducted to compare the critical values and power of the empirical goodness-of-fit tests (Anderson-Darling ( A 2 ), Kolmogorov-Smirnov test ( D n ) and Cramer-von-Mises ( ! 2 )) and the smooth test.All computations were performed using the R package EWGoF .We generated samples from the two- parameter Weibull distribution, with scale and shape parameters set at 30 and 6, respectively.The number of Monte Carlo runs in each case was set at 1000.Samples of size n !{5, 20, 50,100, 500,1000} were generated and estimates of rejection probabilities computed.We compared the performance of the GOF tests at the 1%, 5% and 10% significance levels.Power was obtained as the percentage of rejection of the null hypothesis.Table 1 shows the power of the smooth tests of order 3 and order 4 for the Weibull distribution against the three common empirical distribution function tests (KS, CVM and AD).
Results show that the performance of all the tests is strongly linked to the shape of the simulated distribution.The empirical tests are biased in situations where ! is fairly minimal (close to 1), whereas when ! is sufficiently large (i.e.! > 0 ), they fail to detect the right distribution (Weibull).For smaller samples, the smooth tests are generally more powerful than empirical GOF tests.The components of the smooth test tend to be unbiased when ! = 5 , ! = 35 for large samples (i.e.n !500 ).
The calculation of the asymptotic distributions of the EDF statistics is as follows: ! n = (F n (z) " z) / n , where F n (z) is the EDF of the set of z i and tends to normality !(z) as n !" .

Data Description
We conducted a retrospective data analysis for all patients who were initiated ART at two public hospitals in Nairobi, Kenya (Makadara Health Center and Lungalunga Health Center) between October 1, 2011 and December 31, 2014.Considering that ART services were initiated in Kenya in 2003 across all public hospitals, we specifically extracted data from 2011 because by then all the public systems, processes and structures for defaulter tracing were expected to have picked up effectively.Our event of interest was time to first LTFU.The clinical setting considered here is routine regular Comprehensive Care Center (CCC) in typical government hospitals.Data is collected routinely whenever patients come for clinical check-up or drug refill.Since time to first LTFU was the event of interest, other exits (i.e.transfer outs and deaths) were not considered in the analysis.Patients who were actively receiving ART services and did not experience the event were also removed.Only patients who were observed from the time of ART initiation between November 1, 2011 and December 31, 2014, were included in the analysis.The time between ART initiation to first LTFU was given in months.Time to first LTFU was defined as missing routine clinical appointment within 48 hours from the scheduled appointment date and not identified as "active on ART", "dead", or "transferred-out".The time to first LTFU was calculated as the time interval between the dates of ART initiation and first drop out, as recorded by the ART database, IQCare.The cohort was stratified by gender (male and female), WHO Staging (WHO Stage 1, WHO Stage 2, WHO Stage 3 and WHO Stage 4) considered at the time of ART initiation and age groups (<10 years, 10 -14years, 15-24 years and 25+ years).Data was retrieved from a Health Information System(HIS) called IQCare without patients identifiers.Only variables of interest were pulled out to an Excel spreadsheet.Data was stored in Excel and thereafter analysed in R .Approval was obtained from Pathfinder International.

Motivation for Analysis of HIV Retention Data
Patients receiving ART can experience LTFU, which may result in discontinuation of treatment, drug toxicity, treatment failure due to poor adherence and drug resistance.This can result in an increased risk of death of up to 40% of ART patients in sub-Saharan Africa [18].Studies have shown that LTFU has negative impact on immunological benefit of ART and increases AIDS-related morbidity, mortality, and hospitalizations [18].Individuals who miss visits in the first year of treatment have a higher mortality rate [2].Stephen and co-authors [4] showed that retention of patients who are on ART treatment remains stable after 12 months of ART initiation, with loss to follow-up being the main cause of attrition [3].Previous studies have also illustrated associations between frequent LTFU and more severe opportunistic illnesses [5].Analysis of LTFU have also been used in HIV care to monitor and improve programme effectiveness, using patient retention as a measure of quality of care [19].
The main objective in the analysis of LTFU data was to check retention of patients in care.Programmatically, this is considered an important determinant of successful ART long-term outcomes.Patients who experience LFTU essentially get enrolled in other facilities with different regimen combinations, which is likely to compromise their immune system.Retaining patients for long allows provision of long term Highly Active Antiretroviral Therapy (HAART), tracking WHO staging, tracking immunosuppression profiles and evaluation of emergence of medication toxicities.In resource-limited settings, it is common to find patients dropping out of ART treatment.Due to significant drop-outs, patients may not realize the benefits of ART if they are LTFU.More innovation is therefore required for further ART scale-up and improve retention in care.

Modelling LTFU
The focus is on time to first LTFU, and data in this perspective is primary and has not been published or utilised in any publication.This is a typical Kenyan case, however, different types of LTFU are expected to reflect the general evolution of HIV programming.LTFU is expected to be a stable event that does not evolve much with time, at least in adults.However, in young children, the risk of the event is not likely.Heterogeneity is to be expected as it is well-known that there are various degrees of LTFU.The fact that LTFU is considered as a stable event in any HIV programming suggests that at least in adults there is no event-dependence and no time-dependence.The start time is the time of enrolment on ART.Patients are expected to come for drug refill and routine check-up.During the observation period, a patient can remain active (i.e.does not miss regular appointments), die, transferred-out or LTFU.

Cohort Description
A total of 4,981 patients were initiated ART between November 1, 2011 and December 31, 2014 in two public hospitals.Out of those initiated on ART, 854 patients LFTU and were therefore included in the analysis.The table below shows the patients' status.The median age of those lost to follow-up (n = 854) was 34.2 years (IQR 30.4 !38.4 ), and 59% (n=509) of them were female.Forty five percent of patients had advanced/severe immunodeficiency at the start of treatment, and 20% had WHO clinical stage 3 or 4 disease.The mean CD4 count was 449 (SD 9.3) at baseline.Characteristics at baseline during ART initiation is given below.

Graphical Assessment
We obtained probability plots to assess the validity of statistical distributions [20] to time to first LTFU data [21].Graphically, the Weibull distribution seems to be close to the P-P plot line compared to the Gamma and log-normal distribution.See Figures 1, 2, 3 and 4 below.

Model Fitting
In order to assess the model fit for the Weibull distribution, we obtained the Akaike Information Criterion (AIC) and the Bayesian Information Criterion (BIC).We also fitted the log-normal and the Gamma distributions to the same data.The AIC and BIC for the Weibull distribution (AIC = 5,276.6,BIC = 5,286.1)are lower than that of the Gamma distribution (AIC = 5,777.8,BIC = 5,787.3)and the log-normal (AIC = 6,009.4,BIC = 6,018.9)distributions.Therefore the model that fits the data best is the Weibull distribution.

Tests Performance
The smooth test generated here is constructed using orthonormal functions as opposed to quadratic forms [12].For the smooth test, the score statistics of order 3 and order 4 are given.The hypothesis regarding the distributional form is rejected by the three empirical distribution tests if their respective test statistic, D n , ! 2 and A n are greater than the critical value obtained from their tabulated values.Also their pvalues are considerably lower than the significance level of 0.01.These tests ( D n , ! 2 and A n ) are more powerful whenever the sample size is not large.In our situation, however, with a sample size of size of 854, the tests are misleading.The smooth test, on the other hand, rejects the hypothesis when considering upto order four and the p-value is quite large compared with the D n , ! 2 and A n tests.

Application Results
To demonstrate the importance of the smooth test of goodness-of-fit in a real life application, we examined an HIV retention data and fit a two parameter Weibull distribution to LFTU data.We assessed the fit using smooth tests of order 3 and 4 and then compared the results with the three empirical GOF tests.Other exits from the program (i.e.death, transfer-outs and active-on-ART) were removed.Essentially, we tested the null hypothesis that the Weibull distribution is the underlying distribution of time to first LTFU.The maximum likelihood estimates of the scale and shape parameters under the Weibull model were != 30.145and != 6.786 , respectively, and the resulting values of the test statistics were S 3 = 1.2529, p = 0.308) and S 4 = 1.66308, p = 0.409) .Hence the Weibull null hypothesis could not be rejected when using smooth tests, suggesting that the Weibull model is the best model for the duration between the start of ART and first LTFU.In comparison, the three empirical GOF tests rejected the null hypothesis.The one-sample Kolmogorov-Smirnov test ( D n = 0.055232, p = 0.01092 ), Cramer-von Mises test ( ! 2 = 1.0238, p = 7.947e "12 ) and the Anderson-Darling test ( A n = 1.9603, p = 0.09659 indicated significance deviations from the null distribution.This suggests that the smooth test is the most reliable test compared with the rest whenever the sample size is sufficiently large.

DISCUSSION
We have considered an extension of the smooth tests to non-censored data.For the application to noncensored data, we provided the orthonormal structure so that the smooth tests of order k =3 and k =4 are computed.We then evaluated the goodness-of-fit for the two-parameter Weibull distribution fitted on HIV retention data.The empirical GOF tests considered here were powerful whenever the sample size is small.This is consistent with [9,14,15,22].Our contribution in this article is unique in the sense that we are considering LTFU data generated from a typical clinical setting.
Lemeshko [23] investigated the gamma distribution with its parameters chosen so that it is closest to the Weibull distribution.The power test was used to asses both simple and composite hypotheses against the simple alternative.Although he found the Kolmogorov-Smirnov, Cramer-von-Mises and Anderson -Darling type nonparametric tests to be most powerful compared with the case when the estimates are found by minimizing the corresponding statistics, the comparison did not take into consideration the smooth tests.
Sururu [24] also did power comparisons in a simulation study of goodness-of-fit tests but the smooth test was not included in his assessment.Few authors have incorporated smooth tests when assessing the power of goodness-of-fit test.In particular, Ledwina et al. [22] showed through simulations that the data-driven version of Neyman's smooth tests performs very well over a wide range of alternatives and is competitive with other data-driven procedures.They also showed that the data driven smooth tests are consistent against essentially all alternatives [22].Rayner and Best [15] also showed that Neyman smooth test for locationscale families are flexible and can be chosen to improve detection of particular alternatives.These tests were shown to perform well against its competitors.This assessment is also consistent with our simulation results.The smooth test approach fails to reject the null hypothesis when considering up to order four.The p - value is quite large compared with the EDF tests and is therefore more appropriate when compared to other alternatives for large sample sizes.The smooth GOF test also produced the best estimate of the distribution of the data (Weibull distribution), which will ultimately result in a better estimate of the hazard function for time to LTFU for predicting hazard rates.The Weibull distribution is the best choice from the density plots and graphs and is validated by smooth GOF test.It is important to carefully choose the best GOF tests in order to make the correct inference about the underlying distribution.We have shown that the smooth test is superior and can be used to analyze time to LTFU data in order to determine the underlying distribution.These results agree with those of [25], who showed by simulation that the test for normality based on smooth test has much greater power than the generalized ! 2 test and the Kolmogorov-Smirnov test.
Kang [25] also demonstrated that the test performs generally as well as the Shapiro-Wilk, skewness, and kurtosis tests for a wide range of alternatives.
Several studies have shown that LTFU poses challenges to the successful implementation of ART programs [18].Studies have shown that patients who discontinued ART developed a rapid increase in viral load and depletion of CD4 cells, putting them at risk of opportunistic infections and early death.Therefore, understanding the underlying pattern and distribution of LTFU is necessary to making sound interventions that maintain adherence to ART treatment.In this study, the two parameter Weibull distribution fits the time to first LTFU well.Several authors [3,5,19,[26][27][28] have shown that the main reason in rising cases of LTFU to be poor patient's defaulter tracing in resource-limited settings.This is likely to compromise positive outcomes of ART in a large scale HIV care center.Pattern of LFTU are therefore crucial in developing practical programmatic interventions.

CONCLUSION
Loss to follow-up (LTFU) is an important problem both for the care of individuals and the evaluation of anti-retroviral treatment (ART) programmes in low-and middle-income countries.But the evaluation of ART programmes has been difficult because many patients are lost to follow-up [29].Thus, it is important to model LTFU so as to better understand the factors influencing LTFU.Most of these models are based on the twoparameter Weibull distribution (e.g.[29,30]) and are often selected without a prior test of goodness-of-fit of this distribution.Our study, however, evaluated the performance of the smooth test against that of the empirical goodness-of-fits tests for the Weibull distribution and validated the results using loss to follow-up data from HIV retention care.
The smooth goodness-of-fit approach performed better than the empirical GOF tests when fitting a parametric distribution to time-to-event complete data.We described how to fit a two parameter Weibull distribution to an HIV retention data and assess the fit using goodness-of-fit procedures.Our results highlight the need to better understand LTFU of patients initiated on ART.Estimation of nuisance parameters can be performed without changing the test statistics and since the tests rely on maximum likelihood techniques, they asymptotically meet the conditions of the Neyman-Pearson lemma against any simple alternative hypothesis.Future studies should address fitting hazard functions based on the Weibull distribution to censored data in order to determine risks to LTFU in HIV care.

Figure 1 :
Figure 1: Test for Theoretical Distributions.The Weibull Distribution is closer to the distribution.

Figure 2 :
Figure 2: Test for Theoretical Distributions.Here, the data appears to be more coherent with Weibull distribution.

Figure 3 :
Figure 3: Test for Theoretical Distributions.The PP-plot indicates that the Weibull distribution is the ideal distribution.

Figure 4 :
Figure 4: Test for Theoretical Distributions.The QQ-plot shows the Weibull distribution is much coherent with the data.

Table 4 : Comparison of the AIC and BIC for the Weibull, Gamma and Lognormal Distributions. All the Parameter Estimates were Obtained by the MLE Method
mean of the natural logarithm of LTFU, **standard deviation of the natural logarithm of LTFU. *