Maximum Likelihood and Bayesian Estimation of Repeatability Index: Application of Estimating Ratio of Variance Components

: An index of repeatability is constructed to evaluate the relative magnitude of measurement error. This index is constructed as a ratio of two variance components. Estimation of the index is derived under the one-way random effects model. We compare the well-known maximum likelihood estimator to the Bayesian estimation procedure using non-informative prior. Large sample variance of the of the maximum likelihood estimator are obtained using the inverse of Fisher’s information matrix and the delta method. Inference procedure using the. We also construct a test statistic on the equality of two repeatability indices using the Monte Carlo integration and sampling Importance re-sampling method. We illustrate the methodologies on the estimation of the index of repeatability of Gamma-glutamyl-transferase, an enzyme found in many organs all over the human body, with the highest concentrations found in the liver. This enzyme’s level is raised in the blood in most diseases that cause damage to the liver or bile ducts and is considered an essential serum marker for alcohol-related liver disease.


INTRODUCTION
The need for assessment of repeatability of disease predictors and biomarkers is ubiquitous in biomedical research. Such predictors should be well-defined and measured without errors. In practice, measurement error is a problem. Disease prediction is adversely affected if the predictors and other biological markers are not accurately measured, a problem known as "regression dilution bias". For example, in fields of clinical diagnoses, pathologists who may unreliably score tissue specimens for histology, radiologists, who for example score the Magnetic Resonance Image (MRI) scan, and the biochemistry laboratory technicians who wrongly evaluate hemoglobin constituents-analysis are committing analytical errors that affect the accuracy of disease diagnoses.
Apart from rater variability, some measurements are prone to biological variation. A well-known example is blood pressures.
Recently, Al-Eid and Shoukri [1], developed an index of repeatability estimated from the components of variations of the Analysis of Variance (ANOVA) model. This index focuses on scaled within subject variation relative to the between-subject of variation. It is desirable that this ratio be as small as possible to declare measurement repeatability. It should be realized that assessing the accuracy of the measured *Address correspondence to this author at the Department of Epidemiology and Biostatistics, Schulich School of Medicine and Dentistry, London, Ontario, Canada; Tel: +1519-661-2162; E-mail: mmshoukr@uwo.ca outcome by a single measurement will not help evaluate the magnitude of the measurement error, and at least two measurements are needed.
The pressing need to produce error free measurements are more needed than ever due to the emergence of advanced technologies in genomics, medical imaging, drug development, and their uses in disease diagnoses and predictions and identifying normal subjects in population-based studies. For example, construction of the normal range or reference range in clinical medicine relied on a large sample of healthy individuals where by a single measurement is taken from each subject [2][3][4]. Research has shown that the distribution of these measurements is affected by two main sources of variations; the between subjects and the within-subject components of variations [5]. We illustrate, by real-life datasets, the usage of the Maximum Likelihood Estimation (MLE) and Bayesian techniques in the estimation of repeatability of biomarkers that are essential for disease diagnosis and prediction.
The paper has two-fold objectives. Firstly, we compare the two used inferential procedures when the target population parameter is the index of repeatability as defined by Al-Eid and Shoukri [1]. This, to our best knowledge, have not been studied in repeatability literature. Secondly, apply the methodologies to a selected biochemistry data available from hospital registry. We intend to extend our techniques to the many disease biomarkers that are available in our data repository. This paper is structured as follows. In Section 2, we introduce the formal definition of the underlying model and specify the parameters of interest. We then derive the maximum likelihood estimators of the components of variation and that of the corresponding parameter of interest, namely Index of Repeatability (IR) assuming the normality of measurements. We use the Fisher information matrix to derive the variances of the estimated components of variation and then use the delta method [6] to derive the variance and the bias of the maximum likelihood estimator of IR. In Section 3 we address the issue of prior specifications of the model parameters, and derive the posterior estimator of IR. In Section 4 we describe the source of our data, and present the sampling strategy. In Section 5 we present the data analytic procedures and in section 6 we present general discussion.

MAXIMUM LIKELIHOOD ESTIMATION (MLE)
A uniquely defined parameter "Index of Repeatability" is the ratio of the within-subject biological variation to the between subject variation. To formalize the presentation, we assume that a single measurement ! !" from subject ! = 1,2, … ! with ! !! replication has the mathematical representation: (1) Here ! ! represents the sources of between-subjects biological variation, and ! !" represents sources of within-subject variations. The parameter ! is the population mean. Note that the assumption of additivity of components is made to simplify the presentation. But a multiplicative model may be made additive under the logarithmic transformation. It is further assumed that: and that ! ! ⊥ ! !" for all ! !"# !.
We define the "Index of Repeatability as" (IR) is as Therefore, and because the within subject variance components ! ! ! is in the numerator a small value of ! means high repeatability. Under the above model set-up, we can show that: The joint sufficient statistics (! .. , !!", !!") of the model parameters are given by:

= between subjects sums of squares
In terms of the familiar "Analysis of Variance" (ANOVA) set up we have, as shown in Table 1: Taking the natural logarithm of ! !|!, ! ! ! , ! ! ! and differentiating with respect to the model parameters, equating the partial derivatives to zero and solving the resulting equations we get the MLE of the model parameters: The variances of the estimated variance components may be obtained by inverting the Fisher's information matrix. The matrix is the negative of the second partial derivatives of the log-likelihood with respect to the parameters. Summarizing, we get: Using the Delta method [6] we derive the asymptotic variance of the MLE of !
Substituting in equation (4) and simplifying we get the asymptotic variance of the MLE of the parameter !: Using the delta method, we derive an approximation to the bias of the MLE of !. The expression for the bias from [6] is given by: Substituting the relevant quantities in equation (6), we get: In the next section we use the Bayesian methodology to derive the posterior distribution of the proposed index of repeatability.

BAYESIAN ANALYSIS
Much work on Bayesian methods as a scientific tool for statistical inference appeared in several books the most important of which are [7][8][9][10]. The first theoretical presentation of Bayesian methods in the analysis of one-way random effects models was introduced by Hill [11]. The work highlighted the complexity of the problem of estimation of variance components under the one-way ANOVA. One of the main ingredients for conducting valid Bayesian statistical inference is to correctly specify the prior distribution. Therefore, care should be taken in assigning priors in order to construct valid posterior confidence intervals. Spiegelhalter [12] employed several priors the most important of which is Jeffery's prior, as reviewed in [13]. The prior specifications for the variance components and the grand mean require a joint prior which we denote by Specifically, we shall consider priors assuming that ! is independent of ! ! ! !"# ! ! ! such that: In most situations the non-informative prior ! ! = 1 is used. Therefore, the joint posterior distribution is then given by marginal posterior of ! ! ! !"# ! ! ! given as Following [11], we use Jefferey's the prior specifications: (8), the joint posterior distribution of ! ! ! and ! ! ! is thus given by: In the joint posterior distribution given in (9). Let us consider the transformations The Jacobian of the transformation is thus given by: Substituting in equation (9) and integrating η out we get the posterior distribution of !, as shown in equation (10) Lemma: For small values of ! !"#ℎ !ℎ!" 1 + ! ! ≈ 1, the posterior distribution of ! is such that: where, F denotes a random variable that has F-distribution with degrees of freedom ! ! = k-1, and ! ! = !(! − 1).
Solving equation (11) for ! we get: We shall derive the posterior means and variance of ! using direct simulations. This equivalent to Monte-Carlo Integration (MCI) as shown in Gelfand and Smith [14]. We construct 2.5% and 97.5% quantiles on the difference between the IR for males and females using the Monte Carlo simulations and the Sampling Importance Re-sampling method (SIR).

DATA SETS
The biochemistry data of the King Faisal Specialist Hospital and Research Center (KFSHRC) includes results of 28 common tests that were retrieved from January 2014 to June 2016. These include, sodium, potassium, phosphorus, and among others, gamma-glutamyl transferase. The task force charged with the determination of the normal range for these data designed an action plan whose first step is to establish an acceptable validation criterion. The data were filtered to include the information of test name, value, gender, age, and sample type, and then exported to Excel data sheet for statistical analysis. Before a validation parameter was selected, the data manager was directed to include subjects with complete data (no missing test values) that had three consecutive readings before the proposed validation procedure is carried out. The target parameter selected for this study was Gamma-glutamyl-transferase (GGT).
The GGT is an enzyme that is found in many organs all over the body, with the highest concentrations found in the liver. GGT level is raised in the blood in most diseases that cause damage to the liver or bile ducts and is conceded as essential serum marker for alcohol-related liver disease [15].
The GGT's level not only predicts liver disease but also is associated with increasing the risk to numerous conditions and diseases including coronary heart disease, type-II diabetes (T2D) [16], and stroke [17,18].
The literature from different gender and ethnic groups across several populations reliably demonstrate robust predictive influence for the GGT. The first American epidemiologic study to test the GGT level was between 1978 and 1982, with 3,853 participants [16].
In general, all reported studies suggest that serum GGT is a strong predictor of numerous diseases such as diabetes, hypertension, liver damage and certain types of cancer [18]. The GGT as predictor of components of metabolic syndrome and heart diseases was reported in [19].
We divided the GGT data into two groups; male (15 subjects) and female (33 subjects). In order to assess the repeatability of the targeted enzyme (GGT), only subjects who were assessed in three occasions (phases) were included in our investigation. Due to these strict inclusion criteria, in addition to time and cost constrains, the male group was small in size and the two groups were not balanced.

METHODS AND DATA ANALYSIS
The main objective of the data analytics is to estimate the MLE of the IR and the Bayesian methods and draw comparisons. The analyses were performed separately for each group. It is also of interest to test the hypothesis of equality of IR in both groups using Wald's confidence interval when the MLE is used We then use the posterior distribution of the difference between male and female indices to construct the highest posterior interval on the difference between the posterior means of the two indices of repeatability. In the Bayesian analysis we shall estimate the quantiles of the difference using both Monte-Carlo integration and Sampling Importance Re-sampling techniques. Figures 1 and 2 show the line graphs of the three reading for each individual subject in the two groups. As can be seen there is no mush variations among the three readings (small within subject variations) as compared to the between subject variations (large between subject variation). In Figures 3 and 4 we present the boxplots for female and male data to explore the possibility of outliers. There are no outliers in the male data, while the female data has six outlying observations. We decided not to exclude theses outliers from the analyses.

Data Analytics
In Table 2, we present the needed summary statistics for the maximum likelihood and Bayesian analyses. Using equation (6) we note that the bias of the MLE is extremely small, and for all practical purposes is considered zero. In Table 3 we present the MLE of the IR parameter, its standard error, which is the square root of the variance given by equation (5). The upper and lower limits of an 95% confidence interval are given as well.  In Table 4 we present the posterior mean and standard deviations obtained from direct simulations for both groups. The R code is quite simple and is presented for male data in Appendix A The posterior means and standard errors were derived using Monte-Carlo Integration following the methodology in [20] It is of interest do derive inference on the difference between two repeatability indices. We can use Wald's interval to construct 95% confidence interval of the difference parameter: This interval is given by: The variances in equation (13) are given by equation (5).
We can use the Monte Carlo Integration method to construct the Q-2.5 and Q-97.5 limits on the difference using the Sampling Importance Resampling (SIR) as explained in [20]. The results are given in Table 4b. Note that the SIR is useful when the analytic expression of the posterior distribution is not available. Relaxing the assumption that 1 + ! ! ≈ 1 does not necessarily hold, the SIR is a method of choice.

COMMENTS AND DISCUSSION
In this article, we discussed two well-known techniques of inference, maximum likelihood and    Bayesian methods to estimate repeatability as defined in [1]. We demonstrated for the GGT data that the maximum likelihood produces a very small bias even in small samples. However, the precision will be definitely increased when larger samples were made available.  For Bayesian inference, we used Jeffreys' prior because it is well documented in the literature [11,13] and is much easier to derive when compared to other priors. For complex models, like ANOVA with repeated measures, analytical derivation of a reference prior could be very difficult, and indeed problematic [11].
Another issue that warrants further investigation is related to the SIR method. As can be seen from Figure  6, most of the weights in the histogram of the weights are concentrated around zero. This means that in this approach the same small set of values are repeatedly selected. This leads to histogram similar to the one in the bottom of Figure 6. One explanation of this problem may be attributed to the poor choice of the suggested distribution where most values are from the actual posterior density are positioned. We note also, in comparing between groups, we avoided the traditional hypothesis procedure, and focused on confidence interval construction, following the recommendations given in [21].
Finally, this study focused on cases when the underlying distribution of the selected biomarker is normal. In practice the distributions of many biomarkers are far from being normal or even symmetric. In addition, the distributions can be heavy-tailed. Extending the Bayesian approaches to analyze repeatability of biomarkers whose distributions are non-symmetric in the presence of multiple observations for each subject is an open area for research.

CONFLICT OF INTEREST
None declared by both authors.