Estimation of Parent-Sib Correlations for Quantitative Traits Using the Linear Mixed Regression Model: Applications to Arterial Blood Pressures Data Collected From Nuclear Families

: A fundamental question in quantitative genetics is whether observed variation in the phenotypic values of a particular trait is due to environmental or to biological factors. Proportion of variations attributed to genetic factors is known as heritability of the trait. Heritability is a concept that summarizes how much of the variation in a trait is due to variation in genetic factors. Often, this term is used in reference to the resemblance between parents and their offspring. In this context, high heritability implies a strong resemblance between parents and offspring with regard to a specific trait, while low heritability implies a low level of resemblance. While many applications measure the offspring resemblance to their parents using the mid-parental value of a quantitative trait of interest as an input parameter, others focus on estimating maternal and paternal heritability. In this paper we address the problem of estimating parental heritability using the nuclear family as a unit of analysis. We derive moment and maximum likelihood estimators of parental heritability, and test their equality using the likelihood ratio test, the delta method. We also use Fieller’s interval on the ratio of parental heritability to address the question of bioequivalence. The methods are illustrated on published arterial blood pressures data collected from nuclear families.


INTRODUCTION
A heritable quantitative trait is a measurable phenotype that depends on the cumulative actions of many genes. These traits can differ among individuals, to produce a continuous distribution of phenotypes such as height, weight, blood pressures, high-and low-density cholesterol levels [1]. Quantitative traits also include molecular phenotypes such as gene expression levels [2]. Heritability analysis has been used for years to evaluate whether a given phenotype is influenced by genetic factors and how strong that influence is comparative to nongenetic risk factors. The general belief behind heritability analysis is that individuals who are more genetically related to each other should be more similar to each other for the phenotypes of interest [3].
The variation that exists in a quantitative trait can be divided into genetic and environmental components, and the genetic component can be additionally subdivided into additive dominance, and epistatic variances [4]. Estimation of the components of variance for a quantitative trait permits one to evaluate *Address correspondence to this author at the Department of Epidemiology and Biostatistics, Schulich of Medicine and Dentistry University of Western Ontario, London, Ontario Canada; Tel: (226-977-8651); E-mail: mmshoukr@uwo.ca, Shoukri.mohamed@gmail.com both the degree to which genetics influences the trait and the trait's underlying genetic style [5]. The most relevant applications of heritability studies are those concerned with the heritability of components of Metabolic Syndrome (MS). The conclusions were, the genetic correlations seemed to vary under different conditions [6,7].
There is an increasing interest in the parental effect on their children with respect to Body Mass Index (BMI) and Blood Pressure (BP) levels, as the leading risk factors of heart diseases. Studying the heritability of components of metabolic syndrome has attracted the attention of genetic epidemiologists. A recent UK cohort study found that familial influence on BMI among middle-aged women appeared to be stronger from mothers than fathers [8]. A Canadian study reported that the prevalence of overweight and obesity among children and adults has risen in Canada. Studies suggest that parent obesity is a risk factor for overweight and obesity in children. This analysis examined associations between biological parent and child body mass index (BMI) in a nationally representative sample of Canadian children [9]. The study concluded that biological parent and child BMI were significantly correlated.
While the aim of the above studies was to establish parent offspring correlation, the issue of heritability has not been adequately addressed. For example, the sampling unit in the Canadian study [9] consisted of a parent and a child only, and ignored other siblings in the same family. Under this restricted sampling design, we cannot estimate the genetic components of the heritable trait of interest.
The fundamental aim of our paper is to establish statistical model-based approach to estimate heritability as defined in Laird and Lange [10]. Specifically, we will establish formal statistical methodologies to compare the paternal to maternal heritability, when the sampling unit is a cluster of nuclear families consisting of parents scores as well as their offspring scores for the same measured quantitative traits. The paper is structured as follows: In section 2, we provide guidelines to the strategy of sampling nuclear families. We derive a formula for the optimal combination of the number of nuclear families and the optimal average sibship size needed to efficiently estimate parent heritability. In Section 3 we set up a regression model and identify the critical parameters that define parental heritability. We also derive moment estimators for heritability, the standard errors of the estimators, and construct confidence intervals on the ratio of parental heritability. We use the bootstrap methods to examine the distributional characteristics of the ratio estimator. In Section 4, and under the assumption of multivariate normality, we estimate the parent heritability and use the likelihood ratio test to evaluate the significance of the difference between maternal and paternal heritability. In Section 5, we illustrate the methodologies on published arterial blood pressures data. In Section 6 we present a general discussion

NUCLEAR FAMILY SAMPLING STRATEGY: ISSUES OF COST CONSTRAINS
From the above studies, and because of the adopted study designs, it is impossible to account for the within siblings' correlation in estimating the correlation between parent and single offspring sampled at random from within the family. It is argued, as an example in the work by Kempthorne and Tandon [11] that one way to estimate parent heritability is done by using the regression of offspring trait on the parental trail, when the unit of study is the entire sibship. To estimate the required sample size (number of nuclear families and the average number of siblings within the family), we start with a model in which offspring scores is regressed on the one parent score (say the mother score).
Let Y ij denote the score on the ! !! offspring in the ! !! family, and X i , the score of the ! !! parent, where j = 1, 2, ..., n i , i = 1, 2, ..., k , and n i is the number of offspring in the ! !! family and k is the total number of families. We assume that the regression of Y on X is given by: and E ij is the deviation of the ! !! offspring of the ! !! parent. We further assume that The most widely used estimator for β is given by the From [11] we have To analyze clustered data, one must therefore model both the regression of Y on X and the within cluster dependence. If the responses are independent of each other, then ordinary least squares can be used, which produces regression estimators that are identical to the maximum likelihood in the case of normally distributed responses.
An important question that we need to answer at this stage is; what is the optimal number of sibships in order to efficiently estimate b. An important factor that needs to be considered is the cost of sampling nuclear families. To simplify our presentation, we shall assume that the variability among sibship sizes is relatively constant and that we can safely replace ! ! by its average size, say n. In this case !(!) reduces to: Shoukri et al. [12] addressed the issue of obtaining the combinations (n, k) that minimize the variance of a specific estimator subject to cost constraints. In their attempt to construct a flexible cost function, they adhered to the general guidelines outlined by Flynn et al. [13]. First, one has to identify approximately the sampling costs and overhead costs. The sampling cost depends primarily on the size of the sample, and includes data collection costs, travel costs, management and other staff costs. On the other hand, overhead costs remain fixed regardless of sample size, including, for example, the cost of setting the data collection form. Following Sukhatme et al. [14] it is assumed that the overall cost function is given as: where c 0 is the fixed cost, c 1 the cost of recruiting an entire sibship, and c 2 is the cost of making a single observation within the sibship. Using the method of Lagrange multipliers [15], the objective function to be minimized G is given as where !"#( !) is given by equation (2.2), and λ is the Lagrange multiplier. The necessary conditions for the minimization of G are !G !n = 0 , !G !k = 0 and !G !! = 0 , with the sufficient condition for G to have constrained relative minimum given by a theorem in Rao [15]. Differentiating G with respect to n, k, and λ, equating to zero and solving for !, we obtain For the sibship size to meaningful, it is assumed that The interpretation of (2.5) is that we select larger sibships when R is large, that is when the cost of sampling an individual within a family is lower than sampling the entire sibship.

REGRESSION MODELS TO ESTIMATE MATERNAL AND PATERNAL HERITABILITY
Let the trait values of the ith family, of size ! ! , be ! !" , ! !" , ! !! , ! !! , ⋯ , ! !" ! . The joint distribution of these random variables is characterized by the following parameters: Here ! !" is the ith mother score, ! !" is the ith father score and ! !" is the score of the jth offspring in the ith family. Moreover, we assume, for all ! ≠ !. Following Mak and Ng [16] and Shoukri and Ward [17] we assume that model (2.6) has the following representation: and ! !" will have a mean zero and a covariance structure of intraclass correlation so that Using the definition of familial correlations provided by model (2.8) we can show that: Conversely, equations (2.9-2.12) give: Equations (2.13-2.14) give ensemble estimators of the regression of offspring on their parental values, and equation (2.15) provide the estimator of the error term.

The Genetic Model
To describe the model that has both environmental and genetic components we assume a base population that is characterized by presence of additive genetic variance, and random environmental component. To simplify the estimation of heritability, we further assume that the interaction between genes and environment is negligible.
Therefore, the measured trait for the jth offspring in the ith family is such that In equation (3.1), ! ! is the breeding value of the ith mother, ! ! is the breeding value of the ith father, and ! !" is an individual deviation due to genetic segregation, and ! !" is a random environmental deviation. Furthermore, we assume that: Note that the parental breeding values are not observable but may be estimated from the corresponding phenotype of the respective parent. This can be done under the assumptions: Under the additivity of genetic effects, we can then use equation (3.2) to define the heritability in the narrow sense [10] as: is the maternal heritability, and is the paternal heritability, as defined in the seminal work of Jacquard [18]. To obtain consistent estimators of (ℎ ! ! , ℎ ! ! ) we assume that (! !" , ! !" ) are uncorrelated with the displacements (! !" , ! !" ).
Inserting (3.2) into (3.1) we get: It can now be easily shown that The first order approximation of the variances and the covariance of the estimators of the regression parameters are: and This is equivalent to testing the equality of the regression parameters ! ! , ! ! . This hypothesis will be tested using: 1.

The Delta Method
From [19] the asymptotic variance of the ratio of two random variables is given by: var ! = ! ! + ! ! ! ! − 2!! !" /! ! ! We can therefore construct 95% confidence limits of the ratio ! so that: Lower 95 % confidence limit =! −1.96 var ! Upper 95% confidence limit = ! + 1.96 var ! It is well-known that the point estimator of the ratio parameter is biased. Using the DM, we have, to the first order of approximation:
From [20], we assume that the rejection region of the hypothesis, with size ! is such that:

MAXIMUM LIKELIHOOD ESTIMATION: THE NORMAL LINEAR MIXED MODEL.
Let , ! = 1, 2, … , ! be a random sample of k families. The likelihood function under model (2.2) will be given by The relevant part of the global likelihood function is given by (5.2) and will be used to test the null hypothesis ! ! : ! ! = ! ! = ! against all possible alternatives. We denote the maximized likelihood function under the alternative hypothesis by ! ! , and under the null hypothesis by ! ! . Note that the part of the likelihood function that is affected by the restriction on the null hypothesis is !! ! . Under the null we write !! ! as shown in (5.3): The null hypothesis is rejected when ! !" = 2[log( ! ! ) -log( ! ! )] exceeds ! !,! ! , the upper cut-off value of a one degree of freedom chi-square distribution at α level of significance.

DATA ANALYSES
The methodologies presented thus fare are quite general. They are applicable to quantitative traits believed to be heritable and the study sample are nuclear families that include parents and their biological offspring. We illustrate the methodologies on data sets described below.

Example: Mial and Oldham's Blood Pressures Family Data
The data used for illustration here are obtained from a survey that aimed at assessing the levels of similarity in systolic and diastolic blood pressure among family members living within 25 miles of Rhonda Fach Valley in South Wales. The data were published by Miall and Oldham [23]. Observations were made on parents and their offspring, with each observation consisting of systolic and diastolic blood pressures measured to the nearest 5mm Hg. However, among 250 sampled families, only 204 contained information on brothers and sisters, whose age is above 18 years and live in the same household. Because of the impossibly low systolic blood pressure (15mm Hg) for one daughter, another family was omitted leaving 203 families for the analysis that had data on both parents and their offspring. The average number of siblings per family was 4 and the standard deviation was 1.92.
Before that data analysis we used R packages to produce pairwise correlations for parents and siblings blood pressure levels. Figure 2 shows the correlations and the histograms of each of the measured traits. This is quite informative since, in addition to the magnitude and the direction of correlations we can graphically detect the level of skewness of the measured traits. All the needed summary statistics are given in Table 1.
In Table 2 we summarize the results of both the delta and Fieller's method. The delta method did not detect significance difference between the paternal and the maternal heritability for both types of blood pressures, since the interval includes unity. However,   In order to verify the validity of the above methods (Delta and Fieller) we used the Bootstrap Resampling (BTR) to investigate the empirical characteristics of the ratio estimators. From Table 2 we see that the bootstrap method is in agreement with the other for the SBP, however based on the bootstrap interval, the hypotheses of equality of parental heritability was supported for the DBP, and this is in contrast to the conclusion by the Fieller's method. According to [21] one should trust the bootstrap results because it is completely non-parametric and does not require any distributional assumptions for the data. In Figures 3  and 5 we show the histogram of the bootstrap values of the ratios for SBP and DBP. As can be seen the distribution is quite skewed. This skewness is seen in Figures 4 and 6 as the plot of the empirical quantiles depart markedly from the quantiles of the normal distribution.  Table 3 gives the results of the maximum likelihood estimation and the log-likelihood under the null and the alternative hypotheses. The likelihood-based approach has desirable asymptotic properties. The estimators are asymptotically normally distributed and asymptotically unbiased. For both blood pressure levels, the LRT detected significant difference between the maternal and the paternal heritability. The residual plots given in Figures 7 and 8 show that there are no outliers and they are almost normally distributed. We should not compare between the LRT and the other methods presented. This because the delta and the Fieller's methods test the significance of the departure of the ratios from unity. The LRT does not examine the ratio, rather it tests the significance of the difference from zero. Although, conceptually the two approaches are similar, the theoretical justifications are not the same.

DISCUSSION
A recent study from China [25] on the familial aggregation of overweight (obesity) and high BP existed in rural areas and low-income families in China, respectively, suggesting both social and familial environments, alongside the impact of genetic, are important factors for non-communicable disease (NCD) risk factors. Furthermore, within two generations, considering offspring with highest BMI and BP were found to live with parents both having higher than normal BMI and BP, and strong father-offspring and mother-offspring correlations of BMI existed without substantial differences, both mother and father are indicated to play important roles in primary prevention strategies, and there might be great potential of family-based intervention against obesity. The salient fact is that "Non-communicable diseases (NCDs) are the leading cause of death and ill health and account for seven of ten deaths worldwide". This study is quite important because the sampling units were nuclear families. However, the data analysis was descriptive and did not explicitly model the genetic components of variations.
A recent article appeared in the Lancet [26] outlined the WHO Sustainable Development Goal (SDG) target which is to reduce premature mortality from non-communicable diseases (NCDs) by a third by 2030 relative to 2015 levels. Among NCDs, heart disease is responsible for the highest risk of premature death in more than half of all countries for women, and more than three-quarters for men. Needless to say that elevated blood pressures levels are risk factors for heart diseases and examining the possibility of horizontal transmission from parents to their offspring is an issue of public health at the family level. Estimation and inference procedures on the heritability of quantitative traits has been of interest to geneticists and genetic epidemiologists, medical geneticists, and genetic counselors. The questions of interests are; does a quantitative trait cluster with families; and does maternal heritability is higher or lower than parental heritability.
We developed several inferential procedures to address both questions. All the suggested procedures, except the likelihood-based inference indicate that there are no significant differences in parental heritability for SBP, but differences exist for the DBP. The methods presented in this paper are quite general and are applicable to quantitative traits collected from nuclear families, and are applicable to molecular data as well. As we indicated in the introduction if a quantitative trait, such as blood pressure levels, BMI or wait-to-hip circumference are proven heritable, then the population risk of transmission of these traits from parents to their offspring may be reduced by genetic counselling through premarital screening strategy.