Analysis of Genetic Relationship Among 11 Iranian Ethnic Groups with Bayesian Multidimensional Scaling Using HLA Class II Data

Background: The key feature of Bayesian methods is their lack of dependence on defaults necessary for classical statistics. Because of the high volume of simulation, Bayesian methods have a high degree of accuracy. They are efficient in data mining and analyzing large volumes of data, and can be upgraded by entering new data. Objective: We used Bayesian multidimensional scaling (MDS) to analyze the genetic relationships among 11 Iranian ethnic groups based on HLA class II data. Method: Allele frequencies of three HLA loci from 816 unrelated individuals belonging to 11 Iranian ethnic groups were analyzed by Bayesian MDS using R and WinBUGS software. Results: like the results of correspondence analysis as a prototype of classical MDS analysis, the results of Bayesian MDS also showed Arabs from Famur, Balochis, Zoroastrians and Jews to be separate from other Iranian ethnic groups. Decreases stress in Bayesian MDS method compared to classical method revealed the accuracy of Bayesian MDS for HLA data analyses. Conclusion: This study reports the first application of Bayesian multidimensional scaling to HLA data analysis with Nei’s DA genetic distances. Stress reduction in Bayesian MDS compared to classical MDS showed that the Bayesian approach can improve the accuracy of genetic data analysis.


INTRODUCTION
Bayesian multidimensional scaling (MDS) is one of the graphical multivariate analyses which is often used for genetic data analysis [1][2][3].Bayesian theory which first proposed by Thomas Bayes in eighteenth century is based on formulating probability distributions to express uncertainty about unknown quantities [4,5], Bayesian statistics has been widely applied in different fields from economical sciences to medical researches and genetic studies [6][7][8][9].The most crucial characteristic of MDS is its simplification of complex data analysis by reducing the dimensions.The key feature of Bayesian methods is their lack of dependence on defaults necessary for classical statistics [5].
Human leukocyte antigen (HLA) genes encode the highly polymorphic molecules responsible for antigen presentation to T lymphocytes.Because of their high variability, HLA molecules are also considered the main problem in transplantation [10].The frequency of HLA alleles differs widely among populations [11] and genetic distances among populations based on HLA *Address correspondence to this author at the Department of Biostatistics, Shiraz University of Medical Sciences, Zand St., 71348-45794 Shiraz, Iran; Tel: +98 9351304346; E-mail: maleknias@gmail.comdata can be helpful in choosing better donor candidates for transplantation [12].
Correspondence analysis or classical MDS are usually used to explore genetic relationships among populations.Since Bayesian approach to MDS offers some advantages over similar classical MDS procedures, in this study we used Bayesian MDS to investigate if this method can improve our genetic analysis compared to classical MDS which previously has been used for analysis of these data.

Statistical Method
Because of its ability to determine relationships among datasets and due to its special type of geometric representation, MDS is widely used in different branches of science.A specific algorithm for a set of proximities is used to select a particular type of spatial representation, and then modeling is performed [15].When either the vectors of observations or the distances between stimuli are available, the best type of geometrical representation is classical MDS, also known as metric MDS.Graphical displays are obtained from a set of transformations on the D matrix using eigen values and eigen vectors to create a new matrix, D [15][16][17].
Different types of distance values have previously been used in MDS analyses [16].In this study, Nei's genetic distance, D A , was calculated with the following [18].
where x ij and y ij were the frequencies of the ith allele at the jth locus in populations X and Y, m j was the number of alleles at the jth locus, and r was the number of loci studied.
Goodness-of-fit tests were used to ensure that the number of dimensions was appropriate.In this study the stress value was used to evaluate the goodness-offit.The stress value was obtained using arrays of D and D matrices with the following formula [2,16,17].
Based on the criteria of Kruskal and Wish; stress <0.025 was considered excellent, between 0.025 and <0.05 was considered good, between 0.05 and <0.1 was considered fair, and stress 0.2 was considered poor [19].Where p(D) is the integrated likelihood [20].The integral was calculated by Markov chain Monte Carlo simulation with the Metropolis-Hastings algorithm [21].Heidelberger and Welch criteria were used to determine the diagnostic convergence of the chain [22].
The posterior density functions of the unknown parameters (X, 2 , ) were calculated as follows: The appropriate number of dimensions was obtained with the Multidimensional Scaling Information Criterion (MDSIC) according to the formula: . .
The optimal number of dimensions was obtained by minimizing MDSICp [2].

RESULTS AND DISCUSSION
Bayesian MDS was used to analyze the genetic relationships among 11 Iranian ethnic groups based on HLA class II data.Table 1 shows the minimum MDSIC detected in different numbers of dimensions for the different HLA loci.The improvements in stress with Bayesian MDS compared to classical MDS are shown in Table 2.Although the optimum dimensions were different for each locus, two-dimensional representation was used for further analysis.
The genetic relationships among 11 Iranian ethnic groups using Bayesian MDS analysis are depicted in Figure 1.The distribution pattern of the ethnic groups differed somewhat depending on which locus was used as the source of genetic information.When the allele frequencies at all three loci were considered, Azeris, Kurds, Parsees, Bakhtiaris and Arabs from Ahvaz clustered together, and Lurs from Luristan and Yasouj were located in a separate cluster.Arabs from Famur, Balochis, Zoroastrians and Jews separated from other ethnic groups as outliers.This might be explained by religious or cultural differences between these groups and other ethnic groups studied here.
Similar results were reported previously based on correspondence analysis [13].However that earlier study found that Arabs from Famur, Balochis, Zoroastrians and Jewes were well separated from other ethnic groups and were outliers, whereas the remaining nine ethnic groups were located in a single cluster.
To our knowledge, this is the first study to implement Bayesian MDS for HLA data analysis using D A genetic distances.Like MDS, correspondence analysis is a multivariate data reduction technique with a graphical output which uses the raw data to create a two-dimensional matrix [23].For HLA data analyses, MDS with D A and a one-dimensional matrix are generally recommended.We calculated D A based on distances among ethnic groups, and constructed a one-way matrix but MDS analyzed with Bayesian methods.As previous articles [24], the decreases stress in Bayesian MDS compared to classical MDS showed that the accuracy of MDS can be improved with Bayesian techniques.As shown by our concurrent use of all three alleles to calculate D A , highly polymorphic genes or the simultaneous study of different genetic loci are potentially helpful in enhancing the accuracy of estimates of genetic proximity with this approach.
Bayes' theorem for parameter estimation was obtained by calculating the posterior distribution of based on the given matrix D: p( / D) = p(D / )p( ) p(D) = likelihood prior int egrated likelihood p(D) = p(D / )p( )d( ) all values of the number of dimensions and D was the matrix of observed distances.The software used in this study, WinBUGS and R, used classical MDS results for the parameters of prior distributions and then estimated the posterior distributions and values with Markov chain Monte Carlo simulation.

Figure 1 :
Figure 1: Bayesian multidimensional scaling analysis showing bi-dimensional representation of the genetic relationship among 11 Iranian ethnic groups based on allele frequencies on HLA-DQA1 (a), DQB1 (b), DRB1 (c), and all three loci (d).

Table 2 : The Comparison of Stress in Classical MDS and Bayesian MDS in Optimal Dimensions (Three Dimensions) Data source Minimum MDSIC Stress for CMDS Stress for BMDS Improvement rate
CMDS, classic multidimensional scaling; BMDS, Bayesian multidimensional scaling.