Single Nucleotide Polymorphisms Detected and In Silico Analysis of the 5 ’ Flanking Sequence and Exon 1 in the Bubalus bubalis Leptin Gene

The leptin plays a critical role in the regulation of reproductive and immune function in humans, it is at the centre of the complex networks that coordinate changes in nutritional state with many diverse aspects of mammalian biology. In this study, we have sequenced the 5’ flanking region and exon 1 of the leptin gene in buffalo, and have detected eight single nucleotide polymorphisms; we have made evidence, through in silico analysis that many of them fall within putative binding sites for transcription factors. Starting from the bovine whole genome shotgun sequence, that encodes the complete sequence of the leptin gene, we had designed primers to amplify two amplicons, so to cover the 5’ flanking and exon 1 of the leptin gene of 41 non related buffaloes. The newly sequenced buffalo fragment was submitted to profile search for transcription factor binding sites, using the MATCH program, focusing on the areas where the single nucleotide polymorphisms had been detected. Our analysis shows that the majority of the identified single nucleotide polymorphisms fall into the core sequence of transcription factor binding sites that regulate the expression of target genes in many physiological processes within mammalian tissues. Because the leptin gene plays an important role in influencing economic traits in cattle, the novel detected single nucleotide polymorphisms might be used in association studies to assess their potential of being genetic markers for selection.


INTRODUCTION
The leptin was identified in 1995 as the product of the obese gene and a hormonal signal that regulates energy balance in mice [1,2,3].The identification of leptin has uncovered a new endocrine system regulating body weight through a negative feedback loop that maintains homeostatic control of adipose tissue mass by modulating the activity of neural circuits that regulate food intake and energy expenditure [4].In human, Farooqi and O'Rahilly [5] describe how the identification of mutations in the gene encoding leptin and the characterization of the associated clinical phenotype of congenital leptin deficiency (hyperphagia, severe obesity, hypogonadism, and impaired immunity), has defined the role of leptin-responsive pathways in the regulation of eating behaviour, intermediary metabolism, and the onset of puberty.They also demonstrate that leptin signaling plays a critical role in the regulation of reproductive and immune function in humans, which places leptin at the centre of the complex networks that coordinate changes in nutritional state with many diverse aspects of mammalian biology.
The leptin gene is highly conserved across species and is located on chromosome 4q32 in the bovine [6].Taniguchi [7] had isolated a bovine genomic clone that *Address corresponding to this author at the Animal Production Research Centre, Via Salaria31, 00015 Monterotondo Roma, Italy; Tel: +39-0690090215; Fax: +39-069061541; E-mail: francesco.napolitano@entecra.it contained about 3-kb in 5'-flanking region upstream from the putative transcription start site and shown that this DNA region disclosed putative binding sites for known transcription factors consistent with those of the promoter region in human and mouse.The same authors reported that the gene is composed of three exons, spanning around 18.9 kb of the genome.Polymorphisms in the promoter region of bovine leptin gene have been associated with: a good milk yield, energy balance, fertility and protein yield [8,9]; serum leptin concentration, performance traits, feed intake and measure of body fatness [10]; and perinatal mortality in dairy heifers [11].
Due to the evidence of the role played by the leptin gene in affecting important economic traits in cattle, the present study aimed to produce the sequence of the 5' flanking region of this gene in buffalo, to look for single nucleotide polymorphisms (SNP) and to make evidence of putative binding sites for transcription factors (TFBS) through in silico analysis.
Amplicons and sequencing products were purified with Agencourt AMPure XP and Agencourt CleanSEQ, respectively (Agencourt Bioscience Corporation) and following elution in 50 µL reagent grade water.The use of further sequencing primers (Table 1, lines 3-6) was necessary because of the size of the first amplicon (1141 bp) and because of the higher CG content of the second one; three of the sequencing primers (PCR2, PCR3, PCR4) were reported by [9] so that the same names were maintained in this work.Direct sequencing was then performed using the Big Dye terminator v3.1 on Applied Biosystems 3500 Genetic Analyzer.The sequences were processed using the Sequencing Analysis v 5.3.1 software.

Single Nucleotide Polymorphism Detection and In Silico Analysis
The SNP detection was performed by comparing the 41 sequenced individuals using the Basic Local Alignment Tool (NCBI).We used a chi-square test to test for significant deviations from Hardy-Weinberg equilibrium at each SNP position.The MATCH TM program [13] in the TRANSFAC ® Professional 10.2 http://www.biobaseinternational.com/[14] was used to perform the profile search for the TFBS on the sequenced buffalo fragment, focusing on the areas where SNP had been detected.The vertebrate binding matrices and the only high quality matrix option were used in the profile selection, and the cut-offs for core and matrix similarity were set to 0.999 and 0.7, respectively, without any change in the other options.

RESULTS AND DISCUSSION
The novel obtained buffalo sequence was deposited in NCBI [GenBank: JF681145] [15], and showed an homology of 96% with the bovine sequence.
Eight SNP were made evident within the 41 genotyped buffaloes (Table 2).Three SNP (g.83 A>G; g.121 A>G; g.283 A>G) showed full linkage disequilibrium in the analysed sample, with no significant deviation of the genotype frequency from the expected frequencies according to Hardy-Weinberg; on the contrary, the distributions of the genotypes at the remaining SNP were not in agreement with Hardy-Weinberg equilibrium (P = 0.05÷0.00001).SNP g.83 A>G falls in the core consensus sequence (5'-TGACAG-3') of the Meis1/Hoxa9 transcription factor [16].Meis1 is a homeodomain transcription factor coexpressed with Hoxa9 in most human acute myeloid leukemias and exerts its oncogenic functions through transcriptional activation of target genes like the tyrosine kinase oncoprotein FLT3 [17].
The SNP g.90 A>G falls within two overlapping consensus sequences for putative binding sites of transcription factors, from 86 to 96 bp of the GenBank: JF681145, 5'-CCGGA-3' (c-Ets-1) and 5'-GATCT-3' (GATA-3).The Ets family is one of the largest families of transcription factor whose members are identified through a highly conserved domain that binds to DNA sites with a central GGA sequence [18].Ets factors have crucial roles in pituitary gonadotrope and lactotrope biology.Gonadotropes synthesize and secrete the glycoprotein hormones luteinizing hormone  [19].
The GATA motif is a common nucleotide sequence found in the transcriptional regulatory regions of numerous genes.In vertebrates, these motifs are bound by one of six factors (GATA-1 to GATA-6) that constitute the GATA family of transcriptional regulatory proteins.GATA-1, 2, 3 are expressed in hematopoietic cell lineages and are essential for erythroid and megakaryocyte differentiation, the proliferation of hematopoietic stem cells and the development of Tlymphocytes [20].GATA-3 plays a central role in regulating Th1 and Th2 (T-helper) cell differentiation and is known to regulate genes encoding the signature cytokines of Th2-cells, interleukin-4 (IL-4), IL-5 and IL-13 [21].Hosoya [22] emphasize overwhelming evidence supports the hypothesis that GATA-3 must play a central role at multiple stages of T-cell development, and therefore, it must be a fundamental contributor to many steps in the overall framework from the birth to the death of T cells.The same transcription factor is shown to be necessary for mammary development and for the maintenance of luminal cell differentiation in mice adult mammary gland [23].
The SNP g.121 A>G falls at the end (5'-AGTCA-3') of the binding site of the Activator Protein-1 factor (AP-1).The 5' flanking region of this site overlaps the DNA binding site for fork head homolog (HFH-1, -3, -8), that is not influenced by the point mutation.Forkhead box (FOX) proteins are a family of transcription factors.Originally, they were given vastly different names (such as HFH, FREAC, and fkh), but in 2000 a unified nomenclature was introduced that grouped the FOX proteins into subclasses (FOX A-S) based on sequence conservation [24].A large number of family members have been discovered, especially in vertebrates, with a central role not only during development, but also in the adult organism.The misregulation and/or mutation of FOX genes often induce human genetic diseases ranging from infertility to immunological defects [25].On the other hand, the transcription factor AP-1 is considered a switch for many signals [26] in fact biochemical evidence showed that AP-1 is not a single transcription factor, but a series of related dimeric complexes of Fos and Jun family proteins, that regulate the expression of target genes that control a number of physiological processes within the cell, including cell cycle progression, tissue remodelling and invasion, and angiogenesis, cell migration, cell differentiation and apoptosis [27].
Nucleotide sequence from nt 71 to nt 121 [GenBank: JF681145] encodes 3 SNP that fall in overlapping putative binding sites of four transcription factors : c-Ets-1, GATA-3, FOX and AP-1; the overlapping of one transcriptional factor's binding site with another may commonly exist, and might result in the formation of protein complexes involving several protein-DNA and protein-protein interactions, enabling the generation of higher order nucleoprotein complexes that enhance transcription [28].In particular for AP-1 dimers, numerous studies demonstrate the ability of nuclear proteins to bind to each other and to mediate cooperative DNA binding and promoter activation when their respective binding sites are juxtaposed [27,28,29,30].The detected SNP within the considered sequence (g.90 A>G and g.121 A>G) deactivate three of the four transcription factors (c-Ets-1, GATA-3 and AP-1).Wang [31] reported that the absolute dependence on the binding sites for Ets-1, AP-1 and GATA-3 together with the strong synergy between Ets-1 and AP-1 suggest close cooperative interactions between the three transcription factors in the regulation of IL-5 expression in mouse T cells.
Allele T of the SNP g.959 G>T allows the activation of the consensus binding site 5'-AATTA-3' for the S8 homeobox gene, subsequently named Prrx2 in mouse and PRRX2 in human (paired related homeobox gene) [32].The product of this gene has been shown to be a transcription factor containing a highly conserved homeodomain that enables the protein to bind to specific DNA sequences that contain an ATTA core [33].The S8 has been implicated in human diseases and congenital abnormalities [34].
The SNP g.1010 A>C falls in the consensus sequence of the binding sites for Tal-1 (T-cell acute lymphocytic leukemia -1; 5'-ggatacAGATGtgaaa-3') and v-myb (myeloblastosis viral oncogene homolog (avian)-like 1; 5'-aaaAACGGa-3').However, the in silico analysis with the MATCH TM software indicated that the mutated allele does not deactivates the transcription sites.
The SNP g.1254 G>A falls in the core consensus sequence (5'-AGCTG-3') for the HEN1 transcription factor, which is activated by the presence of allele A. HEN1 (also known as NSCL1 and NHLH1) was identified by [35] as belonging to a subgroup of bHLH (basic Helix-Loop-Helix) genes that play a role during the development of the mammalian nervous system [36,37,38].Recently, Steinhoff [39] performed a computational expression analysis of human and mouse imprinted genes in a variety of non-cancerous tissues.In particular, they explored the role of predicted TFBS in correlation to tissue specific expression.Most genes in the mammalian genome are expressed from both parental alleles.Imprinted genes represent a minority of genes, which are transcribed from only one allele.These genes have been hypothesized to play a major role in the regulation of embryonic growth, to control placental function and to modulate the transport of nutrients from mother to embryo [40,41,42].Steinhoff [39] showed a remarkable correlative association of the tissues with distinct expression profiles (placenta, adrenal gland, ovary and pituitary) with specific TFBS in the promoter regions of imprinted genes.In particular, HEN1 showed a very pronounced up-regulation in ovary, pituitary and adrenal gland.This SNP had been already detected by [43] in four different buffalo breeds, including the Mediterranean, for which they reported a frequency of the G allele of 0.52, as well as a higher frequency in the Murrah breed (0.69); while the less specialized breeds (Carabao and Jafarabadi) showed a very low frequency (0.18 and 0.03, respectively).Because in the here analyzed sample, the frequency of the G allele was 0.79, it can be inferred that the non-selected dairy breeds, i.e. those that are triple-purpose animals with higher adaptation to harsh environments, maintain the capacity to activate some TFBS more than the selected dairy breeds.

CONCLUSIONS
It is interesting to note, from the analysis of the genotype frequencies (Table 2), that homozygous animals are extremely rare at the SNP (g.83 A>G; g.121 A>G; g.256 A>G and g.959 G>T) that fall within transcription sites that are referred, in human or mouse, to cause genetic diseases.The analysis of gene expression would be necessary so to assess that the detected mutations affect important economic traits as reproductive/productive parameters and disease resistance.In the meantime, because the leptin gene plays an important role in influencing economic traits in cattle, the novel detected single nucleotide polymorphisms might be used in association studies to assess their potential of being genetic markers for selection.

Table 2 : Detected SNP in the Promoter of the leptin Gene in Buffalo (GenBank: JF681145)
Scatà et al.