Polymorphisms in MyoD1, MyoG, MyF5, MyF6, and MSTN genes in Santa Inês sheep

The objective of this work was to sequence the MyoD1, MyoG, MyF5, MyF6, and MSTN genes and to identify polymorphisms in Santa Inês sheep (Ovis aries). A total of 192 lambs with 240 days of age were evaluated, and these genes were sequenced to be compared with the reference sequence in the Ovis aries genome. Genotype and allele frequencies were estimated, and the Hardy-Weinberg equilibrium was tested. Fragments containing 2,493 bp (MyoD1), 1,836 bp (MyoG), 2,813 bp (MyF5), 1,126 bp (MyF6), and 2,380 bp (MSTN) were obtained, and, in these sequences, 160 variants were identified. These polymorphisms were distributed as follows: 59 (MyoD1), 24 (MyoG), 63 (MyF5), 4 (MyF6), and 10 (MSTN). One hundred and four were novel polymorphisms, 45 in MyoD1, 2 in MyoG, 56 in MyF5, and 1 in MSTN. Regarding site, 61 were in intron (27 in MyoD1, 16 in MyoG, 5 in MyF5, 3 in MyF6, and 10 in MSTN), 87 in coding region (22 in MyoD1, 8 in MyoG, 56 in MyF5, and 1 in MyF6), and 12 on 3’UTR (10 in MyoD1 and 2 in MyF5). Therefore, the MyoD family and MSTN genes have several polymorphisms in Santa Inês sheep, which can be useful for association studies.


Introduction
The MyoD family genes and MSTN gene have been used in studies to identify polymorphisms associated with growth, carcass, and meat attributes because their transcripts play a vital role in muscle development (Bhuiyan et al., 2009;Han et al., 2013). The MyoD family genes (MyoD1,MyoG,MyF5,and MyF6) are myogenic regulatory factors (Jin et al., 2016). The MyF5 and MyoD1 are responsible for the differentiation of myogenic cells into myoblasts and their proliferation (Vélez et al., 2017), while the MyoG and MyF6 are responsible for myocyte fusion as well as differentiation and maturation of myofibers.
Variants in the MyoD1 gene were associated with body traits in Stavropol sheep (Trukhachev et al., 2018), while Lôbo et al. (2012) observed an association between MyoD1 gene expression in Longissimus muscle and carcass yield of Santa Inês, Morada Nova, and Somalis sheep. Additionally, positive correlations between MyoG expression and body and carcass weights in Hu sheep were found (Sun et al., 2010), while MyoG variants were associated with body weight in beef cattle (Bhuiyan et al., 2009), and intramuscular fat in pigs (Stupka et al., 2012). Similarly, some MyF5 variants were associated with meat yield in New Zealand Romney sheep (Wang et al., 2017), and body weight in beef cattle (Bhuiyan et al., 2009); while association between MyF6 variants and meat dripping loss, daily weight gain, primal cuts, and lean meat yield was reported in pigs (Kapelański et al., 2005;Wyszyńska-Koko et al., 2006). All these studies revealed the importance of polymorphisms in the MyoD family genes for the genetic control of important traits in livestock.
On the other hand, the MSTN gene is a member of transforming growth factor-beta (TGF-β) family and is a negative regulator of muscle development. This gene inhibits the proliferation and differentiation of muscular progenitors during development, causing reduction of muscle mass of the animals (McPherron et al., 1997;Crispo et al., 2015). Hu et al. (2013) used interference RNA to inhibit the expression of myostatin in sheep and witnessed an acceleration of growth in these animals. Some MSTN variants were also associated with muscle mass increase in Norwegian White Sheep (Boman et al., 2010), crossbreed lambs (Hope et al., 2013), and New Zealand sheep (Han et al., 2013). In addition, effects on birth weight in Makoei sheep (Farhadian et al., 2012), and body weight in Madras Red sheep (Sahu et al., 2017) were also reported. Therefore, knowing the polymorphisms in MyoD family genes and MSTN gene and their frequencies is the first step toward using them in association studies.
The objective of this work was to sequence the MyoD1, MyoG, MyF5, MyF6, and MSTN genes and to identify polymorphisms in Santa Inês sheep.

Materials and Methods
The current study was carried out with the approval of the ethical committee for animal use from veterinary medicine and animal science school of Universidade Federal da Bahia (UFBA) (protocol number 02/2010). A total of 192 Santa Inês lambs were studies at 240 days old, of which 106 were born between 2010 and 2012 at the Pedro Arle experimental farm of Embrapa Tabuleiros Costeiros, in the municipality of Frei Paulo, in the state of Sergipe, Brazil, while the other 86 lambs were born in 2014 on the experimental farm of UFBA, in the municipality of São Gonçalo dos Campos, in the state of Bahia, Brazil. The Embrapa herd is a closed herd since the 80's, and the 106 lambs of this herd are progenies of seven unrelated sires. On the other hand, the UFBA farm is an open herd and no pedigree control was performed for the 86 animals in this group, because the mating occurred at pasture. Anyway, no full-sib animals were used in this study.
The blood sample (5.0 mL) was collected in EDTAcontaining vacutainer tubes and refrigerated at 4ºC. Leukocytes and DNA extraction were performed in Universidade de São Paulo, in the municipality of Piracicaba, state of São Paulo, Brazil, using salt precipitation and proteinase K digestion method (Oliveira et al., 2007).
Primarily, the magnetic beads were employed for purifying the amplicons. Indeed, the beads were homogenized to bind with the amplified products, followed by sample purification with 70% ethanol, which removes contaminants. Subsequently, the pellet was diluted, with the beads getting removed later. Further, considering the base pair size of the amplified products, the samples were diluted to 2 nM. Samples were quantified using Qubit fluorometer (Life Technologies, Carlsbad, USA), and diluted to 0.2 ng/μl for library preparation. The Nextera XT DNA sample preparation and the Nextera XT index (Illumina, San Diego, USA) were used to prepare the library; all steps were performed based as recommended by the manufacturer of the Nextera XT. Sequencing was performed on the MiSeq platform (Illumina, San Diego, USA), using the MiSeq Reagent Kit v2 (500 cycles).
The qualities of the reads were verified using the FastQC software (https://dnacore.missouri.edu/PDF/ FastQC_Manual.pdf). For the first data filtering, the SeqyClean software (Zhbannikov et al., 2017) was used, adopting a quality parameter of 24 (Phred quality score) for each base and a minimum length of 50 bp. Subsequently, the reads were aligned, against the reference sheep genome deposited in the NCBI (version Oar_v4.0), using the Bowtie2 program (Langmead & Salzberg, 2012).
The identification of polymorphisms "single nucleotide polymorphisms (SNP) and insertion or deletion of bases (INDEL)" was carried out in the SAMtools version 1.4 (Li et al., 2009), considering the position of polymorphisms in the reference sheep genome (version Oar_v4.0). Subsequently, the files from the SAM (Sequence Alignment/Map) format were converted to the BAM (Binary Alignment Map) format, followed by the removal of the PCR duplicates. Further, the sorting of the sequences was carried out, which followed the construction of the index of the ordered file. The variant call was carried out using the mpileup option of the SAMtools, covering a value set for the quality of the mapping through the genome reference (-q20) and a filter quality ≥ 40 in the Phred Pesq. agropec. bras., Brasília, v.54, e01132, 2019 DOI: 10.1590/S1678-3921.pab2019.v54.01132 (-Q40) scale. Furthermore, for converting the file form *.bcf to *.vcf, the bcftools command was employed. In this case, more than 99,999 reads for this variant were performed. Finally, the functional annotation of the SNPs and INDELs was performed using the VEP (variant effect predictor) for the online annotation of Ensembl (https://www.ensembl.org/index.html) to identify the locations of the mutations in the different regions of the genome and the likely functional effects of the variants.
The allelic and genotypic frequencies were estimated, for each variant found, and the observed and predicted heterozygosities were compared to test the Hardy-Weinberg equilibrium (HWE). The predicted heterozygosity (PH) was obtained using the equation: PH = 2 × (1 -MAF) × MAF, in which MAF is the minor allele frequency. A significance level of 0.0001 in the HWE test was used. Further, the Haploview software (Barrett et al., 2005) was used to test the HWE.

Results and Discussion
The number of samples amplified was as follows: 173 (MyoD1), 192 (MyoG), 191 (MyF5), 191 (MyF6), and 123 (MSTN). In Table 1  The present study revealed many novel variants (76.3%) in the MyoD1 gene of Santa Inês sheep (Table 2). However, only 32.2% of the polymorphisms showed HWE (p>0.0001). Deviation of the HWE can be caused by factors such as population substructure, genotyping error, selection, copy number variation or inbreeding (Graffelman et al., 2017). Genotyping error increase the observed heterozygosity (Chen et al., 2017), but in the current study the deviation of HWE in MyoD1 gene was a consequence of the low observed heterozygosity for many variants in this gene (Table 2). In the specific case of the gene MyoD1 in Santa Inês sheep, it is unlikely that sampling is the consequence of the reduced genotype frequency, because only two SNPs (g.34302200A>T and g.34301376G>A) had MAF ≤ 1. According to Chen et al. (2017), HWE caused by low observed heterozygosity is probably a consequence of natural variabilities such as population substructure or common deletion polymorphisms.  Of the 13 non-synonymous mutations found in the MyoD1, only two are in HWE (g.34301797T>G and g.34303195G>A). The SNP g.34301797T>G is a novel variant, having genotype frequencies of 97.7% (TT), 1.2% (TG), and 1.2% (GG), thereby further causing an exchange of Asp/Ala. On the other hand, the SNP g.34303195G>A is a non-tolerant mutation (SIFT = 0.03), which has frequencies equal to 96.5% (GG), 3.5% (GA), and 0.0% for (AA) and causes a replacement of Pro/Leu at amino acid 33. Therefore, these two mutations are almost fixed for the reference allele and their use in association tests will be difficult.
The SNPs that occurred in the region 3'UTR do not modify the protein, but polymorphisms in this region have the potential to change the expression of the genes studied because this region is a target site of miRNAs (Meister & Tuschl, 2004). In this region, the polymorphisms g.34301231G>GGC, g.34301304GA>G, g.34301332C>G, g.34301376G>A, g.34301388A>C, g.34301541A>C, and g.34301571G>T were found to be novel polymorphisms, with half of them being not in HWE.
A small sample size of Santa Inês sheep breed was studied here, but the genotype and allele frequencies obtained were very similar to the other sheep breeds reported earlier. For example, the SNP g.34302967A>G, located in exon 3, was also identified in the Stavropol sheep by Trukhachev et al. (2018), who reported frequencies of 17% for G allele and 0.0% for GG genotype. These values were very close to those found for Santa Inês in the current study, which was 19.1% (G) and 3.5% (GG).
A long fragment of MyoG (94.7%) was obtained, along with 24 variants (Table 3), of which the novel SNPs were g.196844G>A and g.198101CGG>CG. The SNP g.196844G>A is in intron 1 and showed HWE. The allele (92.4% G and 7.6% A) and genotype (85.4% GG, 14.1% GA, and 0.5% AA) frequencies allowed its application in the association studies; whereas, the indel g.198101CGG>CG is in intron 2 and did not exhibit HWE due to the large difference between the observed (0.5%) and predicted (10.4%) heterozygosities, which in turn restricts its use in association studies with Santa Inês.
Eight SNPs were located at exon 3 of MyoG, four of them being non-synonymous variants (g.198131T>G, g.198149A>T, g.198159C>T, and g.198304C>G). These variants were almost fixed, with respect to the reference allele, except the SNP g.198304C>G that showed allele frequencies of 83.1% (C) and 16.9% (G). However, this SNP did not show HWE, and the difference between observed (8.9%) and predicted (28.2%) heterozygosities indicates that this loco may be suffering the impact of selection processes in the Santa Inês breed. The similarity between the genotype frequencies of some SNPs found in the gene MyoG in the present investigation and those previously deposited in Ensembl database attest to the quality of the sequences obtained in Santa Inês. For example, the SNP g.198159C>T showed frequencies equal to 97.9% (CC), 2.1% (CT), and 0.0% (TT) in the Santa Inês sample size studied here, which was very similar to the values of 98.4% (CC) and 1.6% (CT) reported in the MOOA population of the project NextGen (Ensembl, 2018).
A fragment that represents 88.9% of the reference sequence of the MyF5 in NCBI was sequenced. In this fragment, 56 novel variants were identified (Table 4), all showing higher frequency for the reference allele compared to the mutant allele, and the MAF ranges from 0.5% to 5.5%. Therefore, carrying out association studies with MyF5 in Santa Inês sheep becomes difficult, as it requires a large sample for ensuring frequencies in all genotypes.
Despite the large number of polymorphisms found in the MyF5 in Santa Inês, the consideration of any error in Table 3. Allelic and genotypic frequencies of polymorphisms in MyoG in Santa Inês sheep and probability of Hardy-Weinberg Equilibrium (HWE) test to compare observed (OH) and predicted (PH) heterozygotes (1) .  sequencing had no basis, because some polymorphisms presented frequencies very similar to those observed in other populations of different sheep breeds. For example, the SNP g.116460689G>T showed allele frequencies equal to 96.9% (G) and 3.1% (T), which are very similar to those reported to the MOOA population in NextGen project (99% G and 1.0% T).
Of the 44 polymorphisms located in the exon, 20 polymorphisms lead to amino acid substitution in the protein, and at least three, i.e. g. 116460257C>T, g.116460430G>C, and g.116462045C>G, were nontolerant mutations (SIFT < 0.05). This number of polymorphisms in exons of MyF5 gene reveals the importance of sequencing genes in sheep because, according to Ensenbl database up to the present moment, in the MyF5 gene of sheep, 82 polymorphisms were already identified, of which only five are in exon region.
In the present investigation, although 81.1% of reference sequence of MyF6 gene in sheep were studied, only four polymorphisms on this gene were found in Santa Inês sheep (Table 5). This small number of polymorphisms was astonishing because according to the Ensembl database, there were 128 variants in different regions (68 in upstream, 49 in 3'UTR, 7 in introns, and 4 in exons) of MyF6 gene. Despite the small number of polymorphisms in MyF6, it will be easier use it in association study in Santa Inês sheep than the MyF5 gene, because the four SNPs found in MyF6 are in HWE and showed MAF which falls in the range of 6.1% to 26.3%.
The SNP g.116446029T>C was the only variant found in exon in MyF6 gene. It is a synonymous mutation located in exon 3 that have genotype frequencies equal to 1.6% (TT), 11.0% (TC), and 87.4% (CC). These frequencies are similar to those observed for MOOA population (0.6% TT, 5.6% CT, and 93.8% CC) in the NextGen project (Ensembl database). Thus, the small number of SNPs found in the gene MyF6 in Santa Inês sheep, in the current study, was probably a characteristic of the breed and not a consequence of error in sequencing or of small sample size. A total of 47.7% of MSTN gene was sequenced and 11 SNPs were found (Table 5), all in intron 1, being one novel SNP g.118142503T>C. This SNP showed genotype frequencies equal to 76.4% (TT), 22.8% (TC), and 0.8% (CC), and is in HWE, with MAF equal to 12.2%, so it can be possibly used for association studies. Indeed, all the SNPs identified in the gene MSTN were in HWE, with the MAF ranging from 4.5% to 45.5%, thereby can be possibly explored for the association studies. Polymorphisms in intron 1 of MSTN gene also have been identified in sheep (Farhadian et al., 2012;Ibrahim & Hickford, 2015), which reported association of polymorphisms in this region with growth, carcass, and meat attributes.
Genotype frequencies of SNPs found in the MSTN gene in Santa Inês sheep were similar to those already reported for populations in NextGen project. For example, the SNP g.118141355G>A showed allele frequency equal to 83.7% for allele A and 16.3% for allele G, where the values were very close to those observed in the MOOA (88% A and 12% G) and IROA (83% A and 17% G) populations.
In the current study, long fragments of MyoD family genes in Santa Inês sheep were sequenced, which allowed identification of several polymorphisms. Further, some results astonished, and highlighted the importance of studies regarding gene sequencing. For example, the MyF5 gene has many variants (63), but all are nearly fixed with respect to the reference allele. On the other hand, MyF6 gene has few variants (4), but all in HWE, with MAF that allows its utilization for the association studies. Another interesting result was the high number of variants in the MyoD1 gene (59), of which only 1/3 were in HWE. Additionally, there are reasons to believe that some non-synonymous mutations in the MyoD1 gene are deleterious because they are non-tolerant, as well as do not exhibit frequency in one of the genotypes. For the MSTN gene, upon comparing the MyoD1 family genes, a smaller fragment was obtained, which has many regions of interest, especially some exons and the 3'UTR, already associated with variables of interest in sheep, but remained unknown in the Santa Inês breed. (1) f(+/+) genotype frequency of homozygous for reference allele; f(+/-) genotype frequency of heterozygous; f(-/-) genotype frequency of homozygous for mutant allele; f(+) reference allele frequency; f(-) mutant allele frequency; NCBI (National Center for Biotechnology Information).
Moreover, samples of many animals did not undergo amplification for MSTN in the current study, which indicates that the primers used here may be annealing in a region where polymorphism exists.

Conclusions
1. There are 160 polymorphisms in the MyoD1, MyoG, MyF5, MyF6, and MSTN genes in the Santa Inês sheep, and of this amount 104 are novel polymorphisms, not yet described.
2. Some variants in these genes can be used in association studies about economic traits in sheep, especially the novel polymorphisms found in this work.