Candidate mutations used to aid the prediction of genetic merit for female reproductive traits in tropical beef cattle

In this study, we aimed to provide a wet laboratory validation for a set of single nucleotide polymorphisms (SNP), which had been identified as candidate functional variants in silico. Genotyping for candidate SNP was performed in Brahman and Tropical Composite cattle. After quality control, 29 SNP were first investigated individually for their association with female reproductive traits and then used as a panel for genomic predictions. The reproductive traits studied were age at first corpus luteum (AGECL; days), post-partum anoestrus interval (PPAI; days), and a binary trait that described if the cow had ovulated before weaning the first calf or not (PW, 0-1). Single nucleotide polymorphisms in six genes (FOXA2, TRAF4, IRF2, IRF1, BPTF, and CPEB1) were found to be significantly associated with reproduction traits . The genomic prediction method used was BayesR, to accommodate the 29 new SNP and compare their performance with predictions based on 50K genotypes (Illumina SNP chip). When new SNP and PLAG1 mutation rs109231213 were included in the genomic predictions for female reproductive traits their accuracies improved. The best predictions were obtained by combining the new SNP and the 50K SNP using BayesR analysis, with a 4% improvement in accuracy. The proportion of the genetic variance explained by the new SNP together was 0.07 for AGECL, 0.03 for PPAI, and 0.02 for PW. It would be favourable to include these new SNP in future versions of bovine SNP chips to target selection for female reproductive traits. These new SNP are likely to improve genomic predictions for female reproductive traits in tropical beef cattle breeds, with varying degrees of Bos indicus content.

Causative mutations could improve GEBV across breeds (Saatchi et al., 2014).However, causality is difficult to prove, and so, we turn our attention to "functional mutations", such as non-synonymous SNP that alter the peptide, or regulatory SNP that impact gene expression.Gene networks and pathways have been used to select functional mutations that yield more accurate GEBV than high-density SNP chip (Snelling et al., 2013).Selected SNP panels formed by functional mutations may improve the portability of GEBV across breeds and into crossbreds (Snelling et al., 2012;2013).The use of causative mutations yielded an increase in accuracy of 2.5-3.7%, when using sequenced genomes (Meuwissen and Goddard, 2010).When using sequenced genomes, biological information regarding SNP can also improve accuracies (Perez-Enciso, et al., 2015).Single nucleotide polymorphisms panels composed of functional markers could aid across breed predictions and aid adoption in beef industries.
The concept of the present work was to select functional SNP mapped to transcription factors (TF) associated with female reproduction.These TF regulate the expression of a set of genes associated with post-partum anoestrus and pregnancy outcomes (Fortes et al., 2014a).Initially, 140 SNP mapped to TF and not represented on SNP chips were used to predict pregnancy outcomes.Selected TF originated from genome-wide association studies (GWAS; Hawken et al., 2012), microarray data (Fortes et al., 2014b), and RNA sequencing (Canovas et al., 2014).Genotypes for putative functional mutations (n = 140) were imputed in an independent crossbred population of 1,988 cows.These in silico genotypes were used in genomic predictions and accounted for 29% of additive variation in rebreeding outcome (Fortes et al., 2014a).The current study sought to provide a validation for the in silico trial by genotyping selected SNP.
Nonlinear genomic prediction methods that emphasize the discovery and use of functional mutations increased the accuracy of genomic prediction for dairy cattle (Kemper et al., 2015).We hypothesised that combining additional functional mutations and nonlinear methodologies could result in more accurate genomic predictions.
The female cattle populations used in this study were from two breeds: Brahman and Tropical Composites.These populations and measured phenotypes have been previously described (Hawken et al., 2012;Johnston et al., 2009;2010;2013).Brahman heifers are considered pure Bos indicus, while Tropical Composites are a Bos indicus-Bos taurus mixture breed.
We investigated three female reproductive traits measured early in life: age at first corpus luteum (AGECL; days), post-partum anoestrous interval (PPAI; days), and post-partum anoestrous interval with respect to weaning (PW; binary).The two traits relevant for post-partum anoestrous interval were measured after the first breeding season and reflect the outcomes of the first mating season.These heifers were joined in multiple sire mating systems, as two-year olds as per normal Australian beef industry practices.All phenotypes were measured using ultrasound imaging by experienced technicians and veterinarians.Ultrasound was used every other week to observe the first corpus luteum and annotate the age of the heifer on that occurrence (AGECL).After the first parturition, cows were scanned again to observe the first corpus luteum post-partum.The interval in days from parturition to first ovulation postpartum (calculated from the corpus luteum observation) was used to record PPAI and PW.The binary phenotype PW was a record of 1, if the cow ovulated prior to weaning, and 0, if the cow did not (it ovulated only after weaning) as previously described (Zhang et al., 2014).The ability to ovulate before weaning the calf is considered important because it shortens the interval between parturitions and is a signal that the cow has overcome lactational anoestrus.Lactational anoestrus is a major cause of prolonged postpartum anoestrus in Bos indicus herds.
A total of 1,221 Brahman cows had DNA available for genotyping, and all were used in this study.However, only 914 Brahman cows were measured for AGECL.Of these cows, 576 conceived in the first mating opportunity and, therefore, had the post-partum phenotypes PPAI and PW measured.A total of 895 Tropical Composite cows were genotyped for this project.Among the genotyped cows, 798 had AGECL measured, while 665 had PPAI and 668 had PW measurements.The calculation of the "zebu content", or Bos indicus genetics, for both populations has been previously described (Porto-Neto et al., 2014).These are two independent populations that represent different breeds, with Brahman having much higher contents of Bos indicus genetics.The test of association was carried out one SNP at a time and separately per breed first, resulting in P-values and estimated SNP effects for Brahman and Tropical Composite cows.In a second analysis, data from both breeds were combined to estimate the SNP association across breeds.Association analyses were carried using the SNP & Variation Suite software from Golden Helix TM (v7.6.8Win64; Golden Helix, Bozeman, MT, USA http://www.goldenhelix.com).We used mixed models to fit R. Bras. Zootec., 47:e20170226, 2018 fixed effects of contemporary group and random effects of animal, SNP, and residual effects according to the following equation: in which y is the vector of observations (AGECL, PPAI, or PW); β is the vector of fixed effects related to the observations by the incidence matrix X; u a is the vector of random additive genetic effects related to the observations by the incidence matrix Z a ; s represents the SNP effect; and e is the vector of random residual effects.Contemporary groups (or cohorts) were formed by animals born in the same year and managed as a group in one of four research stations.Cohort had a significant effect for all studied traits as previously discussed (Barwick et al., 2009;Johnston et al., 2009).
In this mixed model, we used a genomic relationship matrix (GRM) built from high-density SNP chip genotype data.All animals had high-density SNP genotypes available, either directly genotyped or imputed from lower density genotypes.Cows were genotyped using the Illumina Bovine SNP50 BeadChip version 1.All SNP chips were assayed following the manufacturer's protocols, and repeated samples were included in the genotyping exercise for quality assurance.The BEAD STUDIO software (Illumina, Inc.) was used to determine genotype calls.Imputation was performed using a reference set of 917 animals genotyped with the high-density BovineHD chip.The imputation was performed using BEAGLE (Browning and Browning, 2009).Methods, number of animals used, and accuracy of imputation were described by Bolormaa et al. (2013).
The GRM was generated using the SNP & Variation Suite software from Golden Helix TM , which implements the methods described by VanRaden (2008).Because we used a GRM for all our models and not pedigree matrices, we refer to the estimated heritabilities as "pseudo-heritability".We used 3 GRM in this study, one built for Brahman only, another for Tropical Composites, and a third combining the Brahman and Tropical Composite genotypes in one GRM for across-breed analyses.When combining the two breeds, the method described by VanRaden (2008) estimated allele frequencies for one base population.Fixed effects used in the models considered effects of contemporary group within breed, and when breeds were combined in one analysis, the fixed effect of "breed" was also fitted in the equation.For all analyses, an interaction among cohort, farm of origin, and birth month was fitted as a fixed effect in the model.These fixed effects were deemed significant in previous analyses of this data (Hawken et al., 2012).
Two genomic prediction methods were evaluated: BayesR (a nonlinear method) and GBLUP (genomic best linear unbiased prediction).These methods have been compared in the analyses of dairy cattle data (Erbe et al., 2012;Kemper et al., 2015).In short, BayesR assumes that SNP effects are drawn from a mixture of normal distributions, with increasing variance, allowing moderate to large effects, while GBLUP assumes all SNP effects are derived from the same normal distribution, resulting in very small SNP effects for all SNP.The assumption of GBLUP is that all SNP have small effects, which are normally distributed in a classical infinitesimal model.The mixture of distributions allowed in BayesR accounts for higher variability of SNP effects, which corresponds to the hypothesis that some genes may have no association with the investigated phenotypes and other genes may have a moderate effect.The variance of SNP effects had four possible values: 0, 0.0001σ g 2 , 0.001σ g 2 , and 0.01σ g 2 , in which σ g 2 is the total additive genetic variance.A Gibbs sampling scheme was applied similar to that described by Erbe et al. (2012).The four possible values for SNP variance fit the rationale that the three reproductive traits under investigation are complex traits, with the majority of associated SNP having small effects.This genetic architecture was reported in an earlier GWAS (Hawken et al., 2012).
Accuracies of genome predictions were computed as a correlation between the estimated GEBV and the actual phenotypes divided by the square root of the heritability of the trait.Improvement of accuracies were considered significant if greater than one standard error in absolute terms.To calculate the accuracies of genome predictions, the dataset that included both breeds was randomly split into thirds to serve as "reference" and "validation" data.Then, three-fold cross validation was used to estimate the prediction accuracies.

Results
Four TaqMan ® assays failed to produce useful genotypes.Example results for an efficient assay (Figure 1 A) and a failed assay (Figure 1 B) are provided.Results from the remaining 29 SNP assays were used in subsequent analyses.
In Brahman cows, the call rate for one SNP was very low (18%) and was, therefore, excluded from further analyses.Call rates for included SNP were 94.76% or higher.The lowest minor allele frequency (MAF) had a value of 0.43%, and only eight Brahman animals were heterozygous for this SNP.This near fixation of the major allele prevents the use of this SNP in association and genomic prediction analyses.Both the SNP with the poor call rate and the SNP with the low MAF were excluded from further analyses in Brahman cows.
In Tropical Composite cows, allele frequencies were different from Brahman cows.For example, the MAF observed for a SNP mapped to CUX1 in Brahman cows was 11.63%, while in Tropical Composites, the same minor allele was even less frequent (MAF = 1.54%).For ten SNP, the minor allele in one breed was actually the major allele in the other breed; these were rs464658757 mapped to PPARA, rs136542281 and rs136630122 mapped to IRF1, rs381517561 mapped to LHX3, rs208587911 mapped to FOXA2, rs132677230 in PLXNA2, rs717464147 and rs209560708 located in TRAF4, rs439699867 and rs377982205 in BPTF, and rs457608533 in CDC5L.In Tropical Composites, the lowest MAF was 1.54 %, and the lowest call rate was 83%; and so, we were able to use every SNP in the analyses for this breed.
In Brahman cattle, AGECL had an estimated pseudoheritability of 0.57 with a variance of 3.13×10 −5 and a standard error of 5.59×10 −3 .The proportion of the genetic variance explained by fixed covariates was 0.35.Three SNP were associated with AGECL, two mapped to the gene IRF1 and one mapped to CPEB1 (P<0.05)(Table 1).
The analysis of 576 Brahman cows measured for PPAI resulted in an estimated pseudo-heritability of 0.43 with a variance of 2.40×10 −5 and a standard error of 4.90×10 −3 .The proportion of the genetic variance explained by fixed covariates was 0.20.For PPAI in Brahman, associated SNP (P<0.05) were mapped to the genes IRF1, BPTF, and IRF2 (Table 1).
Not surprisingly, the results for PW are similar to those obtained for PPAI, as these are correlated traits.The pseudo-heritability for PW was 0.56, and the variance was 1.63, while the standard error was 1.27.The proportion of variance explained by fixed covariates was 0.23.Two SNP were significant for PW, both mapped to IRF1 (P<0.05) and explaining between 0.82 and 1.20% of the genetic variance (Table 1).In Brahman, mutations in IRF1 were associated with all studied traits.
In Tropical Composite cattle, AGECL had an estimated pseudo-heritability of 0.37, with a variance of 1.43×10 −5 and a standard error of 3.78×10 −3 .The proportion of the variance explained by fixed covariates was 0.09.The number of cows with phenotype records and genotypes was 798.In Tropical Composite cattle, 665 cows had genotypes and records for PPAI.The estimated pseudo-heritability for PPAI was 0.35 with a variance of 1.20×10 −5 and a standard error of 3.48×10 −3 .The proportion of variance explained by  fixed covariates was 0.09.For PW, the estimated pseudoheritability was 0.30, with a variance of 0.75 and a standard error of 0.87.The proportion of the variance explained by fixed covariates was 0.08.In total, 668 Tropical Composite cows had genotypes and phenotypes for the PW analyses.Association results were significant for IRF1, with two SNP associated to PW (Table 2).Single nucleotide polymorphisms in BPTF and IRF2 were associated with AGECL; BPTF was also significant for PPAI (Table 2).
Association results for each SNP in the analyses that considered the two breeds together are presented for AGECL, PPAI, and PW (Table 3).We present SNP associations with P<0.10.When both breeds were analysed together, AGECL had an estimated pseudo-heritability of 0.59, with a variance of 3.58×10 −5 and a standard error of 5.98×10 −3 .The proportion of the variance explained by fixed covariates was 0.38.
In a previous study, these same cows were genotyped for a SNP mapped to the PLAG1 gene: rs109231213 (Fortes et al., 2013).This PLAG1 SNP was significantly associated with AGECL in the mentioned study.We included the PLAG1 SNP genotype in the genomic prediction analyses.The genomic prediction method BayesR gives better results, especially when the new tested mutations that were mapped to the 17 candidate genes targeted in this study and the PLAG1 SNP are included in the model.The GBLUP models resulted in accuracies that were lower by one standard deviation (data not shown).The proportion of the variance explained by the 29 SNP together was 0.07 for AGECL, 0.03 for PPAI, and 0.02 for PW.The best predictions are from the combined new panel, with the BayesR analysis, with a 4% improvement in accuracy (Table 4).

Discussion
Four TaqMan ® assays failed to produce meaningful results.TaqMan ® is a robust methodology, but it may fail if there are other mutations nearby and/or if the population only has one of the tested alleles.
In a previous study, these same cows were genotyped for a SNP mapped to the PLAG1 gene: rs109231213 (Fortes et al., 2013).The PLAG1 SNP was mapped to a QTL on chromosome 14 that was associated with many traits of economic interest for beef and dairy cattle breeding.The SNP in this QTL was associated with height, weight, serum concentration of IGF1, and age at puberty (Fortes et al., 2013;Juma et al., 2016;Karim et al., 2011;Littlejohn et al., 2012;Nishimura et al., 2012;Pryce et al., 2012;   Saatchi et al., 2014;Utsunomiya et al., 2013).The PLAG1 mutation rs109231213 was included in genomic predictions described in this study.Association analyses were carried in both breeds.In Brahman cattle, AGECL-estimated pseudo-heritability was in agreement with previous estimates of AGECL heritability in this Brahman population (Johnston et al., 2009).The effect predicted for associated SNP was between 11 and 19 days, which means that heifer puberty measured as AGECL could occur earlier in animals carrying the favourable variant.
The pseudo-heritability for PPAI was slightly lower than previous heritability estimates for the same population, which were between 0.51 and 0.52 (Johnston et al., 2010;Zhang et al., 2014).The difference in these heritability estimates could be due to the use of a pedigree relationship matrix in previous studies, while we used a GRM.More animals with phenotypes and pedigree were available, but not all those animals were genotyped.Because of genotype availability, about 100 cows from the previous studies were not included in our genomic analyses.
Single nucleotide polymorphisms with associations to PPAI were mapped to genes SOX5, IRF1, BPTF, and IRF2.The absolute effects of associated SNP were estimated between 13 and 39 days.This means that PPAI could be substantially reduced by selecting for the favourable alleles.The results for PW are similar to those obtained for PPAI, which is expected since these are correlated traits; one measured in days and another is binary simplification of the phenotype to indicate the ability of cows to ovulate prior to weaning.A higher heritability of PW (referred to as PPO), as compared to PPAI, was also observed in previous analyses of this population in which PW heritability was estimated from pedigree-based relationships as 0.62 (Zhang et al., 2014).
In Tropical Composite cattle, AGECL had an estimated pseudo-heritability of 0.37.This result is lower than that published before for the same population (Johnston et al., 2009).Again, herein we used a GRM and fewer animals than the study by Johnston and colleagues, which could explain the difference in results.
From single SNP analyses in Tropical Composites, genes BPTF and IRF1 presented the most significant associations with these early-in-life indicators of female fertility.Gene IRF1 emerges as a candidate for harbouring mutations associated to reproductive traits measured early in life, in both Brahman and Tropical Composite cows.
Gene IRF1 (Interferon regulatory factor 1) is a transcription factor, which, in mammals, activates the expression of the cytokine interferon beta (Miyamoto et al., 1988).IRF1 was subsequently shown to be a transcriptional activator and/or repressor of many target genes (a key regulator).IRF1 regulates expression of target genes by binding to the interferon-stimulated response element.Interferon activity is known to regulate reproductive biology, more specifically early pregnancy stages (Mathew et al., 2016, Wiltbank et al., 2016).IRF1 has also been shown to play a role in regulating post-translational modifications to tumour suppressor protein p53 (Dornan et al., 2004).IRF1 is involved in the regulation of immune response, apoptosis, DNA damage, and tumour suppression; its link to reproductive ability has also been shown (Lim et al., 2016).The genomic prediction method BayesR is a hierarchical method, which allows important DNA variants to be prioritised.The proportion of the variance explained by the 29 SNP together was 0.07 for AGECL, 0.03 for PPAI, and 0.02 for PW.These variances are higher than expected, given the low number of SNP in the tested panel.We tested the accuracy of genomic predictions using the selected SNP alone, the 50K chip SNP alone, and then all SNP together.The best predictions were from the combined new panel, with the BayesR analysis, with a 4% improvement in accuracy.This improvement in accuracy is comparable to the results obtained when BayesR was used in dairy breeds.In a combined dataset of Holstein and Jersey, BayesR also outperformed GBLUP models, and the increase in accuracies ranged from 5 to 15% for different milk traits (Erbe et al., 2012).BayesR improvement over GBLUP models is not a surprising result, as summarised and explained by Su et al. (2014).The significant improvement observed for AGECL might reflect the fact that a major QTL, namely the PLAG1 region, is present for this trait.Bayesian models, such as BayesR, are reported to benefit more traits influenced by QTL of large effect (Legarra et al., 2011).
Simulation studies with full sequence data suggest that selecting variants that are close to (or are known) causative mutations can increase genomic prediction accuracies (Perez-Enciso et al., 2015;van den Berg et al., 2016).Our results agree with these simulation studies.In the foreseeable future, in which genomes from over a thousand bulls will be available (Hayes et al., 2014), understanding biology and prioritizing functional mutations will continue to be a relevant research effort.
In Australia, the current tool for female fertility selection is the use of GBLUP models to predict "days to calving".The accuracy of predicting this female fertility trait might also be improved if the tested SNP were included in a genomic prediction framework, particularly if a BayesR analysis was used.Future work in this area could be promising, especially if additional putative functional SNP are discovered and incorporated in future versions of cattle SNP chips.

Conclusions
Mutations in five genes were found to be significantly associated with reproduction traits.The gene IRF1 had significant SNP associations in both breeds and, therefore, it emerges as a candidate for female reproductive traits measured early in life.
The genomic prediction results support the possibility that the described method to prioritise gene candidates to be incorporated in future SNP chips is worth exploring.The proposed SNP panel with 29 mutations results in reasonable accuracies of genomic prediction for AGECL on its own.However, it works even better when combined with the 50k chip data.The accuracy is improved for all traits when the 29 SNP are used together with 50K chip single nucleotide polymorphisms.The predictions were carried for Brahman and Tropical Composites.This approach is likely to serve all breeds with Bos indicus content.
The fluorescence intensity for the VIC and FAM probes is plotted on the x and y axes.Note how the successful SNP plot has three clear clusters, one for each genotype group in A, while there are no clearly separated clusters, and alleles cannot be discriminated in B.

Figure 1 -
Figure 1 -Example of allelic discrimination plots for successful SNP (A) and failed SNP (B).

Table 3 -
Brahman and TropicalComposites results for SNP associations