Alternative parameterizations of relatedness in whole genome association analysis of pre-weaning traits of Nelore-Angus calves.

Gestation length, birth weight, and weaning weight of F2 Nelore-Angus calves (n = 737) with designed extensive full-sibling and half-sibling relatedness were evaluated for association with 34,957 SNP markers. In analyses of birth weight, random relatedness was modeled three ways: 1) none, 2) random animal, pedigree-based relationship matrix, or 3) random animal, genomic relationship matrix. Detected birth weight-SNP associations were 1,200, 735, and 31 for those parameterizations respectively; each additional model refinement removed associations that apparently were a result of the built-in stratification by relatedness. Subsequent analyses of gestation length and weaning weight modeled genomic relatedness; there were 40 and 26 trait-marker associations detected for those traits, respectively. Birth weight associations were on BTA14 except for a single marker on BTA5. Gestation length associations included 37 SNP on BTA21, 2 on BTA27 and one on BTA3. Weaning weight associations were on BTA14 except for a single marker on BTA10. Twenty-one SNP markers on BTA14 were detected in both birth and weaning weight analyses.


Introduction
Relatedness within animal data may be modeled in multiple ways ranging from incorporation of some aspect of family as a classification variable to more complex modeling of pedigree accomplished through inclusion of the numerator relationship matrix in mixed model equations (Henderson, 1963). The availability of dense single nucleotide polymorphism arrays for research permits the evaluation of genomic relatedness among animals with records based on their marker genotypes. Modeling of genomic relatedness, rather than average relatedness based on pedigree, was considered an important opportunity for genetic prediction (Liu et al., 2002;Visscher et al., 2006;Visscher, 2009), due to the improvement of regression equations through refinement of the covariances in the coefficient matrix that are due to relatedness among animals. Inclusion of genomic relatedness in prediction equations improved estimates of variances, and hence heritability (Thomas, 2005;Visscher et al., 2006;Wolc et al., 2013) and improved accuracies of predicted genetic merit (Hayes and Goddard, 2008). These appear to hinge upon the improved marker-trait associations detected with models using refined relationship models. Genomic relationship matrices incorporated in mixed model association analyses may be a better substitute for pedigree-based relationship in marker-QTL association studies (Yu et al., 2006;Zhang et al., 2006Zhang et al., , 2007Sham et al., 2009). There has not been comparison of association results using genomic relatedness and pedigree-based relatedness. The objectives of this study were to 1) describe the whole genome SNP-birth weight associations and the consequences of modeling relationship matrices derived either from pedigree or genomic information or modeling family structure as a fixed effect (per design of the experimental population), and 2) to describe weaning weight-and gestation length-SNP associations using records and genotypes of F 2 Nelore-Angus calves.
had as many as four project calves. Four of those five F 1 bulls were mated by natural service to 1/2 Bos indicus 1/2 Bos taurus cows to produce 4 half-sibling families. Most of the dams of calves in those half-sibling families were F 1 or F 2 Brahman-Hereford; some were Brahman-Angus and Nelore-Angus. All of those calves (n = 264) were springborn (February to late April) from 2003 through 2007. Calves were identified and weighed as soon as practical after birth, usually within one day. Calves were weaned at an average of 213.8 ± 17.9 d of age. All procedures involving animals were approved by the Texas A&M University Institutional Animal Care and Use Committee with multiple animal use protocols from 2002 through the present.

Genotypes
Isolation of DNA was conducted from blood samples as previously described (Riley et al., 2013). Genotypes for all project animals were obtained from the Infinium BovineSNP50v1 assay (Illumina, Inc., San Diego, CA). There were on average 29,692 informative markers per F 2 family (1 informative marker per 101 kb). Loci were excluded from association analyses if 1) genotypes were missing in 10% or more individuals because phasing programs and other software generally perform poorly with higher levels of missing genotypes, 2) minor allele frequency was less than or equal to 0.05 because very low minor allele frequencies could be SNP that were segregating in founders but not transmitted to the F 1 and F 2 individuals in the context of this experimental design (family structure), or 3) deviation from Hardy-Weinberg equilibrium (p £ 0.05, exact tests according to Wigginton et al., 2005). The assay used is known to have approximately a 0.5% error rate (Hong et al., 2012); 684 Mendelian errors were detected in 181 of the 776 individuals (from 1 to 69 SNP in individuals).
Association analyses included 34,957 markers.

Statistical analyses
Models were built using the MIXED procedures of SAS (SAS Inst., Inc., Cary, NC). Investigated fixed effects included the main and interaction effects of year, season of birth, family, dam age (distinct categories for 2-, 3-, 4-, 5to 10-yr-olds, and older than 10 yr), and calf gender. Regression of weaning weight on calf age in days was investigated. Association analyses were conducted using the Q-K procedures (Yu et al., 2006) of JMP Genomics (SAS Inst., Inc., Cary, NC) with fixed effects identified from the preliminary analyses and regression of trait on genotypic values, which were coded as 0 for the homozygote of the allele with the highest frequency and 2 assigned to the other homozygote. Heterozygotes were assigned values of 1. The false discovery rate was constrained to 0.05 using methodology of Benjamini and Hochberg (1995).
The designed relatedness in this project was evaluated by conducting association analyses with two random structures. The genetic covariance among animals in this project was modeled by constructing 1) the relationship matrix based on pedigree, and 2) the genomic relationship matrix based upon genotypes from the Bovine SNP50 using procedures of JMP Genomics. The pedigree-based relationship matrix was constructed with coancestry coefficients for pairs of animals according to methodology described by Falconer and Mackay (1996). There were 2,419 individuals in the pedigree. Elements of the genomic relationship were calculated as an approximation of identity-by-descent by adjusting the probability of identityby-descent between two individuals with the average probability of identity-by-state between random individuals (Yu et al., 2006): where n indexes genotypes, i and j index individuals with genotypes; IBD ijk is identity-by-descent estimate of the probability that individuals i and j share an allele at marker genotype k that came from the same ancestor at a locus. This probability is estimated without considering pedigree information; X ik = 0, 1, 2, corresponding to genotypes at marker k in animal i; Within a genotype, p (q) is the frequency within each genotype of the nucleotide (A, C, G, or T) lower (higher) in the alphabet. A third analysis was conducted using a completely fixed effects model with no relatedness modeled. For each of those 3 scenarios, additional analyses differed by the inclusion of family as a fixed effect (18 levels: 14 full-sibling and four half-sibling families). These six analyses were conducted on birth weight (n = 737; mean 35.0 kg; SD 5.73 kg). Subsequently, those results were used to select the model for analysis of gestation length (n = 473; mean 281 days; SD 5.39 days) and weaning weight (n = 699; mean 229.7 kg, SD 30.5 kg). All outliers (those outside the mean ± 4 SD) were excluded from analyses. Quantilequantile plots were constructed of ranked test statistics (unadjusted probability values of F ratios for regression of phenotype on coded genotypes) at each SNP (The Wellcome Trust Case Control Consortium, 2007), and the degree of inflation of the median probability value (l) was calculated (Devlin and Roeder, 1999) as assessment of the appropriate removal of the effects of population substructure (Voorman et al., 2011). Values of this parameter l that are closer to 1 are favorable, and those much larger than 1 may indicate the presence of stratification that could result in spurious association of phenotype with genotype.
Locations of the genomic feature closest to each SNP locus associated in final models for each trait were determined using R statistical software and the package Map2NCBI (Hanna and Riley, 2014). Map coordinates re-ported are from the bovine UMD_3.1 build (Zimin et al., 2009).

Results and Discussion
Preliminary analyses indicated that a combination of year and season of birth was an efficient parameterization of these effects (p < 0.001). Cow age was retained in final models for all traits (p < 0.01), and calf gender explained significant birth weight variation. Family was considered important for all traits in these preliminary analyses (p < 0.001).
The first notable effect of refinement of the relatedness modeled was reduction of the number of detected associations (Table 1). Relative to a model that included no relatedness, the number of detected associations was reduced by over one third in analysis that modeled random animal effects with the inclusion of a pedigree-based relationship matrix. The numbers of detections when employing a genomic relationship matrix were drastically reduced. This implies that the failure to model animal as random resulted in that variation inappropriately contained in the marker regression. Smaller values of $ l indicated that additional refinement by employing marker relatedness as opposed to the average relatedness (with exception of parentoffspring, as that relatedness is exact) based on pedigree best accounted for that variation in the analyses (Figures S1, S2, and S3). The variation in pedigree relationships within a population may be much larger than variation within the genomic relationships among pairs with the same pedigree relationship (Hill, 2013).
The inclusion of the designed family fixed effect had pronounced influence on the association results. In the fixed effects model (no inclusion of relationship matrix in equations corresponding to random effects), the effect of family appeared to substitute for random structure and capture variation to the extent that this model resulted in approximately the same numbers of detections (Table 1, Figure 1) as the mixed model with pedigree-based relatedness included. The inclusion of family as a fixed effect resulted in additional associations detected in pedigree-based analyses and genomic-based analyses (Figures 2 and 3) and a slightly larger value for $ l in the pedigree-based analysis. This is likely because family structure is confounded with information provided in the relationship matrix. Also, there must exist non-0 covariances between levels of this effect and the relatedness modeled in these mixed models. These were modeled as 0 in these analyses, and that may also have contributed to the more liberal detections of SNP-birth weight associations.
The majority of SNP loci detected as associated with birth weight in the analyses with a genomic relationship matrix included (28 of 31) were detected in each of the other analyses with different relationship parameterizations ( Figure 4) except for the fixed effect model that did not include family effects. The analysis that modeled no relatedness (nor family as a fixed effect) identified a large number of loci (n = 650) that were not detected otherwise. Zhang et al. (2006) reported high correspondence between relationship matrices constructed using pedigree and 520 Relatedness and SNP-calf trait associations  1 -Number of detected SNP loci associated with birth weight by chromosome from analyses without modeled relatedness with and without the fixed effect of family ("Un" represents SNP that were not assigned to any chromosome). molecular information, but reported that the distribution of p values (H 0 : no association) from a mixed model with relationship matrix based on markers was closer to the expected distribution than those p values produced from mixed models with pedigree-based relationship matrix included or from a fixed effect model. Incorporating markerbased relationships may increase the power of association tests (Yu et al., 2006). The most recent efforts reported that the use of both pedigree and genomic relationship may be the most appropriate for prediction of breeding values (Aguilar et al., 2010;Goddard et al., 2011;Haile-Mariam et al., 2013). The appropriate modeling of relatedness was incorporation of the genomic relationship matrix without including family as a fixed effect.

Birth weight
There were 14, 8, and 9 SNP loci detected as associated with birth weight at the p < 0.001, < 0.01, and < 0.05 levels respectively (Table S1); corresponding numbers in Bos taurus studies were 35, 91, and 339 (Snelling et al., 2010), and 105, 527, and 2,095 (Lu et al., 2013). The majority of detected loci (30 of 31) were from BTA14:16.3 to 28.4 Mb. The proportion of phenotypic variance accounted for by each SNP (singly, that is, not in a multiple regression model) ranged from 0.023 to 0.067. A single locus was detected at BTA5:49.5. Most recently, Lu et al. (2013) reported multiple SNP-birth weight associations on both of these chromosomes. Kneeland et al. (2004) reported QTL for birth weight on BTA6, 19, and 21. Seven associated markers were within the bp limits of the XK, Kell blood group subunit-related family, member 4 (XKR4), and another was upstream > 25 kb of this gene (Table S1). A significant association of a SNP within this gene with birth weight was also detected in Nelore cattle (Utsunomiya et al., 2013); this was one of five total associations detected in that study, as those researchers applied Bonferroni corrections to p-values for the 434,020 loci evaluated. Detected marker-subcutaneous rump fat associations were reported for markers in and around XKR4 in Brahman and Brahman-influenced cattle (Bolormaa et al., 2011;Porto Neto et al., 2012), and this was considered a candidate gene based on marker-trait associations for average daily gain, average daily feed intake, and residual feed intake in Bos taurus cattle (Lindholm-Perry et al., 2012). Chromosome 14 has been consistently identified in association studies as important for growth traits of cattle (e.g., Snelling et al., 2010;Bolormaa et al., 2011;Lu et al., 2013); summaries of detections and genes in that region across time were detailed by Wibowo et al. (2008) and Hawken et al. (2012). Although pleiomorphic adenoma gene 1 (PLAG1) on BTA14 was considered a candidate gene for influencing hip height in cattle (Karim et al., 2011), it was not the closest gene to any SNP with detected association in the present study. The closest gene to the marker with strongest association was proenkephalin (PENK) on BTA14. This gene was identified as one of 13 found within 500 kb of the SNP most significantly associated with birth weight in Nelore cattle (Utsunomiya et al., 2013). Increasing expression of PENK in the bovine endometrium concurrent with the progression of pregnancy may indicate that this gene promotes or contributes to conceptus elongation (Forde et al., 2013). Peters et al. (2012) reported six SNP associated with birth weight in Brangus cattle in the similar region of BTA5. Although it was not the closest feature to the detected locus on BTA5 in the present study, signal transducer and activator of transcription 6 (STAT6) is in the same region; it has been characterized with respect to SNP associations with various cattle traits (Rincón et al., 2009). Hawken et al. (2012) detected associations of SNP with body weight in a broad region of BTA5 and provided a good summary of earlier work with detections on that chromosome. Riley et al. 521 Figure 2 -Number of detected SNP loci associated with birth weight by chromosome from analyses with modeled pedigree relatedness with and without the fixed effect of family ("Un" represents SNP that were not assigned to any chromosome).
Figure 3 -Number of detected SNP loci associated with birth weight by chromosome from analyses with the genomic relationship matrix included with and without the fixed effect of family ("Un" represents SNP that were not assigned to any chromosome).

Gestation length
Almost all (37 of 40) of the markers associated with gestation length were on BTA21 ( Figure 5; Table S2), as well as one marker on BTA3 and two on BTA27. The estimate of the degree of inflation of the median probability value ( $ l) was 0.98 and the quantile-quantile plot of c 2 statistics and p-values is in the supplementary information ( Figure S4). These did not overlap with most markers previously detected as associated with gestation length (Maltecca et al., 2009; in dairy cattle, although Maltecca et al. (2011) identified 6 SNP on BTA3. Some SNP in imprinted domain regions on BTA13 were associated with calving difficulty and gestation length of dairy cows . A similar region was identified on BTA21 (beginning at BTA21:68.1 Mb; Magee et al., 2010) including delta-like 1 homolog DLK1 and deiodinase, iodothyronine, type III (DIO3); this region was associated with various traits in dairy cattle, including gestation length . The closest two associated markers in the present study were approximately 30 Mb upstream of this region. Eighteen associated markers were within the bp boundaries of eight genes and four pseudogenes (Table S2).

Weaning weight
All but one of the 26 detected SNP-weaning weight associations ( Figure 6; Table S3) were on BTA14. The quantile-quantile plot of c 2 statistics and p-values is in the supplementary information ( Figure S4); $ l was 0.99. Most of those (n = 21) were also detected as associated with birth weight; this result supported the conclusions of Snelling et al. (2010), that is, genomic regions responsible for variation in both traits were largely the same and that using genomic information to select for growth in one phase while limiting or minimizing growth in another phase would be difficult. A single locus at BTA10:26.6 Mb was detected as associated with weaning weight. Lu et al. (2013) detected associations with weaning weight on these chromosomes (plus others) in similar regions. Snelling et al. (2010) detected associations of the direct component of weaning weight with SNP on BTA10 and BTA14, but primarily on BTA6; they detected maternal components of weaning weight associated with SNP on the chromosomes detected in the present study. It should be noted that these mentioned represent a small subset of the overall detections from Snelling et al. (2010) that spanned essentially all chro-522 Relatedness and SNP-calf trait associations mosomes. Peters et al. (2012) reported a window of SNP associated with 205-d weight in Brangus heifers on BTA10. Fifteen of the SNP with detected association with weaning weight were located within physical boundaries of eight genes and one pseudogene (Table S3).

Conclusions and Implications
In mixed model association analyses, marker-based relationship appeared to best capture trait variance corresponding to relatedness. In the absence of either a relationship matrix based on marker genotypes or pedigree, the fixed effect of family appeared to assume that role, but to a lesser extent. Detected associations increased when more primitive parameterizations of relatedness (or no relatedness) were employed, apparently because of stratification. Inclusion of family as a fixed effect in addition to either relationship matrix also resulted in increased associations detected, either due to duplicative modeling of relatedness or due to inappropriate assumption of non-zero covariance between fixed effect levels and random effect levels. Detected associations with birth weight and weaning weight were primarily on BTA14 and consistent with other reports. The strongest association for birth weight was for a marker within PENK, which has been identified as expressed in Riley et al. 523  placenta. Detected associations with gestation length were primarily on BTA21.

Supplementary Material
The following online material is available for this article: Table S1 -Single nucleotide polymorphism markers associated with birth weight. Table S2 -Single nucleotide polymorphism markers associated with gestation length. Table S3 -Single nucleotide polymorphism markers associated with weaning weight. Figure S1 -Quantile-quantile plots and 95% confidence bands of observed c 2 values against expected probability values of birth weight-marker association F statistics from models with no relationship matrix included, and the fixed effect family included (upper) or not included (lower). Figure S2-Quantile-quantile plots for models based on average pedigree relatedness. Figure S3-Quantile-quantile plots for models with genomic relationship matrix. Figure S4-Quantile-quantile plots of gestation lengthmarker association and weaning weight-marker association F statistics. This material is available as part of the online article from http://www.scielo.br/gmb. Data Access: Data will be shared with collaborators upon request.

Associate Editor: Dario Grattapaglia
License information: This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.