Runs of homozygosity for autozygosity estimation and genomic analysis in production animals

Runs of homozygosity (ROHs) are long stretches of homozygous genomic segments, identifiable by molecular markers, which can provide genomic information for accurate estimates to characterize populations, determine evolutionary history and demographic information, estimate levels of consanguinity, and identify selection signatures in production animals. This review paper aims to perform a survey of the works on the efficiency of ROHs for these purposes. Factors such as genetic drift, natural or artificial selection, founder effect, and effective population size directly influence the size and distribution of ROHs along the genome. Individually, genome estimates of consanguinity based on ROHs can be obtained using the FROH index, which is generally considered more accurate than indexes based on other types of genomic or genealogical information. High frequencies of specific ROHs in a population can be used to identify selection signatures. The results of recent studies with ROHs in domestic animals have shown the efficiency of their use to characterize herds in a reliable and accessible way, using genomic information.


Introduction
The domestication process of production animals, followed, in the last decades, by intense directional selection using quantitative methods, has resulted in significant genetic increments in adaptation, type, and production (Randhawa et al., 2016).Consequently, drastic reductions were observed in the effective population size (Falconer & Mackay, 1996), as well as production losses due to high consanguinity rates (Reverter et al., 2017) in breeds under intense selection (Scraggs et al., 2014;Zavarez et al., 2015).
The balance between high rates of genetic gain and of diversity loss require breeding programs to take into account and control endogamous breeding rates in the studied populations (Peripolli et al., 2017).Traditionally, these processes use pedigree information, but random errors in the steps necessary to collect, takes notes of, register, and store information on genetic relationships may have serious negative consequences and cause unintended increases in autozygosity levels (Curik, et al., 2014;Hudson et al., 2015;Zavarez et al., 2015).Therefore, the use of genomic data to complement or correct existing genealogical data may positively impact genetic gains obtained in the long run (Hudson et al., 2015;Marras et al., 2015;Zhang et al., 2015b).
The first studies with molecular markers to aid in animal breeding date from the end of the 1980s, and the obtained results, based on a few markers and high-cost techniques, were inconsistent regarding the efficiency and viability of using genomic data for animal production (Caetano, 2009).However, the association of technologies to fabricate DNA microarrays with molecular methods developed to genotype single nucleotide polymorphisms (SNPs) lead to the development of panels with thousands of markers, allowing an extensive evaluation of the genome in a fraction of the time and cost required by previous methodologies (Caetano, 2009;Silva et al., 2015).
Defined as long and uninterrupted stretches of the genome with genotypes in homozygosis (Lencz et al., 2007), the runs of homozygosity (ROHs) may be identified through the analysis of high-density SNP panels, preferentially with more than 50.000SNPs.The directed analysis of data generated with these panels has allowed studies to identify and characterize ROHs in different species (Curik et al., 2014).In addition, secondary analyses of ROHs may be used to: identify and monitor the consanguinity rate in production animals; map and identify recessive deleterious alleles; characterize the population demography, structure, and history; and estimate individual and population genomic relationships and autozygosity (Saura et al., 2015;Peripolli et al., 2017).
This review paper presents information on ROHs, the methods used for their identification, and their different applications in studies that may positively impact animal production, mainly regarding genomic estimates for the characterization of a population in order to monitor autozygosity and its implications.

Runs of homozygosity
The term ROHs was used for the first time to denominate stretches with 100 or more consecutive SNPs, with absence of heterozygotes or missed calls, in one chromosome (Lencz et al., 2007).ROHs were initially identified in human chromosomes (Broman & Weber, 1999), because, at the time, this was the only species with a wide genome coverage, a requirement for the correct identification of ROHs (Purfield et al., 2012).Studies on the size and distribution of ROHs in other animal species were only carried out approximately eight years later (Gibson et al., 2006), when the first high-density SNP panels for production animals were developed.
The formation process of ROHs is shown in Figure 1, where individual A represents the common ancestor of parents D and E of individual F. The ROH fragment of individual F, as well as the homologous chromosome of the common ancestor A, which originated this ROH, is represented by the green color.The other colors indicate chromosome fragments that did not originate from the homologous chromosome of individual A (Gomez-Raya et al., 2015).
The correct identification of ROHs depends on the control of several factors, such as genotyping quality, minimum size of ROHs, and number of allowed heterozygotes, which may compromise estimates due to eventual genotyping errors (Ferenčakovićet al., 2013b).The type of chip used to obtain data also influences the identification of ROHs, since a wide genome coverage allows the identification of a greater number of runs.It should be noted that chips with densities greater than 50,000 SNPs are necessary to detect with precision ROHs shorter than 5.0 Mb (Purfield et al., 2012;Zhang et al., 2015a).
High-density SNP panels for production animals were only developed starting in 2008 (Caetano, 2009).Currently, there are high-density genotyping chips for several production species, which have allowed for studies to identify ROHs for these species (Curik et al., 2014).The obtained results have shown the potential of this approach to identify genomic regions of interest.The characterization of ROHs in different populations, breeds or lines of a species is important to obtain information on evolutionary history (Metzger et al., 2015;Sorbolini et al., 2015;Zavarez et al., 2015), demographics (Bosse et al., 2012), or related to consanguinity in a population (Marras et al., 2015).

Software used to identify ROHs
Currently, the software most used for the identification of ROHs are Plink v1.07 (Purcell et al., 2007) and SNP & Variation Suite (SVS) of Golden Helix (Curik et al., 2014).However, others cited in the literature, such as Beagle and Germline, were also used for this purpose (Howrigan et al., 2011).
The Plink uses a sliding window to identify both ROHs and consecutive stretches that contain a minimum number of homozygous SNPs, at a minimum pre-specified distance.With this method, the software carries out basic detections of homozygous stretches identified by the sliding window, and the user only needs to define the parameter "minimum size" of segments to be identified (Curik et al., 2014).Plink is available for free download on the Linux, MS-DOS, Apple Mac, and C/C++ source platforms (Plink…, 2017).Being free is the great advantage of this software compared with SVS of Golden Helix, whose higher cost, over U$ 1,000 per year (Golden Helix, 2017), is justified by its great capacity for data management, friendly design, quality of the produced material, and guaranteed support .
The SVS software does not use sliding windows, but considers all SNP in homozygosis as a possible starting point of a new ROH.Each SNP is classified as "in homozygosis", "in heterozygosis" or "missed calls", and provides a cluster of homozygous stretches with a number of SNPs in homozygosis greater than the one specified for each chromosome and individual.Then, a second algorithm groups all calculated stretches in clusters and provides a list with the minimum number of individuals that have these stretches in common.This more modern and complex method allows the user to define groups of parameters, such as minimum size of ROHs in base pairs and SNP numbers, minimum density, maximum gap, and maximum number of allowed heterozygotes and of missed calls (Curik et al., 2014).

Factors affecting the formation of ROH patterns
The processes of natural and artificial selection may alter genotypic patterns and produce contrasting patterns in populations subjected to distinct selective pressures (Sorbolini et al., 2015).The selection of a small number of superior animals tends to reduce the observed phenotypic variability, besides leading to genome remodeling in production animals, generating ROH patterns (Kim et al., 2013), due to increased homozygosity in genomic regions close to the locus that controls traits of interest (Zhang et al., 2015b).Genomic regions that contain loci subjected to artificial selection, in general, present, a greater concentration of ROHs (Metzger et al., 2015).
Demographic factors that lead to genetic drift and to natural or artificial selection pressures may also cause genomic modifications in a species (Ramey et al., 2013).The increase in selection intensity in breeding programs observed in the last decades, together with the use of a small number of animals as breeding stock, has contributed to reduce the effective size of populations of production animals.This contributes to an increase in consanguinity and genetic drift, as well as to a reduction in genetic variability (Peripolli et al., 2017), creating a tendency of long ROHs in the populations of animals subjected to these conditions.
By definition, ROHs are long and uninterrupted stretches of genotypes in homozygosis, as previously mentioned.These stretches are generated by inbreeding events (Ferenčaković et al., 2013a) and, therefore, their size varies according to the number of generations during which inbreeding occurred.Consequently, the size of the ROHs in a herd tends to decrease with each generation.Long stretches in homozygosis, i.e., long ROHs, indicate high consanguinity between individuals (Curik et al., 2014) due to recent inbreeding in the population.
The differences between patterns of ROHs suggest that artificial selection modifies autozygosity in the genome (Metzger et al., 2015;Szmatoła et al., 2016;Peripolli et al., 2017).Selection and/or drift events result in the formation of long ROHs (Pemberton et al., 2012;Howard et al., 2015), which, subsequently, suffer recombination and mutation effects, causing inherited ROHs to decrease in size in each successive generation (Curik et al., 2014).

Characterization of populations through ROHs
The length and frequency of ROHs are used to provide information about the ascendency of an individual or about the structure and history of the population of origin (Howrigan et al., 2011;Purfield et al., 2012).The distribution of ROHs along the genome is not random or uniform, but strongly dependent on local recombination and mutation rates (Bosse et al., 2012), as well as on other evolutionary forces (Ramey et al., 2013).Therefore, the formation and distribution of ROHs throughout the genome is a result of the combination of genomic variables, such as recombination rate, and of signs of recent directional selection (Pemberton et al., 2012).
In production animals, ROHs longer than those found in human populations are expected due to the processes of artificial selection and to the reduced effective population size (Curik et al., 2014).Breeds and specialized lines of production animals are usually subjected to intense selection of allele clusters, which positively affects production, reproductive, and racial pattern characteristics.Therefore, it is common to observe high endogamy rates and, consequently, ROHs abundantly distributed along the genome and present in high frequencies in these populations (Zavarez et al., 2015;Zhang et al., 2015a).
The analysis of the size, position, and frequency of ROHs throughout the genome may provide information about genomic characteristics, recombination rates, and direction of selection, besides evidencing the relationship between distinct populations (Bosse et al., 2012;Metzger et al., 2015).Long ROHs usually indicate intensive selection pressures (Metzger et al., 2015) and recent consanguinity events (Al-Mamun et al., 2015), whereas short ROHs suggest genetic diversity loss due to bottleneck or founder effects (Al-Mamun et al., 2015) or to past consanguinity events in the population (Howrigan et al., 2011).Metzger et al. (2015), while studying ROH patterns in different horse populations, observed variations in the number and frequency of runs between purebred and and crossbred animals.According to these authors, there may be a relationship between specific ROH patterns and the history of natural or artificial selection pressures.Saura et al. (2015) reported losses in genetic variability due to selective breeding by identifying the presence or not of ROHs in the genome of improved swine.Szmatoła et al. (2016) also identified distinct ROH patterns in three bovine populations: native, subjected to conservation processes, and commercial.The obtained results are indicative that, once the ROH pattern of the species, breeds, or populations is known, it is possible to determine the evolutionary history or the demographic information that characterize them.

Estimation of consanguinity through the analysis of the frequency of ROHs (F ROH )
Autozygocity occurs when mated individuals, with common ancestors, transmit chromosomes identical by descent (IBD) to their progeny.The evaluation of these segments allows calculating the inbreeding coefficient (F) used to estimate the level of autozygosity in an individual with one or more common ancestors.From this coefficient, it is possible to determine the probability of the alleles in a random region of the genome being IBD (Wright, 1922).
Traditionally, pedigree data are used to compose the relationship matrix "A", constructed with the expected proportion of IBD loci (VanRaden, 1992), calculated by the expression: Aii = ∑ L 2 ijDij where Aii is the i th element of the diagonal of matrix A, which is equivalent to the inbreeding coefficient of the i th animal plus 1.
Genomic information may be used to estimate the autozygosity of a population (McQuillan et al., 2008) in a fast, reliable, and low-cost manner (Silva et al., 2015).In this case, the inbreeding matrix A, based on pedigree information, is substituted by the inbreeding genomic matrix "G", constructed from information from marker panels, usually high density ones.The inbreeding coefficients are obtained using the proportion of loci identical by state and carry more information than the traditional coefficient (Pértile et al., 2016).Four inbreeding coefficients may be estimated from genomic information: F UNI , of united gametes; F HOM , of homozygosity; F GRM , of genomic inbreeding matrices; and F ROH , of RHOs (Zhang et al., 2015a), described subsequently.
According to Wright (1922), F UNI may be estimated from the correlation between united gametes, by the formula (Yang et al., 2011): where p i is the observed fraction of the first allele on locus i, and x i is the number of copies of the reference allele.
F HOM is estimated based on excessive homozygosity according to Wright (1922), and is obtained by the following formula (Yang et al., 2011): where HOM O and HOM E are the number of observed and expected genotypes in homozygosis, respectively; p i is the observed fraction of the first allele on locus i; and x i is the number of copies of the reference allele.
F GRM is an estimate of the genetic relationship of an individual with itself, obtained from the main diagonal of the genomic relationship matrix (GRM), using genotyping data from the high-density SNP panels (VanRaden, 2008), according to the expression: ] where p i is the observed fraction of the first allele on locus i, and x i is the number of copies of the reference allele.
F ROH is calculated from the addition of estimated ROHs, separated according to minimum run sizes.This coefficient may be defined as the proportion of the autosomal genome in ROHs in relation to the autosomal genome covered by SNPs (McQuillan et al., 2008), as: F ROH = ∑ L ROH / L auto where ∑ L ROH represents the total ROHs above a minimum specified size (L ROH ), identified in an individual; and L auto is the total size of the autosomal genome covered by SNPs.
The autozygosity estimates obtained from the inbreeding coefficient (F), calculated using genomic information, are generally more precise than the one obtained from pedigree information (F PED ), since the latter does not take into account inbreeding in the founders of the herd, for which there is no pedigree information (in this case, F=0 is assumed).F PED also does not consider sampling effects due to selection (Curik et al., 2014;Zhang et al., 2015a).The first studies with domestic animals in which F ROH was used focused on comparing F ROH , an estimate based on the length of ROHs, with F PED , based on pedigree information (Curik et al., 2014;Sölkner et al., 2010).In this comparison, F ROH was shown to be more efficient than F PED and was recommended as an interesting alternative to correct pedigree errors (Hudson et al., 2015).
The first research reports using FROH estimates for domestic animals were made after the 2010s.Ferenčaković et al. (2011), considering the good results obtained in studies with ROHs in humans, used data from genotyped bovine and their pedigree to establish a correlation between the estimates of F ROH and F PED .The authors found a high correlation (0.68) between both estimates when using complete pedigree information (F PED T).Positive correlations between these estimates were also reported by Marras et al. (2015), who found values between 0.66 and 0.70 according to breed, and by Mastrangelo et al. (2016), who obtained even more expressive correlations between 0.83 and 0.95 using different dairy cattle breeds.
Kinship estimates based on ROHs (F ROH ) are usually superior to those identified by pedigree.Ferenčaković et al. (2011) found that the advantage of the index originating from ROHs, when calculated for segments >1 Mb (F ROH > 1 Mb), is identifying events of past consanguinity and that were not identified by the F PED estimates.Scraggs et al. (2014), while studying purebred cattle, observed a superior mean for the F ROH genomic estimate than for F PED , corroborating the assumption that F ROH provides additional information about recent consanguinity, compared with F PED (Gomez-Raya et al., 2015).
Considering different estimates obtained from genomic information (F UNI , F HOM , F GRM , and F ROH ), individual consanguinity based on ROHs (F ROH ) is the most precise (Marras et al., 2015;Zhang et al., 2015b;Gurgul et al., 2016).This is because it is a direct homozygosity measure, calculated from molecular information, and is less susceptible to selection effects and errors caused by sample variations in the gametegenerating phase (Marras et al., 2015), which is not observed in other genomic estimates, influenced by allele frequency.Therefore, the F ROH index is a more precise estimate than other consanguinity estimates obtained from pedigree information or even than other genomic estimates; therefore, it is a viable option for the correction of pedigree errors (Ferenčaković et al., 2011;Marras et al., 2015), and, according to Zhang et al. (2015a), is the most recommended to determine IBD.

ROHs and selection signatures
Selection signatures are a result of genotypic alterations in populations subjected to some form of selection pressure (Ramey et al., 2013;Sorbolini et al., 2015), and are characterized by an increase in allele frequency in one or more genes, or gene clusters, involved in the processes of adaption of a population to specific conditions -such as resistance to diseases, tolerance to cold/heat, or maternal ability -or of improvement for specific purposes -such as meat, milk, and wool production, among others.Therefore, methods that allow identifying these signatures may lead to the identification of the genes involved in the processes related to the productivity of production animals.
ROHs usually cover genomic regions large enough to contain genes or gene clusters, which may be under selection for generations.Therefore, the identification of ROHs may aid in visualizing and identifying haplotype patterns characteristic of breeds or species, fixed due to selection pressures.Starting in 2007, the first reports for the identification of common ROHs in humans affected by Alzheimer and schizophrenia were carried out, aiming to identify genes associated to the development of these diseases (Lencz et al., 2007;Nalls et al., 2009).In the following years, studies with production animals were performed, also identifying ROHs but for the identification of genes or gene clusters related to the productivity of the population and to the characterization of the animal breed or line (Metzger et al., 2015;Sorbolini et al., 2015;Zavarez et al., 2015).
More recently, research activities to identify selection signatures, based on the analysis of the frequency of ROHs, have been carried out for several species of production animals.Fuller et al. (2015) identified ROHs related to adaptive characteristics of commercial bee species, observing the effect of environmental temperature on honey production.In bovine, O 'Brien et al. (2014), Somavilla et al. (2014), andZavarez et al. (2015) identified ROHs associated to the adaptive potential and reproductive and productive characteristics of zebu breeds, whereas Kim et al. (2015a) identified more than 15 regions related to milk production in Jersey herds.

Concluding remarks
The identification and characterization of ROHs may aid in identifying genomic regions that affect traits of interest for the productivity of commercial herds of production animals.Due to the dependency on high-density SNP panels for the correct identification of ROHs, studies with production animals are recent, starting in 2008.However, with the development of DNA chips specific for production species, researches with ROHs have grown exponentially.ROHs may be used for the genomic characterization of herds and have been shown to be efficient for obtaining inbreeding estimates via the F ROH index or for the identification of patterns characteristic of genomically studied breeds or species, known as selection signatures.The potential of ROHs for aiding conventional production techniques is immense and, with the exponential increase of available genomic data due to new or developing genotyping technology, the tendency is that ROHs will also have other applications.
the genome that suffered diversity losses due to consanguinity and determine the relationship between additive effects and ROHs Several regions are associated with ROH < 4 Mb in the studied population and there is correlation between genetic additive effect and ROH < 4 Mb, which is an indicative of the influence of geographical location on homozygosity Kim et al. 2015b 54K Bovine Identify the association between ROHs and consanguinity coefficients (F), as well as analyze ROHs identified in populations of dairy cattle ROHs reflect homozygosis strongly affected by recent artificial selection Kim et al. 2015a 54K Bovine Identify selection signatures in Jersey cattle under selection since 1960 and compare them by the method for the identification of selection signatures via extended haplotypes The analysis of the estimated ROHs allows to efficiently responsible for the loss of genetic variability in reproductive characteristics of improved swine Genomic estimates based on the presence or absence of ROHs are a viable alternative for the detection of variability losses due to consanguinity Zavarez et al. 2015 777K Bovine Characterize levels of autozygosity, based on ROHs, in a population of Nellore cattle The analysis of ROHs allows characterizing herds according to their endogamy levels, besides identifying genomic regions with possible selection signatures for the breed Zhang et al.

Figure 1 .
Figure 1.Representation of the formation of runs of homozygosity (ROHs).Individual F presents a ROH (in green) formed by the pairing of stretches in homozygosis of the homologous chromosome of a common ancestor A.

Table 1 .
Recent studies with runs of homozygosity (ROHs) for the detection of autozygosity, identification of selection signatures, and characterization of populations of production animals.