Model selection for quantitative trait loci mapping in a full-sib family

Tong, Chunfa; Zhang, Bo; Li, Huogen; Shi, Jisen

doi:10.1590/S1415-47572012005000044

Abstract

Statistical methods for mapping quantitative trait loci (QTLs) in full-sib forest trees, in which the number of alleles and linkage phase can vary from locus to locus, are still not well established. Previous studies assumed that the QTL segregation pattern was fixed throughout the genome in a full-sib family, despite the fact that this pattern can vary among regions of the genome. In this paper, we propose a method for selecting the appropriate model for QTL mapping based on the segregation of different types of markers and QTLs in a full-sib family. The QTL segregation patterns were classified into three types: test cross (1:1 segregation), F2 cross (1:2:1 segregation) and full cross (1:1:1:1 segregation). Akaike's information criterion (AIC), the Bayesian information criterion (BIC) and the Laplace-empirical criterion (LEC) were used to select the most likely QTL segregation pattern. Simulations were used to evaluate the power of these criteria and the precision of parameter estimates. A Windows-based software was developed to run the selected QTL mapping method. A real example is presented to illustrate QTL mapping in forest trees based on an integrated linkage map with various segregation markers. The implications of this method for accurate QTL mapping in outbred species are discussed.

full-sib family; interval mapping; model selection; quantitative trait locus

Model selection for quantitative trait loci mapping in a full-sib family

Chunfa Tong; Bo Zhang; Huogen Li; Jisen Shi

Key Laboratory of Forest Genetics and Biotechnology of the Ministry of Education, Nanjing Forestry University, Nanjing, China

^{Send correspondence to} Send correspondence to: Jisen Shi Key Laboratory of Forest Genetics and Biotechnology of the Ministry of Education Nanjing Forestry University 159 Longpan Road, 210037 Nanjing Jiangsu Province, China E-mail: jshi@njfu.edu.cn

ABSTRACT

Statistical methods for mapping quantitative trait loci (QTLs) in full-sib forest trees, in which the number of alleles and linkage phase can vary from locus to locus, are still not well established. Previous studies assumed that the QTL segregation pattern was fixed throughout the genome in a full-sib family, despite the fact that this pattern can vary among regions of the genome. In this paper, we propose a method for selecting the appropriate model for QTL mapping based on the segregation of different types of markers and QTLs in a full-sib family. The QTL segregation patterns were classified into three types: test cross (1:1 segregation), F₂ cross (1:2:1 segregation) and full cross (1:1:1:1 segregation). Akaike's information criterion (AIC), the Bayesian information criterion (BIC) and the Laplace-empirical criterion (LEC) were used to select the most likely QTL segregation pattern. Simulations were used to evaluate the power of these criteria and the precision of parameter estimates. A Windows-based software was developed to run the selected QTL mapping method. A real example is presented to illustrate QTL mapping in forest trees based on an integrated linkage map with various segregation markers. The implications of this method for accurate QTL mapping in outbred species are discussed.

Key words: full-sib family, interval mapping, model selection, quantitative trait locus.

Introduction

Genetic mapping of quantitative trait loci (QTLs) based on genetic linkage maps is a powerful tool for unraveling the genetic architecture of quantitative trait variation in plants, animals and humans. Since the seminal publication on interval mapping by Lander and Botstein (1989) there has been a tremendous development of statistical methods and algorithms for QTL mapping. To make interval mapping more useful, Zeng (1993, 1994) and Jansen and Stam (1994) independently proposed so-called composite interval mapping in which partial regression analysis is used to separate the effects of multiple linked QTLs. Zeng and collaborators constructed the framework for multiple interval mapping to simultaneously characterize the underlying QTLs (their number, locations, and main and epistatic effects) for a quantitative trait (Kao et al., 1999; Zeng et al., 1999). Xu and colleagues extended interval mapping to map qualitatively inherited traits, such as binary and categorical traits (Xu and Atchley, 1996; Yi and Xu, 2000; Xu et al., 2005). The principle of interval mapping was established for a pedigree, initiated with two inbred lines, such as the F₂, backcross and recombinant inbred lines. For any two inbred lines, there are only two alleles at each locus and in the F₁ hybrids that transmit gametes to the next generation there is a fixed linkage phase between any two loci. These two features of inbred lines greatly facilitate statistical inference about the QTL location and effects.

In practice, it is difficult or impossible to generate inbred lines for outcrossing species such as forest trees because of their high heterozygosity and long generation intervals. For any two heterozygous individuals, the number of alleles per locus can differ from gene to gene, leading to different segregation patterns when the two individuals are crossed. Wu et al. (2002) listed all possible types of marker segregation in a full-sib family derived from two heterozygous lines. For a given heterozygous line, there is uncertainty about the linkage phase between any pair of loci, i.e., diplotype when the two homozygous chromosomes are considered together. Despite these difficulties, various models and methods for linkage analysis in outcrossing species have been developed through the collective efforts of statisticians and geneticists (Grattapaglia and Sederoff, 1994; Maliepaard et al., 1997; Wu et al., 2002). Lu et al. (2004) derived a general framework that covers all these approaches and allows for linkage analysis between any types of markers by simultaneously estimating the recombination fraction, parental diplotype and gene order. More recently, Tong et al. (2010) described a hidden Markov model approach for multilocus linkage analysis and developed a Windows-based software to construct genetic linkage maps with different segregation markers in a full-sib family.

Nevertheless, despite these advances, there has been limited exploration of the modeling and analysis of QTL mapping in outcrossing species. Haley et al. (1994) proposed an approach for mapping outcrossing QTLs in an experimental cross with the F₂ type markers. Although this approach was used to detect QTLs in pigs (Andersson et al., 1994), it did not receive widespread acceptance because of its failure to incorporate the linkage phase of the parents and any type of marker segregation. Lin et al. (2003) subsequently proposed a general statistical model for simultaneously estimating the QTL-marker linkage phase, QTL location and QTL effects in an outcrossed family. Although some key statistical issues of the latter model have been investigated, there has been no systematic modeling of QTL segregation patterns.

In this article, we propose a method for selecting the appropriate model for mapping QTL intervals in a full-sib family derived from two outcrossing parents by considering all possible patterns of QTL segregation, i.e., test cross (1:1 segregation), F₂ cross (1:2:1 segregation) and full cross (1:1:1:1 segregation). The most likely QTL segregation pattern for a sample was chosen based on model selection criteria such as Akaike's information criterion (AIC; Akaike, 1974), the Bayesian information criterion (BIC; Schwarz, 1978) and the Laplace-empirical criterion (LEC; McLachlan and Pell, 2000). The method capitalizes on all types of marker segregation and provides simultaneous estimates of the QTL segregation pattern, QTL location and QTL effects. Simulations were used to investigate the statistical behavior of this QTL mapping approach. A Windows-based software was developed to implement the statistical model for QTL mapping in outbred species and the usefulness of the method was validated by using an outcrossing forest tree as an example.

Materials and Methods

Segregation pattern

Suppose that two outcrossing lines, P₁ and P₂, are crossed to generate a full-sib family. The number of different alleles at an informative marker locus in the two parents may be 2, 3 or 4. Maliepaard et al. (1997) showed that the possible combinations of two parental genotypes at an informative marker locus, i.e., segregation types, were ab × aa, aa × ab, ab × ab, ab × cd, ao × ao, ab × ao or ao × ab, where a, b, c and d denote different alleles at a marker locus and o denotes the null allele, with the two characters to the left of the crossing symbol representing the marker genotype of P₁ and the two characters on the right representing the marker genotype of P₂. The linkage analysis used to estimate recombination and linkage phase inference between any two markers is well-defined (Wu et al., 2002, 2007; Lu et al., 2004; Tong et al., 2010) and allows the construction of an integrated linkage map that can contain any type of segregation markers. Similarly, a QTL may also have up to four alleles and present different segregation types. However, some of the segregation types, such as q₁q_{1 ×}q₁q₂ and q₁q_{1 ×}q₂q₃, cannot be distinguished from each other because of inadequate information about allelic configurations.

The QTL segregation patterns are generally classified into three types: (1) test cross, in which the segregation type is q₁q_{1 ×}q₁q₂ or q₁q_{2 ×}q₁q₁ that can generate two genotypes, q₁q₁ and q₁q₂ (1:1 segregation), (2) F₂ cross, in which the segregation type is q₁q_{2 ×}q₁q₂ that can generate three genotypes, q₁q₁, q₁q₂ and q₂q₂ (1:2:1 segregation) and (3) full cross, in which the segregation type is q₁q_{2 ×}q₃q₄ that can generate four genotypes, q₁q₃, q₁q₄, q₂q₃, and q₂q₄ (1:1:1:1 segregation). Each of these QTL segregation types reflects different degrees of information and can be discriminated from the others by using appropriate model selection criteria.

Conditional probability

Consider two molecular markers and a putative QTL in the interval of two markers on a chromosome in a diploid full-sib family. We initially assume that there are four alleles for each molecular marker loci or QTL and that the combined genotypes of the two parents at two markers and a QTL are denoted by a₁q₁a₂ / b₁q₂b₂ and c₁q₃c₂ / d₁q₄d₂, where the slash is used to segregate the two haplotypes of a genotype. If r is the recombination fraction between the markers, r₁ the recombination fraction between marker 1 and the QTL, and r₂ the recombination fraction between the QTL and marker 2, then we have the relationship r = r₁ + r₂ - 2r₁r₂, assuming that there is no interference between two intervals on chromosomes. The frequencies or probabilities of the combined genotypes in the progeny can be easily derived, as shown in Table 1, in which the elements were multiplied by 4. For the other marker and QTL segregation patterns, the probability of marker and QTL genotype can be obtained by first merging the rows of the same marker genotype and then the columns of the same QTL genotype in Table 1. Once the probabilities of all the marker and QTL genotypes have been obtained, the conditional probability of a QTL genotype given the combined genotype of the two markers can be obtained by dividing the probability of the corresponding marker and QTL genotype by the sum of all the probabilities with the same given marker genotype.

Thumbnail

Mixed model

For a given QTL segregation pattern, let J be the number of QTL genotypes (J = 2, 3 or 4). Assume that a quantitative trait is distributed as a normal distribution with mean µ_j and variance Σ² within the jth QTL genotype (j = 1,..., J). The phenotypic value of the ith individual, y_i, will then have a mixture of normal distributions:

,

where p_j|i is the conditional probability of the jth QTL genotype given the marker genotype of the ith individual.

For a sample of n individuals in the full-sib family, the likelihood of the parameter vector, θ = (µ₁,..., µ_J, Σ²), for a specific position on the chromosome, can be written as

where

is the density function of a normal distribution.

EM algorithm

Under the full model, the maximum-likelihood estimates of the parameters can be obtained with a form of the expectation-maximization (EM) algorithm (Dempster et al., 1977). For iteration s + 1, assume that we have estimates of the parameter . In the E-step, we calculate the conditional mean of the complete data log likelihood, which involves calculating the posterior probability of individual i having the jth QTL genotype, as

In the M-step, we maximize the log likelihood by updating the estimates of µ_j and Σ² as

The EM algorithm is then initiated by taking

until the estimates converge, where is the empirical mean of observations.

Hypothesis testing

The null hypothesis of no QTL segregating at the specific position of the chromosome is

implying that the distribution of the quantitative phenotype does not depend on the genotype of the putative QTL. The corresponding likelihood function is

where µ₀ and are the mean and variance of the overall population, respectively, and is the parameter vector.

Under the null model, the maximum likelihood of parameters can be directly obtained as

The test statistic for the above hypothesis can be expressed as the log-likelihood ratio of the full model over the null model:

where and are two vectors of the maximum likelihood estimates under the full model and null model, respectively. If a high peak of the LR profile exceeds a critical threshold then a QTL that controls the trait is asserted to exist in a marker interval. Because LR may not be asymptotically distributed as a chi-square distribution an empirical method for determining the genome-wide threshold can be used by performing permutation tests (Churchill and Doerge, 1994).

Model selection

The purpose of model selection is to identify a model that has a balance between the goodness-of-fit of the data and the complexity of the model. Fisher's maximum likelihood cannot be used as a criterion for model selection because a simpler model has to be a subset of a more complicated model and, hence, the maximum likelihood of the former is always less than that of the latter. Akaike's information criterion (AIC; Akaike, 1974) and the Bayesian information criterion (BIC; Schwarz, 1978) are commonly used for model selection. AIC and BIC are defined as

where is the maximum likelihood, d the number of parameters to be estimated in the model, and n the sample size. AIC is derived in terms of Kullback and Leibler (1951) information for the true model with respect to the fitted model while BIC is based on an integrated likelihood within a Bayesian framework.

In addition to the above two criteria, the Laplace-Empirical criterion (LEC; McLachlan and Pell, 2000) was expected to be a good choice for model selection. LEC not only contains information on the number of parameters and sample size in a model but also provides a priori information on the parameters and information matrix of the log likelihood function. LEC is defined as

where is the prior probability density of the estimated parameters and is the observed information matrix, i.e., the negative Hessian matrix of the log likelihood, both evaluated at the maximum likelihood estimate vector . We assumed, as did Roberts et al. (1998), that the estimated parameter µ_j was uniformly distributed over the interval of length for j = 1,..., J, that Σ² was uniformly distributed in the interval and that all are independent. The LEC for our QTL mapping model can therefore be written as

where J is the number of QTL genotypes for a certain QTL segregation pattern. The ^appendix appendix (in Supplementary Material) provides the details of each element of the matrix used to calculate the determinant of .

The approach described above allowed us to choose the model that was most likely to provide the minimum AIC, BIC or LEC among the three QTL segregation patterns for a specific position on a chromosome. The power of AIC, BIC and LEC was assessed through Monte Carlo simulations.

Monte Carlo simulations

To assess the usefulness of the QTL mapping method and model selection in different QTL segregation patterns in a full-sib family we simulated a 100 cM-long chromosome with six markers evenly spaced along the chromosome. As indicated by Maliepaard et al. (1997), the segregation patterns of the six markers were aa × ab, ab × cd, aa × ab, ab × cd, ab × ab and aa × ab, and the linkage phase between two adjacent markers were r, r, r, c × r and c, respectively. One QTL was simulated at position 50 cM and the QTL segregation patterns were assumed to be: (1) q₁q_{1 ×}q₁q₂, (2) q₁q_{2 ×}q₁q₂ or (3) q₁q_{2 ×}q₃q₄, corresponding to the three different QTL segregation patterns.

In the simulation, the effects of the QTL genotypes were set to be µ₁ = 15 and µ₂ = 10 for the test cross segregation pattern, µ₁ = 20, µ₂ = 16 and µ₃ = 10 for the F₂ segregation pattern, and µ₁ = 20, µ₂ = 18, µ₃ = 14 and µ₄ = 10 for the full cross segregation pattern. The heritability of the QTL was set at values of h² = 0.10, 0.15, 0.20, 0.30 and 0.50. The variance of the environment effect, Σ², was therefore determined by the variance and the heritability of the assumed QTL and was defined by the relationship . For example, in the test cross segregation pattern, if h² = 0.30, then Σ² = 14.6 because in this case = 6.25. For each case of the simulation, we sampled 500 individuals from a full-sib family with 1000 replicates. Model selection criteria such as LEC, AIC and BIC were used to select the best model among the three competing models in this study and the power of these criteria was calculated based on 1000 replicates. The statistical power for each model was obtained by counting the number of runs out of 1000 replicates in which the model selection was correct and the LR value was greater than an empirical threshold. The threshold of the LR for each model was estimated by an additional 1000 simulations with no QTL segregation. Generally, the 0.95 or 0.99 quantile of the 1000 LR values under the null model was used as the empirical threshold.

Software development

We developed a Windows-based software, designated as FsQtlMap, to implement the statistical methods for QTL mapping in a full-sib family. FsQtlMap is written in VC++ 6.0 and runs on Microsoft Windows operating systems, including Windows 2000, 2003, XP, Vista and 7. The software assumes that the segregation pattern in a QTL may be test cross (1:1 segregation), F2 cross (1:2:1 segregation) or full cross (1:1:1:1 segregation) in a full-sib family and uses LEC as a model selection criterion to determine the QTL segregation ratio. The summary of QTL detection and a series of intermediate results are generated and saved in the corresponding files associated with QTL mapping. FsQtlMap uses the free software gnuplot to plot LOD (the logarithm of the odds based on 10) profiles along the linkage groups; the plots are generated in enhanced metafile format (EMF) and postscript (PS) format. FsQtlMap also provides a function that runs permutation tests to yield the genome-wide LOD threshold for asserting that a given peak of the profile is a QTL for each of the three QTL models. Further details on the data format and operational procedures are provided in the FsQtlMap manual. The software and its manual can be freely downloaded from http://fgbio.njfu.edu.cn/tong/FsQtlMap/FsQtlMap.htm.

A real example

The applicability of our statistical method for mapping QTLs in a full-sib family was demonstrated for a forest tree, specifically an interspecific F₁ hybrid population between Populus deltoides and Populus euramericana in Xuchou, Jiangsu Province, China. Ninety-three genotypes randomly selected from the population were used to construct the genetic linkage map based on molecular markers detected by RAPD, AFLP, ISSR, SSR and SNP analysis (Zhang, 2005). The linkage map contained 19 linkage groups and 314 markers, of which 252 segregated in a 1:1 ratio, 7 in a 1:2:1 ratio and 55 in a 1:1:1:1 ratio. The linkage phases of the two parents between any two adjacent markers on the map were also predicted. Our analysis identified QTLs that affected the root number, an adventitious root trait, in all of the 19 linkage groups in the integrated map of P. deltoides and P. euramericana.

Results

Figure 1 compares the powers for selecting the true model among the three candidate models based on LEC, AIC and BIC. Figure 1a,^b indicates that the power of LEC and BIC for selecting the QTL segregation model of test cross and F₂ cross was higher than that of AIC for all the heritabilities, whereas Figure 1c shows the opposite, i.e., that the power of AIC for selecting the QTL segregation model of full cross was higher than that of LEC and BIC. Although BIC showed a slight advantage over LEC for selecting the model of test cross and F₂ cross, it had drastically lower power than LEC for selecting the model of full cross, especially when the heritability of the QTL was < 0.20. Overall, the powers of LEC and BIC were almost similar in finding the correct model, probably because these criteria are derived from a Bayesian framework for model selection (McLachlan and Pell, 2000). However, the LEC provides more information of the true model than BIC in that the former not only has a priori information of the parameters in the model but also contains information on the negative Hessian matrix of the log likelihood. The result of these simulations suggest that the LEC is the first choice for model selection in mapping QTLs in a full-sib family, a conclusion that agrees well with the findings of model selection theory.

Table 2 provides detailed results on the estimated QTL position, genotypic effects, heritability, and power of model selection using LEC for the three QTL segregation models. The power of model selection increased as the QTL heritability increased and was generally > 90%, except in the case of h² = 0.10 and 0.15 for the full cross model. The levels of QTL heritability had a strong effect on the precision of the estimates of the QTL position but had a small effect on the estimates of QTL genotypic effects and QTL heritability. A high QTL heritability can yield estimates of the QTL position that tend towards the true value with a small standard deviation. When the QTL heritability was small, as in the case of h² = 0.10, especially for the full cross model, the estimates of QTL position were biased with a standard deviation up to 10.19. The average estimates of QTL genotypic effects and heritability were almost equal to the true values, but the standard deviations decreased as the heritability increased. The precision of the estimates for QTL position, genotypic effects and heritability decreased as the number of parameters in the model increased. The test cross model yielded the most precise estimates of QTL position, genotypic effects and heritability because it had only three parameters (one for residual or environmental variance and two for QTL genotypic effects) while the full cross model yielded less precise estimates with five parameters. This difference can be explained by the fact that the high complexity of the model decreased the precision of the parameter estimates.

Thumbnail

The linkage map of P. deltoides and P. euramericana was scanned with the three QTL segregation models using the interval mapping method. Figure 2 shows the profiles of the log likelihood ratios (LR) generated by each model to detect QTLs that control the adventitious root trait. The critical values determined at the 1% significance level by 1000 permutation tests (Doerge and Churchill, 1996) were 14.71, 21.78 and 27.54 for the test cross, F₂ cross, and full cross models, respectively. For each position on a linkage map, the LEC was used to determine the most likely QTL segregation pattern. Six high peaks (A-F) that exceeded the thresholds were detected in the LR profiles (Figure 2). However, since peak E in Figure 2a and peak F in Figure 2b occurred at the same position in marker interval CG/CTT_440R~ TC/CGT_120, linkage 3 there were only five true QTLs.

Table 3 summarizes the procedure for selecting the most likely QTL segregation pattern for the five positions in Figure 2. According to the LEC, peaks A, C and F were selected to be the significant QTL positions because each of them had the lowest value for LEC and a significant value of LR under the same QTL segregation pattern, whereas peaks B and E were not significant QTL positions since they did not have the lowest values for LEC. However, peak F was close to peak C and they had almost the same genotypic effects so that the former may be considered a ghost QTL (Martinez and Curnow, 1992; Doerge, 2002). Overall, therefore, two QTLs, i.e. peaks A and C, were concluded to be the significant QTLs responsible for root number.

Thumbnail

Discussion

The efforts of many statistical geneticists in the past two decades mean that genetic linkage maps can now be constructed using different segregation molecular marker data from full-sib families in species such as forest trees in which inbred lines are almost impossible to obtain through traditional self-mating for many generations (Maliepaard et al., 1997; Wu et al., 2002; Lu et al., 2004; Tong et al., 2010). Two softwares, JoinMap (Van Ooijen, 2006) and FsLinkageMap (Tong et al., 2010), are available for constructing an integrated genetic linkage map with predicted linkage phase between any two adjacent markers. Based on such genetic linkage maps in outbred species, we have now proposed a method for selecting the appropriate model for detecting QTLs by considering three QTL segregation patterns, i.e., test cross (1:1 segregation), F₂ cross (1:2:1 segregation) and full cross (1:1:1:1 segregation). Our method has some advantages in the genetic mapping of complex traits by accounting for the biological characteristics of forest trees.

First, our QTL mapping method with model selection procedures allows one to choose the most likely QTL segregation pattern of the three assumed patterns. Like molecular markers, QTL segregation may show different patterns throughout the genome in an outcrossing species. Hence, it is reasonable to incorporate different QTL segregation modes into a statistical model for QTL mapping in a full-sib family. However, MapQTL (Van Ooijen, 2009), the only available software that can be used to detect QTLs with data from a full-sib family, assumes that the QTL segregation is fixed as ab × cd. This is the case of the full cross pattern in our statistical model. The shortcoming of MapQTL can be illustrated by the real example described above in which QTLs were detected segregating in test cross and F₂ cross patterns. This means that no QTLs would be found if QTL mapping in this example were done with MapQTL.

Second, our QTL mapping method could be done by using genetic linkage maps of outbred species that had been constructed in the past 20 years. For example, in forest trees, many parent-specific linkage maps (Plomion et al., 1995; Wu et al., 2000; Yin et al., 2002; Shepherd et al., 2003; Gan et al., 2003) have been constructed and QTL mapping studies have also been done with the pseudo-test cross strategy first proposed by Grattapaglia and Sederoff (1994). This method has some limitations in QTL mapping in that the linkage phase between adjacent two markers and possible multiple QTL segregation patterns are not considered. The application of our QTL mapping method to these previous data would be expected to yield better results.

Third, the use of LEC as the criterion for identifying QTL segregation patterns is not only supported by the simulation results but also by the quantity itself. Model selection is an important but very difficult problem that has not been completely resolved for mixed models (McLachlan and Pell, 2000). Although AIC and BIC have been extensively applied to many situations, they were apparently unable to select the correct QTL segregation ratio in our QTL mapping models (Figure 1). Unlike AIC and BIC, LEC contains more information about the model itself. LEC not only contains the number of estimated parameters and the sample size but also the prior probabilities of the estimated parameters and the negative Hessian matrix of the log likelihood. These characteristics indicate that LEC generally has a higher power than AIC and BIC in model selection.

Finally and most importantly, we have developed a Windows-based software (FsQtlMap) to allows the immediate implementation of our QTL mapping strategy. Computer packages for QTL mapping, such as MapMaker/QTL (Lincoln and Lander, 1990) and Windows QTL Cartographer (Wang et al., 2010), are well-established and have been extensively used for inbred lines. In contrast, there are no popular statistical tools for QTL mapping in outbred species such as forest trees. Although MapQTL (Van Ooijen, 2009) has been used for QTL mapping in forest trees by some researchers, its application is limited by the assumption that there is only one QTL segregation pattern in a full-sib family. By incorporating the characteristics of outcross species FsQtlMap provides a much more powerful computing tool for QTL mapping.

Our new QTL mapping method was applied to real data and successfully detected two QTLs that affect adventitious roots in Populus. One QTL segregated in an F₂ cross and had much higher heritability. This finding indicates that the rooting capacity of poplars may be controlled by a major gene that can explain ~70% of the phenotypic variance. This conclusion is consistent with that of Han et al. (1994).

Acknowledgments

We thank Rongling Wu and two anonymous reviewers for their constructive suggestions and comments on this manuscript. This work was partially supported through a project funded by the Priority Academic Program Development (PAPD) of the Jiangsu Higher Education Institutions and the National Natural Science Foundation of China (grant no. 30872051).

Internet Resources

Received: November 14, 2011

Accepted: April 21, 2012

Associate Editor: Everaldo Gonçalves de Barros

License information: This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Supplementary Material

The following online material is available for this article:

- Appendix: Elements of the information matrix of the log- likelihood.

This material is available as part of the online article from http://www.scielo.br/gmb.

Appendix: Elements of the information matrix of the log- likelihood

The elements of the information matrix I_e(θ), denoted by I_e(θ)_hh, are the negative second partial derivatives of the log likelihood function (Eq. (5)) with respect to μ₁, μ₂,..., μ_k, and σ², which can be directly obtained as follows:

Akaike H (1974) A new look at the statistical model identification. IEEE Trans Automat Contr 19:716-723.
Andersson L, Haley CS, Ellegren H, Knott SA, Johansson M, Andersson K, Andersson-Eklund L, Edfors-Lilja I, Fredholm M, Hansson I, et al. (1994) Genetic mapping of quantitative trait loci for growth and fatness in pigs. Science 263:1771-1774.
Churchill GA and Doerge RW (1994) Empirical threshold values for quantitative trait mapping. Genetics 138:963-971.
Dempster AP, Laird NM and Rubin DB (1977) Maximum likelihood from incomplete data via EM algorithm. J R Stat Soc Ser B (Methodological) 39:1-38.
Doerge RW (2002) Mapping and analysis of quantitative trait loci in experimental populations. Nat Rev Genet 3:43-52.
Doerge RW and Churchill GA (1996) Permutation tests for multiple loci affecting a quantitative character. Genetics 142:285-294.
Gan S, Shi J, Li M, Wu K, Wu J and Bai J (2003) Moderate-density molecular maps of Eucalyptus urophylla S. T. Blake and E. tereticornis Smith genomes based on RAPD markers. Genetica 118:59-67.
Grattapaglia D and Sederoff R (1994) Genetic linkage maps of Eucalyptus grandis and Eucalyptus urophylla using a pseudo-testcross: Mapping strategy and RAPD markers. Genetics 137:1121-1137.
Haley CS, Knott SA and Elsen JM (1994) Mapping quantitative trait loci in crosses between outbred lines using least squares. Genetics 136:1195-1207.
Han K, Bradshaw HD, Gordon MP and Han KH (1994) Adventitious root and shoot regeneration in vitro is under major gene control in an F2 family of hybrid poplar (Populus trichocarpa × P. deltoides). Forest Genet 1:139-146.
Jansen RC and Stam P (1994) High resolution of quantitative traits into multiple loci via interval mapping. Genetics 136:1447-1455.
Kao C-H, Zeng Z-B and Teasdale RD (1999) Multiple interval mapping for quantitative trait loci. Genetics 152:1203-1216.
Kullback S and Leibler RA (1951) On information and sufficiency. Ann Math Statist 22:79-86.
Lander ES and Botstein D (1989) Mapping Mendelian factors underlying quantitative traits using RFLP linkage maps. Genetics 121:185-199.
Lin M, Lou XY, Chang M and Wu RL (2003) A general statistical framework for mapping quantitative trait loci in nonmodel systems: Issue for characterizing linkage phases. Genetics 165:901-913.
Lincoln SE and Lander ES (1990) Mapping Genes Controlling Quantitative Traits Using MAPMAKER/QTL. Technical Report. Whitehead Institute for Biomedical Research, Cambridge, 46 pp.
Lu Q, Cui YH and Wu RL (2004) A multilocus likelihood approach to joint modeling of linkage, parental diplotype and gene order in a full-sib family. BMC Genetics 5:e20.
Maliepaard C, Jansen J and Van Ooijen JW (1997) Linkage analysis in a full-sib family of an outbreeding plant species: Overview and consequences for applications. Genet Res 70:237-250.
Martinez O and Curnow RN (1992) Estimating the locations and the size of the effects of quantitative trait loci using flanking markers. Theor Appl Genet 85:480-488.
McLachlan G and Pell D (2000) Finite Mixture Models. John Wiley & Sons, New York, 419 pp.
Plomion CD, Malley MO and Durel CE (1995) Genomic analysis in maritime pine (Pinus pinaster). Comparison of two RAPD maps using selfed and open-pollinated seeds of the same individual. Theor Appl Genet 90:1028-1034.
Roberts SJ, Husmeier D, Rezek I and Penny W (1998) Bayesian approaches to Gaussian modeling. IEEE Trans Pattern Anal Mach Intell 20:1133-1142.
Schwarz G (1978) Estimating the dimension of a model. Ann Stat 6:461-464.
Shepherd M, Cross M, Dieters MJ and Herry R (2003) Genetic maps for Pinus elliottii var. elliottii and P. caribaea var. hondurensis using AFLP and microsatellite markers. Theor Appl Genet 106:1409-1419.
Tong CF, Zhang B and Shi JS (2010) A hidden Markov model approach to multilocus linkage analysis in a full-sib family. Tree Genet Genomes 6:651-662.
Van Ooijen JW (2006) JoinMap 4, Software for the calculation of genetic linkage maps in experimental populations. Kyazma BV, Wageningen, Netherlands.
Van Ooijen JW (2009) MapQTL 6, Software for the mapping of quantitative trait loci in experimental populations of diploid species. Kyazma BV, Wageningen, Netherlands.
Wang S, Basten CJ and Zeng Z-B (2010) Windows QTL Cartographer 2.5. Department of Statistics, North Carolina State University, Raleigh, NC.
Wu RL, Han YF, Hu JJ, Fang JJ, Li L, Li LM and Zeng Z-B (2000) An integrated genetic map of Populus deltoids based on amplified fragment length polymorphisms. Theor Appl Genet 100:1249-1256.
Wu RL, Ma CX, Painter I and Zeng Z-B (2002) Simultaneous maximum likelihood estimation of linkage and linkage phases in outcrossing species. Theor Popul Biol 61:349-363.
Wu RL, Ma CX and Casella G (2007) Statistical Genetics of Quantitative Traits: Linkage, Maps and QTL. Springer, New York, 365 pp.
Xu S and Atchley WR (1996) Mapping quantitative trait loci for complex binary diseases using line crosses. Genetics 143:1417-1424.
Xu C, Li Z and Xu S (2005) Joint mapping of quantitative trait loci for multiple binary characters. Genetics 169:1045-1059.
Yi N and Xu S (2000) Bayesian mapping of quantitative trait loci for complex binary traits. Genetics 155:1391-1403.
Yin TM, Zhang XY, Huang MR, Wang MX, Zhuge Q, Tu SM, Zhu LH and Wu RL (2002) Molecular linkage maps of the Populus genome. Genome 45:541-555.
Zeng Z-B (1993) Theoretical basis for separation of multiple linked gene effects in mapping quantitative trait loci. Proc Natl Acad Sci USA 90:10972-10976.
Zeng Z-B (1994) Precision mapping of quantitative trait loci. Genetics 136:1457-1468.
Zeng Z-B, Kao C-H and Basten CJ (1999) Estimating the genetics architecture of quantitative traits. Genet Res 74:279-289.
gnuplot, http://www.gnuplot.infoFsQtlMap manual, http://fgbio.njfu.edu.cn/tong/FsQtlMap/FsQtlMap.htm
» link
Zhang B (2005) Constructing genetic linkage maps and mapping QTLs affecting important traits in poplar. PhD Dissertation, Nanjing Forestry University, Nanjing, China. http://fgbio.njfu.edu.cn/tong/zhang2005.pdf

appendix

Send correspondence to:

Jisen Shi

Key Laboratory of Forest Genetics and Biotechnology of the Ministry of Education

Nanjing Forestry University

159 Longpan Road, 210037 Nanjing

Jiangsu Province, China

E-mail:

jshi@njfu.edu.cn

Publication Dates

Publication in this collection
13 July 2012
Date of issue
2012

History

Received
14 Nov 2011
Accepted
21 Apr 2012

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

[1] Akaike H (1974) A new look at the statistical model identification. IEEE Trans Automat Contr 19:716-723.

[2] Andersson L, Haley CS, Ellegren H, Knott SA, Johansson M, Andersson K, Andersson-Eklund L, Edfors-Lilja I, Fredholm M, Hansson I, et al. (1994) Genetic mapping of quantitative trait loci for growth and fatness in pigs. Science 263:1771-1774.

[3] Churchill GA and Doerge RW (1994) Empirical threshold values for quantitative trait mapping. Genetics 138:963-971.

[4] Dempster AP, Laird NM and Rubin DB (1977) Maximum likelihood from incomplete data via EM algorithm. J R Stat Soc Ser B (Methodological) 39:1-38.

[5] Doerge RW (2002) Mapping and analysis of quantitative trait loci in experimental populations. Nat Rev Genet 3:43-52.

[6] Doerge RW and Churchill GA (1996) Permutation tests for multiple loci affecting a quantitative character. Genetics 142:285-294.

[7] Gan S, Shi J, Li M, Wu K, Wu J and Bai J (2003) Moderate-density molecular maps of Eucalyptus urophylla S. T. Blake and E. tereticornis Smith genomes based on RAPD markers. Genetica 118:59-67.

[8] Grattapaglia D and Sederoff R (1994) Genetic linkage maps of Eucalyptus grandis and Eucalyptus urophylla using a pseudo-testcross: Mapping strategy and RAPD markers. Genetics 137:1121-1137.

[9] Haley CS, Knott SA and Elsen JM (1994) Mapping quantitative trait loci in crosses between outbred lines using least squares. Genetics 136:1195-1207.

[10] Han K, Bradshaw HD, Gordon MP and Han KH (1994) Adventitious root and shoot regeneration in vitro is under major gene control in an F2 family of hybrid poplar (Populus trichocarpa × P. deltoides). Forest Genet 1:139-146.

[11] Jansen RC and Stam P (1994) High resolution of quantitative traits into multiple loci via interval mapping. Genetics 136:1447-1455.

[12] Kao C-H, Zeng Z-B and Teasdale RD (1999) Multiple interval mapping for quantitative trait loci. Genetics 152:1203-1216.

[13] Kullback S and Leibler RA (1951) On information and sufficiency. Ann Math Statist 22:79-86.

[14] Lander ES and Botstein D (1989) Mapping Mendelian factors underlying quantitative traits using RFLP linkage maps. Genetics 121:185-199.

[15] Lin M, Lou XY, Chang M and Wu RL (2003) A general statistical framework for mapping quantitative trait loci in nonmodel systems: Issue for characterizing linkage phases. Genetics 165:901-913.

[16] Lincoln SE and Lander ES (1990) Mapping Genes Controlling Quantitative Traits Using MAPMAKER/QTL. Technical Report. Whitehead Institute for Biomedical Research, Cambridge, 46 pp.

[17] Lu Q, Cui YH and Wu RL (2004) A multilocus likelihood approach to joint modeling of linkage, parental diplotype and gene order in a full-sib family. BMC Genetics 5:e20.

[18] Maliepaard C, Jansen J and Van Ooijen JW (1997) Linkage analysis in a full-sib family of an outbreeding plant species: Overview and consequences for applications. Genet Res 70:237-250.

[19] Martinez O and Curnow RN (1992) Estimating the locations and the size of the effects of quantitative trait loci using flanking markers. Theor Appl Genet 85:480-488.

[20] McLachlan G and Pell D (2000) Finite Mixture Models. John Wiley & Sons, New York, 419 pp.

[21] Plomion CD, Malley MO and Durel CE (1995) Genomic analysis in maritime pine (Pinus pinaster). Comparison of two RAPD maps using selfed and open-pollinated seeds of the same individual. Theor Appl Genet 90:1028-1034.

[22] Roberts SJ, Husmeier D, Rezek I and Penny W (1998) Bayesian approaches to Gaussian modeling. IEEE Trans Pattern Anal Mach Intell 20:1133-1142.

[23] Schwarz G (1978) Estimating the dimension of a model. Ann Stat 6:461-464.

[24] Shepherd M, Cross M, Dieters MJ and Herry R (2003) Genetic maps for Pinus elliottii var. elliottii and P. caribaea var. hondurensis using AFLP and microsatellite markers. Theor Appl Genet 106:1409-1419.

[25] Tong CF, Zhang B and Shi JS (2010) A hidden Markov model approach to multilocus linkage analysis in a full-sib family. Tree Genet Genomes 6:651-662.

[26] Van Ooijen JW (2006) JoinMap 4, Software for the calculation of genetic linkage maps in experimental populations. Kyazma BV, Wageningen, Netherlands.

[27] Van Ooijen JW (2009) MapQTL 6, Software for the mapping of quantitative trait loci in experimental populations of diploid species. Kyazma BV, Wageningen, Netherlands.

[28] Wang S, Basten CJ and Zeng Z-B (2010) Windows QTL Cartographer 2.5. Department of Statistics, North Carolina State University, Raleigh, NC.

[29] Wu RL, Han YF, Hu JJ, Fang JJ, Li L, Li LM and Zeng Z-B (2000) An integrated genetic map of Populus deltoids based on amplified fragment length polymorphisms. Theor Appl Genet 100:1249-1256.

[30] Wu RL, Ma CX, Painter I and Zeng Z-B (2002) Simultaneous maximum likelihood estimation of linkage and linkage phases in outcrossing species. Theor Popul Biol 61:349-363.

[31] Wu RL, Ma CX and Casella G (2007) Statistical Genetics of Quantitative Traits: Linkage, Maps and QTL. Springer, New York, 365 pp.

[32] Xu S and Atchley WR (1996) Mapping quantitative trait loci for complex binary diseases using line crosses. Genetics 143:1417-1424.

[33] Xu C, Li Z and Xu S (2005) Joint mapping of quantitative trait loci for multiple binary characters. Genetics 169:1045-1059.

[34] Yi N and Xu S (2000) Bayesian mapping of quantitative trait loci for complex binary traits. Genetics 155:1391-1403.

[35] Yin TM, Zhang XY, Huang MR, Wang MX, Zhuge Q, Tu SM, Zhu LH and Wu RL (2002) Molecular linkage maps of the Populus genome. Genome 45:541-555.

[36] Zeng Z-B (1993) Theoretical basis for separation of multiple linked gene effects in mapping quantitative trait loci. Proc Natl Acad Sci USA 90:10972-10976.

[37] Zeng Z-B (1994) Precision mapping of quantitative trait loci. Genetics 136:1457-1468.

[38] Zeng Z-B, Kao C-H and Basten CJ (1999) Estimating the genetics architecture of quantitative traits. Genet Res 74:279-289.

[39] gnuplot, http://www.gnuplot.infoFsQtlMap manual, http://fgbio.njfu.edu.cn/tong/FsQtlMap/FsQtlMap.htm
» link

[40] Zhang B (2005) Constructing genetic linkage maps and mapping QTLs affecting important traits in poplar. PhD Dissertation, Nanjing Forestry University, Nanjing, China. http://fgbio.njfu.edu.cn/tong/zhang2005.pdf

Brasil

Brasil

Model selection for quantitative trait loci mapping in a full-sib family

Abstract

appendix

Send correspondence to:

Publication Dates

History