Print version ISSN 1415-4757
Genet. Mol. Biol. vol.31 no.1 São Paulo 2008
HUMAN AND MEDICAL GENETICS
Alan E. Stark
Balgowlah NSW, Australia
This paper gives a model of a structured population with respect to an autosomal locus with two alleles. The population reproduces in discrete and non-overlapping generations. The population is assumed to be in equilibrium in that exactly the same distribution of genotypic proportions is reproduced in each generation. The population is subdivided into 'localities' which are characterized by the local gene frequencies. Within each locality the genotypic proportions may depart from Hardy-Weinberg proportions and the same fixation index applies to all localities. The system departs from reality by assuming that the frequency of the first allele follows the beta distribution. However, this enables a convenient way to derive the mating frequencies of parents so that equilibrium is maintained. Wright's F-statistics are applied to characterize the population as a whole. The system is extended to permit an arbitrary level of outbreeding.
Key words: subdivided populations, F-statistics, mating frequencies.
This paper gives a mating system for a structured population with respect to an autosomal locus with two alleles. The population reproduces in discrete and non-overlapping generations and is assumed to be in equilibrium meaning that the population is exactly reproduced in each generation. The simplest model of an unstructured population is that of Weinberg (1908) and Hardy (1908). Stark (2006) and Li (1988) showed that certain forms of non-random mating produce and sustain Hardy-Weinberg proportions. While there were earlier attempts to add realism by going beyond the original model of Hardy and Weinberg it appears that Sewall Wright did most to develop models of structured populations. Some of his ideas are summarized in Wright (1965), which gives references to his earlier work. In that paper Wright wrote "A system was developed for describing the properties of hierarchically subdivided natural populations. Three parameters were proposed in the 1951 paper in terms of a total population (T), subdivisions (S), and individuals (I). FIT is the correlation between gametes that unite to produce the individuals, relative to the gametes of the total population. FIS is the average over all subdivisions of the correlation between uniting gametes relative to those of their own subdivision. FST is the correlation between random gametes within subdivisions, relative to gametes of the total population."
The principal features contained in the quotation above are included here in that the total population is divided into sub-populations or localities which may be connected in the sense that the two members of a mating pair may come from different localities or may come from the same locality. Each locality is characterized by its gene frequencies. Nei (1987, p. 159) and Weir (1996, p.166) each have a section on fixation indices and so provide some background to the general theme of this paper.
The object here is to develop a compact model so that relations between various parameters can be explored. The most novel feature is that it is possible to give explicit expressions for mating frequencies, an aspect of the whole subject of subdivided populations to which Wright and others have given little attention.
The following sections give the notation used, mating frequencies at local and overall levels, distribution of genotypes from mating, a numerical example and some discussion.
The two alleles of the autosomal locus are denoted by A and B and the frequency of A within a locality by x and over the whole population by q. The corresponding frequencies of B are of course 1-x and 1-q, respectively. The frequency of A varies over the whole population according to the beta distribution so that the probability of finding a locality with gene frequency x in a small interval of width dx containing the value x is given by the expression
where B(a, b) is the beta function with parameters a, b given by
the integral being taken over the interval (0, 1).
The properties of (1) are given in Kendall and Stuart (1977, p. 35). In particular the first four moments about the origin of (1) are:
From these the mean frequency of A over the whole population is given by
and the standard deviation of the distribution of frequency by
Stark (2007) gives a general mating system which can maintain a given departure from the Hardy-Weinberg form. Here a particular case of this is used (see Stark, 1976a, 1976b). If the frequencies of genotypes AA, AB and BB are respectively f0 = q2 + Fpq, f1 = 2pq 2Fpq, f2 = p2 + Fpq, (p = 1-q), then the mating frequencies, denoted by fij, are given by
where r = 2F/(1+F), d0 = 2p, d1 = q - p, d2 = 2q and V = 2pq(1 + F).
The mating frequencies
It is assumed that the genotypic frequencies within a locality characterized by gene frequency x are:
Thus w is the within locality fixation index and corresponds to Wright's FIS. The symbol w is used here for convenience and to avoid complications is assumed to be non-negative. When a pair of mates is chosen from the same locality it is assumed that the frequency of mating pairs of the various types follow formula (2), with appropriate substitutions, such as w for F, x for q, x2 + wx(1 - x) for f0, etc.
On the assumption that the distribution of types within localities follows (3) and that the distribution of gene frequencies follows (1), matings within localities being as given in (2), the overall distribution of types in the whole population can be calculated, and in particular the overall population frequency of A, denoted by q. This will be demonstrated in the next section.
The complete model permits mating between individuals from different localities. In order to maintain the overall population structure the frequencies of pair types will again follow formula (2) but substituting whole population parameters. This will be shown in a later section. A final refinement allows for an arbitrary choice of the proportion of inter-locality matings denoted by c.
Integrated mating frequencies consistent with a uniform within-locality fixation index
The probability of observing a locality with gene frequency close to x was given as (xa-1(1-x)b-1/B(a, b))dx in formula (1). To maintain genotype proportions with fixation index w, as given in (3), apply mating system (2) to parental frequencies (3). The resulting mating frequencies can be integrated using (1) as the weighting factor to obtain the overall mating frequencies in terms of the first four moments of the beta distribution, as follows:
In the above, where male and female are of different type, the reciprocal mating also shares the same frequency. Thus AA x BB and BB x AA have identical frequency. Note that the frequency of AB x AB matings is four times that of AA x BB matings. The resulting matrix of mating frequencies is denoted by L. A numerical example employing this formula in combination with mating frequencies of outbreeding pairs is given in Table 1. The method of calculating tables such as Table 1 is completed in the next section.
Overall genotypic distribution
The application of the mating frequencies given in the preceding section produces the following overall genotypic distribution among offspring:
The frequency of gene A borne by this distribution is q = a/(a + b), that of B is b/(a + b) and the fixation index is FIT = (1 + w(a + b))/(1 + a + b). Applying the relation (1-FIT) = (1 - w)(1-FST) produces FST = 1/(1 + a + b). Thus FST is seen to be equal to the variance of the (beta) distribution of gene frequencies over localities ab/((a + b)2(a + b + 1)) divided by the product of the overall frequencies of genes A and B. An alternative way of calculating FST is given in the final section.
The distribution of frequencies given by (4)-(6) can be maintained by outbreeding, that is by forming couples in which the two partners come from different localities. The frequencies of the various pair types are taken from formula (2) using parental frequencies from (4)-(6), gene frequency q = a/(a + b) and fixation index FIT = (1 + w(a + b))/ (1 + a + b). Denote the matrix of outbreeding mating frequencies by M.
A further refinement is possible by permitting an arbitrary proportion of outbreeding denoted by c. Combining the two forms of mating yields a mating matrix given by C = (1 - c)L + cM. Since both L and M produce offspring following frequencies (4)-(6), their combination C does the same.
Table 1 gives an example of the system: the parameters are a = 3, b = 5, w = 1/8 and c =1/2 . The matrix of overall mating frequencies C is given in Table 1. The overall genotypic proportions are: type AA 37/192; type AB 70/192; type BB 85/192. The overall frequencies of genes A and B are respectively 3/8 and 5/8. Other properties of the population are: variance of the distribution of the A gene frequency is 5/192, FIT = 2/9 and FST = 1/9.
The quantity FST can be derived directly in a way which illustrates the following description of Wright given above: "FST is the correlation between random gametes within subdivisions, relative to gametes of the total population". Assign the gametic values 0 and 1 respectively to alleles A and B, as in Table 2. Calculate the probabilities of drawing a pair of genes independently within a locality characterized by allele frequency x, as in Table 2. Calculate the overall frequencies of the respective gene pairs by integration over distribution (1), using the moments of the beta distribution given above to yield the integrated values given in Table 3. Calculate the uncorrected sum of products of genic values from Table 3 to obtain b(b+1)/((a+b) (a+b+1)). Correct this by deducting the product of overall mean values, namely b2/(a + b)2, to derive the corrected sum of products of genic values (covariance of genes in a pair), that is ab/((a+b)2(a+b+1)). Divide the covariance by the product of the standard deviations of the values of the genes in a pair over the whole population. Since the distributions of genic values are identical, this is equal to the variance of either, that is ab/((a+b)2. The result is the correlation between the values of the gene pair namely FST = 1/(1 + a + b), as noted earlier. Thus FST is the correlation between the values of a pair of genes drawn from the population that is due just to their sharing membership of the same locality.
A virtue of the model and analysis given here is that they show explicitly how the various quantities such as FST are expressed in terms of the basic parameters a, b, w and c. They show for example that FST depends only on the parameters of the distribution of gene frequency. They show also that a continuum of overall mating tables defined by c can sustain the same population structure. Weir (1996, p. 166) discusses the use of an estimate of FST to examine "population differentiation" in a setting in which estimates of gene frequencies are available from samples of sub-populations. The analysis given here is a pointer as to what an estimate of FST should reflect. Weir (1996, p. 167) sounds a note of caution in interpreting a typical estimate of FST by a formula which he gives because it is difficult to assess the significance of sub-population divergence in this way. He recommends a test employing a contingency-table chi-squared statistic rather than FST. Nei (1987, p.163) mentions some approaches to empirical studies of population differentiation. In some populations there are no clear indications of sub-population boundaries. The formula given here for FIT shows that this quantity is determined by w(FIS) as well as by a and b. Therefore, when a population is sampled as a single entity, the calculated fixation index reflects both sub-population differentiation and departure from Hardy-Weinberg proportions at a local level.
Hardy GH (1908) Mendelian proportions in a mixed population. Science 28:49-50. [ Links ]
Kendall MG and Stuart AS (1977) The Advanced Theory of Statistics, v 1: Distribution Theory. 4th ed. Charles Griffin & Company Limited, London, 472 pp. [ Links ]
Li CC (1988) Pseudo-random mating populations. In celebration of the 80th anniversary of the Hardy-Weinberg law. Genetics 119:731-737. [ Links ]
Nei M (1987) Molecular Evolutionary Genetics. Columbia University Press, New York, 512 pp. [ Links ]
Stark AE (1976a) Generalisation of the Hardy-Weinberg law. Nature 259:44-44. [ Links ]
Stark AE (1976b) Hardy-Weinberg law: Asymptotic approach to a generalized form. Science 193:1141-1142. [ Links ]
Stark AE (2006) A clarification of the Hardy-Weinberg law. Genetics 174:1695-1697. [ Links ]
Stark AE (2007) On extending the Hardy-Weinberg law. Genet Mol Biol 29:664-666. [ Links ]
Weinberg W (1908) Über den Nachweis der Vererbung beim Menschen. Jahresh Verein f vaterl Naturk Württem 64:368-382. English version: On the demonstration of heredity in Man. In: Boyer SH (ed) Papers on Human Genetics. Prentice-Hall, Englewood Cliffs, pp 4-15. [ Links ]Y [ Links ]
Weir BS (1996) Genetic Data Analysis II: Methods for Discrete Population Genetic Data. Sinauer Associates, Inc. Publishers, Sunderland, 445 pp. [ Links ]
Wright S (1965) The interpretation of population structure by F-statistics with special regard to systems of mating. Evolution 19:395-420. [ Links ]
Send correspondence to:
Alan E. Stark
3/20 Seaview Street, 2093
Balgowlah NSW, Australia
Received: October 26, 2007; Accepted: January 22, 2008.
Associate Editor: Paulo A. Otto