On S.N. Bernstein’s derivation of Mendel’s Law and ‘rediscovery’ of the Hardy-Weinberg distribution

Around 1923 the soon-to-be famous Soviet mathematician and probabilist Sergei N. Bernstein started to construct an axiomatic foundation of a theory of heredity. He began from the premise of stationarity (constancy of type proportions) from the first generation of offspring. This led him to derive the Mendelian coefficients of heredity. It appears that he had no direct influence on the subsequent development of population genetics. A basic assumption of Bernstein was that parents coupled randomly to produce offspring. This paper shows that a simple model of non-random mating, which nevertheless embodies a feature of the Hardy-Weinberg Law, can produce Mendelian coefficients of heredity while maintaining the population distribution. How W. Johannsen’s monograph influenced Bernstein is discussed.


Introduction
The model used here is of a population consisting of three types of female and male individuals denoted by T 0 , T 1 and T 2 . The population reproduces in discrete and nonoverlapping generations and is assumed to be stationary in senses which are to be described.
This paper concerns mainly the first result of Bernstein which was a proof that Mendel's coefficients of heredity (to be defined in Section 2) were a necessary outcome if one assumed stationarity from the first generation of offspring, plus random mating, and that the union of two of the types, T 0 and T 2 always produced offspring of type T 1 .
S.N. Bernstein (see Seneta (2001) for a biographical sketch) published condensed versions of his work in two short papers (Bernstein 1923a,b). Only the first of these is considered here. The second considers the case when there are an arbitrary number of types of individuals and another (non-Mendelian) form of heredity. Sheynin (2004) provides translations into English of Bernstein's publications in Russian which concern the 1923 papers. The first of these (Bernstein, 1922) gives background to the 1923 papers and refers to the studies on evolution by Charles Darwin (1859), experimental findings by Mendel (1866Mendel ( , 1965 and biometrical work of Francis Galton and Karl Pearson. Bernstein hoped that mathematics could help to unify the various theories of evolution. The second paper (Bernstein, 1924) gives the solution in detail together with other models not considered here. A similar English version of the second paper (Bernstein, 1942) was provided by Emma Lehner of the University of California, Berkeley. Lehner states that the original paper of 1924 appeared in Annales Scientifiques de l'Ukraine, Vol. 1 (1924). p. 83-114. Lehner's version contains only the first half of the Bernstein (1924) paper. Ballonoff (1974) reproduced Bernstein's two papers of 1923 (in the original French). Later (Bernstein, 1976) he published an English translation of large parts of the 1924 paper which omitted some details of the proofs of theorems (in this he transliterated the author's name as "Bernshtein", but in the references of this paper, in the interests of consistency, it is given as "Bernstein"). This was accompanied by a short introduction in the same periodical (Ballonoff, 1976). The following quotation from Ballonoff (1976) suggests that Ballonoff completely missed the main point of the Bernstein (1923a) paper, despite the point being spelled out in the title of the paper. "There are two major results of this paper, one already an accepted part of genetics theory, the other yet un-explored! The accepted result is the <<Hardy-Weinberg>> law for the equilibrium of Mendelian genetic systems." The other result to which Ballonoff refers is outside the scope of this paper. As noted above, the main point of Bernstein's research was to establish Mendel's first law. Lyubich (1973), which were inspired by Bernstein's work, culminating in a monograph (Lyubich, 1992).
By the 1920's the importance of the short paper by Hardy (1908) was recognised by geneticists, although the almost simultaneous independent presentation of the same idea, almost in passing by Weinberg (1908) in a more ambitious paper on the inheritance of twinning in humans, had been overlooked till the 1940's An interesting feature of Bernstein's presentation is that he derives Mendel's first law through Hardy's formula without any reference to Hardy. The only references in Bernstein (1924) were to his own paper of 1922 and to the monograph of Johannsen , given as "3. Auflage" [Third Edition], but with no date. Bernstein's (1922) paper is fairly general relating mainly to his desire to enhance biology by providing mathematical underpinnings to it. In his English version of Bernstein (1924) Sheynin's references include: "Johannsen, Wm (1926) Elemente der exakten Erblichkeitslehre. 3. Aufl. Jena." Clearly there is a discrepancy in publication years between Bernstein (1924) and Johannsen (1926). The English-language version of Bernstein (1924) given by Sheynin contains the following: "Here, I shall not dwell on those fundamental considerations which convinced me in that, when constructing a mathematical theory of evolution, we ought to base it upon laws of heredity obeying the principle of stationarity. I only note that the Mendelian law, which determines the inheritance of most of the precisely studied elementary traits, satisfies this principle (Johannsen 1926, p. 488). The so-called Mendelian law concerns three classes of individuals, two of them being pure races and the third one, a race of hybrids always born when two individuals belonging to pure races are crossing".
We shall assume that Bernstein had to hand, in essence, Johannsen (1913), that is the 2 nd edition [2 Auflage]. Johannsen was familiar with the Hardy-Weinberg distribution of genotype frequencies and gives it in his book (Johannsen, 1913, p.486).
Our next section gives basic concepts and notation as well as a summary of the Hardy-Weinberg formulae. This is followed by Bernstein's (1924) question and a reference to his proof without the details. We then demonstrate that stasis (that is, constancy of type proportions over all generations , including the zeroth parental proportions) is possible under non-random mating. A model of assortative mating which is a special case of non-random mating, together with a numerical example, is next. The final section suggests that Johannsen supplied all the background which Bernstein needed and makes some general comments.

Basic Concepts and Notation and Hardy-Weinberg Equilibrium
The concepts and methods which we use have been described disparagingly by Ernst Mayr as "beanbag genet-ics". Individuals are completely characterised by type of which there are three, namely T 0 , T 1 , T 2 . Although they are not emphasized here, genes G and G determine type according to the following correspondence: GG~T 0 ; GG~T 1 ; GG~T 2 . The effectively infinite population is reproduced sexually in discrete and non-overlapping generations. Taking account of gender there are 9 mating combinations as defined by the matrix:

T T T T T T T T T T T T T T T T T T
A 'child' which is one of the three types arises from each coupling and the aggregate of children form the new generation, later to become parents, in their turn. From his study of peas Mendel established his first law. For each of the above couplings it gives the set of probabilities as to type of child. If we take the outputs, by column , of the 9 couplings, Mendel's first law can be expressed in the form of the following matrix: ) M = 1 1 2 0 1 2 1 4 0 0 0 0 0 1 2 1 1 2 1 2 1 2 1 1 2 0 0 0 0 0 1 4 1 2 0 1 where the order of columns is by mating couples:

[T T T T T T T T T T T T T T T T T
(3) and the rows are the proportions of offspring in the respective categories T 0 (GG), T 1 (GG), T 2 (GG). ) M shows, for example, that coupling T 1 x T 2 produces offspring in the proportions 0, 1/2, 1/2. We shall call entries of arrays such as ) M coefficients of heredity. The important point to realise about ) M is that it expresses probabilities relating to outcomes of single coupling events.
To use the law to make predictions about populations requires a further step, namely a specification of the rule of formation of the aggregate of couples. The most appealing rule is that they are formed randomly. This can be expressed in the form of a matrix of proportions of couples in the order given before. Following Bernstein (1924), we use the symbols a, g, b to denote the frequencies (proportions) of the respective types T 0 , T 1 , T 2 , the same in each gender. Then random mating is given by a vector whose form is, in accordance with the form ( Then the composition of the population, that is the proportions of the three types, following one round of random mating under Mendel's law is given by Stark and Seneta 389 ) The vector ) T is known as the Hardy-Weinberg distribution after the originators Weinberg (1908) and Hardy (1908). This can be expressed more compactly after introducing the following definitions: (6) The quantities g and g, which are referred to as gene frequencies, give the proportions of genes of the two kinds in the population since they weight the type frequencies according to the number of genes, G or G, in each type. Then The important property of the Hardy-Weinberg formulation follows if we now subject the new array of type frequencies to another round of random mating. The coupling frequencies are given by the vector form: , , Since g + g = 1, the type frequencies among offspring are: that is, identical to ) T. So one round of random mating produces a set of type frequencies which are maintained indefinitely under random mating, often referred to as Hardy-Weinberg equilibrium. Notice that gene frequencies, as given by Eq. (6), of the initial parental generation are maintained.
Notice that the mating vector U* has the property that frequency of matings of type T 1 x T 1 is 4 times that of either T 0 x T 2 or T 2 x T 0 , since the corresponding array is {g 2 , 2gg, g 2 }. We express this in self-evident notation to be formalized shortly as: Hardy (1908) pointed out that if the initial parental frequencies satisfy the relation (10) then equilibrium frequencies will have been attained under random mating, even after the first round of mating. Ewens (2004, p. 5), as a comment on Eq. (10), Hardy's equation, points out reasons why the Hardy-Weinberg Law is so important. Firstly, a population can be characterised by a single gene frequency rather than a set of genotype frequencies, so it provides economy of description. But much more important is the "stability behaviour", that is: there is no tendency for genetic variability to dissipate. As we stress, stationarity was the bedrock on which Bernstein based his proof. Mayo (2008) concludes the summary of his review article on the Hardy-Weinberg Law with "Its discovery marked the initiation of population genetics".
Bernstein's Question Bernstein (1924) repeated the steps above leading to the derivation of the Hardy-Weinberg proportions. He then turned the problem around by asking, if the population maintains constant proportions of types, namely {a, g, b}, after an initial round of mating, and assuming mating is random, whether this implies that the heredity coefficients are necessarily those given by ) M. He began with a general form denoted by M where M is given by However this form is too general to work towards constancy of type proportions from the first generation and Bernstein modified it to That is he assumed that mating T 0 x T 2 and the reciprocal mating T 2 x T 0 always produce offspring of type T 1 and the other reciprocal matings produce offspring identically.
After one round of random mating the population structure is Bernstein (1924) The argument by which he shows this is most readily seen in Bernstein (1942).

Stasis Under Non-Random Mating
In this section we show that with two crucial assumptions, it is possible to derive Mendel's heredity coefficients under non-random mating, and the assumption of stasis: that the proportions of types remain constant from an initial parental generation. We first assume

M M = (
which is just Bernstein's (11). Secondly assume that the matings vector is any probability distribution of the form: Thus the mating matrix (15) has the property that the frequency of mating T 1 x T 1 is 4 times that of both T 0 x T 2 and T 2 x T 0 . This last echoes the condition (10)

Stasis Under Assortative and Random Mating
We now denote the proportions of couples in the various mating combinations by f ij , (i = 0, 1, 2; j = 0, 1, 2), indicating the proportion of the mating T i x T j . Stark (1976a,b) gives a mating system which can maintain a given departure from the Hardy-Weinberg form of genotype frequencies. In these citations Mendel's first law was assumed to hold [That is, the heredity coefficients are given by Eq. (2)].
In this mating model the proportions of mating couples are given by where f i is the genotypic frequency of T i , i = 0, 1, 2.
The terms d 0 , d 1 and d 2 are phenotypic values attributed to the respective types. The second is intermediate in value between the other two and separated from each by 1. V is the variance of these values with respect to the distribution of type frequencies and m is the correlation between mates with respect to their phenotypic values which are standardised by dividing by their standard deviation.
The mating matrix of Eq. (19) satisfies f 11 = 4f 02 , since f 02 = f 0 (f 1 ) 2 f 2 /V 2 and f 11 is 4 times that expression, so that, by the general result of the previous section on mating matrices of type (15), genotype frequencies are maintained, verifying the earlier results of Stark (1976aStark ( , 1976b. We give a numerical example of such a mating matrix which serves to illustrate various features of the model (and which allows for considerable flexibility as demonstrated here by incorporating a 'taboo' of mating T 0 x T 0 ): t C is symmetric with elements adding to 1 and the middle element is 4 times the upper right hand (and lower left hand) element. Summing rows and columns of t C gives the distribution of types in females and males, namely {13/125, 64/125, 48/125}. The important property of t C is that, if Mendelian heredity coefficients are applied to it, the offspring distribution is identical to the parental distribution. The offspring then become the next parents and so can continue the population in unchanged form.
Example ( By contrast, if random mating is applied to frequencies {13/125, 64/125, 48/125}, the distribution of offspring is {81/625, 288/625, 256/625} and following a further round of random mating, this offspring distribution is reproduced. This is stationarity in the Bernsteinian sense, imitating the conclusion of the Hardy-Weinberg Law. Note that, considering these proportions as genotypic frequencies, gene frequencies remain constant at g = 9/25 and g = 16/25 under both random and assortative mating, that is stasis in this gene frequency, rather than genotype frequency , sense is achieved under both systems of mating.

Discussion
Since the theme of Bernstein (1923a) is the main focus of this paper, it is appropriate to discuss further Bernstein's achievements and limitations, in respect of that paper. We believe that this paper encapsulates Bernstein's most notable contribution to genetics although he wrote much more which he considered important.
In his celebrated and contentious paper Fisher (1936) writes "In 1930, as a result of a study of the development of Darwin's ideas, I pointed out that the modern genetical system, apart from such special features as dominance and linkage, could have been inferred by any abstract thinker in the middle of the nineteenth century if he were led to postulate that inheritance was particulate, that the germinal material was structural, and that the contributions of the two parents were equivalent". Bernstein (1923a) demonstrates that he not only anticipated Fisher's assertion but showed how it could be realised mathematically. Fisher believed that Mendel had a clear view of his own first law during the course of his experiment. In relation to Fisher (1936) Franklin et al. (2008 write "It is our contention that this controversy should end." While Fisher (1936) continues to fascinate, Bernstein (1923a) and his other writings have been largely ignored. His two papers of 1923 and Bernstein (1942) are listed in Felsenstein (1981) and there is a pointer to Holgate (1975) against the key word 'stationarity'. There is no reference to Bernstein in Wright (1969)  The discipline of genetics in the Soviet Union experienced two periods of turbulence and isolation. The first was because of the First World War and the revolution and the second was when Lysenko was given wide powers of control over teaching and research in biology and agriculture. Between these two periods, for reasons related to Communist ideology and politics, there was a resurgence of Lamarckism. In relation to the former period, Dobzhansky (1980) noted that, after a period of about seven years, "Acquaintance with the experimental work of the Morgan school, and with the findings of other geneticists in Europe and in the United States, became possible only in about 1921." Stark and Seneta (2011) describe how A.N. Kolmogorov took a stand against Lysenkoism in 1940 and how the publication of a new edition of Bernstein's monograph on probability was stopped because it included material on Mendelism. A central tenet of Lysenkoism was acceptance of Lamarckism. Both Weismann and Johannsen were included in the list of people who were the targets of Lysenko's vitriol. Johannsen (1913) was the only source cited by Bernstein (1924), apart from one paper of his own. When introducing Mendel's first law Johannsen (1913, p. 486) gave a table, here reproduced as Table 1, which perhaps motivated Bernstein as to how to approach his proof. In effect the table is a derivation of the Hardy-Weinberg formula through functional iteration. Implicit in the table is the assumption of random mating. It can be said that Johannsen supplied all the information that Bernstein needed for his task. Dunn (1965) and Grant (1975) and others pay tribute to Johannsen's important role in the development of genetics. His book was a reliable source of ideas available to Bernstein. But Bernstein had to work in isolation from some important developments in mathematical genetics, such as Fisher (1918), Haldane (1919) and Wright (1921).
In Johannsen (1913) there are many references to Galton, and Pearson, as well as to Darwin, but fewer to Morgan. Johannsen, although a great supporter of statistical method in biology, was one of the leaders of a group of biologists opposing the views of Galton, Karl Pearson and Weldon on inheritance (Guttorp and Lindgren, 2009). The conflict with Pearson started with Johannsen's (1903) paper (see Peters, 1959 for a version in English). However Yule (1904), also a member of the English Biometric School, came to Johannsen's defence, calling his results one of the most important contributions to genetics. It may be of some relevance that Bernstein was an admirer in the times of which we speak of Karl Pearson's Grammar of Science, in a Russian translation of the second edition of 1900 (Read, 1982, p. 24). Various editions of the Grammar have contained sections on heredity. Johannsen (1913, p. 711) cites Weinberg (1908Weinberg ( , 1909a. Hill (1984, p. 12) notes that Weinberg's paper of 1908 "was a small part of his work in genetics". It is this pa- Aa 0 2pq 2(p 2 + pq)(pq + q 2 ) = 2pq(p + q) 2 aa q q 2 (pq + q 2 ) 2 = q 2 (p + q) 2 † p + q = 1. per , with Hardy (1908), which is nowadays associated with the discovery of the Hardy-Weinberg Law. Hardy's (1908) paper is cited on p. 704 of Johannsen (1913), although this page is not mentioned in the index, where only " Hardy, 486" occurs. There is no indication in Bernstein (1924) that he had used the Weinberg(1908) reference and no mention of Hardy. Dunn (1965, p. 94) makes a remark which is important in the context of Bernstein's place in genetics in the Soviet Union between the two world wars and beyond. He writes "Likewise, Johannsen, in effect, cleared the air of the fear that acquired characteristics might, after all, be inherited. Weismann's arguments had in the long run been less effective than Johannsen's simple experimental demonstrations, at least with those biologists who wanted to advance the study of heredity. Johannsen's conclusion that acquired modifications were not inherited was backed up a little later by Castle and Phillips (1909), using an argument of quite a different kind." While Bernstein's model is ingenious and is supported by intricate calculations which are best displayed in Bernstein (1942), he starts out by, in a sense, violating his main postulate, namely stationarity: he requires random mating in the first (and subsequent) generations which involves a change from the initial population, unless it is already in Hardy-Weinberg form. It is the assumption that stationarity is required only from the first generation onwards which causes considerable mathematical difficulty in arriving at Mendel's coefficients of heredity. While this assumption does imitate the Hardy-Weinberg Law in its formulation, it does not seem realistic in considering stability of population proportions starting from an arbitrary time point.
We have, in contrast, shown relatively simply that Mendel's set of heredity coefficients follow necessarily from (11), and (15) which embodies the property f 11 = 4f 02 . This last equation reflects the situation which exists in the Hardy-Weinberg approach after one generation of mating.
Before 1900 there were several kinds of study and much speculation aimed at elucidating the phenomenon of inheritance. Mendel's experimental approach cut through the vague ideas surrounding the question. Essentially, Mendel's was a study of individual hereditary events which could be collated to form the basis of a theory. This enabled many other scientists to frame studies around Mendel's model. The basic one of these was the Hardy-Weinberg model (Weinberg (1908), Hardy (1908)). This is idealistic in that the original formulation and current usage was and is based on the assumption of random mating. It has been shown by Stark (1980Stark ( , 2005Stark ( , 2006aStark ( ,b, 2007 and Li (1988) that random mating with Mendelian coefficients is a sufficient, but not a necessary, condition for Hardy-Weinberg equilibrium. Failure to appreciate this non-necessity is widespread in the genetics literature. For example Wikipedia (2011) states "Violations from the Hardy-Weinberg assumptions can cause deviations from the expectations... random mating... violations... will not have Hardy-Wein-berg proportions." The novelty of Bernstein's approach is that it starts from a view of a population and posits several conditions to derive a model of inheritance for single reproductive events.