Prebiotic Chemical Kinetics Imprint on Positional Codon Usage

Foram analisados mais de um milhão de exons pertecentes aos 3 domínios da vida, Eubacteria, Archea e Eukarya. Um número significante de primeiros codons com um excesso de bases idênticas nas suas duas primeiras posições foi encontrado nos genomas das Eubacterias e Archea. Este excesso tem uma freqüência muito menor nos genomas das Eukarya. Propõe-se que esta discrepância depende da cinética de crescimento dos oligoribotídeos na Terra primitiva, que foi diminuindo com o passar do tempo.


Introduction
Biological evolution, including its molecular and cellular aspects, is nowadays reasonably well understood from its central lawgenetic variation followed by natural selection.In contrast, the initial steps of the transition from the chemical (prebiotic) to biological evolution remain a shadowed subject, even though it must have been dictated solely by thermodynamics and kinetics.The transitions from inorganic to living systems was firstly proposed by Oparin in 1924 1 and, independently by Haldane 1 (1928, 1954) in the twenties of the last Century, but we owe to Miller 2 (1953) the first experimental demonstration of the synthesis of bio-molecules, such as amino-acids and ribosides, starting with simple molecules such as H 2 O, NH 3 , CH 4 , N 2 , etc. under conditions likely those of the early Earth. 2 Therefore, according to the biogenetic continuity hypothesis, biomolecules found in present-day biota are directly connectable to the prebiotic molecules.
In more recent studies, [3][4][5] guanidine and uracyl and its derivatives were synthesized from simple gases under the probable prebiotic conditions.Among the wide range of supporting evidence available in the literature, we note the work of Ferris and co-workers 6,7 who have conducted a series of studies detecting oligoribotide formation in a 5'-phosphorimidazole solution in the presence of a Na + -montmorillonite 6 and have synthesized mixtures of oligomers, some of which were 55 monomers long. 7n a recent experiment, di-and tri-peptides have been synthesized from monochiral amino acid solutions containing COS, 8 a compound that probably was also available in prebiotic environments.Such results allow us to assume that the base sequence of contemporary exons such as those available at the NCBI database 9 might retain vestiges from their purely chemical past.
1][12][13][14] It is well known that the posterior discovery of ribozymes 10,[15][16][17] lent further credibility to a "RNA world" scenario. 12On the other hand, the selective formation of ribose is rather a difficult process, and remained a matter of debate, even though a number of studies have shown that activated ribonucleotides polymerize to form RNA. Recently, this missing link has been addressed by the work of Powner et al. 18 Their experiments have shown that activated pyrimidines ribonucleotides can be formed, bypassing free ribose and the nucleobases, starting from cyanamide, cyanoacetylene, glycolaldehyde, glyceraldehyde and inorganic phosphate, all plausible prebiotic feedstock molecules, and synthesis conditions that are consistent with early-Earth geochemical models.
In this paper evidence for the continuity hypothesis in the transition from chemical to biological evolution is investigated on the basis of data mining and kinetics of codon build-up from ribotides/ribotide-like molecules.

A hypothesis for kinetic imprint on the formation of early exons
In solution (e.g., primeval ocean), ribotide dimerization reactions would consist of condensations between the 3'OH of one ribotide and the 5'HPO 4 of another (or 2'OH→5'HPO 4 in the absence of a catalyst).Hence, a dimerization between A and G can be schematically represented as: Such condensation reaction is a function of the relative orientation of the two ribotides during a collision.As a first approximation (disregarding small differences in stereochemistry and electron densities of the 3'-hydroxyl and 5'-phosphoryl groups in distinct monoribotides), it should be the same in all cases, except for identical monoribotides where it should be twice as large.This is because one cannot distinguish the collision 5'HPO 4 -A I + A II -(OH)3' from the collision 5'HPO 4 -A II + A I -(OH)3', whereas, by contrast, the collision A I + U II produces the di-ribotide 5'HPO 4 -AU-(OH)3', whereas the collision U II + A I produces the di-ribotide 5'HPO 4 -UA-(OH)3'.Note that it is not necessary that all ribotides/ ribotides-like species had to be present, but only that the concentrations of each monomer to be roughly the same and continuously replenished.
Admitting the 5'→3' growth without questioning its origins here, further polymerization would produce a chain population with an excess of diribotides with identical bases in their first two positions at their 5'-end (e.g., AA, CC, GG, UU -herein referred as XX).A consequence of this simple chemical kinetics effect is that, as the polyribotides were beginning to act as RNAs, the first position of such biopolymers was richer in the above-mentioned doublets.If that would be the case, a higher than the randomic frequency of XX_ codons (f) in the first position of the early genes would be expected, i.e., higher than 16/64 = 0.250, or 16/61 = 0.262, if the evolution-determined three stop codons are excluded (where "_" is A, C, G or U).This ratio should diminish as a result of biological evolution processes, such as genetic drift and natural selection.Although the growth of a polyribotide chain may have occurred by both n+1 and n+n mechanisms, a significant excess of the so-called XX_ codons anywhere in the genome would be less probable.Prebiotic oligoribotide sequences were shorter than the current coding sequences (up to 55-mer) 7 and placement of XX doublets would have to fall into specific positions, i.e., first and second positions of non-starting exons.Ultimately, 3.7+ billion years of selective pressure and adaptation would be expected to wash most (or even all) trace excess of randomly placed XX_ codons by usage needs.A main question lies on whether a prebiotic kinetic imprint would still be present in the beginning of the genes of the current biota.
It is important to note that the synthesis of a polypeptide chain in a living organism is initiated by a particular tRNA, (tRNA (met) ), in response to a translation initiation codon, almost invariably AUG. 19Hence, all the peptides begin with methionine during the synthesis, which is posttranslationally cleaved.Some bacterial sequences represent a few exceptions to this rule.On the other hand, the model probed here would have taken place before translation and therefore, it is equally important to take this fact into account.
To answer this question, 1,060,114 complete and nonredundant coding sequences across the three domains of life, 20 available at the NCBI Reference Sequence database (15), were parsed and the first (non-AUG) codon in each exon identified (sequence details available in supporting online material).The resulting ratio revealed a remarkable trend shown in Figure 1.The f values for the first position of early organisms (Eubacteria and Archaea) are higher than a random distribution, even if 26.2% is taken as reference, while this frequency falls near the random index for Eukarya's more complex genomes (Figure 1a).This same trend is not shared by the codon usage of the XX_ codons, i.e., f for any given position of the analyzed genomes (28.2% for Eubacteria, 27.8% for Archaea and 29.2% for Eukarya) (Figure 1a).It shows that the excess of the XX_ codons in the first position of Eubacteria and Archaea is not a consequence of an overall excess of these codons in those domains.In order to verify the non-randomness of the observed bias in the first position, f values were calculated to several other codon positions and normalized to each domain's usage of the XX_ codons.Interestingly, in Eubacteria f progressively diminishes with the distance from the first codon reaching values near its usage of the XX_ codons (within 5%) only after the tenth position (Figure 1b).No significant excess for these codons was detected for Archaea beyond the first position as well as, as expected, for Eukarya.

Conclusions
The current lack of an evolutionary and/or functional driven explanation for the observed distributions of the XX_ codons in the first position of the genes should not overrule their influence.It is also worth noting that this study employs an elemental statistical treatment and is based on data from modern ribosome-translated reading frames, which do not necessarily correspond to the beginning of prebiotic RNA genes.Nevertheless, considering the odds of evolution and database dependency it is interesting that the analyzed data fits well the above-mentioned hypothesis per se.While outlining the transition from the chemical to the biological stage in the prebiotic Earth with a high degree of certainty is hardly possible, this study supports the evidence that the present biota still carries its legacy.