Evolutionary change – patterns and processes

The present review considered: (a) the factors that conditioned the early transition from non-life to life; (b) genome structure and complexity in prokaryotes, eukaryotes, and organelles; (c) comparative human chromosome genomics; and (d) the Brazilian contribution to some of these studies. Understanding the dialectical conflict between freedom and organization is fundamental to give meaning to the patterns and processes of organic evolution.

It seems that everything started with a marvelous explosion.Although there are divergent views, the universe should have emerged from a single, unbelievably small, dense, hot region, 10 billion or more years ago (Halliwell 1991, Adams andLaughlin 2001).Everything occurred in very, very short periods of time and at extremely high temperatures.The corpuscles of matter present at that moment are called quarks [presumably from a sentence of Johann Wolfgang von Goethe (1749-1832) who would have said " Poor God, he puts the nose in every quark", meaning the smallest particle of matter].Eight types of quarks have been identifi ed (up, down, charm, strange, bottom, and top), the last one being clearly established at the beginning of 1995 only (Nascimento 1995).
An interpretation of what happened in the next chain of events depends on the hypotheses established to explain the universe's structure.Three of them were that the universe: (a) is open, and its expansion is indefi nite; (b) closed, the period of expansion being fi nite -it is postulated that a new contraction, similar to that which existed before the Big Bang, will occur; and (c) it is flat, a situation intermediate between the other two.Presently it is believed that the universe does not contain suffi cient energy to be closed, and that it is in permanent expansion (Adams and Laughlin 2001).
The next big transition was the origin of life.Since our solar system, and with it the earth, was formed at about 4.5 billion years ago, and we have evidence of life in fossils dated around 3.5 years ago, it is clear that life started early in the history of our planet.The fact that ribonucleic acid (RNA) can duplicate without the help of the other molecules, and that ribozymes can perform enzyme-like reactions, led to the idea that in the beginning there was a RNA World.Joyce and Orgel (1999) then described how this World could be formed, by what they called " The Molecular Biologists' Dream", using extrapolations from prebiotic chemistry and directed RNA evolution.

FRANCISCO M. SALZANO
The sequence of events could be envisaged as follows: (a) nucleoside bases and sugars, either developed on earth or received from the outside space were formed; (b) nucleotides were then synthesized and stored in pools; (c) a mineral catalyst at the bottom of the pool (for example, montmorillonite) afterwards participated in the formation of long single-stranded polynucleotides, some of which were then converted to complementary double strands by template-directed synthesis; (d) at least one of these double-stranded RNAs on melting would yield a single-stranded rybozime capable of copying itself and its complement; (e) repeated copying would lead to an exponentially growing population; and (f) the RNA pre-organism would be enclosed by a membrane (Orgel 2004).From the very beginning, natural selection was a key factor in the whole process (De Duve 2005).
Alternative hypotheses (protein-fi rst), however, exist.Berger (2003) developed the idea that at the beginning primitive enzymatic sites would have been formed by abiotic amino acids specifically gathered around different substrates.These proteinoids would have then been transferred to messenger-like RNAs by a mechanism reverse to that of the present protein synthesis, and after that to DNA.
Other questions: was hot or cool the most appropriate environment for the origin of life?Bada and Lazcano (2002) favored the second alternative; they also suggested that life may have originated several times.Cavalcanti et al. (2004) considered the origin and evolution of the genetic code.They maintained that an initial version, containing fewer amino acids than the present 20, was modifi ed by the incorporation of new ones after the duplication and divergence of previous synthetases and tRNA molecules.
If evolution is a fact, who was the universal ancestor?Extensive comparisons of genome sequences from widely different modern organisms identifi ed only 60 genes (mostly involved in the translation process) that appear to be universal.The minimum number needed by a self-suffi cient organism, however, is one-order of magnitude higher (600) (Whitfi eld 2004).How can this be explained?Woese (1998,2002) suggested that the universal ancestor was not a discrete entity, but a community of cells (a supramolecular aggregate) that survived and evolved as an unit.Initially both mutation rate and lateral gene transfer levels were elevated.As increasingly complex and precise biological structures evolved, the two above-indicated processes diminished in number.Modern cells began to evolve with the origin of the translation process.Vertically generated novelty assumed greater importance, until a threshold was reached, called by Woese ( 2002) the Darwinian threshold.After that, the more orthodox processes of mutation and selection that are identifi ed today became the key evolutionary factors.
Wonderful, but how all of this reasoning can be tested?A large number of researchers are now trying to synthesize artifi cial cells in the laboratory.Two approaches are being followed.The top-down type of study tries to create artifi cial cells by simplifying and genetically reprogramming existing cells with simple genomes; while the bottom-up experiments start from the very beginning, using nonliving organic and inorganic materials.Presently the second approach is the most explored (Rasmussen et al. 2004).be found in the Pennacchio and Rubin (2003), Snyder and Gerstein (2003), Ureta-Vidal et al. (2003), Haubold andWiehe (2004), andMiller et al. (2004) contributions.The latter also considered what future developments can be expected in this area.Soltis et al. (2004), on the other hand, stressed that independently of the techniques, adequate taxon sampling is vital for correct evolutionary implications.
The dialectic dilemma quantity or quality is pervasive in all the history of biological evolution, and has been variously considered in relation to genomics in recent years.Independently of the differences in size and complexity of the genomes of diverse organisms, all DNA has to be divided in small pieces for sequence determination.And the fi rst decision that has to be made is whether a cloneby-clone shotgun (CCS) or a whole-genome shotgun (WGS) method should be employed.The latter (WGS) is simpler than the former, and has been adopted not only for the establishment of one of the versions of the human genome, but also for the investigation of a wide array of other organisms.Venter et al. (2003) provided an overview of these results.But there are doubts about the WGS effi ciency in providing a really reliable minute version of a genome.The controversy about the quality (and independence) of the human genome sequences generated by the public consortium and the Celera Company is not new, and can be assessed in Waterston et al. (2003) and Adams et al. (2003) papers. Schmutz et al. (2004a) and She et al. ( 2004) considered the question of the quality of the human genome data in detail.The latter authors verifi ed that large (greater than 15 thousand bases, or kb) and highly similar (greater than 97%) duplications are not adequately resolved by WGS assembly.A mixed strategy, using a targeted clone-by-clone approach to resolve duplications was proposed.
The fact is that database information is increasing in logarithmic progression, and much of these data have not been adequately and comprehensively analyzed.An attempt in this direction was provided by Driskell et al. (2004).They considered the phy-logenetic potential of about 300 thousand proteins sequences stored in Swiss-Prot and GenBank, and stressed that more than 100 thousand species (about 6% of all those described so far) have at least one molecular sequence archived in public databases.The evaluation they made involved the identifi cation of clusters of putative protein homologs, construction of supertrees and supermatrices.A surprising fi nding is that combining many genes is a robust procedure, and that a relatively large amount of missing data is tolerable.They are optimistic that with these tools they may eventually identify the Tree of Life.Crandall and Buhay (2004), favorably discussing their results, expressed however some cautionary views.They also mentioned that we are loosing 27 thousand species each year, due to environmental degradation.

GENOME VARIATION IN PROKARYOTES
Prokaryotes are unicellular, and are the most numerous organisms on earth.Species delineation among them is complicated, due to the absence, in many, of sex; about four thousand species have been described, but some scholars suggest that there should be around four million of them.
Their genetic systems are not as simple as was imagined some time ago.Although most of them have circular genomes, others present linear structures.Also, while the majority has just one large chromosome, several have two or three large replicons; and in addition, many possess extrachromosomal DNA segments.Haploidy is not universal; during the exponential growth phase as much as 10 copies of the main chromosome can occur per cell.
Since the 1970's, molecular investigations indicated that the prokaryotes could be separated in two distinctive domains, named Archaea, and Bacteria.The main characteristics of the genomes of completely sequenced prokaryotes listed by Saccone and Pesole (2003) and described by others are given in Table I.The numbers of species considered are not high (18 Archaea,76 Bacteria in the most general comparison), but some fi rst generaliza-   (4) groups (averages of 89% and 87%).The percentages of GC dinucleotides were also similar (respectively 44% and 45%).
We are still far away from functionally identifying a signifi cant proportion of the microbial genes by functional categories, but data compiled by Saccone and Pesole (2003) indicated that some of the most frequent were related to energy metabolism, transport and binding proteins, protein synthesis, and cellular processes.Graham et al. (2000) identifi ed 351 clusters of signature proteins, that have no recognizable bacterial or eukaryal homologs.They are involved in key energetic systems, cofactor biosynthesis and other functions.These unique genes, which are present in around 15% of the archaeal genomes, suggest that they really constitute an anciently diverged major lineage.

VIRUSES
Viruses generally have much smaller genomes than bacteria.For instance, the virus responsible for the severe acute respiratory syndrome (SARS) recently identifi ed in China and which spread to many countries has a 30 kb genome (Marra et al. 2003).The poxvirus genomes range from 145 kb to 290 kb in size, and each genome contains about 200 genes only.McLysaght et al. (2003) conducted a study in 20 of these viruses.They stressed four characteristics of their genome evolution.First, its structure is highly conserved among the chordopox (vertebrateinfecting) viruses; second, gene loss and gain varied markedly among the gene families considered; third, many of these acquisitions occur due to horizontal transfer events; and fourth, both conservative and positive selection could be identifi ed, that undoubtedly are related to characteristic features of infection, replication, and virulence.Proteins that may be targeted for drug design, to be used for their control, were identifi ed.
The genomic structure of a parasitoid wasp (Cotesia congregate) bracovirus (CcBV) was found to possess 568 kb in 30 DNA circles, which together contain 156 coding DNA sequences.In their orga-nization they resemble more an eukaryote genomic region than a viral one.Many CcBV genes contain introns (69%) and 42% of their putative coding DNA sequences have no similarity to previously described genes (Espagne et al. 2004).
The largest known virus genome described so far is from a double-stranded DNA of a mimivirus which grows in amoeabae.It has 1.18 Mb, with 1,262 putative open reading frames, 10% of which exhibit a similarity to proteins of known function.Unexpectedly, genes for the codifi cation of central protein-translation components, DNA repair pathways, and polysaccharide synthesis enzymes are present, as well as those coding for six tRNAs.By their structure mimivirus blurs the recognized frontier between viruses and the parasitic cellular organisms with small genomes.Raoult et al. (2004) suggest that these large DNA viruses could have emerged before the establishment of the three domains of life (Archaea, Bacteria, Eukarya).

GENOME VARIATION IN EUKARYOTES
The term eukaryote refers to an enormous range of organisms whose common feature is that their cells have nuclei bounded by a membrane, separating them from the cytoplasm.There are many unicellular eukaryotes, and the question is whether their genomes are similar or dissimilar from those of the prokaryotes.Table II presents the characteristics of seven completely sequenced unicellular eukaryotes.Comparison with the numbers of Table I indicates that they possess much larger genomes (excluding Encephalitozoon cuniculi), their sizes ranging from 9 Mb to 34 Mb, against averages of 2.2 Mb and 3.0 Mb found in Archaea and Bacteria, respectively.Encephalitozoon cuniculi is a special case.This intracellular parasite microsporidium infects a wide range of hosts, from protozoans to humans (where it was identifi ed in opportunistic infections after the emergence of AIDS and immunosuppressive therapies for organ transplantation).Its genome comprises 11 chromosomes, and it has about two thousand genes.The absence of genes for some biosyn-  2004); Xu et al. (2004).The optical map of Leishmania major was reported by Zhou et al. (2004), but besides the protozoa's total size (34.7 Mb) no other information about the variables listed were given. 2 Excluding introns. 3NA: Not available in the sources consulted.
thetic pathways, like the tricarboxilic acid cycle, indicates that this organism has a strong host dependence.
As for the other information given in Table II, two of the organisms listed there, Saccharomyces cerevisae and Plasmodium falciparum, have been extensively studied, the fi rst due to its economic importance, and the second because it is the agent of one of the most severe forms of malaria.P. falciparum's genome is almost 2× higher than that of S. cerevisae, but it carries about the same number (fi ve to six thousand) of genes; this, naturally, conditions diversity in relation to gene density and length.In addition, GC content is quite diverse in the two species (38% in S. cerevisae; 19% in P. falciparum).
Let us now consider in more detail a group of eukaryotic organisms, the plants.Some say that a plant is something green that doesn't move around very much.Using a phylogenetic approach, we can separate the green plants from other photosynthetic organisms and focus in the land plants.Land conquest was achieved by them some 430 million years ago, with subsequent wide diversifi cation.The bryophytes and the ancestor of vascular plants (tracheophytes) diverged early in land plant evolution.Mosses are bryophytes and their morphologies and life cycles differ signifi cantly from those of flowering plants.For instance, the gametophyte (haploid) generation is dominant in mosses, while the sporophyte (diploid) generation is the one dominant in flowering plants.
Taking into consideration these differences, Nishiyama et al. (2003) investigated in what way the transcriptome (the total product of DNA transcription) of a bryophite, the moss Physcomitrella patens differed from that of Arabidopsis thaliana, the fi rst vascular plant which had its genome fully sequenced.A total of 15,883 transcripts were assembled, and at least 66% of the genes in P. patens had homologues in A. thaliana.
There is wide variation in the genome sizes of the vascular plants; for instance, these sizes vary in the major crop plants from 0.4 giga (one billion) bases (Gb) in rice to 16 Gb in wheat (Messing et al. 2004).Within the vascular plants, the angiosperms are by far the most numerous group; they have from 250 thousand to 300 thousand species; while the other plants number just around 53 thousand (Eguiarte et al. 2003).Traditionally the angiosperms have been classifi ed in monocots (plants with a single cotyledon and other characteristic features) and dicots (with two cotyledons).In phylogenetic analyses only the monophyly of the monocots is supported; but although dicots are apparently nonmonophyletic, a large number of species traditionally considered within this group form wellsupported clades (Judd et al. 1999).Be as it may, only a single species from each of these large subdivisions had their genome completely sequenced: Arabidopsis thaliana (a weed; dicot) and Oryza sativa (rice; monocot).Arabidopsis thaliana's genome is much smaller (125 Mb) than the Oryza sativa genome (420 Mb in the subspecies indica; 466 in the subspecies japonica).They also differ in GC content (36% in Arabidopsis; 43% to 44% in Oryza) and in number of genes (Arabidopsis: 25 thousand; Oryza: 32-55 thousand).As for homology, 80-85% of the Arabidopsis genes have a rice homolog, but only 49% of rice genes are represented in A. thaliana (Eguiarte et al. 2003, Saccone andPesole 2003).
Zea mays, or corn, is one of the most important crops.Its genome has 2.3 Gb, and therefore it would be diffi cult to sequence it entirely.However, Messing et al. ( 2004) have generated 307 Mb of its sequence, estimating that repeat sequences occur in 58%, and genic regions in 7.5% of the genome.Two striking aspects of it were emphasized by them.First, although the ancestor of maize arose by tetraploidization, fewer than half of the genes appear to be present in two orthologous copies, suggesting signifi cant gene loss in the diploidization process.Second, the remaining gene number has increased dramatically due to tandemly amplifi ed gene families.
Information about the genomes of 11 completely sequenced representatives of two other king-doms of life (Fungi and Animalia) are presented in Table III.The method of study varied in the different investigations, but the tendency seems to be to combine the speed of the Whole Genome Shotgun with the precision of additional Clone-by-Clone investigations.There is only one fungus species (Neurospora crassa) that presents a much smaller (38.6 Mb) genome size and estimated number of genes (10.1 thousand), as well as a more compact (average of two introns per gene) genetic system, as compared to the other organisms.But even within the Animalia, the range of values, especially in relation to genome sizes, is wide (97.0 Mb in Caenorhabiditis elegans; 2.9 Gb in Homo sapiens; the latter value is about 30× higher than the fi rst).The interval in relation to number of genes is much smaller (13.6 thousand in Drosophila melanogaster; 31.1 thousand in Fugu rubripes, a 2× difference), while the number of introns per gene varies from three (D.melanogaster) to nine (Rattus norvegicus).Notice the much larger genome size of Bombix mori (428.7 Mb), as compared to that of Drosophila melanogaster (180.0Mb), a 2.4× difference.Larger genes were also found in B. mori, due to the insertion of transposable elements, an event that occurred in relatively recent times.The number of estimated genes is also 1.4× higher in B. mori (18.5 thousand) as compared to D. melanogaster (13.6 thousand).
Other characteristics of these and related organisms deserve mention.Thus, Neurospora crassa presents the widest array of genome defense mechanisms observed in any eukaryotic organism, including a unique mechanism, named repeat-induced point mutation.This is a process that effi ciently detects and mutates both copies of a sequence duplication, determining the methylation and silencing of repetitive DNA.Thus the organism establishes a protection against selfi sh or mobile DNA, but with a price.It prevents gene innovation through gene duplication, and the result is a genome with an unusually low proportion of closely related genes.
This mechanism does not exist in Ciona intestinalis, and therefore it did not prevent the devel- opment of Ciona's unique genes for making cellulose.These genes, that since they were never observed in other animals may appear to have come out of nowhere, actually should have been received through horizontal gene transfer from a bacteria (Pennisi 2002).
What is the origin of the central nervous system?To investigate this question Mineta et al. (2003) determined the nucleotide sequence of expressed sequence tags (ESTs) from the head por-An Acad Bras Cienc (2005) 77 (4) tion of the planaria Dugesia japonica, a basal bilateran animal.Out of 3,101 nonredundant EST clones they found 116 clones that had signifi cant similarity to known genes related to the nervous system, and 110 of them (95%) were shared with H. sapiens, D. melanogaster, and C. elegans, indicating considerable conservation.Even more interesting, 35 of them (30%) had homologous sequences with those of A. thaliana and S. cerevisae, which do not have a nervous system!Therefore, nervous system-related genes greatly predated the origin of the nervous system.
Other recent genome studies involving domestic animals include a 1-Mb resolution radiation hybrid map of the canine genome (Guyon et al. 2003); and a detailed physical map of the horse Y chromosome (Raudsepp et al. 2004).
Are there marked differences in the rates of change that occur along different eukaryotic lineages?Table IV presents some information concerning this point.As is shown there, the rate of substitutions, and the number of bases involved in the events, is two-times higher in the rodent lineage that originated M. musculus and R. norvegicus than in the human lineage.Independently of the lineages, deletion events are two to three times as common as insertion events, and the number of bases involved are three to four times higher in deletions as compared to insertions.

GENOME VARIATION IN ORGANELLES
Mitochondria are cytoplasmic organelles that originated from a symbiosis between bacteria that led to the formation of the primitive eukaryotic cell.They are mainly related to processes of oxidative phosphorylation; and as a matter of fact the number of mitochondria per cell is strictly correlated with energy requirements.Their structure also varies widely during the cell cycle.Not less than 292 mitochondrial genomes (mtDNAs) had been completely sequenced around 2003, and their sizes (ranges and averages) are listed in Table V according to different taxonomic categories.The largest genomes are found in plants (average of 207 kb), and the smallest among Protists (38 kb).In Animalia the variability is not extensive.Despite the extreme values of 13.5 kb found in Taenia crassipes, a platyhelminth, and 22.7 kb found in Venerupis philippinarum, a mollusk, the averages in the several phyla generally occur around 16 kb, with no clear connection with phylogeny.The value for Homo sapiens is 16.6 kb.
The mitochondrial genetic code shows several differences from the universal genetic code, and these changes are diverse from phylum to phylum.Protist mtDNAs generally resemble plant rather than animal or fungal mtDNAs.Despite the wide variation in size (19.4-100.3 kb) the informational content of the mitochondrial genome in fungi is quite constant.Impressive diversity occurs in plants; their genome may be formed by a single large circular molecule (the master chromosome) as well as by a heterogeneous population of linear, circular, and more complex structures.Animal mtDNAs, on the other hand, are characterized by a compact arrangement, constancy of gene content, and the presence of a single noncoding region about 1 kb long (Saccone and Pesole 2003).
Plastids are cellular organelles that have also arisen through symbiotic processes, involving specifi cally a primitive eukaryote and a cyanobacterium-like ancestor.Among plastids the chloroplast is undoubtedly the best studied, since the organelle is the central site of the photosynthetic process.It has a generally flat and lens-shaped structure, and is limited by a double membrane.Much less information is available, as compared to mitochondria, for the plastid genome, but the sizes of the complete genome of 21 species are given in Table VI  Everything in life has its price.In organic evolution the price payed to allow more diversifi cation was death.Raff and Kaufman (1983) asserted " By the account of the Old Testament book of Genesis, death was the price of knowledge; more prosaically, it was the price of multicellularity".The most primitive metazoans should have had just two cell types, somatic and germinative.The latter retained most of the properties of their basically immortal unicellular ancestors.The somatic ones, however, showed adhesiveness and could only perform a fi nite number of cell divisions.The interaction between these two types, determined by a balance between growth factors and cell death (apoptosis), led to organisms of different sizes and shapes.Multicellularity has arisen many times in eukaryotic evolution, with independent origins in the lineages leading to animals, green plants, fungi, cellular slime moulds, and several other taxa.Three factors are shared by them: division of labor, differentiation, and cell communication (Brooke and Holland 2003).
A curious organism, that presents intermediate properties between the unicellular and multicellular conditions, was described by Keim et al. (2004a, b).It is a spherical prokaryotic magnetotactic multicellular living being whose components align themselves along magnetic fi elds, swim, and divide by binary fi ssion as a unit.Its life cycle is therefore completely multicellular, differing therefore from the latter because it does not present, as part of the life cycle, a unicellular stage.
How is this reflected in genome complexity?Lynch and Conery (2003) argued that there is an inverse relationship between population density per unit of area and average individual body mass within a species, and also an inverse relationship between organism size and the product of its effective population (the number that effectively matters in evolution) and the mutation rate (N e u).Therefore, the enormous long-term effective population sizes of prokaryotes would impose a strong barrier to the evolution of complex genomes and morphologies.
A dancy, and the possibility of evolutionary plasticity.
On the other hand, factors exist that can reduce the genome size, especially deletions of different sizes.But quantity is not enough.The way this new material was introduced and its content certainly matters (quality).It is clear, for instance, that multicellular organisms would need an expansion of regulatory domain families involved in extracellular adhesion or signal transduction, as well as DNA-binding transcription factors.In addition to the invention and prolifera-tion of specifi c protein domains, new architectures created by the juxtapositions of previously existent domains would also be needed (Saccone and Pesole 2003).Other factors are intron number and size.The average number of introns per gene in most multicellular species is between four and seven, while for unicellular eukaryotes is less than two.As a matter of fact, there seems to be a threshold genome size of about 10 Mb below which introns are very rare.Intron length is also important; there is a relationship between it and the amount of recombination, probably modulated by selection (Lynch and Conery 2003).
Other aspects have been considered by Wolfe and Li (2003), such as: (a) the origins of new genes (de novo formation, mosaic); (b) asymmetric directional mutation pressure; and (c) effects of genome location on mutation rates.Certainly structure matters, as is indicated by the fact that adjacent genes are co-regulated more often than is expected by chance, and in comparative genomics extensive conservation of gene order occurs.Vinogradov (2004), on the other hand, discussed especially the problem of variation in the amount of noncoding DNA in terms of multilevel selection, mutation bias, and negative and/or positive feedbacks to genome size changes.

COMPARATIVE HUMAN CHROMOSOME GENOMICS
The draft sequence of the human genome was presented with much fanfare in February, 2001 (International Human Genome Sequence Consortium 2001, Venter et al. 2001), but detailed analysis of its contents is continuing quietly in the laboratories.A series of papers are appearing giving information for each chromosome of our genome, and four of the main characteristics of those most completely studied are given in Table VII.Chromosome 19 is notable, since it presents the highest numbers in three of the characteristics compared, namely the highest GC (48%) and the highest repeat contents (55.7%), as well as the highest gene density (26.0 genes per Mb).It is also medically important due to the presence there of genes involved in familial hypercholesterolemia and insulin-resistant diabetes (Grimwood et al. 2004).Chromosome 13, the largest acrocentric of the set, is notable conversely for the low percentage of GC (38.5%) and low density (6.5 genes per Mb).As a matter of fact, it contains a central region of 38 Mb where the gene density drops to only 3.1 genes per Mb.This chromosome is also important because it carries several genes involved with cancer (breast cancer, retinoblastoma, B-cell chronic lymphocytic leukemia) (Dunham et al. 2004).NA: Not available in the sources consulted.
Chromosome 22, as chromosome 19, also shows a high percentage of GC (47.8%), and high gene density (16.3 genes per Mb).This chromosome and chromosome 21 were subjected to a close analysis by Chen et al. (2002), who found that: (a) the number of gene structures containing untranslated exons exceeds 25%; and (b) the terminal exon and initial intron tend to be the largest in their categories.
Respectively low and high percentages of repeats have been found in chromosome 21 (40.1%) and 16 (47.8%).The former is of course involved in the etiology of Down syndrome and one form of Alzheimer's disease, and has been extensively studied by Stylianos E. Antonarakis and his group in Geneva.For instance, Dermitzakis et al. (2002) found 2,262 non-genic conserved blocks in this chromosome that are potentially functional.As for chromosome 16, it has many segmental duplications, that are particularly clustered along its short arm (Martin et al. 2004).
Chromosome 5 harbor the important protocad-herin and interleukin gene families, as well as the duplicated region associated with various forms of spinal muscular atrophy (Schmutz et al. 2004b).Chromosome 6 has the largest transfer RNA gene cluster in all the genome, and within its major histocompatibility complex presents HLA-B, the most polymorphic of our loci (Mungall et al. 2003).Chromosome 7, the seat of the cystic fi brosis gene, exhibits an unusual amount of segmentally duplicated sequences (8.2%), that might have led to a series of chromosome rearrangements that were evolutionary important, and included 440 breakpoints associated with disease (Hillier et al. 2003, Scherer et al. 2003, Müller et al. 2004).
As for the remaining chromosomes listed in Table VII, chromosome 14 has a heterochromatic short arm that contains essentially ribosomal RNA genes, while its euchromatic long arm presents most, if not all, of the protein-coding genes.Two loci of special importance for the immune system (the alpha/delta T-cell receptor and the immunoglobulin heavy chain), as well as more than 60 disease genes, have been localized in it (Heillig et al. 2003).Chromosome 20 generally has intermediate fi gures for the characteristics listed in Table VII.It is best known for harboring the genes that cause Creutzfeldt-Jakob disease and severe combined immunodefi ciency (Deloukas et al. 2001).
Not listed in Table VII but also extensively characterized is the Y chromosome.Its malespecifi c region comprises 95% of its length, and is a mosaic of heterochromatic and euchromatic sequences.The latter have been classifi ed in three classes: X-transposed, X-degenerate and ampliconic.These classes present all 156 known transcription units, which include 78 protein-coding genes that collectively encode 27 distinct proteins.The most prominent features of the ampliconic region are eight massive palindromes, at least six of which contain testis genes (Skaletsky et al. 2003).

BRAZILIAN CONTRIBUTION TO GENOMICS
Genome research in Brazil initiated years before the release of the draft sequence of the human genome.All started in Belo Horizonte in 1992, as a collaborative effort between Sergio D.J. Pena, from the Federal University of Minas Gerais, and Andrew J. Simpson, at the time in the René Rashou Research Center of the Oswaldo Cruz Foundation.They decided to obtain expressed-sequence tags (ESTs) (namely those that uniquely identify genes) of Schistosoma mansoni, a trematode worm responsible for schistosomiasis, an endemic parasitic disease that affects about 12 million Brazilians, especially in the rural area.In this endeavor they received initial logistic and fi nancial support from J. Craig Venter and its Institute for Genomic Research in the USA (Pena 1996).
This project continues to be developed through consortia coordinated both in Belo Horizonte and São Paulo.But the real turning point in relation to Brazilian genetics science in general occurred with the complete sequencing (using a combination of clone-by-clone shotgun and whole genome shotgun) of Xylella fastidiosa.This microorganism is responsible for a serious disease of orange trees, citrus variegated chlorosis.When Andrew J. Simpson moved from Belo Horizonte to the Ludwig Institute for Cancer Research in São Paulo, an opportunity appeared for the development of a project, enthusiastically endorsed by FAPESP (Fundação de Amparo à Pesquisa do Estado de São Paulo) for the genome sequencing of this bacterium.The results were published in the 13 July, 2000 issue of Nature as a cover article, and with very favorable comments.
The success of the Xylella project (which led to the sequencing, afterwards, of an USA strain - Van Sluys et al. 2003) stimulated the development of several other genome investigations.A list of those for which information is available is given in Table VIII.All of them concentrated: (a) in organisms or subjects of major economical or medical interest (agricultural pests, organisms that could produce substances for industrial use, important crops, agents or vectors of diseases, the cancer problem in humans); and (b) in the functional fraction of the genomes, through the EST approach.
As can be seen in Table VIII, all kingdoms of life are being considered, namely 11 species of Bacteria (C. violaceum, G. diazotrophicus, H. seropedicae, L. xyli, L. interrogans, M. hyopneumoniae, M. synoviae, R. tropici, X. axonopodis, X. campestris, X. fastidiosa), two of Protozoa (L.chagasi, T. cruzi), two of Fungi (C.perniciosa, P. brasiliensis), four of Plants (C. arabica, E. grandis, P. cupana, S. officinalis) and fi ve of Animalia (S. mansoni, L. vannamei, A. aegypti, B. taurus, H. sapiens).Also, all of them are being developed through consortia of laboratories from different institutions, an intelligent way to optimize personnel and equipment resources, providing also opportunity for less developed centers to integrate with more sophisticated units in important investigations.
Other results involved the identifi cation of short interrupted palindromes (a sequence that reads the same, 5' to 3', on complementary strands) on the extragenic DNA of three bacterial genomes (Vasconcelos et al. 2000); and a System for Automated  (Almeida et al. 2004).

STUDIES OF THE PORTO ALEGRE GROUP
Our research group is working in different aspects of molecular evolutionary change, and information about the most recent results is provided in Tables IX and X.The plant studies have two foci of inquire: (a) a particularly interesting group of substances, the pathogenesis-related proteins (PRs); and (b) relationships among taxa in two contrasting genera, one (Passiflora) much variable, while the other (Petunia) shows a restricted degree of speciation.
PRs are coded by plants as a response to pathological or related situations and provide a good model for the testing of mechanisms of positive selection.We (Scherer et al. 2005) encountered evidence for the action of this type of selection at specifi c sites of fi ve of the 13 PR families investigated.The challenge, now, is to establish relationships between structure and function, and a beginning was made by Thompson et al. (2005) by modeling four PR-5 proteins.
After a fi rst overall phylogenetic analysis of 61 species of Passiflora (Muschner et al. 2003) our group is working with special situations.One of them is the possible differentiation of P. elegans from P. actinia under the influence of changes that occurred in the Atlantic forest in southern Brazil (Lorenz-Lemke et al. 2005); the other is the documentation and details as how P. alata is invading unoccupied areas of this region (Koehler- Santos et al. 2005a, b).As for Petunia, despite the fact that the 11 species studied do not show marked molecular differences, they can be separated in two complexes, their representatives living respectively in high and low altitude levels.Petunia plus its closely allied genus Calibrachoa should have diverged from other clades at about 25 million years before present (Kulcheski et al. 2005).
The studies on human populations are listed in Table X.A particularly interesting genomic region occurs in chromosome 19, that codes for the low-density lipoprotein receptor (LDLR) and contains a 3' untranslated region (3' UTR) that has multiple copies of the Alu family of repetitive DNA.Its upstream portion presented the highest mutation rate estimated thus far for an autosomal locus in humans (0.632% per million years), possibly due to the action of selection (Fagundes et al. 2005).The results suggest a single origin for the fi rst colonizers of the American continent (Heller et al. 2004), as was observed in other data set, involving L1 and Alu insertions (Mateus Pereira et al. 2005).
Other fi ndings: (a) Native North American Indian mtDNA haplogroup X was not present in 991 individuals from 25 South American Indian groups, indicating that while present in the north, it should be rare or absent in the south (Dornelles et al. 2005); (b) spatial gradients of the APO E alleles are compatible with a directional demographic expansion which occurred in northeastern Asia and much of the New World (Demarchi et al. 2005); and (c) Data from 404 STRs and 17 2-9-site haplotypes indicate that although the early colonization of the Americas may have conditioned some loss of genetic variability, the range of differences found among fi ve Native American populations was two times higher than those found between the most variable Amerindian (Maya) and a control African Yoruba sample (Salzano and Callegari-Jacques 2005a).

UNDERLYING PRINCIPLES
Genomic evolution can be viewed as just a part of the universe's evolution, made possible by the origin of life.The whole process can be envisaged as a permanent struggle between the dialectical agents of change and order.Disordered chaos prevailed in the beginning of life, but the building of structures and channeling of processes soon established limits to variation.Natural selection was the primary agent responsible for the evolution of a genetic region or organism in one or the other direction, through mechanisms of positive (emphasizing novelty) or negative (protecting the status quo) se-  lection.Mutation itself was influenced by this factor, which determined rates of change and forbidden options.
The structure of life involved the assemblage of genes in chromosomes, the establishment of longitudinal differentiation along them of coding and non-coding (regulatory) regions, and the determination of the gene order that would best suit the process of protein coding.In relation to the latter, the division in domains furnished the possibility of a degree of flexibility impossible to be obtained otherwise.
Early in the process, the fusion of different kinds of organisms provided the possibility of the development of transitions, changes that opened completely new avenues of life exploration.Mitochondria and plastids originated in this way, and presently modulate the whole process of energy formation and use.
Small or large?Simple or structured?The option was in large part determined by population sizes, modes of reproduction, and the immediate physical and biotic environment.The result is the marvelous assemblage of life forms that are presently seen in the world.
The importance of the direct study of the genetic material in the proportions that have been reviewed here cannot be overemphasized.We are starting to understand whole genomes, their struc- Palavras-chave: evolução molecular, princípios evolutivos, evolução, plantas, evolução humana.

TABLE III Selected information about the genomes of completely sequenced multicellular eukaryotes. 1
2Abbreviations: Mb: Megabases; NA: Not available in the sources consulted; Gb: Gigabases; WGS: Whole Genome Shotgun; CCS: Clone-by-Clone Shotgun.

TABLE IV Midpoint values and their variation in the branches of a phylogenetic tree connecting humans, the common rodent line, mice, and rats. 1
economically important) show little variation (average: 136.5 kb).Notable is the low size (70.0 kb) of Epifagus virginiana, an Euasterid.UNICELLULARITY, MULTICELLULARITY, AND GENOME COMPLEXITY

TABLE VII Characteristics of ten of the most completely studied chromosomes of the human genome. 1
1 Source: Dunham et al. (2004); Martin et al. (2004); Schmutz et al. (2004b).

643 TABLE IX Recent investigations (papers published or in press) on plant molecular evolution by the Porto Alegre group.
An Acad Bras Cienc (2005) 77 (4)EVOLUTIONARY CHANGES IN PLANTS AND HUMANS