Searching for convergent evolution in manganese superoxidase dismutase using hydrophobic cluster analysis.

There are numerous examples of convergent evolution in nature. Major ecological adaptations such as flight, loss of limbs in vertebrates, pesticide resistance, adaptation to a parasitic way of life, etc., have all evolved more than once, as seen by their analogous functions in separate taxa. But what about protein evolution? Does the environment have a strong enough influence on intracellular processes that enzymes and other functional proteins play, to evolve similar functional roles separately in different organisms? Manganese Superoxide Dismutase (MnSOD) is a manganesedependant metallo-enzyme which plays a crucial role in protecting cells from anti-oxidative stress by eliminating reactive (superoxide) oxygen species. It is a ubiquitous housekeeping enzyme found in nearly all organisms. In this study we compare phylogenies based on MnSOD protein sequences to those based on scores from Hydrophobic Cluster Analysis (HCA). We calculated HCA similarity values for each pair of taxa to obtain a pair-wise distance matrix. A UPGMA tree based on the HCA distance matrix and a common tree based on the primary protein sequence for MnSOD was constructed. Differences between these two trees within animals, enterobacteriaceae, planctomycetes and cyanobacteria are presented and cited as possible examples of convergence. We note that several residue changes result in changes in hydrophobicity at positions which apparently are under the effect of positive selection.


Introduction
Superoxide dismutases are a group of metalloenzymes that are present in the vast majority of organisms thus far studied.They are responsible for the dismutation of reactive superoxide (oxygen molecule with an extra electron O 2 -) to peroxide and water as follows: 2 O 2 -+ 2 H 2 O = O 2 +H 2 O 2 and 2 OH - (Fridovich, 1995).In concert with catalase, the peroxide is further to broken down to H 2 O and O 2 (Belinky et al., 2002).There are three separate superoxide dismutases based on the metal cofactor associated with the enzyme: iron SODs (FeSODs) found in prokaryotes, protists and plants, copper/zinc SODs (Cu/ZnSODs) found in bacteria and the cytosol of eukaryotes, and manganese SODs (MnSODs) found in prokaryotes and the mitochondrial matrix of eukaryotes (Natvig et al., 1996).MnSODs are usually found as homodimers in bacteria or homotetramers in eukaryotes (Natvig et al., 1996).Each MnSOD monomer contains two domains (C-and N-terminal) connected by a loop.The C-terminal domain consists of a three-stranded antiparallel beta-sheet and four helices, while the N-terminal one consists of two helices folded into an antiparallel hairpin, along with a left-handed twist (Guan et al., 1998).It has been shown that superoxide acts against oxidative stress and that SOD mutants exhibit additional requirements for methionine and lysine, and may have shortened life spans and are oxygen intolerant (Fridovich, 1995;Alscher et al., 2002).Because of its crucial role in preventing damage by superoxide ions, MnSODs have been found in nearly all species reported to date.
Hydrophobic Cluster Analysis (HCA) was developed for structural comparisons between proteins based on primary sequence information, in cases where X-ray crystal structure was difficult if not impossible to obtain (Gaboriaud et al., 1987).It has been pointed out that the sequence variation based on the 20 essential amino acids is much greater than the number of observed variations in protein secondary and tertiary structure.That is to say that common 2-D and 3-D folding patterns can consist of very different amino acid sequences (Lemesle-Varloot et al., 1990).There is, for example, an 80% similarity based on HCA in the hydrophobic regions between human hemoglobin a-chain and lupin leghemoglobin while there is only a 15% sequence identity (Gaboriaud et al., 1987).A previous study applied HCA to the analysis of a tandem mutation of superoxide dismutase from the microsporidium Nosema bombycis to look for structural differences between these isozymes (Xiang et al., 2010).The goal was to understand whether this duplication represents the evolution of an SOD with different functional characteristics, or simply a duplication of the same protein to produce more SOD for the same purpose.
The aim of this present study was to compare phylogenies based on a combination of (maximum likelihood, maximum parsimony, neighbor joining and Bayesian) analyses of amino acid sequence data, from a wide variety of ( 48) organisms, with one based on values obtained through HCA to look for structural and possible functional convergence within these proteins.

Hydrophobic cluster analysis
The software developed for Hydrophobic Cluster Analysis creates a two dimensional plot of the protein based on the amino acid sequence.The HCA plots of each MnSOD protein were drawn using the Drawhca program (Woodcock et al., 1992).This plot is presented, as it would be on a cylinder with 3.6 amino acids per turn shown as a classical a-helix.After five turns, the residues i and i + 18 are placed at the same locations on the cylinder.Then, the cylinder is separated along its axis and unrolled.As some adjacent residues would be widely cut, the representation is duplicated making it easier to follow and allowing a visualization of the environment of each amino acid residue.The HCA plot is then defined by encircling the adjacent hydrophobic residues (tryptophans, tyrosines, methionines, phenylalanines, isoleucines, leucines, valines) and marking the prolines (asterisk) and glycines (closed rhomb) as presenting loops and the cystines as involving disulphide bonds (Gaboriaud et al., 1987).
We then developed a numerical score between each amino acid sequence pair to compare their structure similarity.To do this, Perl scripts were written to align the sequences (fasta format) based on their correspondence to the HCA plot and obtained the CR numbers by counting corresponding hydrophobic residues between each pair of sequences.HCA similarity scores were obtained by calculating the number of hydrophobic amino acids via the following formula: HCA similarity score (%) = (2CR x 100)/(RC1 + RC2), where RC1 (RC2) is the number of hydrophobic residues in protein 1 (protein 2), and CR is the number of hydrophobic residues which are in correspondence between the two sequences (Gaboriaud et al., 1987).

Constructing and comparing trees
Based on the HCA similarity scores between each pair of MnSOD proteins, we obtained a distance matrix (distance = 1 -HCA similarity percentage), which was then used to construct a HCA tree by UPGMA method of PAUP software (Swofford, 2002).We also used the nearest neighor interchange (NNI) search algorithm in PAUP to complete a Maximum Parsimony tree for the sequence identities of MnSOD proteins, after their amino acid sequences were aligned using CLUSTAL X software (Thompson et al., 1997).Based on this multiple sequence alignment, a Neighbor Joining tree was reconstructed through the Poisson model of MEGA 4 (Tamura et al., 2007).The WAG matrix (Whelan and Goldman, 2001) with a gamma of 1.092, which was considered as the most suitable substitution model using ProtTest (Abascal et al., 2005), was used to construct a Maximum Likelihood tree using PhyML (Guindon and Gascuel, 2003).All bootstrap values were acquired using 500 replicates.A phylogram based on Bayesian analysis was also constructed by Phylobayes 3.3f (Lartillot et al., 2009), using the WAG model with a default gamma distribution and a saving frequency of 100 generations.Due to their similar topologies, the four phylogenetic trees (MP, NJ, ML and Bayesian) were combined into one common tree using the MacClade software package (Maddison and Maddison, 2000).
Visual comparisons between the HCA tree and the common tree were accomplished using the Treejuxtaposer program (Munzner et al., 2003).This program can be used to browse many trees and compare them side by side to examine topological differences by controlling the BCN (Best Corresponding Node) score.The lowest BCN threshold was chosen to check the differences between each pair of trees.

Selective test
In order to find out whether the observed hydrophobic cluster convergences are under natural selection, DnaSP v5 software (Rozas et al., 2003) was used to calculate the ratios of the nonsynonymous (Ka) to synonymous (Ks) substitution rates for pairwise genes.The selection of each individual residue within these genes was also analyzed by the site models (M1a, M2a, M7, and M8) of the CODEML software from PAML (Yang, 1997).The posterior probabilities were calculated using the Bates empirical Bates (BEB) algorithm.tary Material Figures S1-S4 for individual trees) with one based on the pair-wise comparison using hydrophobic cluster analysis of the same protein.The common tree presented in Figure 1 shows relatively high bootstrap values for most of the branch points and high agreement for the analyses performed.Phylogenetic relationships based on amino acid sequence (common tree) of this protein correspond well with analyses using other genes (Ciccarelli et al., 2006).All three analyses separated the taxa into the three accepted domains of life; Eukarya, Archaea and Bacteria.The eukaryotes are separated into three common kingdoms (Fungi, Animalia, and Plantae).The fungi show a dichotomy into cytosolic and mitochondrial forms of their MnSODs as previously reported (Frealle et al., 2006).The fungal taxa presented are all Ascomycetes and conform to the relationships presented in earlier works constructing a 6-gene phylogeny (James et al., 2006).Interestingly the HCA tree conforms better to the 6-gene tree in one minor respect, with Candida albicans and Debaryomyces hansenii being sister taxa (Kurtzman and Robnett, 1998).

Trees comparison
While we did not see differences between the two trees at the higher levels of domain and kingdom, except perhaps for the Archaea (Figure 1), we did see differences at the phylum to generic levels.The red branches shown in Figure 1, especially the five groups of Table 1 highlight significant differences between the common tree and that created by HCA values.Xiang et al. 463 Table 1 -Differences between sequence identity scores and HCA similarity scores for MnSOD genes within four groups.Non-synonymous (Ka) and synonymous (Ks) substitution rates for pairwise MnSOD were also shown.n.a., not available.

Animal relationships
The relationships among the Metazoa, as seen in the common tree, show a close correlation with long standing conventional wisdom and with previous phylogenies based on other molecular data.We can clearly see from the com-parisons of Figure 1 and the scores of Table 1 and Figure 2.
The vertical red box of Figure 2A, shows the similarities among the vertebrate proteins.Figure 2B  Figure 3B shows the changes of amino acid residues that result in these two differences.In addition we can see from Figure 3B a greater degree of amino acid similarity between E. coli and T. ptyseos (88%) than between E. coli and Y. enterocolitica (85%) and between T. ptyseos and Y. enterocolica (82%) (Table 1).Clearly Y. enterocolitica and E. coli have more structural similarity based on HCA Xiang et al. 465  2.
Two boxes of Figure 4A, enclosed in red, show regions of similarity between the archean C. symbiosum and the Proteobacteria, while C. N. maritimus shows a different contour.Figure 4B, shows the amino acid residues (in red) that correspond to these differences, with the hydrophobic residues indicated in blue.We infer from these results that the archean C. symbiosum MnSOD may function in a manner more similar to the Planctomycetes and beta-Proteobacteria MnSODs than to the other archean C. N. maritimus MnSOD.

Blue-green algae
The relationships among the cyanobacteria show high bootstrap values in the common tree and significant differ-ences with the HCA tree.In the common tree, Crocosphaera watsonii is the sister taxon to the remaining cyanobacteria, but based on HCA analysis, C. watsonii is the sister group to Leptolyngbya boryana and Anabaena variabilis (Figure 1).We see evidence for convergence based on the comparison of amino acid substitutions as shown in Figure 5B vs. the hydrophobic cluster analysis of Figure 5A.Crocosphaera watsonii shows the same contour as outlined by the red boxes of Figure 5A as A. variabilis and L. boryana, while the amino acid sequences found in these contours (Figure 5B) show only 46% and 47% sequence similarities, respectively (Table 1).In addition, Trichodesmium erythraeum, a taxon more closely related to A. variabilis and L. boryana than C. watsonii, shows a different contour in Figure 5A than the other three blue-green 468 Convergent evolution in MnSOD

Selection analysis
A selection analysis revealing Ka values (nonsynonymous nucleotide changes) lower than Ks (synonymous nucleotide changes), i.e.Ka/Ks < 1, indicates that mutations in nucleotides resulting in amino acid changes are less frequent than silent changes, this implying that the respective gene is under constraint, impairing change at the protein level.This is referred to as purifying selection.In contrast, Ka values larger than Ks values (Ka/Ks > 1), imply that the gene is under positive selection, and that a change in amino acid composition from a previous state, at that position, is driven by evolutionary processes.Although some Ks values were not available, all of the obtained Ka/Ks ratios were less than 1 (Table 1), indicating purifying selection for all MnSOD genes in our analysis.Nevertheless, as shown in Table 2, several positively selected residues were detected through PAML software.These changes correspond to the HCA plots and sequence alignment maps shown in Figures 2-5, where colored residues show variations in hydrophobicity.These are changes at Alanine at position 140 of H. sapiens, Valine at position 71 and Leucine at position 134 of E. coli, Serine at position 112, and Lysine at the 123 and 213 sites in C. symbiosum.These appear to be under positive selection based on PAML analysis (boxed residues in Table 2).This selection may contribute to the mutations of amino acids resulting in the changes of protein structure and function.

Discussion
The presentation of the HCA tree for MnSODs gives a visual representation to look for taxa where these proteins may have deviated structurally from their phylogenetic position.Hydrophobic cluster analysis allows us to observe convergent evolution not only by looking for reversions back to the same amino acid but also reversions to the same HCA contour.If this method can be automated then one could search nucleotide and protein databases for similarities in protein structure among unrelated organisms.The question is not whether these differences between phylogeny and HCA are minor or major changes, but whether they represent functional convergence at all.We believe that we may have observed some structural convergence at the molecular level regarding the blue-green algae, the Archaea, the Enterobacteriaceae and the animals, but no functional correlation can be made at this time.The relationship between these contours and the structure/function of proteins needs to be verified using X-ray crystal analysis, enzyme kinetics, etc.Since there is a large production of superoxide in the process of photosynthesis, testing levels of photoinhibition on various blue-green algae, which may have evolved to different exposures to light, vs. HCA, might be one possible study.
From the standpoint of phylogenetic analysis, the HCA only provides a distance matrix without parameters or statistics to check for accuracy or reproducibility.At the same time, trees based on HCA for the MnSOD proteins follow closely enough those based on the evolutionary history, so that one must wonder about the reason(s) for the noticeable deviations from this relationship.

Figure 1
Figure 1 is a phylogenetic comparison of MnSOD based on Maximum Parsimony, Neighbor Joining, Maximum Likelihood, and Bayesian analysis (see Supplemen- Figure 2 -Comparison of the HCA structure and amino acid sequence of four vertebrates MnSODs.Note: The similarity in HCA between fish (D. rerio) and man (H.sapiens).(A) HCA plot of MnSOD.The vertical red box shows a region of significant differences based on HCA.Amino acid changes resulting in changes in the HCA plot are in red (hydrophilic) or blue (hydrophobic).The phylogram to the left indicates the likely evolutionary relationship of the taxa in order to compare with changes in hydrophobicity.(B) Amino acid sequenced alignment of MnSOD.The black background shows regions where two or more amino acids are totally identical, while the gray background indicates changes resulting in similar amino acids.The positions in red (hydrophilic) or blue (hydrophobic) background show changes in hydrophobicity.The red horizontal box encloses a region corresponding to the regions enclosed by the vertical box in A. The green boxes enclose the positively selected residues detected by PAML in Table2.
Figure3Ais a HCA plot of three members of the proteobacteria in the family Enterobacteriaceae which show differences between the two phylogenies seen in Fig-

Figure 3 -Figure 5 -
Figure 3 -Comparison of the HCA structure and amino acid sequence of Enterobacteriaceae MnSODs.Note: The similarity in HCA between E. coli and Y. enterocolitica.(A) HCA plot of MnSOD.The vertical red boxes show regions of significant differences based on HCA.Amino acid changes resulting in changes in the HCA plot are in red (hydrophilic) or blue (hydrophobic).The phylogram to the left indicates the likely evolutionary relationship of the taxa in order to compare with changes in hydrophobicity.(B) Amino acid sequence alignment of MnSODs.The black background shows regions where two or more amino acids are totally identical, while the gray background indicates changes resulting in similar amino acids.The positions in red (hydrophilic) or blue (hydrophobic) background show changes in hydrophobicity.The red horizontal boxes enclose regions corresponding to the regions enclosed by the vertical boxes in A. The green boxes enclose the positively selected residues detected by PAML in Table2.

Table 2 -
Positive selection tests for putative convergent MnSOD genes using site models of PAML.The residues in bold mean the posterior probabilities are > 0.95 and only the values > 0.5 are presented to indicate the positive selective sites.The residues in underline correspond to the colored ones in the HCA plots and amino acid alignment figures.(A) Animal (H.sapiens, G. gallus, X. laevis, and D. rerio) in Figure 2; (B) Enterobacteriaceae (E.coli, T. ptyseos, and Y. enterocolitica) in Figure 3; (C) Archaea and relatives (C.symbiosum, C.N. maritimus, D. acidovorans, C .testosteroni,M. flagellatus, B. marina, and P. maris) in Figure 4; (D) Cyanobacteria (A. variabilis, L. boryana, T. erythraeum, and C. watsonii) in Figure 5.