Modeling and analysis of soybean (Glycine max. L) Cu/Zn, Mn and Fe superoxide dismutases

Superoxide dismutase (SOD, EC 1.15.1.1) is an important metal-containing antioxidant enzyme that provides the first line of defense against toxic superoxide radicals by catalyzing their dismutation to oxygen and hydrogen peroxide. SOD is classified into four metalloprotein isoforms, namely, Cu/Zn SOD, Mn SOD, Ni SOD and Fe SOD. The structural models of soybean SOD isoforms have not yet been solved. In this study, we describe structural models for soybean Cu/Zn SOD, Mn SOD and Fe SOD and provide insights into the molecular function of this metal-binding enzyme in improving tolerance to oxidative stress in plants.


Introduction
Crop plants are frequently exposed to a variety of abiotic, biotic and xenobiotic stresses that cause injury, limit their growth and adversely affect their productivity. The most common result of such stress is the production of toxic reactive oxygen species (ROS). Increased levels of ROS such as superoxide (O2• -), hydrogen peroxide (H 2 O 2 ) and the hydroxyl radical (•OH) cause irreparable damage to cellular components such as DNA, proteins and lipids that require additional defense mechanisms (Blokhina et al., 2003). Plant responses to ROS toxicity involve the coordinated action of enzymatic and non-enzymatic antioxidant defense systems (Pallavi and Rama Shankar, 2005;Pallavi et al., 2012). Among enzymatic defenses, superoxide dismutase (SOD, EC 1.15.1.1) is the most important enzyme because of its distinct ability to neutralize superoxide anions by dismutating them into O 2 and H 2 O 2 . SOD is synthesized by all aerobic organisms and also by some air-tolerant and obligate anaerobic organisms (Fink and Scandalios, 2002).
SODs are metalloproteins that occur in four isoforms: copper zinc SOD (Cu/Zn SOD), manganese SOD (Mn SOD), nickel SOD (Ni SOD) and iron SOD (Fe SOD), all of which are highly stable because of the b-barrel structure and low content of a-helix strands (Renu and Sabarinath, 2004). Almost all eukaryotic organisms synthesize Mn and Cu/Zn SOD whereas Fe SOD is exclusive to plants ( Kwang-Hyun et al., 2006). Ni SOD was recently reported in Streptomyces griseus and S. coelicolor (Garcia-Hernández et al., 2002). In the Arabidopsis thaliana genome, three Cu/Zn SODs (cytosolic, chloroplast and peroxisomal), one Mn SOD and three Fe SODs (Fe SOD1, 2 and 3) genes have been reported, with Fe SOD2 and 3 being chloroplastic. Chloroplast Cu/Zn SOD is located in the thylakoid membranes and Fe SOD2 and 3 are located in the stroma (Myouga et al., 2008).
Soybean (Glycine max. L) is an important legume plant that is widely cultivated for its protein and oil and is considered a miracle crop. Although the role of SOD as an antioxidant enzyme under stress conditions has been studied in different varieties of soybean (Chaitanya et al., 2009;Hassan et al., 2011), the genes that control the expression of these isoforms have not been identified. In this study, we undertook a molecular, structural and phylogenetic analysis of soybean SOD isoforms (Cu/Zn, Mn and Fe SOD) based on homology modeling using A. thaliana SODs. chloroplasts (ATG12520), cytosol (AT1G08830) and peroxisomes (AT5G18100), were used as the query sequences in BLAST searches to identify the corresponding genes in soybean. These nucleotide sequences were assessed for homology at the protein level and used for phylogenetic analysis.

Phylogenetic analysis of sequences
All sequence alignments were done using ClustalW (Aiyar, 2000). Phylogenetic trees were plotted using MEGA 4.0 software with the UPGMA method (Tamura et al., 2007). Soybean genes that separated along with Arabidopsis genes were pooled and their amino acid pattern was analyzed by constructing pretty boxes using Boxshade.

Secondary structure prediction of SOD proteins
The secondary structures of the soybean SOD isoenzymes were predicted using the PSIPRED online server based on the retrieved sequences. PSIPRED incorporates two feed-forward neural networks that analyze the data generated as an output from PSI-BLAST (position-specific iterated BLAST) (Altschul et al., 1997). Validation of the procedure and performance using PSIPRED yielded an average Q3 score of 76.5%.

3D structure
The 3D structure models for soybean and Arabidopsis SODs were developed using 3D LigandSite. This software was used to the exact binding site of metal ions in the amino acid sequences. Dompred software was used to predict the domains and their boundaries for a given protein sequence.

Quaternary structure prediction
The quaternary structures of SOD proteins in soybean and Arabidopsis were predicted using the protein interfaces and surfaces tool PISA. Assemblies that could form crystals were determined by identifying the sets that represented the solutions indicated in the headings of the appropriate table (see Results). The highest values in the assemblies were considered to be the most appropriate. The MM size indicated the number of macromolecular monomeric units in that particular assembly and corresponded to an oligomeric or multimeric state. A formula was obtained to indicate the chemical composition of the assembly and denoted the number of different monomeric units. The stability of an assembly, i.e., its tendency to dissociate in solution, was also determined. The solvation free energy (DG int ) was calculated as the difference in solvation energies of the isolated and assembly structures and indicated the free energy gain (kcal/M) during the formation of an assembly. The free energy of dissociation (DG diss ) represented the free energy difference between the associated and dissociated states. Assemblies with DG diss > 0 were thermodynamically more stable because positive values were included in external energy use during the dissociation of an assembly.

Model evaluation
The dihedral angles j vs. y of amino acid residues in the protein structures were visualized and analyzed with Ramachandran plots (Ramachandran et al., 1963). The evaluation of models predicted in silico is essential in order to avoid errors resulting from trivial and non-trivial mistakes. To avoid ambiguities and to improve accuracy, the predicted SOD models were evaluated using the ProSA and VADAR web servers. For a specific PDB structure, ProSA calculates the overall quality score and validates a low resolution structure for approximate models using C-alpha atoms of the input structure. The output provides a z-score for the model that indicates the overall model quality; this value was determined from the plot during prediction.

Results and Discussion
ROS produced by plants are eliminated by antioxidant defense systems that enhance the tolerance of plants to environmental stress (Min-Lang et al., 2012). In view of the increasing interest in the molecular modeling of the various isoforms of SOD, in this study we investigated the structure of soybean SOD isoforms and examined their phylogenetic relationships.

Phylogenetic analysis
The availability of information from various genome-sequencing projects, cDNA libraries and EST libraries offers the possibility of complementing investigations of gene function in vivo with parallel phylogenetic analyses of multigene families to address their evolution within and across species (Vincentz et al., 2003). Within families, the protein structure and catalytic residues that determine the substrate specificity are generally conserved. Bioinformatics tools are thus useful for the functional analysis of related proteins (Henrissat et al., 2001). However, many sequence-based families are polyspecific, i.e., they include genes that encode proteins with different functions. This reflects gene duplication and evolutionary divergence, with the acquisition of new protein functions (Emanuele et al., 2004). In the present study, the phylogenetic relationships of soybean SOD genes were evaluated with respect to Arabidopsis SOD genes by using the Maximum Composite Likelihood (MCL) approach implemented in MEGA (Tamura et al., 2007).
Phylogenetic analysis of the soybean and Arabidopsis open reading frames (ORFs) provided information on the evolutionary ancestry of all the SOD groups. This analysis showed that SODs segregated into two major clusters, with cytosolic, chloroplast, peroxisomal Cu/Zn in one cluster and Mn SOD and Fe SOD in another. In this tree, soybean SOD TC332577 segregated with chloroplast Cu/Zn SOD, TC287018 with peroxisomal Cu/Zn SOD, TC282951 with the genes of Arabidopsis cytosolic Cu/Zn SOD, TC278165 with Mn SOD and TC278336 with Fe SOD (Figure 1). Arabidopsis and soybean SODs were grouped with the same branch lengths, i.e., 0.0870 for Mn SOD, 0.1257 for Fe SOD, 0.0748 for chloroplast Cu/Zn SOD, and 0.0912 and 0.1675 for cytosolic and peroxisomal clusters, respectively. This homology in grouping reflected the strong similarities in the gene patterns of these two plants.
However, there was a subtle difference in the branch lengths of the major groups. Cu/Zn SOD segregated into a major group whereas the peroxisomal enzyme of both plants grouped together with a branch length of 0.1675. Chloroplast and cytosolic enzymes grouped together with branch lengths of 0.1465 and 0.1301, respectively; they were joined to the peroxisomal enzymes via branch lengths of 0.0714 and 0.0177, respectively. The difference between these branch lengths was < 0.075. Similarly, Mn SOD and Fe SOD grouped with branch lengths of 0.4680 and 0.4294, respectively; these two major groups were linked by a branch length difference of 0.32. The UPGMA tree showed that, the difference between two branch lengths at each cluster was £ 0.5 and in some cases almost zero. This result shows that these SODs are closely related to each other and that the sequences retrieved were accurate. The UPGMA tree showed that the SOD genes identified here are important and deserve further investigation.

Boxshade analysis
The comparison of homologous protein sequences is the most effective means of identifying common active sites or binding domains. Comparative studies of protein sequences allow the functional relationships among proteins to be determined and are particularly important for homology searches and threading methods in structure prediction. The alignment of multiple protein sequences is a powerful tool for grouping proteins into families and allows subsequent analysis of evolutionary issues (Balasubramanian et al., 2012). In the present study, the pattern of conserved amino acids in the soybean and Arabidopsis SOD protein sequences was studied using the Boxshade server, which split the sequences into two clusters with Cu/Zn SOD forming one cluster and Mn SOD and Fe SOD forming the second cluster ( Figure 2). Fe and Mn SODs showed high similarities in sequence and structure. Rice Fe and Mn SODs also share high homology in their amino acid sequences. Mn SOD is the only form of SOD that is essen-Modeling and analysis of soybean SOD proteins 227  tial for the survival of aerobic life and plants. Mn SODs share 65% sequence similarity with each other (Youxiong et al., 2012). The degree of homology was also high among Cu/Zn SOD genes compared to that of Fe and Mn SODs. Our results indicated that the protein sequences from Cu/Zn SOD of chloroplasts, cytosol and peroxisomes had a greater number of conserved amino acid sequences than Mn and Fe SODs. The subcellular and phylogenetic distribution of SODs showed that all three SOD isoforms co-exist only in plants (Bowler et al., 1994). Comparative sequence analysis of the three SOD isoforms suggests that Fe SODs and Mn SODs are more efficient than Cu/Zn SODs, and that Fe and Mn SODs most probably arose from common ancestral enzymes, whereas Cu/Zn SODs evolved separately in eukaryotes (Smith and Dolittle, 1992).

Secondary structure analysis
The secondary structure predictions for soybean and Arabidopsis Cu/Zn, Fe and Mn SOD proteins showed that Mn SODs had a long chain length consisting of a-helices and b-strands (Figure 3). Helices were absent in chloroplast, cytosolic and peroxisomal Cu/Zn SODs of soybean and Arabidopsis and their secondary structures were identical (Table 1). Soybean and Arabidopsis SOD proteins had a similar number of domains but their locations differed. The binding sites of the SOD proteins also differed in both plants ( Table 2). The heterogen counts in the SOD genes of both plants were similar with respect to the type of heterogens present in SOD proteins (Table 3).

3D structure analysis
Proteins are complex chemical entities with a large number of variable atoms and a convoluted topology that make their description complicated (Ingale and Chikhale, 2010). The 'indescribable nature' of proteins also makes the quality of an experimentally determined protein structure very difficult to assess. The rapid increase in the number of genomes being sequenced and in the number of genes being deposited in databases means there is a need to identify the protein functions involved in protein interactions that form the basis of defining protein groups. In this study, three-dimensional models of soybean and Arabidopsis Cu/Zn, Mn and Fe SOD proteins were predicted using the software 3D LigandSite. The resulting models displayed excellent global and local stereochemical properties (Figure 4). Blue colored residues were predicted to be part of the binding. Residue conservation was calculated using the Jensen-Shannon divergence score (Capra and Singh, 2008). The ligands that formed the cluster were used to predict the metal ions shown in the space-filling format (Wass et al., 2010). There was a marked distinction between the metal binding sites and normal sites without space filling that enabled us to locate the coding regions exactly. The structural symmetry of the Cu/Zn SOD groups was identical.
Despite similarities in the secondary structures of cytosolic and peroxisomal Cu/Zn SOD there were many 230 Gopavajhula et al.  differences between the corresponding genes in both plants. The amino acid patterns were identical in Cu/Zn SOD compared to Mn and Fe SOD. The Fe SOD structure contained more helices, as indicated by the quality index, with more omega aberrations. Cu/Zn SODs showed more homology compared to other models with no helices in their structures. Model evaluation revealed the accuracy of the predicted models and suggested possible errors that were trivial when compared to the overall quality of the proposed structure. Quaternary structural analysis showed that all of the structures are thermodynamically stable, with dissociation energies > 0.

Model evaluation
The Z-score value, a measure of model quality that predicts the total energy of the structure (Wiederstein and Sippl, 2007), was predicted for soybean and Arabidopsis SODs using the PROSA server ( Figure 5). The Z-score values for chloroplast, cytosolic and peroxisomal Cu/Zn SODs were -6.33, -7.05 and -6.93, respectively; the corresponding values for Mn SOD and Fe SOD were -7.59 and -7.92, respectively.
The stereochemical quality and exactness of the predicted soybean and Arabidopsis SOD proteins were analyzed using residue-by-residue geometry and the overall geometry of the protein structures was analyzed with Ramachandran plots (Ramachandran et al., 1963). These plots help visualize the dihedral angles j vs. y of amino acid residues in proteins. In a polypeptide, the main chain N-C a and C a-C bonds are relatively free to rotate. These rotations are represented by the torsion angles f and y, respectively. Figure 6 shows the quantitative evaluation of soybean and Arabidopsis protein structures done using Ramachandran plots provided by the VADAR web server. Soybean and Arabidopsis Mn SODs shared > 90% structural similarity with an equal number of residues in favored, allowed and outlier regions. There was little variation in the general and proline residue numbers in the aR, aL and b regions of both structures. The prominence of residues in the aR and b regions suggested that the structure was rigid with more right helices. Soybean chloroplast Cu/Zn SOD showed higher quality compared to the Arabidopsis protein, with fewer residues in the outlier region and more residues in the allowed region. The absence of residues in the aL region in both structures was an interesting feature of this protein and indicated that the structure had no left helices. Chloroplast Cu/Zn SOD of both plants had 86% of residues in the favored region (this value was lower than in previously reported SOD structures). More residues were observed in the allowed region of soybean (9%) and Arabidopsis (8.5%) SOD. There were more outliers in the Arabidopsis structure (5%) than in soybean (4%). Cytosolic Cu/Zn SODs from both plants had no residues in their 232 Gopavajhula et al. outlier regions. These soybean and Arabidopsis SODs also had an equal number of residues in the favored and allowed regions (95% and 4.5%, respectively). In both structures, three proline residues were distributed in the aR and b re-Modeling and analysis of soybean SOD proteins 233 Figure 5 -ProSA-web z-score chimeric protein plot. The z-score indicates overall model quality. The ProSA-web z-scores of all protein chains in PDB were determined by X-ray crystallography (light blue) or NMR spectroscopy (dark blue) with respect to their length. The plot shows results with a z-score £ 10. The z-score for SOD is highlighted as a large dot. The value is within the range of native conformations.
gions. There was a subtle difference in the sparsely populated aL region, where Arabidopsis had one general and one proline while in soybean both of the residues were generally in the core region. All of the Cu/Zn structures had fewer residues in the aR region compared to the b region. The Fe SOD and Mn SOD structures had good clustering of residues and a greater number of helices. The structural quality of chloroplast Cu/Zn SOD of both plants was lower than in the remaining structures. In all of the structures, the aL region was less much less populated, i.e., few or no residues.
The soybean Fe SOD structure had 95% of its residues in expected regions and a negligible proportion (1.3%) in the outlier region; the corresponding values for Arabidopsis were 93% and 2.4%, respectively. However, Arabidopsis had 5% of its residues in allowed regions whereas soybean had 3.6%. Soybean peroxisomal Cu/Zn SOD had < 89% of its residues in the favored region. The number of residues in allowed and outlier regions was also high, indicating structural aberrations whereas in Arabidopsis no residues are observed in the outlier region and~94% of residues were in the favorable region, indicat-234 Gopavajhula et al. Figure 6 -Validation of SOD structures using Ramchandran plots. The Ramachandran plots revealed that > 90% of SOD amino acid residues from the modeled Arabidopsis structure were incorporated in the favored regions of the plot.
ing the quality of the structure. Similar modeling and Ramachandran plot analyses to those described here have been used in the structural and functional analysis of spinach antioxidant proteins and the models were evaluated by computational tools (Sahay and Shakya, 2010).

Conclusion
Proteins are ubiquitous molecules that are involved in numerous crucial functions in organisms. Proteins accomplish their functions by positioning specific amino acids at target sites. Knowledge of the structural arrangement of amino acids is very important for understanding the molecular mechanisms by which proteins perform their functions. SOD is an important antioxidant enzyme that provides the first line of defense against ROS toxicity. The accurate and reliable molecular structural analysis of SOD isoenzymes is important for understanding their function in response to oxidative stress. In this study, structural models of soybean Cu/Zn, Mn and Fe SOD were analyzed and compared with those for Arabidopsis. These analyses provided insights into the molecular function of SOD isoenzymes with respect to their interactions with different cellular organelles. Further studies are in progress to understand the possible SOD gene interactions that may improve our understanding of the role of SODs in minimizing ROS toxicity.