Base excision repair in sugarcane

DNA damage can be induced by a large number of physical and chemical agents from the environment as well as compounds produced by cellular metabolism. This type of damage can interfere with cellular processes such as replication and transcription, resulting in cell death and/or mutations. The low frequency of mutagenesis in cells is due to the presence of enzymatic pathways which repair damaged DNA. Several DNA repair genes (mainly from bacteria, yeasts and mammals) have been cloned and their products characterized. The high conservation, especially in eukaryotes, of the majority of genes related to DNA repair argues for their importance in the maintenance of life on earth. In plants, our understanding of DNA repair pathways is still very poor, the first plant repair genes having only been cloned in 1997 and the mechanisms of their products have not yet been characterized. The objective of our data mining work was to identify genes related to the base excision repair (BER) pathway, which are present in the database of the Sugarcane Expressed Sequence Tag (SUCEST) Project. This search was performed by tblastn program. We identified sugarcane clusters homologous to the majority of BER proteins used in the analysis and a high degree of conservation was observed. The best results were obtained with BER proteins from Arabidopsis thaliana. For some sugarcane BER genes, the presence of more than one form of mRNA is possible, as shown by the occurrence of more than one homologous EST cluster.


INTRODUCTION
DNA from all living organisms is able to react with a large number of physical and chemical agents as well as with chemical compounds produced by cellular metabolism.The DNA damage induced by these different agents can interfere with cellular processes such as replication and transcription, resulting in cell death and/or mutations.The low frequency of mutation normally seen in cells is due to the presence of enzymatic pathways, which repair DNA damage and avoid mutation.The fact that DNA repair pathways are involved in protecting the integrity of genetic information explains the great interest in the study of these pathways.In general, cells deficient in DNA repair mechanisms are more susceptible to mutagenesis, which can predispose organisms, which carry these cells, to diseases such as cancers.
DNA repair enzymes are divided into five groups, which are: i) damage reversion repair (DRR), in which only one enzyme is necessary to directly revert the damage; ii) mismatch repair (MMR), in which the enzymes involved are responsible for correction of mismatched base pairs, principally mismatches induced by replication errors; iii) recombination repair or double strand breaks (DSB) repair, mediated by enzymes involved with homologous recombination, which uses the information on the undamaged sister chromatin or homologous regions to bypass lesions, including DNA strand breaks; iv) nucleotide excision repair (NER), involves enzymes which are able to remove several types of DNA damage, especially damage that induce large double helix distortion.This pathway involves the excision of an oligonucleotide containing the lesion and posterior repair by DNA synthesis; v) base excision repair (BER) involves the action of glycosylases which cleave the glycosylic bound between the specific damaged base and the deoxyribose, with subsequent incision by AP-endonuclease (apurinic/apyrimidinic endonuclease) at the resultant abasic site.A single or a few nucleotides are synthesized using the undamaged strand as a template.A review of DNA pathways can found in Eisen and Hanawalt (1999).
In plants, our knowledge of DNA repair pathways is still very poor.The first plant repair genes were only cloned in 1997 and the mechanisms of action of their products have yet to be characterized.In general, the few plant genes known to be involved in DNA repair are from Arabidopsis thaliana, and have shown homology with genes related to different repair pathways in other eukaryotes.For example, the AtXPB (the letters At indicate A. thaliana) gene is homologous to the human XPB gene, which is involved in DNA repair and transcription (Ribeiro et al., 1998).The A. thaliana thi1 gene is homologous to the yeast thi4 gene and seems to be involved in thiamine biosynthesis and mitochondrial DNA damage tolerance (Machado et al., 1996;Machado et al., 1997).The AtMMH gene is homologous to the E. coli mutM gene, which encodes for a glycosylase that recognizes some types of oxidative deoxyguanosine dam-age (Ohtsubo et al., 1998).The A. thaliana UVR3 gene shows homology with 6-4 photolyase genes from Drosophila melanogaster and Xenopus laevis, which act in the repair of 6-4 photoproducts (Nakajima et al., 1998).Two others A. thaliana genes, homologous to the human XPF gene (encoding an endonuclease involved in NER and recombination) and the MutS gene (involved in MMR) respectively, have also been described (Gallego et al., 2000;Ade et al., 1999).
Genome projects such as the Sugarcane Expressed Sequence Tag (SUCEST) project allow the possibility of rapidly identifying plant DNA repair genes.In fact, the high conservation observed between DNA repair genes in eukaryotes has facilitated the identification of genes in different organisms.In the data mining work described in this paper our main interest was to investigate the occurrence of sugarcane BER genes.This DNA repair pathway is mainly involved with the correction of base damage that causes relatively minor distortions to the helical DNA structure, such as deaminated, oxidized, alkylated or even absent bases (for reviews see Krokan et al., 2000;Memisoblu and Samson, 2000;Eisen and Hanawalt, 1999).In general, the inactivation of BER genes leads to a spontaneous mutator phenotype, and their relationships with carcinogenesis and aging have been investigated in several organisms (Boiteux and Radicella, 2000;Boiteux and Radicella, 1999;Kvaloy et al., 2001).
Our search for sugarcane BER genes was performed using the tblastn (blast = basic local alignment search tool) program and the sequences of BER proteins were obtained directly from GenBank and used as probes to identify homologous nucleotides sequences in the SUCEST database.A high degree of similarity was observed for the majority of the investigated proteins, indicating the high conservation of BER genes.

MATERIAL AND METHODS
The sequence of BER proteins, from bacteria, yeast, human and plant genes were obtained directly from the GenBank (http://www.ncbi.nlm.nih.gov).The sequences of these proteins were used to identify homologous nucleotides sequences in the SUCEST database (http://www.fapesp.br).Initially, the analyzis were performed by the tblastn program.The nucleotide sequence of clusters identified in the SUCEST database were submitted to GenBank and analyzed by the blastx program to confirm the homology with BER proteins.For some BER proteins, more than one homologous cluster was identified and in these cases, the blastn program was used to compare the nucleotide sequences of these clusters with the objective of identifying possible alleles or gene duplication.

RESULTS
We extracted from the GenBank database the sequences of several BER proteins, mainly glycosylases and AP endonucleases, from Arabidopsis thaliana (P), Escherichia coli (B), Saccharomyces cerevisiae (Y), Schizosaccharomyces pombe (Y*) and Homo sapiens (H).These protein sequences were used to screen clusters and reads in the SUCEST database, and the results of this screening are given in Table I.As expected, the best e-values were obtained with BER proteins from A. thaliana, a high conservation (e-value ≤ e -15 ) was observed for the majority of the proteins investigated.These results were confirmed when the nucleotide sequences of the clusters were submitted to GenBank and analyzed with the blastx program, from which we obtained similar e-values for all the clusters identified in the SUCEST database (data not shown).No homology for endonuclease IV, UVDE and TDG was found in sugarcane, nor has any been found in A. thaliana and other plants, raising the possibility that these proteins are not present in the plant kingdom.
The number of reads that constitute each cluster and their occurrence in different cDNA libraries used in this EST genome project are presented in Table II.These results are important because they imply that these genes are expressed in the tissues from which the libraries were constructed.Some BER proteins (e.g. the MAG homologue) are widely distributed while others (e.g. the UNG homologue) have a restricted distribution.
Table III shows the results obtained when the nucleotide sequence of homologous clusters to the same BER protein were compared using the blastn program.Some clusters presented high similarity between their nucleotide sequences.For example, for the atMMH1 clusters SCPRRT3027A01.g and SCSGAM2105C07.g we obtained an e-value of 0.0, suggesting the probable occurrence of distinct alleles or alternative splicing products.However, other clusters showed no nucleotide similarity, even presenting overlapping alignment with the BER protein, suggesting the presence of different genes, as observed for the ARP homologues.

DISCUSSION
Glycosylases are enzymes that recognize specific damaged bases.There are at least seven different known glycosylases specialized in recognizing specific substrates that initiate the BER pathway.Some glycosylases act also as 3'AP endonucleases (or β-lyase) (Memisoglu and Samson, 2000).Uracil DNA glycosylase (UNG or UDG) is the main enzyme involved in uracil incision.In DNA, uracil results from cytosine deamination that gives rise to guanine:uracil (G:U) mismatches, which can lead to guanine:cytosine to adenine:thymine (G:C to A:T) transition mutations during replication.Deoxyuridine monophosphate (dUMP) can be incorporated instead of deoxythymidine monophosphate (dTMP), which may alter binding of transcription factors to their targets (Verri et al., 1990).UNG homologues have been characterized in many bacterial and eukaryotic species and these proteins have strikingly similar structures and functions (Pearl, 2000).In our work, we identified two distinct clusters (Table I) with high similarity to UNG proteins only.One cluster has 726 bp and aligns best with the N-terminal protein region, whereas the other cluster has 804 bp and aligns best with the UNG C-terminal region.Of the UNG genes described in the literature, only mammals have two UNG versions, one nuclear (UNG2) and one mitochondrial (UNG1), which are differentially expressed in tissues and induced during the cell division (Slupphaug et al., 1991;Nilsen et al., 1997;Nilsen et al., 2000).The two forms of human UNG differ in their N-terminals, UNG2 having a specific amino acid sequence which is not present in UNG1 (Otterlei et al., 1998).The smaller sugarcane cluster shows a higher similarity with UNG2 (e-value 3e -19 ) than UNG1 (e-value 5e -18 ) and the large cluster has the same e-value (1e -20 ) for both UNG forms.Each cluster has only one read, which are from of the same flower cDNA library (Table II).The sequence of a single A. thaliana UNG gene was obtained through its genome project.It thus seems that sugarcane may be a good model to study UNG activity in plants.
The formamidopyrimidine DNA glycosylase (FPG or MutM) plays a fundamental role in the repair of oxidative DNA damage by recognizing oxidized purines such as 7,8-dihydro-8-deoxyguanine (8-oxo-dG) and imidazol ring-opened purines (FAPY-G).The deficiency of this BER enzyme in bacteria causes a mutator phenotype with an increase of transversion mutation involving G:C to T:A transversions due to the ability of 8-oxo-dG to pair with 2-deoxyadenosine (Boiteux and Radicella, 1999).The MutM proteins have been characterized in many bacterial strains and are very conserved.In eukaryotes, the only mutM homologous gene known has been found in A. thalia-na (AtMMH).This gene encodes two types of mRNA (AtMMH1 and AtMMH2), formed by alternative splicing of exon 8 (Ohtsubo et al., 1998).In the SUCEST database we found four clusters presenting high similarity with MutM proteins (Table I).The best e-values were observed with AtMMH1.The clusters SCCCLR2C01B12.g and SCSGAM2105C07.g showed a similar alignment with the MutM proteins, but when their nucleotide sequence were compared using the blastn program (Table III), a good e-value was not observed (4e -08 ), which suggests the probable occurrence of two different genes.In contrast, we obtained a high similarity between the clusters SCCCLR2C01B12.g and SCEQRT2094G03.g (e-value = e-169 ).The same thing occurred with the clusters SCPRRT3027A01.g and SCSGAM2105C07.g (e-value = 0.0).From these results we cannot conclude whether or not sugarcane has a similar splicing pattern to that found in A. thaliana, or if these clusters represent alleles.But it does seem that sugarcane may have two different expressed mutM homologues.
An 8-oxo-dG glycosylase (OGG1) with functional analog activity to MutM has been characterized in yeasts and mammals (Boiteux and Radicella, 1999).In A. thaliana, a nucleotide sequence homologous to the OGG1 gene was found (by A. thaliana genome project), although functional assays were not performed.A cluster presenting high similarity with OGG1 has been identified by us in the SUCEST database (Table I).Taken together, these data suggest that both MutM and OGG1 are present in plants.Eisen and Hanawalt (1999) have proposed that OGG1 arose in the common ancestor of eukaryotes and that the MutM found in plants is probably derived from the chloroplast genome.Our results also show a wide distribution of MutM in different cDNA libraries and a high number of reads (Table II), while OGG1 occurred in only two cDNA libraries, with only two reads in one cluster (Table II).In the human genome, seven OGG1 isoforms have been observed due to the alternative splicing, and the expression of this gene is both independent of the cell cycle and is not tissue-specific (Nishioka et al., 1999;Bouziane et al., 2000;Radicella et al., 1997).In sugarcane, it is possible to suggest that the MutM protein could be ubiquitous and the OGG1 gene could be targeted to certain tissues by specific control mechanisms.
Endonuclease III (NTH) is one of the most important enzymes in the repair of oxidative damage in pyrimidines, being its major substrate thymine glycol.NTH has been characterized in several organisms and shows a high degree of conservation (Cadet et al., 2000).After the screening of the SUCEST database for NTH protein, we identified one cluster with a high similarity (Table I).We also observed a large range of distribution in the cDNA libraries (Table II).These data are similar to that obtained for the NTH of A. thaliana, which is expressed in different plant tissues (Rodan-Arjona et al., 2000).Plants seem to have different expression patterns for this gene when compared with other eukaryotes.In human cells, the expression of NTH is cell-cycle dependent (Bouziane et al., 2000) and this protein has mitochondrial target sequence and nuclear localization signal (Takao et al., 1998).Two NTH homologous genes have been found in yeasts (NTG1 and NTG2).The NTG1 gene is inducible by exposing cells to DNA-damag-ing agents and its product is present in the nucleus and mitochondria.The NTG2 gene is constitutively expressed and its product is present only in the nucleus (Alseth et al., 1999).It seems that a single locus of NTH homolog is found in sugarcane.
MutY-glycosylase is another important enzyme for the repair of oxidative DNA damage, removing deoxyadenine when paired with 8-oxo-dG, deoxyguanosine or deoxycytosine.Deficiency in this enzyme leads to an increase in G:C to T:A transvertions (Boiteux and Radicella, 1999).Although a nucleotide MutY-like sequence from A. thalina is registered at GenBank, in the our work no cluster showing good similarity with MutY proteins was found among the 250.000ESTs sequences contained in the SUCEST database.This enzyme has been described in several organisms and its absence in sugarcane is unexpected, but a very low level of MutY expression could explain the absence of its cDNA in the SUCEST bank.
The enzyme 3-Methyladenine DNA glycosylase (MAG) recognizes and removes a wide variety of alkylated bases that include 3-methyladenine, 7-methyladenine, 3-methylguanine and 7-methylguanine.MAG genes have been cloned from a variety of bacteria and eukaryotes and can be divided into three gene families, all of which are represented in the A. thaliana genome.Sugarcane presents six clusters with homology to the TagI homologue from A. thaliana (MAG2 in the Table I), the reads of which are widely distributed in the cDNA libraries (Table II).The blastn analysis showed an e-value of 0.0 for the clusters SCACLB1046E02.g and SCCCRZ3003C09.g (Table III), which suggest that they may be alleles.Although all the clusters show alignment overlapping with MAG2 protein, no nucleotide sequence similarity was obtained, suggesting the occurrence of five different genes.One cluster homologous to AlkA-like from A. thaliana (MAG3 in the Table I) was also found in sugarcane but its reads had a limited distribution (Table II).Unlike A. thaliana, no MPG homologue (MAG1 in the Table I) was found in sugarcane.The presence of multiple copies of MAG genes and their expression in several tissues suggest that plants are frequently exposed to alkylating agents.Functional assays in A. thaliana demonstrated that its MAG protein, homologous to MPG from mammals, is preferentially expressed in meristematic tissues, developing embryo and endosperm and organ primordial.This pattern of expression is consistent with a requirement for expression in rapidly dividing tissues.Furthermore, high levels of MAG has also been observed in growing leaves, which undergo a relatively low rate of cell division, suggesting that MAG is required not only for DNA replication, but also for cell growth (Shi et al., 1997).
Apurinic/apyrimidinic endonucleases cleave the DNA backbone at sites where bases are missing.Two distinct families are known.One includes the XTH protein (exonuclease III) of E. coli, the RRP1 protein of D. melanogaster, the HAP1/APE1 protein of mammals, the APN2 protein of yeast and ARP protein of A. thaliana.The other family includes the NFO protein (endonuclease IV) of E. coli and the APN1 protein of yeast.Some other proteins can serve as AP endonucleases, but usually these activities are included as glycosylases, as MutM and NTH which do not function as AP endonucleases on their own.In eukaryotes, the exonuclease III family has multifunctional roles, acting not only as AP endonuclease but also as a reduction-oxidation (redox) mediator, maintaining transcription factors (e.g.Fos and Jun) in an active reduced state (Memisoglu and Samson, 2000;Babiychuk et al., 1994;Evans et al., 2000).In the SUCEST database, we found four homologue clusters to proteins of the exonuclease III family, distributed in different cDNA libraries (Tables I  and II).These were confirmed as exonuclease III family proteins when analyzed at GenBank.In addition, these clusters show alignment overlapping with the ARP protein, but strong divergence when compared between them (Table III).These data suggest that sugarcane may have different forms of exonuclease III-like proteins, a finding consistent with what has been found in other organisms (Hadi and Wilson, 2000;Bennett, 1999).
Sequences homologous to the endonuclease IV family were not found in sugarcane (Table I) nor have they been described in A. thaliana.In general, members of the exonuclease III family are found in most species, whereas members of endonuclease IV family have a more limited distribution.It has been postulated (Eisen and Hanawalt, 1999) that these gene families are ancient and that the absence of either gene from a particular species is likely to be due to gene loss.However, all species encode at least a homologue of one of the two AP endonucleases, so the loss of one of these is tolerated, but not the loss of both (Eisen and Hanawalt, 1999).
In conclusion, our results suggest a high degree of conservation for most of the BER genes in sugarcane.Furthermore, sugarcane seems to have different forms for some genes, such as UNG, MutM, MAG and ARP, which could be the result of gene duplication, alternative splicing or differential promoter utilization.The answers to these questions may be obtained through functional analysis.The mechanisms of DNA repair in plants are as yet poorly understood, there being very few functional assays described for plants and no work has been published for genes such as OGG1.Thus, genome projects like the SUCEST and A. thaliana projects can provide good models for investigating DNA repair in plants.
supported by the Secretaria da Indústria, do Comércio, da Ciência e da Tecnologia do Estado do Rio Grande do Norte.

Table I -
Similarity obtained between BER proteins and clusters from SUCEST bank.
a-P = proteins from Arabdopsis thaliana; B = protein from Escherichia coli; Y = protein from Saccharomyces cereviseae; Y* = protein from Schizosaccharomyces pombe; H = protein from Homo sapiens.

Table II -
Occurrence of the probable sugarcane BER proteins in the SUCEST cDNA libraries.

Table III -
Analyze of the BER SUCEST clusters in the blastn program.