Sugarcane phytocystatins : Identification , classification and expression pattern analysis

The cystatins are tightly binding, but reversible, inhibitors of cysteine proteinases, which constitute a superfamily of evolutionary related proteins. They have been subdivided into three families: the cystatin family which contain two disulfide bonds, the stefin family which lack disulfide bonds, and the kininogen family composed of large glycoproteins containing three repeats similar to those found in the cystatin family. Members of the cystatin superfamily occurring in plants are currently known as phytocystatins, defined as proteins lacking disulfide bonds but possessing a conserved N-terminal amino acid sequence (L-A-R-[FY]-A-[VI]-X(3)-N). We have used the protein sequence deduced from seven phytocystatins (from the Arabidopsis thaliana genome project) and from the SUgarCane EST project (SUCEST) database to identify 25 possible sugarcane phytocystatins. Phylogenetic analysis has allowed us to cluster these phytocystatins into four distinct groups: (i) those with a characteristic N-terminal consensus, (ii) those with the same consensus plus a long C-terminal extension; (iii) those that lack the consensus but contain the highly conserved QxVxG motif found in all members of the superfamily and (iv) those that lack both the consensus and the QxVxG motif.


INTRODUCTION
The cystatin superfamily is composed of tightly-binding, reversibly competitive cysteine proteinases inhibitors (Nicklin and Barret, 1984) called cystatins, which exhibit similarities in their amino acid sequences and functions (Barret et al., 1986).The cystatins have been classified into three distinct families: the cystatin family consisting of groups of small proteins with two disulfide bonds, the stefin family of small proteins (~12 kDa) lacking disulfide bonds, and the kininogen family of large glycoproteins (60-120 kDa) containing three repeats similar to those found the cystatin family (Barret et al., 1986).The activity of cystatins has been explained by the presence of three contact points with the target proteinase.Studies with human cystatin C indicate that the first point of interaction involves the N-terminal glycyl-containing segment, which is substrate-like and accommodates the S1 and S2 enzyme sub-sites (Abrahamson et al., 1987).The other points of interaction occur on hairpin loops formed by antiparallel sheets, the first loop containing the QxVxV segment (a highly conserved sequence present in the cystatin superfamily) which stabilizes the complex by providing an area of extended contact with the proteinase binding site (Bode et al., 1988).The second loop corresponds to a tryptophan-containing segment that also interacts with the proteinase binding site (Nycander and Bjork, 1990).
Plant cystatins present structural peculiarities, genomic arrangements and intrinsic diversity which, when compared to animal cystatins, justify the creation of a new family, the phytocystatins (Kondo et al., 1991;Margis et al., 1998), within the cystatin superfamily.This is supported by the presence of an N-terminal helix consensus sequence HYFQ]-N) only found in phytocystatins, as well as by phylogenetic tree analysis which clusters them in a single branch statistically distinct from the other families and also their diversity in regard to the position and organization of their introns (Margis et al., 1998).
It is widely assumed that phytocystatins perform a defensive role in plants because of their effects on exogenous proteinases such as those produced by insects and nematodes (Liang et al., 1991;Zhao et al., 1996).The involvement of phytocystatins in plant defense is support by the observation that transgenic plants expressing them show enhanced resistance to phytophagous organisms (Leple et al., 1995;Urwin et al., 1995), as well as the fact that they can be induced in tomato and soybean by wounding or methyl-jasmonate (Bolter, 1993;Botella et al, 1996).In seeds, phytocystatin mRNAs show an expression pattern similar to that of major seed storage proteins (Abe et al., 1987, Abe et al., 1992), and this, together with the their ability to inhibit endogenous cysteine proteinases, has led to the proposal that they are involved in the regulation of protein turnover during seed development.More recently, the involvement of phytocystatins as modulators in programmed cell death has been reported (Solomon et al., 1999).
In the study presented in this paper, as part of the SUgarCane EST (SUCEST) program to characterize sugarcane expressed sequence tags (ESTs), we have identified sugarcane phytocystatins and described their structural features, localization and putative functions.
The Multiple Alignment Program (MAP) was used to compute a multiple global alignment of sequences using the pairwise method, its alignment algorithm computing the best overlapping alignment between two sequences without penalizing terminal gaps.This program, designed in a space-efficient manner, allows long sequences to be aligned (Huang, 1994) and was the method used to align the proteins deduced from the SUCEST clusters and the A. thaliana phytocystatins.
Phylogenetic analysis was performed using the Molecular Evolutionary Genetics Analysis (MEGA) software, version 2.0 (Kumar et al., 2000), the pair-wise deletion option being used for the amino acid gaps in the sugarcane cysteine protease multiple alignment, and the phylogenetic tree obtained from neighbor-joining analysis using the p-distance method.For tree construction the confidence levels assigned at various nodes were determined after 5000 replications using the Interior Branch test (Sitnikova et al., 1995).The expression pattern tree was constructed using the unweighted pair group method with averages (UPGMA) number of site differences and pairwise deletion also present in the MEGA software.

Description of SUCEST cDNA libraries
All sugarcane sequences used in this work corresponding to sequenced reads and cluster consensus (Phrap option in data base) were obtained from the SUCEST project (http://sucest.lad.dcc.unicamp.br/en/)and derived from cDNA libraries specific for different sugarcane tissues, organs or growth conditions as follows: apical meristem from mature (AM1) and (AM2) immature plants; 1 cm (FL1) and 5 cm (FL3) flower base; 50 cm (FL4), 20 cm (FL5) and10 cm (FL8) flower stem; lateral buds (LB1 and LB2); large (LR1) and small (LR2) leaf-root insert libraries; etiolated leaves (LV1); the grouped data of two non-redundant libraries (NRn); the grouped data of three root libraries (RTn); the grouped data of three leaf-root transition zone libraries (RZn); stem bark (SB1); grouped data of two seed libraries of different insert sizes (SDn); grouped data of two stem libraries from the first and fourth internodes (STn); libraries derived from calli submitted to a 4-37 °C temperature change and three (CL3), four (CL4), six (CL6) and eight (CL7) hours of a light/dark cycle; plants infected with the bacteria Gluconacetobacter diazotroficans (AD1) and Herbaspirillum rubrisubalbicans (HR1).

Identification of SUCEST phytocystatins
The sequences of seven Arabidopsis thaliana phytocystatins and one typical phytocystatin-II from Oryza sativa were used to find homologous SUCEST clusters with a real potential for presenting cysteine proteinase activity.The overall analysis, using the T-Blast-n algorithm with an e -10 cut-off value, allowed the identification of 25 different consensus clusters that separated into four distinct groups (Table I, Figures 1 and 2).
Relationship of sugarcane phytocystatins Phylogenetic tree of sugarcane phytocystatin clusters shows that they can be classified into four groups: I) those with the characteristic L-A-R-[FY]-A-[VI]-X(3)-N consensus forming part of the N-terminal helix, II) those with the same consensus plus a long C-terminal extension, III) those that lack the consensus but contain the highly conserved QxVxG motif found in all members of the superfamily and IV) those without the conserved QxVxG motif (Figures 1 and 2).
Group I, which includes the rice phytocystatin-II and Ath1 and Ath4, has the typical phytocystatin structural organization of the helix and at turns I and II (Margis et al., 1998).The absence of the L-A-R-[FY]-A-[VI]-X(3)-N consensus normally present in the helix has important con-straints for the maintenance of the three-dimensional organization of these inhibitors because it may introduce conformational changes in members of this group, including possible change of their target proteinases.The C-terminal extension (of as yet unknown function) present in the members of this group has also been found in other phytocystatins (Margis et al., 1998).Members of group II present the same features but with an extended C-terminal end.
Group III is composed of Ath2, Ath3, Ath5 and sugarcane clusters SC14 to SC19, the principal characteristic of this group being the presence of an extended sequence just after the region corresponding to the N-terminal helix.The genomic analysis of several phytocystatins has demonstrated that this region corresponds to a splicing junction *The libraries were are as follows: apical meristem from mature (AM1) and (AM2) immature plants; 1cm (FL1) and 5 cm (FL3) flower base; 50 cm (FL4), 20 cm (FL5) and10 cm (FL8) flower stem; lateral buds (LB1 and LB2); large and small leaf-root insert (LRn); etiolated leaves (LV1); grouped data of three root libraries (RTn); grouped data of three leaf-root transition zone libraries (RZn); stem bark (SB1); grouped data of two seed libraries of different insert sizes (SDn); grouped data of two stem libraries from the first and fourth internodes (STn); grouped data of libraries derived from calli submitted to a 4-37 °C temperature change and three to eight hours of a light/dark cycle (CLn); plants infected with the bacteria Gluconsugarcane diazotroficans (AD1) and Herbaspirillum rubrisubalbicans (HR1).for phytocystins that present introns (Margis et al., 1998).
The presence of this extra-sequence may be due to differential mRNA processing of SC14 and SC15.The real implications of this sequence on the inhibitory activity of phytocystatins is still unknown because no comparative biochemical studies have been carried out either with Aths1, Ath5 or any other phytocystatins presenting this structural peculiarity.Group-IV, except cluster SC20, is composed of clusters not presenting the classical QxVxG signature/motif at turn-I.The detection of these clusters in several SUCEST libraries indicate that they are transcribed, but the conservation of the biological function of these clusters may be questioned since this motif is strictly related to the binding of these inhibitors to the active site of the cysteine proteinases and is indispensable for inhibition.

Analysis of sugarcane phytocystatin expression pattern
A preliminary analysis of sugarcane cysteine proteinases expression patterns was made by direct correlation of the reading frequency of each cluster in the different SUCEST cDNA libraries.The expression pattern analysis of the 25 phytocystatins (Table I and Figure 3) revealed that the best represented clusters were SC01, SC09, SC10, SC11, SC13, SC20 and SC21, with eight to 21 reads present in almost all libraries.Of the four sugarcane phytocystatin groups, group-III was the least represented, with only 19 reads distributed in nine different libraries.Curiously, group-III clusters have a high presence in calli, with all clusters presenting at least one read in this library.
Phytocystatins were originally identified in rice and maize seeds (Kondo et al., 1991).Their direct implication with storage protein mobilization and oryzain regulation has been described.Here, we have found clusters of all four phytocystatin groups in the seed library, but cluster SC01 was definitely the most abundant and probably the sugarcane equivalent to rice cystatins I and II, which are involved in the control of protein mobilization during germination.
In order to identify any correlation between the expression pattern of the clusters in the libraries and the group to which they belong, a dendrogram was produced from the data presented in Table I.A matrix of libraries versus the presence/absence of clusters was generated and this matrix was used to produce the dendrogram of correlation of sugarcane expression pattern (Figure 3).A clear clustering of expression pattern was observed with group-II members (SC07, SC09, SC10, SC11 and SC13) and with three group-IV clusters (SC20, SC222 and SC25).
In the present work we have identified 25 different phytocystatin related clusters in the SUCEST data bank.The analysis of these SUCEST sequences and their expression pattern show that sugarcane possesses phytocystatins with new structural features and with a differentiated expression pattern not only related to seeds but also to other organs and tissues.

Figure 1 -
Figure1-Alignment of sugarcane phytocystatin clusters with seven A. thaliana phytocystatins (Ath1 to Ath7) and rice phytocystatin-II (OS 02).Alignment was performed using the MAP algorithm.Phytocystatins were divided into four major groups according to different sequence characteristics.Regions corresponding to putative phytocystatin N-terminal alpha-helix and to beta-turns I and II are emphasized.

Figure 2 -
Figure2-Phytocystatin phylogenetic tree.The tree was constructed using MEGA software with the following parameters: p-distance, neighbor-joining, pair-wise deletion and internal branch test with 5000 replications.The analysis was performed based on the MAP alignment of 25 sugarcane clusters, seven A. thaliana phytocystatins (Ath1 to Ath7) and rice phytocystatin-II (OS 02).Four major groups with different phytocystatin forms were identified.The values obtained from the internal branch replication (80%) are indicated in the main branch points.

Figure 3 -
Figure 3 -Dendrogram of sugarcane phytocystatin expression pattern.The tree was constructed using the unweighted pair group method with averages (UPGMA) number of site differences and pairwise deletion.MEGA software was used for the analysis.

Table I -
Frequency of sugarcane phytocystatin related reads and clusters in sugarcane cDNA libraries.