Cloning and characterization of endo-β-1,4-glucanase genes in the common wheat line three pistils

In this work, we report the cloning and characterization of endo-β-1,4-glucanase (EGase) genes (TaEG) in the common wheat line three pistils. Three TaEG homoeologous genes (TaEG-4A, TaEG-4B and TaEG-4D) were isolated and found to be located on chromosomes 4AL, 4BS and 4DS, respectively. The three genes showed high conservation of their coding nucleotide sequences and 3 untranslated region. The putative TaEG protein had a molecular mass of 69 kDa, a theoretical pI of 9.39 and a transmembrane domain of 74–96 amino acids in the N-terminus that anchored the protein to the membrane. The genome sequences of TaEG-4A, TaEG-4B and TaEG-4D contained six exons and five introns. All of the introns, except for intron IV, varied in length and sequence composition. Phylogenetic analysis revealed that TaEG was most closely related to rice (Oryza sativa) OsGLU1. The TaEG transcript levels increased significantly during the subsidiary pistil primordium differentiation phase (spike size ∼7–10 mm) in Chuanmai 28 TP (CM28TP). These data provide a basis for future research into the function of TaEG and offer insights into the molecular mechanism of the three pistils mutation in wheat.


Introduction
are enzymes that hydrolyze polysaccharides containing a 1,4-b-glucan backbone and are produced by prokaryotes and eukaryotes (Henrissat et al., 1989;Beguin, 1990;Watanabe et al., 1998;Byrne et al., 1999;Rosso et al., 1999). These enzymes have also been classified as cellulases because of their central role in cellulose degradation and their potential to modify cellulose-containing materials (Bisaria and Mishra, 1989;Beguin, 1990;Beguin and Aubert, 1994). Sequence analysis has shown that all cloned plant EGases can be classified into two subfamilies (Nicol et al., 1998). One of these subfamilies consists of soluble secreted EGases such as avocado Cel1, pepper CX1 and tomato Cel2 that are expressed specifically during fruit ripening (Christoffersen et al., 1984;Lashbrook et al., 1994;Ferrarese et al., 1995), while others such as bean BAC1, soybean SAC1 and elder JET1, are associated with abscission (Tucker et al., 1988;Kemmerer and Tucker, 1994;Taylor et al., 1994). The other subfamily consists of membrane-anchored proteins located in the plasma membrane that also contribute to nor-mal cellulose formation (Brummell et al., 1997a;Nicol et al., 1998;Del Campillo, 1999;Molhoj et al., 2001). This subfamily includes the Arabidopsis KOR gene and the O. sativa OsGLU1 gene that are involved in cell elongation (Nicol et al., 1998;Lane et al., 2001;Zhou et al., 2006).
The common wheat (Triticum aestivum L.) line three pistils (TP), which was selected by Peng (2003) from the "trigrain" wheat cultivar, is a valuable mutant for improving wheat yield. Since the TP mutation has normal spike morphology but produces three pistils per floret it has potential for increased grain number per spike. In a previous study, a sequence from an expressed sequence tag (EST) was initially identified in a TP mutation in wheat using an annealing control primer system (ACP) (Yang et al., 2011). Blast searches of this sequence against the GenBank database revealed that it was homologous to the EGase gene. To our knowledge, the role of EGase genes in wheat development has not yet been reported.
As part of an investigation into the function of the wheat EGase gene (TaEG) in pistil development in this work we have cloned, characterized and phylogenetically analyzed the TaEG gene in the common wheat line three pistil. We also examined the expression patterns of this gene in different developmental stages of young spikes.

DNA and RNA isolation
Genomic DNA was isolated from young spikes of CM28TP and CM28 and fresh leaves of the NT and Dt lines, as described by Porebski et al. (1997). The DNA was dissolved in Tris-EDTA (TE) buffer and stored at -20°C. Total RNA was isolated from young spikes as described by Manickavelu et al. (2007). The RNA was dissolved in RNase-free double distilled water (ddH 2 O) and stored at -70°C. The quality of the DNA and RNA was confirmed by agarose gel electrophoresis and the nucleic acid concentrations were determined spectrophotometrically based on the 260/280 nm absorbance ratio.

Cloning and chromosomal mapping of TaEG
A TaEG EST (DETP-3) was initially identified in the three pistils mutation of wheat by using an annealing con-trol primer system (ACP) (Yang et al., 2011). Blast search analysis indicated that a full length cDNA (GenBank ID: AK373303) in barley shared perfect identity with the TaEG EST. The PCR primer pair TaEG-1 was designed with Primer Premier 5.0 based on the cDNA of AK373303 (Table 1). The cDNA and genomic DNA sequences were amplified using the same primers (TaEG-1). PCR amplification was done in a thermocycler (My-Cycle, Bio-Rad, San Diego, CA, USA) in a volume of 50 mL containing 100 ng of genomic DNA or reverse transcriptase reaction product (see below for details), 100 mM of each dNTP, 1.5 mM Mg 2+ , 1 U of Taq DNA polymerase, 0.4 mM of each primer and 1PCR buffer. The PCR cycling conditions included pre-denaturation at 94°C for 5 min followed by 35 cycles of 94°C for 30 s, 60°C for 30 s and 72°C for 4 min, and a final extension at 72°C for 10 min. The amplified products were visualized by gel electrophoresis in 1% agarose gels and then documented with a Gel Doc 2000TM system (Bio-Rad). The target DNA bands were recovered and purified from the gels using Qiaquick Gel extraction kits (QIAGEN, Shanghai, China). The purified PCR products were cloned in the pMD-19T vector according to the manufacturer's instructions (Takata, Dalian, China). Transformants were plated on LB agar containing ampicillin. Clones with inserts were identified using blue/white colony selection. Positive clones were then screened and sequenced by Taihe Biotechnology Co. Ltd. (Beijing, China).
The chromosomal location of TaEG was mapped using the genomic DNA of 36 Dt lines and 5 NT lines (see above). The authenticity of the Dt lines and NT lines was confirmed with SSR markers before use. The gene-specific primers for TaEG-IN3 (Table 1) were designed based on intron which shows clear sequence variations among chromosomes A, B and D ( Figure 1).

Molecular characterization of TaEG
The sequence data were analyzed with GenScan software. The open reading frame (ORF) of the cDNA sequence was searched using ORF finder software. A computation tool from the Swiss-Prot/TrEMBL entries was used to calculate the isoelectric point (pI) and molecular mass (Mr) of TaEG. The deduced TaEG sequence was investigated for the presence of signal peptide cleavage sites and transmembrane helices using software available through Yang et al. 401

Multiple sequence alignment and phylogenetic analysis
The deduced amino acid sequence of TaEG was aligned with the EGase genes reported for other species using the ClustalW program (Thompson et al., 1994). Neighbor-joining trees of the EGase genes were generated with MEGA software version 5.0 (Saitou and Nei, 1987;Tamura et al., 2007)

Real-time PCR
Total RNA was isolated from spikes of the nearisogenic line CM28TP and its recurrent parent, CM28, at various developmental stages (2-5, 5-7, 7-10, 10-13, 13-15, 15-17 and 17-20 mm in length). The cDNA was synthesized using a PrimeScript Perfect real-time RT reagent kit (TaKaRa, Dalian, China). The primers for TaEG-2 (Table 1) were designed using Primer Express 2.0 software to amplify 101-bp fragments of the TaEG gene. Real-time as- 402 EGase gene in a common wheat line says were done with SYBR green dye (TakaRa) using a Bio-Rad CFX96 real-time PCR platform. All of the samples were analyzed in triplicate and the fold change in RNA transcripts was calculated by the 2 -DDCt method (Livak and Schmittgen, 2001) with the wheat housekeeping genes ubiquitin (DQ086482) and actin (AB181911) as internal controls (Hama et al., 2004;Yamada et al., 2009).

Results
Cloning and chromosomal location of genomic and cDNA sequences Cloning followed by PCR product analysis resulted in the identification and isolation of three genomic sequences  (Figure 2). Further analysis showed that the longest PCR product (3625 bp) was located on the long arm of chromosome 4A, the intermediate product (3574 bp) on the short arm of chromosome 4D and the shortest product (3510 bp) on the short arm of chromosome 4B. The three homologous genes were tentatively designated as TaEG-4A, TaEG-4D and TaEG-4B, respectively. The sequences were deposited in GenBank under accession numbers KC521526, KC521527 and KC521528.
The cDNA from young spikes of CM28TP amplified with the TaEG-1 primer pair yielded a fragment of~2.1 kb. The PCR product was cloned into the PMD-19T vector and 15 clones were sequenced. Three distinct sequences of 2061 bp, 2084 bp and 2087 bp were identified (accession numbers: AC521523, AC521524 and AC521525, respectively), indicating that the band of~2.1 kb included comigrating cDNA of different homologous genes. The three homologous genes shared 100% homology in their coding and 3' untranslated regions. The lengths of the 5' untranslated region were 103 bp, 126 bp and 129 bp for TaEG-4A, TaEG-4B and TaEG-4D, respectively. The open reading frame (ORF) of TaEG was 1869 bp and coded for a deduced protein of 622 amino acids. Primary structure analysis using Swiss-Prot/TrEMBL revealed that the molecular mass of the putative TaEG protein was 69 kDa with a theoretical pI = 9.39. The putative TaEG protein lacked signal peptide cleavage sites but had a 74~96 amino acid transmembrane domain. BLAST searches showed that the TaEG belonged to glycosyl hydrolase family 9 (Henrissat et al., 1989). Sequence alignment suggested that TaEG had high homology with other plant EGase proteins, all of which contained four Cys residues and two putative glycosyl hydrolase active sites, namely, 'SY-VG-G-YP-VHHR' and 'F-DVR-N-NYTE-TLAGAN' (Figure 3). Yang et al. 403

Structure of the TaEG genomic sequences
The TaEG-4A, TaEG-4B and TaEG-4D genomic sequences were 3625 bp, 3510 bp and 3574 bp long, respectively; their alignment and comparison with the corresponding cDNA sequences revealed the complex structure of the three homologous genes that consisted of six exons and five introns (Figure 4). The five introns fulfilled the GT-AG relue but varied markedly in length and sequence composition. The length of intron III was 905 bp, 792 bp and 850 bp in TaEG-4A, TaEG-4B and TaEG-4D, respectively, which was longer than the other four introns. Intron IV was 85 bp long in the three genes and was the shortest of the five introns. Intron I was 395 bp, 369 bp and 376 bp long in TaEG-4A, TaEG-4B and TaEG-4D, respectively. Intron II showed two lengths (88 bp and 86 bp) in TaEG-4A, TaEG-4B and TaEG-4D whereas intron V had three lengths, i.e., 91 bp, 92 bp and 90 bp.

Phylogenetic analysis of TaEG and plant EGases
To construct the phylogenetic tree, EGases from A. thaliana (AtCel1 and KOR), O. sativa (OsGLU1), S. lycopersicum (TomCel2, TomCel3 and TomCel8), P. nigrum (CcCel3), N. tabacum (NtCel2 and NtCel8), P. communis (PcEG2 and Cel1), M. indica (MiCel1), B. napus (Cel16) and G. barbadense (Cel) were analyzed. Figure 5 shows the tree obtained with the neighbor joining program using MEGA software and 1000 bootstrap replicates. Previous analysis identified two types of EGase in plants, namely, secretory EGase and transmembrane EGase, with the major difference between them being the presence of a signal peptide in the former and a transmembrane domain in the latter (Brummell et al., 1997a). In the present study, these EGases formed into two well-resolved clades (clades 1 and 2). Clade 1 contained eight members and the deduced proteins possessed a signal peptide, e.g., Tom Cel2, NtCel2 and PcEG2. Further inspection of clade 1 revealed the presence of two subclades (subclades 1 and 2), with the major difference between them being the presence (subclade 1) or absence (subclade 2) of a cellulose binding domain (CBD) (Zhou et al., 2004). Clade 2 contained seven members and the deduced proteins had a transmembrane domain, e.g., Tom OsGLU1, Cel16 and KOR. TaEG was located in clade 2 and showed a much greater similarity (83%) to O. sativa OsGLU1. Phylogenetic analysis also indicated that TaEG was a membrane-anchored protein.

TaEG gene expression in developing spikes
Real-time PCR was used to study the pattern of TaEG expression in different tissues during the developmental stages of young spikes. As shown in Figure 6, in young spikes of CM28TP at stages 2-5, 5-7, 10-13, 13-15, 15-17 and 17-20 mm the transcript levels were similar to those observed in CM28. However, the transcript levels in young spikes of CM28TP at stages 7-10 mm were 5.6-fold higher than in CM28 spikes at the same stage.

Discussion
An EST sequence (DETP-3) previously identified in the three pistil mutation of wheat using an annealing control primer system (ACP) was shown to have functions similar to those of the EGase gene (Yang et al., 2011). The EST was therefore tentatively designated as an T. aestivum endoglucanase gene (TaEG). In the present study, cDNA cloning and genome sequencing of TaEG from wheat showed that there are three homoeologous TaEG genes in wheat, namely, TaEG-4A, TaEG-4B and TaEG-4D located on chromosomes 4AL, 4BS and 4DS, respectively. The three homoeologous genes showed high conservation of the nucleotide coding sequences and 3 untranslated regions, with homologies up to 100%. The putative TaEG Yang et al. 405 Figure 4 -Schematic representation of the TaEG gene. Endo-b-1,4-glucanases have been classified into two subfamilies (Nicol et al., 1998). One subfamily consists of soluble secreted enzymes that contain an N-terminal signal peptide and EGase domain; these enzymes are located in the periplasm where they function as cell wall-softening enzymes that modify the cell wall. The other EGase subfamily consists of membrane-anchored proteins that have a predicted N-terminal transmembrane anchor motif (Brummell et al., 1997a). TaEG, like KOR and TomCel3, belongs to the latter subfamily and contains a transmembrane domain located between amino acids 74 to 96. The precise function of membrane-anchored EGases is currently unclear, but these enzymes are located in the plasma membrane and participate in cellulose metabolism in the inner layers of the cell wall (Brummell et al., 1997a;Nicol et al., 1998;Del Campillo, 1999;Molhoj et al., 2001). Some membrane-anchored EGases are involved in cell elongation. In tomato vegetative tissues the highest levels of TomCel3 mRNA were found in the most actively growing cells, in the expanding zones of young hypocotyls and in young expanding leaves (Brummell et al., 1997b). The Arabidopsis KOR gene has been identified with the dwarf mutant KORRIGAN and found to be involved in normal wall assembly and cell elongation (Nicol et al., 1998). In G. barbadense the Cel gene is involved in cotton fiber elongation (Zhu et al., 2011). Our phylogenetic analysis revealed that the TaEG genes clustered more closely with O. sativa OsGLU1. Furthermore, alignment analysis demonstrated that TaEG shared more identity with OsGLU1 (83%) than other membrane-anchored EGases. These findings suggest that TaEG is an ortholog of O. sativa OsGLU1 and may have similar biological roles. The O. sativa OsGLU1 gene has been identified with the dwarf mutant glu and found to be involved in cell elongation, cellulose and pectin content (Zhou et al., 2006).
Expression analysis showed that the highest levels of TaEG mRNA occurred in the 7-10 mm stage of CM28TP spikes. CM28TP has two more pistils per floret than CM28 and young spikes at the 7-10 mm stage in the primordial differentiation phase of subsidiary pistils (Yang et al., 2011). Based on these observations, we suggest that TaEG may contribute to pistil development and could be functionally important in defining the morphology of the three pistil mutant. However, this suggestion requires confirmation through additional experiments.