Research Article CitEST

In order to obtain a better understanding of what is citrus, 33 cDNA libraries were constructed from different citrus species and genera. Total RNA was extracted from fruits, leaves, flowers, bark, seeds and roots, and subjected or not to different biotic and abiotic stresses (pathogens and drought) and at several developmental stages. To identify putative promoter sequences, as well as molecular markers that could be useful for breeding programs, one shotgun library was prepared from sweet orange (Citrus sinensis var. Olimpia). In addition, EST libraries were also constructed for a citrus pathogen, the oomycete Phythophthora parasitica in either virulent or avirulent form. A total of 286,559 cDNA clones from citrus were sequenced from their 5’ end, generating 242,790 valid reads of citrus. A total of 9,504 sequences were produced in the shotgun library and the valid reads were assembled using CAP3. In this procedure, we obtained 1,131 contigs and 4,083 singletons. A total of 19,200 cDNA clones from P. parasitica were sequenced, resulting in 16,400 valid reads. The number of ESTs generated in this project is, to our knowledge, the largest citrus sequence database in the world.


Introduction
Citrus is an important crop worldwide, with an annual production estimated at over 105 million tons in the period of 2000-2004(FAO, 2005)).Brazil is one of the main citrus fruit producing countries, together with the Mediterranean countries, the United States and China.More than two thirds of global citrus fruit production comes from these countries.The processing of citrus fruits represents approximately one third of the fruit production, more than 80 percent in frozen concentrated orange juice production.São Paulo State in Brazil, and Florida State in the USA, are the two main orange juice producers.Brazil exports approximately 99 percent of its production while 90 percent of Florida's production is consumed domestically and only 10 percent is exported.
Cultivated citrus species are susceptible to various pathogens including bacteria, fungi, nematodes, viruses and viroids; these are responsible for severe losses worldwide (Deng et al., 2001).Citrus trees usually grow as a combination of productive scion variety bud-grafted onto a rootstock variety adapted to soil and environmental conditions (Forment et al., 2005).Citrus breeding is an expensive, long-term process, and citrus breeders have always faced many problems due to the complex genetic background of this crop.
The Citrus genome size (n) is approximately 367,000,000 bp, making it too expensive to be completely sequenced.On the other hand, expressed sequence tags (ESTs) are, today, a fast and inexpensive way of identifying new genes for obtaining data on gene expression and regulation, and for reconstructing metabolic maps based on genome data.ESTs of many important crops have been generated, including sugarcane (Vettore et al., 2001(Vettore et al., , 2003)), Populus (Sterky et al., 2004), Vitis species (da Silva et al., 2005).For citrus, a genomic shotgun library was also constructed in this work, but the focus was the cDNA libraries which were constructed from different tissues of citrus spe-cies and genera, either in the presence of a pathogen/abiotic stress, or not.
Pathogens are also a focus of CitEST.One of the major concerns of the citrus industry is the phytosanitary problem caused by the different pathogens that attack the crop.Two of these pathogens, Xylella fastidiosa and Xanthomonas axonopodis pv.citri, had their genomes completely sequenced (Simpson et al., 2000;da Silva et al., 2002).In CitEST, EST libraries were constructed for the oomycete Phythophthora parasitica in either virulent or avirulent form.
In this paper, we report the construction and general data obtained from the CitEST libraries.

Biological material
The citrus species and genera as well as all the tissue sources used to construct the cDNA libraries are listed in Table 1.The material was collected at the Centro APTA Citros Sylvio Moreira -IAC (Cordeiropolis, SP, Brazil)

RNA isolation and cDNA library construction
Total RNA was extracted from 1 g of tissue using the TRIzol Reagent, according to the instructions of the manufacturer (Invitrogen).Due to the high content of carbohydrate, total RNA was extracted from seeds according to modifications of the procedure described by Naito et al. (1988).P. parasitica was obtained by filtration of liquid culture and total RNA was extracted from 1 g of the filtrate.
Poly A+ RNA was isolated from 0.5 mg of total RNA using the mRNA Isolation System (Promega Corporation, Madison, WI). cDNA libraries were constructed with the SuperScript Plasmid System with Gateway Technology for cDNA Synthesis and Cloning kit (Invitrogen).Complementary DNA was synthesized from mRNA using a primer consisting of an oligo(dT) sequence with a NotI restriction site.SalI adapters were ligated to the blunt-ended cDNA fragments followed by NotI digestion.The cDNA fragments were size fractionated in Sephacryl S cDNA Size Fractionation Columns (Invitrogen) and cloned into the NotI-SalI restriction site of the pSPORT 1 vector.The ligated cDNA fragments were transformed into E. coli DH5α cells by the ice-cold RbCl/CaCl 2 solution method (Hanahan, 1983).White colonies were inoculated in 200 μL of liquid Circle Grow medium (Molecular Biology Certified Bacterial Growth Media, QBiogene -Bio 101 Systems -USA) containing 8% (v/v) glycerol and 100 μg/mL of ampicillin, in 96-well-microtiter plates, incubated overnight at 37 °C and stored at -80 °C.
The citrus EST collections were catalogued by two characters indicating the genera and species, followed by two numbers indicating the variety, a character and a number indicating the tissue source, and three numbers indicating the conditions.As an example: TS27-C2-300, bark tissue of Citrus sunki under greenhouse.

Shotgun library preparation
For the shotgun library, ten grams of young leaves were collected from Pera Olimpia sweet orange grown in greenhouse and used for total DNA preparation according to Dellaporta et al. (1983).The DNA was purified in a cesium chloride gradient.Twenty micrograms of DNA were partially digested with 0.1 U of Sau3AI (Invitrogen) for 10 min at room temperature.The DNA was separated in a 0.8% agarose gel, the fragments of 1.5 to 3 kb were isolated with the GFX PCR DNA and Gel Band Purification Kit (GE Healthcare) and cloned in pGEM3Z open at the BamHI restriction site.

Plasmid DNA minipreparation and sequencing
Plasmid DNA was extracted by the boiling method (Marra et al., 1999).The sizes of the cloned fragments from the EST libraries were evaluated by digesting the plasmid DNA with PvuII.Visualization of the products was done by electrophoresis in 1.2% agarose gels.The sequencing reactions were prepared according to the instructions of the manufacturer (Applied Biosystems) for the DNA sequencing kit Big Dye Terminator cycle sequencing ready reaction v3 and v3.1.Sequencing was done in ABI 3700 and 3730 DNA Analyzers (Applied Biosystems).

Sequence data analysis
The methods used to submit, process and analyze the ESTs are described elsewhere in this issue (Reis et al., in this issue).

Results and Discussion
A total of 33 cDNA libraries were constructed from different tissues of citrus species and genera at different developmental stages (fruits), or under biotic or abiotic stresses (pathogens and drought), as listed in Table 1.A total of 16, 9, 1, 1, 2 and 4 libraries were prepared from leaves, fruits, flowers (mixture of four developmental stages), seeds (from fruits at different developmental stages), roots and bark, respectively.Two cDNA libraries from mycelia of P. parasitica and one shotgun library from leaves of C. sinensis var.Pera Olimpia were also prepared.The fragment size of the cDNA clones was evaluated for each library and ranged from 1 to 2 kb, depending on the library (average fragment size was 1.5 kb).Vettore et al. (2001) reported an average fragment size of 1,250 bp for clones of sugarcane cDNA libraries.
A total of 13 C. sinensis var.Pera cDNA libraries were constructed, mainly because this is the most important cultivar in Brazil.Leaves from X. fastidiosa, CTV (Citrus tristeza virus), CiLV-C (Citrus leprosis virus cytoplasmic type) and P. parasitica-infected plants were used as RNA sources to evaluate the genes induced/repressed by these pathogens.Two cDNA libraries from Rangpur lime (C.limonia) were constructed, because this Citrus species was the most commonly used as rootstock in Brazil.To compare gene expression in different tissues, libraries were constructed using tissues of the same plant.
cDNA libraries from peel of C. sinensis and C. reticulata fruits with diameters of 1, 2.5, 5, 7, 8, 9 cm and 1, 2.5, 5 cm, respectively, were constructed with the purpose of identifying differentially expressed genes in each stage.Factors associated with productivity and fruit quality are strongly dependent on fruit development.The understanding of the molecular mechanisms by which citrus plants regulate this complex process represents a unique opportunity to improve our understanding of all physiological mechanisms determining fruit setting, fruit size, organic acid accumulation, carbon flow, peel color and morphology.A database containing nonredundant sequences from all cDNA libraries was constructed and used to perform comparisons with all libraries and evaluate individual gene expression levels.
The processing of ESTs is a fundamental step to obtain high quality sequence reads from raw sequencer trace data (Kunne et al., 2005).Table 2 presents the general data obtained from each CitEST library.A total of 286,559 cDNA clones from citrus were sequenced in their 5' end, generating 242,790 valid reads of citrus.A total of 19,200 cDNA clones from P. parasitica were sequenced, generating 16,400 valid reads.The success index is the percentage of reads with more than 150 bp with Phred quality above 20.Table 2 presents the percentage of efficiency per library (number of valid reads / number of submitted reads x 100).The minimum efficiency was observed in TS27-C2-300 and PT11-C2-301, from bark tissue of C. sunki and P. trifoliata, respectively.The maximum efficiency was observed in CS00-C1-100, the library from leaves collected from trees grown in greenhouse.In this case, the leaves used to construct the library were from new flushes of branches, and therefore more suitable for RNA extraction.The sequences are available at: biotecnologia.centrodecitricultura.br Table 3 presents a summary of CitEST data.The average size of reads was 847.48 bp.Forment et al. (2005) reported an average sequence length of 500 nucleotides for citrus EST collection obtained from 25 cDNA libraries.The average EST length from 26 libraries constructed from different sugarcane tissue was 750 bp (Vettore et al., 2001).The number of clusters/specie is also presented in Table 3 and is expressed by (# Contigs + # Singlets).The average efficiency of all libraries was 84.73%.Redundancy is calculated as number of clusters / number of Reads, expressed as a percentage.The high values in redundancy obtained for some species are related to the high number of reads sequenced for these particular species, since the libraries are being exhausted.
The efficiency of the genomic library (shotgun) of sweet orange was 78.2% based on the same quality analysis used for the ESTs.The objective of constructing this library was to identify putative promoter sequences as well as molecular markers that could be useful for breeding programs.We have produced a total of 9,504 sequences for this library and the valid reads were assembled using CAP3.In this procedure, we have obtained 1,131 contigs and 4,083 singletons.These clusters were analyzed for chloroplast or mitochondria contamination using the Blast tool to search for similarity against the chloroplast and mitochondria protein sequences from the Swiss-Prot knowledgebase.In this analysis, it was shown that 107 contigs and 203 singletons were, in fact, contaminant sequences from either chloroplast or mitochondria.This encompasses a total of 1,652 sequences, which corresponds to 17.38% of the total.
In reports from other sequencing projects in Brazil, a total of 237,954 sugarcane ESTs (Vettore et al., 2003), and 123,889 Eucalyptus ESTs (Carrer, 2005) were generated.Forment et al. (2005)   number is, to our knowledge, the largest citrus sequence database in the world.
P. parasitica isolates IAC-01/95 G-40 and IAC-01/95 II, avirulent and virulent forms, respectively, were obtained from the collection of the Phytopathological Laboratory of the Centro APTA Citros Sylvio Moreira -IAC.
obtained 22,635 high quality citrus ESTs from 25 citrus libraries covering different tissues, developmental stages and stress conditions.In the CitEST project, a total of 242,790 citrus ESTs were generated.This

Table 2 -
General data obtained from each CitEST library.