A genetic framework for flowering-time pathways in Citrus spp

Floral transition is one the most drastic changes occurring during the life cycle of a plant. The shoot apical meristem switches from the production of leaves with associated secondary shoot meristems to the production of flower meristems. This transition is abrupt and generally irreversible, suggesting it is regulated by a robust gene regulatory network capable of driving sharp transitions. The moment at which this transition occurs is precisely determined by environmental and endogenous signals. A large number of genes acting within these pathways have been cloned in model herbaceous plants such as Arabidopsis thaliana. In this paper, we report the results of our search in the Citrus expressed sequence tag (CitEST) database for expressed sequence tags (ESTs) showing sequence homology with known elements of flowering-time pathways. We have searched all sequence clusters in the CitEST database and identified more than one hundred Citrus spp sequences that codify putative conserved elements of the autonomous, vernalization, photoperiod response and gibberelic acid-controlled flowering-time pathways. Additionally, we have characterized in silico putative members of the Citrus spp homologs to the Arabidopsis CONSTANS family of transcription factors.


Introduction
When grown from seeds, Citrus seedlings progress through a developmental ontogeny typical for woody perennials, eventually producing a moderately sized tree.After a juvenile period, typically lasting several years, Citrus trees enter the adult phase in which they are capable of continuously producing flowers in addition to vegetative shoots (Krajewski and Rabe, 1995).Flowers can potentially be produced throughout the year, but in most oranges and mandarins grown in temperate environments, the majority of flowers are produced during the spring flush.Thousands of flowers are usually produced on established trees, but only a relatively small proportion develops into fruit.In some varieties, pollination, fertilization and seed development are required for fruit set, while in others, parthenocarpic fruit development can occur.In some cases this is stimulated by pollination (Koltunow et al., 2000).
For a given Citrus species and/or variety, the number of fruit on an individual tree is negatively correlated with final fruit size.Consequently, the tendency for Citrus to ex-hibit a biennial bearing pattern of different flowering intensities has a significant impact on fruit size at harvest.In "on" years a relatively large number of flowers are produced (and thus small fruits), while in "off" years relatively few flowers are formed as well as fewer, but bigger fruits (Garcia-Luis et al., 1992;Garcia-Luis and Kanduser, 1995;Garcia-Luis et al., 1995).Because of this effect, trees of a particular variety within a geographical area tend to become synchronized in their biennial bearing pattern.While this simplifies management to some extent, it greatly exacerbates the overproduction of small fruit in "on" years.Thus, the understanding of the molecular regulation of the flowering process is crucial for controlling fruit production in Citrus.
The rapid advances made in understanding Arabidopsis flowering have allowed researchers to begin similar investigations in perennial crops.This knowledge is greatly accelerating flowering research in perennial trees because, at least in a general sense, the same genes appear to be involved in flower initiation, flower formation, and fruit development in all of the important flowering plants.Using the DNA sequence of flowering genes from model plants as a starting point, flowering genes have been successfully isolated from several agriculturally important tree crops, including apple (Yao et al., 1999;Sung et al., 1999;Sung et al., 2000;Kotoda et al., 2000), Citrus (Pillitteri et al., 2004), grape (Boss et al., 2001;Boss et al., 2002), and Eucalyptus (Kyozuka et al., 1997;Southerton et al., 1998;Dornelas et al., 2004;Dornelas and Rodriguez, 2005).
Here we concentrated on the characterization of genes involved in the pathways that lead to the transition from vegetative to reproductive development in Citrus species.With this goal, we have used the sequences of the key proteins of the different developmental pathways involved in the regulation of flowering-time available from Arabidopsis as bait to search the Citrus database of expressed sequence tags (CitEST) showing sequence homology with known elements of flowering-time pathways.Additionally, we have undertaken an extensive in silico characterization of the putative Citrus homologues of the CONSTANS gene family, which, in Arabidopsis, mediate the cross-talk between the circadian clock and the genes controlling reproductive meristem identity.We have identified Citrus sequences that codify putative conserved elements of the vernalization, photoperiod response, autonomous and gibberellic acid-controlled flowering-time pathways.We expect that our results will contribute to further studies describing how these pathways function in controlling the induction to flowering and thus the biennial fruit bearing pattern in Citrus.

Searching Citrus ESTs homologs to Arabidopsis flowering-time genes
The overall goal of this study was to retrieve from the CitEST data set, Citrus spp homologs to all genes described to be involved in the control of flowering time, according to the processes showed in Figure 1.In order to achieve this, data mining in the CitEST database was carried out using published plant gene sequences as bait, as well as keyword searches in the CitEST home page (http://citest.centrodecitricultura.br/).Plant gene sequences used as bait were retrieved from public gene databases (http://www.ncbi.nlm.nih.gov/entrez/query.fcgi) using their corresponding accession numbers or by the use of keywordoriented searches (Mouradov et al., 2002;Izawa et al., 2003).Protein (deduced amino acid) sequences from the retrieved bait sequences were compared to Citrus spp clustered EST sequences using a combination of different Blast algorithms (Altschul et al., 1997), with the BLOSUM62 scoring matrix, with a threshold of e < 10 -10 for positive hits.The identity (in terms of donor cDNA library) and number of sequence read composition of each individual candidate cluster were checked to access their potential expression pattern.
For the results presented in Table 1, we have obtained e-values using the BLASTp algorithm (Altschul et al., 1997) as described above.The identity and the similarity were calculated at the amino acid level, relative to the corresponding Arabidopsis putative homolog, within the extension of the successful sequence alignment produced by their pair-wise comparison.

In silico characterization of the Citrus homologs belonging to the CONSTANS gene family
The Arabidopsis CONSTANS (CO) gene family codifies putative transcription factors defined by two conserved domains (Putterill et al., 1995;Griffiths et al., 2003).The first is a zinc finger region near the amino terminus that resembles B-boxes, which regulate protein-protein interactions in several animal transcription factors (Putterill et al., 1995).The second is a region of 43 amino acids near the carboxy terminus termed the CCT (CO, CO-like, TOC1) domain (Robson et al., 2001).We have identified Citrus homologs to the Arabidopsis CO gene family by using the Arabidopsis sequences as bait and the BLAST algorithms (Altschul et al., 1997) as described above.Only comparisons that produced an e-value better than e -50 were considered highly significant.In the cases where the obtained e-values were between e -50 and e -5 , a re-clusterization of all reads identified was performed using the CAP3 algorithm from the BioEdit Software (Hall, 1999).The novel cluster consensus sequences obtained were re-submitted to BLAST and frequently better e-values were obtained.We analyzed these using the CDD algorithm (Marchler-Bauer et al., 2005) to identify the presence of conserved domains in the deduced protein sequence.et al., 2002 andIzawa et al., 2003).The data underlying the model and the corresponding homologs in Citrus are presented in Table 1 and in the text.For abbreviations and gene names see Table 1.

Comparative and phylogenetic analysis of CONSTANS gene family homologs
To examine the relationships between the Citrus CO-like genes and their putative Arabidopsis homologs in more detail, their nucleotide and predicted peptide sequences were used to determine genetic distances and to construct phylogenetic trees.Because the middle regions of the genes were the most divergent, they could not be aligned with confidence.Therefore, neighbor-joining (NJ) and maximum parsimony (MP) trees were constructed using B-box (and CCT domain sequences when available) following the alignments obtained using the CLUSTALX software (Thompson et al., 1994).The alignments were eventually corrected by hand.Phylogenetic trees were ob-   (Altschul et al., 1997).d ID = identity; SIM = similarity; both based on the amino acid sequence, relative to the putative Arabidopsis homologs.e ext = extension of the successful alignment including eventual insertion/deletion events.

Results
Identifying Citrus ESTs related to flowering-time pathway genes Genetic analyses in model plants such as Arabidopsis identified a whole set of flowering-time genes that were subsequently assigned to four major genetic pathways according to their response to the exposure of a period of cold (vernalization) or to day length (photoperiod) (Simpson et al., 1999;Araki, 2001;Mouradov et al., 2002;Simpson and Dean, 2002;Bastow and Dean, 2003;Amasino, 2004;Boss et al., 2004).The field of flowering time has thus been organized around these four pathways, with the photoperiod and vernalization pathways mediating the response to environmental cues and the autonomous and the gibberellin (GA) pathways acting largely independently of these external signals (Figure 1).Based on the systematic search in the CitEST database using Arabidopsis sequences as bait, we have identified 109 Citrus spp.EST clusters representing putative Citrus spp homologs to flowering-time genes.Some of these genes are required for the day length response, and some encode regulatory proteins specifically involved in the control of flowering, while others encode components of light signal transduction pathways or are involved in circadian clock function.A representation of the relationships among these processes is shown in Figure 1 and the putative homologs of the key players in Citrus spp are presented in Table 1.The role of each of these elements in the flowering-time pathways and their implication for the understanding of Citrus spp flowering processes are presented in the Discussion section.
Two genes play a prominent role at the "bottom" of the flowering promotion cascades: CONSTANS (CO) and FLOWERING LOCUS C (FLC).The FLC gene is the point of convergence of the autonomous and vernalization pathways (Figure 1).Ultimately and in part through CONSTANS (CO) and FLC, the flowering signals lead to the induction of a set of genes called floral meristem identity (FMI) genes and responsible for the fate change of the meristems emerging on the flanks of the shoot apex (Long and Barton, 2000).This group of genes includes the LEAFY (LFY) gene, expressed in early floral stages and responsible for their floral fate (Lohmann and Weigel, 2002).We could not find any putative homolog to LFY in the CitEST database, but Citrus homologs to this gene have already been identified (Pena et al., 2001), thus indicating an underrepresentation of flowering-time sequences in the CitEST dataset.
The CO gene is probably the most downstream actor, specific for the photoperiod pathway (Figure 1) and both the light and the internal clock precisely regulate CO protein accumulation (Valverde et al., 2004).Due to their importance to the regulation of flowering-time, the CO-like sequences found in the CitEST database were studied in greater detail and these results are presented separately in a separate section below.

Elements of the Citrus CONSTANS-like gene family
We have identified a total of 244 Citrus spp EST sequences showing significant (e-value lower than e-10 ) similarity to the Arabidopsis CO-like (COL) genes, by means of a combination of BLAST algorithms and keyword searches in the CitEST database (Table 2).When submitted to the CAP3 algorithm, these sequences were initially organized into 75 clusters.With further comparison of their deduced amino acid sequences, the number of valid clusters was reduced to 27.
Based on previous studies on Eucalyptus (Dornelas and Rodriguez, 2005) and sugarcane (Dornelas and Rodriguez, 2006) COL proteins, we concluded that this gene family evolves rapidly, particularly in the middle regions (see also Lagercrantz and Axelsson, 2000).Thus our analysis focused on the B-box sequences only and we excluded putative homologs to the related Arabidopsis STO (SALT TOLERANCE) gene.STO-like genes have B-boxes but no CCT domain.Additionally, we excluded the related ZIM gene from our analysis, which contains an additional ZIM motif.This short motif is found in a variety of plant transcription factors that contain GATA domains and its conserved amino acids form the pattern TIFF/YXG (Lagercrantz and Axelsson, 2000; Griffiths et al., 2003).We thus restricted our analysis to Citrus spp sequences showing the conserved B-box and CCT domains, according to the definition of the COL family provided by Griffiths et al. (2003).These assumptions explain the reduced number of a When using the BLASTp algorithm (Altschul et al., 1997) and considering an e-value of e -10 .All Arabidopsis CO-like proteins were used as alternative bait sequences.b Number of clusters formed by the given number of ESTs when using CAP3 assembling algorithm (Huang and Madan, 1999).c Number of clusters, after eliminating redundancy and after parsimony analysis.
true putative Citrus spp homologs of COL members shown in Table 2.
As most of the CCT domain sequences are not available for the Citrus spp COL proteins, we produced alignments of the predicted peptides of the conserved B-box region for all Arabidopsis AtCO and AtCOL proteins and their putative Citrus spp homologs (Figure 2A).
Variation within the B-box domain suggested that the CO-like genes could be further subdivided.To further examine the relationship between the putative Citrus spp COL homologs and their Arabidopsis counterparts in more detail, the sequence alignment shown in Figure 2A was used to determine genetic distances and to construct phylogenetic trees.Therefore, neighbor-joining (Figure 2B) and maximum parsimony trees (data not shown) were constructed, giving similar results.The proteins were consistently grouped into three principal clades (Figure 2B).These three groups were identified previously and are thought to have evolved prior to the divergence of monocots and dicots (Griffiths et al., 2003;Dornelas and Rodriguez, 2005;2006).Group III genes comprised Arabidopsis and Citrus spp proteins with two zinc finger domains, the second of which was diverged from the CO-type B-box.Group II genes comprised Arabidopsis and Citrus spp proteins with a single B-box.Group I comprised the most CO-like genes and included Citrus spp putative CO ortologs.Sequence comparisons showed that the clusters CS00-C1-100-086-A06, CL06-C4-501-017-G07, CG32-C1-003-018-D09, CA26-C1-002-061-D07 and LT33-C1-003-096-C01 presented significant similarity (e-value lower than e -10 ) to CO (Table 1), but only CS00-C1-100-086-A06, CG32-C1-003-018-D09, and PT11-C1-901-085-G05 had complete B-box sequences; and thus only these were considered for the phylogenetic analysis.All these three Citrus spp sequences were consistently maintained in the same cluster together with AtCO (Figure 2B).

Discussion
The flowering pathway regulated by gibberellins Because of the importance of crop load, methods for reducing the extent of biennial bearing in Citrus have been investigated for use in commercial production.Winter sprays with gibberellic acid (GA) are one management tool that can be used to regulate flowering, and minimize the effect of biennial bearing.In Citrus, as in many other perennial crops, GA application during bud development can inhibit flower production (Monselise and Halevy 1964;Guardiola et al., 1982;Lord and Eckard, 1987), and in the following spring lead to a greater proportion of single terminal flowers on leafy shoots, which tend to produce the larger fruits.On the other hand, in many annual plants such as Arabidopsis, GA has a promoting effect on flowering.Thus, either GA has contrasting roles in the flowering of different species, or abnormally high GA levels in woody perennials such as Citrus, but not in annuals such as Arabidopsis.This prevents normal flower formation, presumably by disrupting essential developmental events.
The Arabidopsis ga1 biosynthetic mutant flowers extremely late (sometimes never) in SD (Blazquez et al., 1998;Wilson et al., 1992).GA acts, at least in part, by upregulating the LEAFY (LFY) gene.LFY expression is dramatically reduced in ga1 mutant in short days and constitutive expression of LFY is sufficient to rescue the late flowering of this mutant (Blazquez et al., 1998).A ciselement has been found in the LFY promoter that abolishes its response to GA without affecting LFY induction by photoperiod, indicating that the two different pathways are integrated at the level of LFY promoter (Blazquez and Weigel, 2000).GA is also involved in inducing SOC1 expression (Moon et al., 2003) and may also be the FLOWERING TIME (FT) gene.We have found Citrus putative homologs for SOC1 and FT, but no clear homolog sequences to LFY were found within the CitEST database.Nevertheless, it is clear that the Citrus genome contains orthologs to LFY (Pena et al., 2001;Pillitteri et al., 2004).Accordingly, overexpressing the Arabidopsis LFY sequence in transgenic Citrus plants dramatically altered the flowering behavior and the transgenic plants flowered in a few months rather than several years (Pena et al., 2001).

Autonomous and vernalization pathways
Plants require not only external (environmental) factors but also internal (developmental) factors to promote flowering.Although the ecotypes used in the laboratory of Arabidopsis thaliana flower earlier, many ecotypes flower very late or require a cold treatment, vernalization.The FRIGIDA (FRI) gene is responsible for the differences of the lateness of flowering among Arabidopsis ecotypes, as all known early-flowering ecotypes have mutations in the FRI gene (Johanson et al., 2000).The FRI codes for a protein with 619 amino acids that has coiled-coil domain in two positions (Johanson et al., 2000).No putative homolog could be assigned to FRI among the Citrus spp.EST clusters.The FRI protein is a positive regulator of the Flowering Locus C gene, which is a repressor for flowering (Michaels and Amasino, 1999).The FLC gene encodes a MADS-box protein (Michaels and Amasino, 1999;Peacock and Dennis, 1999).Despite the fact that no FRI homolog could be found among Citrus ESTs, we found putative homologs to FLC in six Citrus species (Table 1).Additionally, no sequence was found within the CitEST data set  (Robson et al., 2001;Griffiths et al., 2003).Amino acid colors are default of CLUSTAL software.B. Phylogenetic analysis of CO-like genes.A Neighbour-Joining tree was built based on the of B-box domain alignment shown in A. The Citrus deduced protein names are given in colored boxes.Genetic distances are shown at the given scale.Bootstrap values from 1,000 replicates were used to assess the robustness of the trees.Only bootstrap values above 75% are shown.The domain structures of each protein is also shown to the right side of their names.B1 and B2 are CO-like B-boxes (white rectangles) or derived zinc finger domains (solid rectangles).CCT is the conserved CCT carboxy-terminus domain (Robson et al., 2001).The dotted lines represent incomplete sequences.Arabidopsis MIPS codes are as follows: AtCO (At5g15840); AtCOL1 (At5g15850); AtCOL2 (At3g02380); AtCOL3 (At2g24790); AtCOL4 (At5g24930); AtCOL5 (At5g57660); AtCOL6 (At1g68520); AtCOL7 (At1g73870); AtCOL8 (At1g49130); AtCOL9 (At3g07650); AtCOL10 (AB023039); AtCOL11 (At4g15250); AtCOL12 (At3g21880); AtCOL13 (At2g47890); AtCOL14 (At2g33500); AtCOL15 (At1g28050); AtCOL16 (At1g25440).
that would code for the other elements of the vernalization pathway: VRN1 and VRN2 (Chandler et al., 1996) or for the VIP1-7 genes.VRN2 has a repressible role over the expression of FLC and codes for a protein with homology to PcG proteins (Sheldon et al., 2000).VIP4 was cloned and encodes another PcG protein (Zhang and van Nocker, 2002), and is a repressor of the FLC gene as well.These results indicate that the autonomous branch of the vernalization pathway may be present, in Citrus, but that the connection with cold-sensing may have been lost during evolution.One strong argument in favor of this speculation is that the elements of the vernalization pathway have not been found in any tropical plant for which genomic resources are available including rice, for which the genome is completely sequenced (Izawa et al., 2003), Eucalyptus (Dornelas and Rodriguez, 2005) and sugarcane (Dornelas and Rodriguez, 2006).

Light-dependent pathway and the role of CONSTANS-like proteins
Red light is accepted by phytochrome proteins, which are encoded by PHYA through E genes in Arabidopsis (Reed et al., 1993;Briggs et al., 2001;Ohto et al., 2001).We found putative Citrus spp homologs to PHYA, PHYB and PHYC, but similar to what was observed for other woody species (Dornelas and Rodriguez, 2005), we were not able to find significant similarities among Arabidospsis PHYD and PHYE within the CitEST data set (Table 1).
Blue light receptors are named as cryptochrome proteins, which are encoded by CRY1 and CRY2 in Arabidopsis (Ahmad and Cashmore, 1993;Lin et al., 1998).We found a putative homolog to CRY2 only among C. reticulata sequences, but CRY1 homologs could be found in five different Citrus species (Table 1).Arabidopsis cryptochrome gene CRY1 cooperatively functions with the CRY2 gene to repress the function of CO and GIGANTEA (GI) (Mockler et al., 1999).
The functions of genes LHY, CCA1, ELF3, and TOC1 are related to the circadian clock that processes the light signals and converts them into periodic information (Hicks et al., 2001;Doyle et al., 2002).The processed signal is transmitted to the GI gene, whose product activates the CO gene (Suarez-Lopez et al., 2001).Citrus spp putative homologs to all these circadian clock elements were found (Table 1), suggesting that the molecular elements of the circadian clock may be conserved among herbaceous and woody plants, despite their divergent reproductive behavior.This has also been observed for other woody species such as Eucalyptus (Dornelas and Rodriguez, 2005).These results thus indicate that the observed differences in the reproductive development between herbaceous and woody plants are likely to be the product of different interactions among clock elements rather than differences in the clock components themselves.
We have paid special attention to the characterization of the putative Citrus spp homologs to the Arabidopsis CO-like family members.The CO and CO-like genes encode nuclear zinc finger-containing proteins, suggesting potential transcription factor function, but the precise mechanism of CO action is not yet understood (Parcy, 2005).In particular, CO has not been shown to bind DNA and is, therefore, assumed to be tethered to regulatory sequences through interaction with other transcription factors (Hepworth et al., 2002).Recently, evidence has accumulated indicating that CCAAT binding factors can mediate interactions between CONSTANS-like proteins and DNA (Ben-Naim et al., 2006).The members of the CO-family are very conserved and can be found among diverse angiosperm species and even in Physcomitrella (Zobell et al., 2005), suggesting that the function of these proteins in controlling reproductive development may be conserved as well.
The precise analysis of CO expression pattern has recently led to new and exciting questions regarding CO mode of action (Takada and Goto, 2003;An et al., 2004).Indeed, the photoperiodic signal was known to be perceived in leaves and somehow transmitted to the apex by the unknown florigen signal (Zeevaart, 1976;Bernier et al., 1993;Colasanti and Sundaresan, 2000).The discovery that CO is expressed in the vascular system of the leaves (in the phloem companion cells) and induces FT in this tissue, suggests that the florigen signal is downstream or at the same level as CO (Takada and Goto, 2003;An et al., 2004).Expression of CO from different promoters showed that CO triggers early flowering when expressed in the leaf phloem but not in the apex (An et al., 2004, Ayre andTurgeon, 2004).These experiments convincingly suggested that CO acts from the leaves and that the florigen is downstream of CO.Accordingly, all Citrus spp.contigs that showed significant similarity to CO (Table 1; Figure 2) are formed exclusively by leaf-derived ESTs (with the exception of a C. limonia EST, CL06-C4-501-017-G07, which is derived from root tissues).As opposed to CO, its target gene FT can trigger early flowering when expressed either from the leaves or from the apex, suggesting either that FT itself is the florigen or that FT can induce the florigen synthesis both from leaves and the apex.Knowing that CO acts from the leaves to induce FT also raises many questions about the induction of SOC1 and LFY.In Arabidopsis, both LFY and SOC1 expression increase at the apex during the floral transition (SOC1 in the apex itself and LFY in the flower anlagen).To date, the function of other CO-like family members is largely unknown.Nevertheless, there is evidence that COL proteins may directly interact with CO to provide the correct control of flowering time mediated by light (Martin et al., 2004).It will be interesting to access the expression patterns of the different Citrus CO-like family members to see if their transcription correlates with the transition to the reproductive phase.

Conclusions and Perspectives
There are physical, chemical, and biological signals that contain information for the onset of flowering.The four known pathways that respond to these signals have been characterized in Arabidopsis and some herbaceous model plants.The genetic-based framework of these pathways in these model plants can now be assessed by molecularly cloning each member.This task is generally much more difficult and time-consuming in woody plants due to their extended life cycles.Here we present the initial construction of a genetic framework containing the molecular elements which putatively control the flowering pathways in seven different Citrus species.Precise characterization of the in situ expression patterns of all these Citrus spp putative flowering-time genes will be important to understanding their roles in the flowering process, opening the way for the manipulation of their expression patterns in the future.The function of these elements can now be tested in heterologous systems, such as Arabidopsis, via transgenic approaches.We believe our results will be a valuable source for future research on the control of flowering and of biennial fruit bearing patterns in Citrus.

Figure 1 -
Figure 1 -Overview of the relationships among the elements involved in the flowering-time pathways in the model plant Arabidopsis thaliana (afterMouradov et al., 2002 andIzawa et al., 2003).The data underlying the model and the corresponding homologs in Citrus are presented in Table1and in the text.For abbreviations and gene names see Table1.

Table 1 -
Citrus ESTs that share homology to flowering-time genes of Arabidopsis.

Table 2 -
Citrus putative homologs to the CONSTANS-like genes of Arabidopsis.