Expressed sequence tag analysis of khat (Catha edulis) provides a putative molecular biochemical basis for the biosynthesis of phenylpropylamino alkaloids

Khat (Catha edulis Forsk.) is a flowering perennial shrub cultivated for its neurostimulant properties resulting mainly from the occurrence of (S)-cathinone in young leaves. The biosynthesis of (S)-cathinone and the related phenylpropylamino alkaloids (1S,2S)-cathine and (1R,2S)-norephedrine is not well characterized in plants. We prepared a cDNA library from young khat leaves and sequenced 4,896 random clones, generating an expressed sequence tag (EST) library of 3,293 unigenes. Putative functions were assigned to > 98% of the ESTs, providing a key resource for gene discovery. Candidates potentially involved at various stages of phenylpropylamino alkaloid biosynthesis from L-phenylalanine to (1S,2S)-cathine were identified.

Khat (Catha edulis), a perennial flowering shrub native to East Africa and the Arabian Peninsula, has long been cultivated for its neurostimulant properties.Evidence suggests that chewing young khat leaves as a social activity dates back at least a thousand years (Klein et al., 2009) and might even predate the use of coffee (Balint et al., 2009).It was not until 1975 that United Nations laboratories identified the phenylpropylamino alkaloid (S)-cathinone as the compound largely responsible for the mild euphoric and anorexic properties of khat (United Nations, 1975).Studies involving the short-and long-term dangers of chewing khat are generally inconclusive, and the physical harm and dependence caused by the plant remain controversial (Mateen and Cascino, 2010).For example, while some studies have linked khat consumption with genotoxic effects in humans (Kassie et al. 2001), others have highlighted khat as a po-tential source of anti-cancer agents (Bredholt et al. 2009).Recent evidence has linked khat use with impaired memory and cognitive flexibility (Colzato et al. 2011).While khat chewing is a longstanding tradition in parts of East Africa and in the Middle East, possession of khat is illegal in Canada, the United States and parts of the European Union.Fresh khat is a scheduled drug under the controlled substances legislation in Canada and the United States, yet may be imported with proper licensing in Australia and use of the plant is unregulated in the United Kingdom, the Netherlands (Klein et al., 2009) and in Israel (Krizevski et al. 2008).A sizable portion of the seven metric tons of licit khat, which is classified as a vegetable in the United Kingdom and therefore not subject to tax, is estimated to travel through Heathrow Airport each week destined for blackmarket distribution in North America (Klein et al., 2009).Due to controversial and inconsistent domestic policies, and fast-growing communities of East African immigrants (Gebissa, 2010), khat has become a subject of international concern.
Although this pathway has been partially characterized at the biochemical level, no biosynthetic genes involved in the conversion of trans-cinnamic acid to phenylpropylamino alkaloids have been isolated.To establish a functional genomics platform aimed at gene discovery, we took the approach of building an EST library from biosynthetically active khat leaves.It was recently shown that the pathway intermediates 1-phenylpropane-1,2-dione and (S)-cathinone, and the end products (1S,2S)-cathine and (1R,2S)-norephedrine accumulate mainly in young leaves and flowers of khat with lesser quantities in young stems (Krizevski et al., 2007(Krizevski et al., , 2008)).In contrast, mature leaves lack (S)-cathinone and accumulate only (1S,2S)-cathine and (1R,2S)-norephedrine suggesting that phenylpropylamino alkaloid biosynthetic gene expression is highest in young tissues.For this reason young khat leaves were selected for EST analysis.Khat shrubs (Catha edulis, Forsk.) were grown in open field conditions using commercial cultivation practices, including drip irrigation and fertilization, at the Newe Ya'ar Research Center in Northern Israel.Young khat leaves approximately 1-3 cm in length were harvested from five-year-old plants during daylight hours in November 2006.Total RNA was isolated using an RNeasy Midi kit (Qiagen) and poly(A)+ RNA was selected using a Dynal Dynabeads kit (Invitrogen).The poly(A) + RNA was converted to cDNA using the ZAP cDNA synthesis kit (Stratagene) and the resulting clones were unidirectionally inserted into EcoRI and XhoI sites within the phage vector lUni-ZAPII XR, and packaged by Gigapack III Gold packaging extract (Stratagene).Primary libraries were converted into plasmids by in vivo excision, and Escherichia coli colonies were randomly transferred to 96-well microtiter plates for automated plasmid preparation using Templiphi Template Amplification kit (GE Healthcare Life Sciences).Twenty randomly chosen plasmid clones were digested using EcoR1 and Xho1 restriction enzymes for agarose gel electrophoresis analysis to check the insertion rate and average insert length.Sequencing of cDNA inserts was performed using an ABI Prism Big Dye terminator sequencing kit (Applied Biosystems) and an AB 3730 genetic analyzer (Applied Biosystems).
A total of 4,896 clones were randomly selected from the C. edulis library and submitted for unidirectional sequencing from the 5' end using M13 primer.DNA sequencer traces were interpreted and vector and low-quality sequences were eliminated using PHRED (Ewing et al., 1998) and LUCY (Chou and Holmes, 2001), resulting in 4,723 high-quality expressed sequence tag (EST) sequences (96.5%).The ESTs were submitted to GenBank and assigned accession numbers JG723448 through JG728170.Cluster analysis and contig assembly were performed using STACKPACK (Miller et al., 1999), resulting in 3,293 unigenes (Supplemental Table S1).Sequence comparisons were done using the BLAST algorithm (Altschul et al., 1990) with the public sequence databases TAIR Proteins v.8 and UniProt Plants v.14.5.BLAST analysis yielded matches for the majority of unigenes, with only 21 and 56 ESTs finding "no hit" when compared to TAIR Proteins (< 1%) and UniProt (1.7%) databases, respectively (Table S1).These results compare favorably with similar EST-based gene expression studies.For example, an ESTbased study of gene expression in flax (Linum usitatissimum) seed that used similar homology-search parameter cutoffs (e.g.E-value of e-6) revealed a match between only 76.4% of flax unigenes with Arabidopsis proteins (Venglat et al., 2011).In another example, analysis of 5,023 unigenes derived from Madagascar periwinkle (Catharanthus roseus) yielded a "no hit" rate of 14.2% against GenBank entries, although different annotation parameters were used (Murata et al., 2006).Khat unigenes that showed significant homology (Evalue < e-10) to known proteins of UniProt Plants were selected for Gene Ontology (GO) annotation and mapping to the TAIR database, which is updated on a regular basis (Berardini et al., 2004).GO Annotation analysis assigned a functional category to 2,839 (88%) of the unigenes possessing hits against public databases (Supplemental Table S2).However, to better reflect khat transcripts putatively involved in specialized metabolism, including phenylpropylamino alkaloid biosynthesis, which is a category not included in GO annotations, the Arabidopsis-based ontology results were manually verified and reclassified into 8 categories (Figure 3).Although a large proportion of the khat library (33%) appears dedicated to primary metabolism, nearly 5% of the ESTs encoded proteins putatively involved in specialized metabolism.This category includes candidates for enzymes shown in Figure 1, and those putatively involved in flavonoid and terpenoid biosynthesis.
The recent discoveries of biosynthetic genes involved in benzoic acid metabolism facilitated a tBLASTn-based search of the khat EST library for homologues, all of which were detected except for Arabidopsis thaliana aldehyde oxidase-4 (AtAAO4) catalyzing the dehydrogenation of benzaldehyde to benzoic acid (Ibdah et al., 2009) (Figure 1, Table 1).However, an EST with extensive similarity (E-value = e-129) to Antirrhinum majus benzaldehyde dehydrogenase (AmBALDH) (Long et al., 2009) was identified, suggesting that benzoic acid biosynthesis in khat is more similar to the pathway in snapdragon petals than in Arabidopsis seed since AmBALDH and AtAAO4 likely catalyze the same reaction in non-b-oxidative benzoic acid metabolism.Highly homologous ESTs were also identified using At4CL1 and PhKAT1 as queries suggesting that a b-oxidative, CoA-dependent pathway leading to benzoyl-CoA production might also occur in khat leaves (Table 1).An alternative pathway operative in lactic acid bacteria circumvents the PAL-catalyzed production of trans-cinnamic acid.In this case, phenylpyruvate, a transamination product of Phe, serves as a precursor to benzaldehyde (Nierop-Groot and de Bont, 1999).An Arabidopsis transaminase producing phenylpyruvate from Phe was recently characterized (Prabhu and Hudson, 2010).However, no close homologues were found among khat ESTs.
Beyond benzoic acid biosynthesis, the reactions leading from the formation of 1-phenylpropane-1,2-dione to (1S,2S)-cathine and (1R,2S)-norephedrine are not well understood.The recruitment of a ThDP-dependent enzyme for the carboligation of pyruvate with a benzoyl derivative has been proposed (Grue-Sørensen and Spenser, 1989), although the involvement of such an enzyme in ephedrine alkaloid biosynthesis has not been demonstrated.Two ThPD-dependent enzymes isolated from microbes, acetohydroxyacid synthase (AHAS) and pyruvate decarboxylase (PDC), catalyze the conversion of benzaldehyde to (R)-phenylacetylcarbinol (Figure 2), an intermediate in the semi-synthetic production of ephedrine alkaloids (Engel et al., 2003;Meyer et al., 2011).In addition to (R)phenylacetylcarbinol, mutant Zymomonas mobilis PDCs catalyze the formation of (S)-phenylacetylcarbinol, along with R and S forms of 2-hydroxypropiophenone (Pohl et al., 1998) (Figure 2).The possibility that khat possesses a PDC enzyme with similar catalytic flexibility must also be considered.Enzymatic and molecular characterization of this carboligation step will be necessary to unequivocally establish the biosynthetic precursors of (S)-cathinone.
The potential involvement of benzoyl-CoA as a precursor to (S)-cathinone has been suggested (Grue-Sørensen andSpenser, 1988, 1989).A ThPD-dependent enzyme could catalyze a carboligation reaction between the benzoyl moiety of benzoyl-CoA and the C 2 -C 3 component of pyruvate (Supplemental Figure S1).Similar to reaction schemes proposed for ThPD-dependent enzymes such as AHAS (Jaña et al., 2010, Engel et al., 2003) and PDC, the decarboxation of pyruvate yields a hydroxyethyl-thiamin diphosphate anion/enamine intermediate that would attack the carbonyl carbon of benzoyl-CoA to initiate condensation, release of a CoASH leaving group and the formation of 1-phenylpropane-1,2-dione. A similar, but not identical reaction mechanism involving benzoic acid in lieu of benzoyl-CoA is also possible whereby acid-catalyzed protonation at the carbonyl oxygen would permit nucleophilic attack by the anion/enamine intermediate.
Searching the khat EST library revealed three candidate sequences with homology to ThPD-dependent enzymes putatively involved in the formation of 1-phenylpropane-1,2-dione (Table 1).Unigenes 017_C06-044 and 034_C01-011 annotated as AHAS, reflecting their close homology with characterized plant AHAS enzymes (76%  such as peppermint (Mentha x piperita) (Davis et al., 2005), opium poppy (Papaver somniferum) and black henbane (Hyoscyamus niger) (Ziegler and Facchini, 2008).In each case, reduction to a specific stereoisomer occurs via an enzyme belonging to the short chain dehydrogenase/reductase (SDR) protein family.Interestingly, a bacterial SDR protein was found to reduce N-methylated (S)cathinone to (1S,2S)-pseudoephedrine, but not to (1R,2S)ephedrine (Kataoka et al., 2006(Kataoka et al., , 2008)), which supports the hypothesis that two distinct SDR enzymes are involved in the formation of (1S,2S)-cathine and (1R,2S)-norephedrine, respectively (Krizevski et al., 2010).Despite its long history, khat has recently become a controversial plant and is regulated, along with its phenylpropylamino alkaloid constituents, as a controlled substance in many Western countries.In contrast, several phenylpropylamino alkaloids are widely available and have a variety of health applications.The biosynthesis of phenylpropylamino alkaloids in khat begins with L-phenylalanine and requires 8-10 steps to yield (1S,2S)-cathine and its diastereomer (1R,2S)-norephedrine.Although some steps in benzoic acid metabolism have been recently resolved in Arabidopsis and other plants, the biochemistry of phenylpropylamino alkaloid metabolism in khat remains largely uncharacterized.The annotated EST library provides a snapshot of the khat young leaf transcriptome and establishes a valuable resource for phenylpropylamino alkaloids biosynthetic gene discovery.Candidate cDNAs encoding enzymes that putatively catalyze each step of the pathway were identified, which provides a genomics platform essential for their future characterization.

Hagel et al. 641 Figure 1 -
Figure 1 -Proposed biosynthetic routes leading from L-phenylalanine to phenylpropylamino alkaloids in khat.A CoA-independent, non-b-oxidative pathway of L-phenylalanine side chain-shortening is shown in blue, whereas a CoA-dependent, b-oxidative route is shown in purple.Red arrows indicate an alternative CoA-dependent, non-b-oxidative route suggested to occur in some plants (Abd El-Mawla and Beerhues 2002;Boatright et al. 2004).Either benzoic acid or benzoyl-CoA undergoes condensation with pyruvate, a reaction putatively catalyzed by a ThDP-dependent enzyme.1-Phenylpropane-1,2-dione undergoes transamination to yield (S)-cathinone, which is reduced to (1S,2S)-cathine and (1R,2S)norephedrine.Activity has been detected for enzymes highlighted in yellow, and corresponding genes are available for enzymes highlighted in green.Enzymes highlighted in red have not been isolated, although EST analysis revealed numerous potential candidates (Table1).Catha edulis ESTs putatively involved in this pathway have been identified for many steps, and candidates are listed in Table1.Abbreviations: PAL, phenylalanine ammonia lyase; CoA, Coenzyme A; ThDP, thiamine diphosphate; NAD(H), nicotinamide adenine dinucleotide; NADP(H), nicotinamide adenine dinucleotide phosphate.

Figure 2 -
Figure 2 -Carboligation products of benzaldehyde and pyruvate formed by ThDP-dependent AHAS and PDC enzymes in microbes.(R)-Phenylacetylcarbinol is formed by AHAS in Escherichia coli and PDCs in certain yeast and bacteria.Mutation at a single amino acid in Zymomonas mobilis PDC enhanced production of (S)-phenylacetylcarbinol and resulted in the formation of both (R)-and (S)-2-hydroxypropiophenone.Although no evidence is presently available, one or more of these reaction products could be an intermediate in the formation of 1-phenylpropane-1,2-dione in khat.

Hagel et al. 643 Figure 3 -
Figure 3 -Functional categorization of expressed sequence tags (ESTs) from Catha edulis leaf-derived cDNA library.Assignments were made based on homology to proteins of known function, as evidenced by tBLASTn search results using TAIR Proteins and UniProt Plants databases.ESTs with homology to uncharacterized, putative, or hypothetical proteins (i.e.unknown function) comprised 4.4% of the total population.