Identification of novel and conserved microRNAs in Coffea canephora and Coffea arabica

As microRNAs (miRNAs) are important regulators of many biological processes, a series of small RNAomes from plants have been produced in the last decade. However, miRNA data from several groups of plants are still lacking, including some economically important crops. Here microRNAs from Coffea canephora leaves were profiled and 58 unique sequences belonging to 33 families were found, including two novel microRNAs that have never been described before in plants. Some of the microRNA sequences were also identified in Coffea arabica that, together with C. canephora, correspond to the two major sources of coffee production in the world. The targets of almost all miRNAs were also predicted on coffee expressed sequences. This is the first report of novel miRNAs in the genus Coffea, and also the first in the plant order Gentianales. The data obtained establishes the basis for the understanding of the complex miRNA-target network on those two important crops.

MiRNAs can control basic aspects of development, as well as the molecular responses to different types of stresses (de Lima et al., 2012). In plants, genes coding for miRNAs are generally 100-400 nt long and can be located in either the exons or introns of protein coding genes or in intergenic regions (Bartel, 2005). Mature miRNAs are initially generated from hairpin-like precursors as dsRNA duplexes. One of the strands (the guide strand) is loaded into RNA silencing complexes called RISC, while the other strand (the passenger or star strand) is usually degraded (Baumberger and Baulcombe, 2005;Qi et al., 2005). The RISC complex, containing Argonaute proteins (AGO), is then directed to RNAs having similarity with the embedded guide sequence. Depending on the Argonaute effector protein present in the complex, targets can be repressed either by RNA degradation or by translation inhibition (Huntzinger and Izaurralde, 2011).
Although some miRNAs are known to be conserved throughout the plant kingdom, the advent of massively parallel DNA sequencing methods allowed the identification of a vast number of non-conserved genes, even in closely related plants (Rajagopalan et al., 2006;Fahlgren et al., 2007;Ma et al., 2010). To date, there are about ten thousand mature miRNA sequences of green plants (Viridiplantae) deposited in the miRBase database (miRBase) (Griffiths-Jones, 2004). However, these sequences are not evenly distributed among the taxa. For example, information from economically important plants, including the ones belonging to the genus Coffea, is almost entirely lacking.
to sequence and characterize transcripts from these two species (Lin et al., 2005;Mondego et al., 2011;Combes et al., 2013). However, only few conserved miRNAs have been described in C. arabica so far (Rebijith et al., 2013;Akter et al., 2014). In this work we deep-sequenced total sRNAs from C. canephora leaves and found conserved and novel miRNA genes belonging to 33 families, including two that have never been observed in other plants before.

Plant material and deep sequencing
Leaves of C. canephora (conilon cultivar) were harvested at an experimental field from the Federal University of Viçosa, Minas Gerais State, Brazil. Total RNAs were extracted using the Plant RNA Reagent (Invitrogen, cat 12322-012) and were sent as ethanol precipitates to be sequenced at Fasteris Life Science Co. (Geneva, Switzerland). The small RNA library was prepared according to a modified Illumina protocol previously described (Silva et al., 2011) and sequenced using the HiSeq2000 platform. The raw data obtained from the sequenced library was deposited at the NCBIs Gene Expression Omnibus (GEO) database under the accession number GSE46617.

Data processing and filtering
Adaptor sequences were trimmed from the generated data using custom scripts. After the removal of low-quality reads and reads smaller than 16 nt and bigger than 26 nt, the high-quality raw sequences were used as queries in local BLASTN (Altschul, et al., 1997) searches against known cellular non-coding RNAs (rRNA, tRNA, snoRNA, mtRNA and cpRNA). Filtering was done using the following sequences or databases: complete chloroplast DNA from C. arabica (NC_008535); complete mitochondrial DNA from Nicotiana tabacum (NC_006581), Boea hygrometrica (NC_016741) and Mimulus guttatus (NC_018041); tRNA from A. thaliana, Populus trichocarpa and Medicago truncatula ; rRNA from Asclepias syriaca and C. arabica ; and snoRNA from all plant species available. The sRNAs matching with the refereed sequences without mismatches and gaps were discarded and the remaining unique sequences were used to search for conserved miRNAs.

Identification of conserved and novel miRNAs
MiRNAs were identified by three independent strategies: i) BLAST searches (Altschul et al., 1997) against the sequences deposited in the miRBase database (release 20) (Griffiths-Jones, 2004); ii) mapping sRNAs onto C. arabica and C. canephora contigs using SOAP2 software  and iii) using the plant miRDeep tool (Yang and Li, 2011). For BLAST searches, the filtered set of unique sRNAs (all high quality reads from 16 to 26 nt, without rRNAs, tRNAs, snoRNAs, mtRNAs and cpRNAs) were used in local BLASTN searches against all plant mature sequences retrieved from the miRBase database. Only sequences that fully matched known genes from the database, without gaps and with at least 10 reads, were further processed. The remaining sequences were considered as unknown sequences. For the SOAP2 analysis, the full set of redundant reads was matched against C. arabica and C. canephora contigs/ESTs (Expressed Sequence Tags) retrieved from the Brazilian Coffee Genome Project (Mondego et al., 2011) or from a C. canephora RNA-seq database (Combes et al., 2013). The SOAP2 output was filtered with an in-house filter tool (FilterPrecursor) in order to identify candidate sequences as miRNA precursors using a mapping pattern of one or two blocks of aligned small RNAs with perfect matches (Kulcheski et al., 2011). The filtering was done with the following default parameters: minimum number of mapped reads in the candidate precursors: 10; maximum offset allowed for a single read: 5; maximum percentage of reads mapped out of columns: 25; maximum number of columns in the mapping profile: 2. Parameters used for the miRDeep analysis were: length of best perfect match: 28; type of output: 2(traditional BLAST output); Identity percentage cut-off [Real]: 0 (perfect match); maximum number of hits: 10. The selected candidate precursors were manually inspected using the Tablet software (Milne et al., 2010) to visualize the presence of the mapping pattern. The secondary structures of candidate sequences were checked with the RNA Folding/annotation tool from the UEA sRNA toolkit (Moxon et al., 2008), using default parameters. The following criteria were used to define a good miRNA candidate: no more than four unpaired nucleotides between the putative mature and star sequences, of which no more than three nucleotides were consecutive and no more than three nucleotides were without a corresponding unpaired nucleotide in the near complementary sequence within the hairpin structure (Meyers et al., 2008). Only contigs matching those rules and with at least 10 reads in the putative miRNA region were considered as miRNA precursors. Candidates were then used as queries for BLASTN searchers against plant miRBase sequences. Reads having full matches without gaps with miRBase sequences were considered as conserved miRNAs. Sequences with no matches in the database were considered as novel miRNAs and the ones having nonperfect matches were considered as variants of known miRNAs.

Prediction of miRNA targets
The prediction of the putative target genes for conserved and novel miRNAs was done with the psRNATarget software (Dai and Zhao, 2011). The search was done against the C. canephora and C. arabica contigs retrieved from the Brazilian Coffee Genome Project or against the C. canephora RNA-Seq data (Combes et al., 2013), with the following parameters: maximum expectation value: 3; multiplicity of target sites: 2; and nucleotide range of central mismatch for translational inhibition: 9-11. Candidate sequences were annotated based on BLASTN (Altschul et al., 1997) and PFam searches (Punta et al., 2012). Gene ontology terms were obtained by using the GO slimmer tool from the AmiGO toolkit (Carbon et al., 2009), using default parameters.

Digital expression analysis
The expression of target miRNAs was computed in the different libraries of the Brazilian Coffee Genome Project. The frequency of reads for each miRNA contig in each library was computed and then normalized by the number of reads in the library. The values obtained were then analyzed with the Cluster and Tree View programs (Eisen et al., 1998). Aggregation was made by hierarchical clustering, based on Spearman Rank correlation matrix. Digital blot matrix was ordered according to similarities in the patterns of gene expression and displayed as an array, where the normalized number of reads for each EST-contig in each specific library is represented in gray scale.

C. canephora sRNA library
About eight million high quality reads ranging from 16 to 26 nt were obtained in the Illumina sequencing of sRNAs from C. canephora leaves (Table S1). Most of the redundant and unique reads identified were 24 nt and 21nt long, respectively ( Figure S1). This pattern has already been observed in several other plant deep-sequencing libraries and is probably due to the high abundance of heterochromatic siRNAs and microRNAs, which are 24 nt and 21nt long, respectively (Nobuta et al., 2008;Lelandais-Briere et al., 2009;Wei et al., 2009;Klevebring et al., 2009;Romanel et al., 2012).
The identified sRNAs were divided into six categories: small nucleolar RNAs (snoRNAs), transfer RNAs (tRNAs), ribosomal RNAs (rRNAs), mitochondrial RNAs (mtRNAs), chloroplastidial RNAs (cpRNAs) and miRNAs (Table S1). Together, about 49.7% of all redundant sRNAs matched to snoRNAs, tRNAs, rRNAs, mtRNAs or cpRNAs, but about 46.6% of all reads could not be assigned to any of the six categories (Table S1). An interesting prominent peak of 19 nt was observed in tRNA-derived reads ( Figure 1). It has recently been reported that some of tRNA-derived sequences of this size can be found in complexes with Argonaute proteins and therefore may not be merely degradation products (Loss-Morais et al., 2013). Sequences belonging to miRNAs represented about 3.6% of the total redundant reads in the library. Those sequences were found by searching against plant miRNA sequences deposited in the miRBase database release 20 (Griffiths-Jones, 2004) and by aligning all reads against expressed contigs from C. arabica and C. canephora or against RNA-Seq data from C. canephora (Combes et al., 2013) (see Methods for details).
As expected, most of the C. canephora miRNA reads were 21 nt long, but sequences of 20, 22, 23 and 24 nt were also found ( Figure 1). Since there are no coffee miRNA sequences in the current release of the miRBase database, all miRNAs identified here are new for the refereed species. MiRNAs from all size classes were then separated into three categories: 1) miRNAs that are conserved in other plant species and whose sequences are identical to sequences deposited in the miRBase (47 miRNAs, in Table 1); 2) variants of known miRNAs (nine miRNAs, in Table 2 and 3) novel coffee miRNAs whose sequences are not related to any of the known families of plant miRNAs deposited in miRBase (two miRNAs, in Table 2).

Identification of conserved miRNAs in C. canephora and C. arabica
For the identification of conserved miRNAs, C. canephora high quality reads from 16 to 26 nt in size that did not match to snoRNAs, tRNAs, rRNAs, mtRNAs or cpRNAs were used as queries in local BLASTN searches against plant miRNA genes deposited in the miRBase database. Only results having full matches, without gaps and with at least 10 reads were further analyzed. A total of 250,992 reads, representing 47 unique miRNAs and belonging to 24 families, matched the plant miRBase sequences under those strict rules (Table 1). The number of miRNA mature sequences found on each family varied significantly. Most of the families (14 out of 24) were represented by only one mature sequence (Table 1). In general, families with multiple miRNA sequences had one dominant form, followed by lower expressed sequences. This is the case, for example, for the family miR159, where four mature sequences were found, with the dominant miR159a having about 44 thousand reads, while the other three members together having less than two hundred reads (Table 1). In the family miR166, however, a total of eight miRNA se-Identification of coffee miRNAs 673 Figure 1 -Abundance of the different classes of sequences found in the C. canephora sRNA library. Based on BLAST searchers, reads ranging from 16 to 26 nt were divided into six categories (indicated by colors). Numbers in the x-axis indicate their respective size in nucleotides. quences were identified, with two of them having high read abundance (Table 1). The precursor sequences for five of the conserved miRNAs could be found by mapping the C. canephora reads to C. canephora and C. arabica available expressed contigs (Table 1, Figure 2 and Figure 3) (Mondego et al., 2011). The precursor from miR396b, however, was the only one retrieved from C. canephora contigs (Table 1, Figure 2), while the other four (miR167b, miR172b, miR390b and miR398a) were found in C. arabica-derived sequences (Table 1, Figure 3). The star sequences for all of them, except miR167b were also found by this analysis (Figure 2 and Figure 3), increasing the confidence of our data. The identification of precursor sequences in C. arabica based on C. canephora reads also indicates that these genes are likely conserved between the two species.
Identification of non-conserved miRNAs in C. canephora and C. arabica Since the coffee genome sequence is still publicly unavailable, the identification of non-conserved miRNAs was done by separately mapping all redundant sRNA reads from C. canephora against contigs from C. canephora and C. arabica (Mondego et al., 2011) or against C. canephora RNA-seq data (Combes et al., 2013) using the SOAP2 software  or the miRDeep-P tool (Yang and Li, 2011). After filtering results with Perl scripts (see Methods), about 250 contigs from C. canephora and 350 contigs from C. arabica were considered as candidate precursor sequences. All contigs were then visually inspected through the Tablet software and their structures computed by a RNA hairpin folding and annotation tool (Moxon et al., 2008). Only structures following strict pairing rules were then considered as miRNA precursors. By using this strategy, eleven non-conserved miRNAs were found (Table 2). From those, nine are variants of known plant miRNA families (Table 2). Those sequences were annotated according to the miRBase best hit (Table S2). Two other genes, however, are totally unrelated to known families of miRNAs and are therefore members of two new putative families in plants (Table 2).
From the seven families of miRNAs where new members were found, six (miR159, miR393, miR479, miR5225, miR8558 and miR8697) have only one member Identification of coffee miRNAs 675    each, while family miR482 have three sequences (Table 2). Interestingly, all members of the miR482 family identified here seem to be highly expressed in C. canephora leaves ( Table 2). The precursor sequences from only four genes (miR479, miR482a, miR482c, miR5225) were found in C. canephora contigs (Table 2, Figure 2). All the others, except miR479, had their precursors found among C. arabica contigs (Table 2, Figure 3). Therefore, the miRNAs miR482a and miR482c were the only cases, in this study, where the precursor sequences were found on both species. The star sequence could also be detected for all but miR159e and miR482b, reinforcing the idea that the identified genes are processed by the canonical DCL pathway (Figure 2 and Figure 3). The novel precursor miRNAs were found in two Coffea canephora expressed sequences (Table 2 and Figure 3). Since those genes are still not deposited in public databases, they were temporarily named as miR-Seq-1 and miR-Seq-2. Their putative miRNA star sequence could also be retrieved from the sequenced reads and their precursor sequences fit the requirements for being considered as true miRNA genes (Figure 3).

Isoforms of miRNAs found in C. canephora sRNAs
MiRNA isoforms, also known as iso-miRNAs, are a group of diverse sequences derived from a single precursor gene. They are frequently observed and are likely originated from imprecise DCL processing or post-transcriptional editing processes (Morin et al., 2008). Isoforms, however, may be loaded into Argonaute complexes and therefore exert their silencing activities (Fernandez-Valverde et al., 2010;Wang H et al., 2011). In total, 17 iso-miRNAs, derived from 11 genes, were found (Table 3). It is unlikely that the observed iso-miRNAs are sequencing artifacts, since all bases mapped to coffee transcripts have Q quality scores higher than 30 (99.9% of accuracy) (data not shown). Most precursors have only one isoform, but up to five isoforms coming from a single gene were observed for miR482c. The two miR-seq-1 isoforms showed the highest degree of sequence diversity when compared to their reference miRNA (Table 3). The expression of most iso-miRNAs, however, was significantly lower than their reference variant (Table 3). For example, miR8558 is about six hundred times more expressed than its detected isoform (Table 3). Although miRNAs miR396b, miR482a and miR482b also follow this rule, their respective isoforms have a high read abundance, indicating that they might be systematically produced in C. canephora cells (Table 3).

Targets of miRNAs
All expressed contigs from both C. canephora and C. arabica were used to search the putative targets of the identified miRNAs through the web-based psRNATarget tool (Dai and Zhao, 2011). In total, 339 and 149 probable targets were identified in C. arabica and C. canephora, re-spectively. From these, 442 sequences were predicted to be targeted by conserved miRNAs (Table S3) and 46 by the non-conserved ones (Table 4). All the identified targets were categorized into Gene Ontology (GO) terms to evaluate their putative functions ( Figure S2). GO terms covering a broad range of biological processes were obtained, demonstrating the putative importance of coffee miRNAs in controlling several physiological aspects. GO terms related to regulation of transcriptional, development and cell differentiation, however, were by far the most enriched terms observed ( Figure S2). Several coffee sequences similar to known miRNA targets were found, including genes involved in different aspects of development (Table S3). This includes genes associated with vegetative phase change, like the Squamosa promoter binding protein (SBP/SBL)-like and Apetala-2, targets of miRNAs 156 and 172 (Wang JW et al., 2011), respectively, and cell proliferation-related genes, like NACcontaining genes, target of miR164 (Mallory et al., 2004), and WRC domain-containing Growth regulating factors, target of miR396 (Rodriguez et al., 2010). Genes involved with root and leaf formation were also identified, including Auxin Responsive Factors, targeted by miR160 (Wang et al., 2005), and Homeodomain-containing genes, targeted by the miRNA165/166 family during the establishment of leaf polarity (Rhoades et al., 2002). Biotic and abiotic stress-related genes were also observed among the putative coffee miRNA targets. For instance, the identified target of miR395, the APT sulfurylase, is known to be involved in sulfate homeostasis during sulfur starvation (Jones- Rhoades and Bartel, 2004). Another interesting example is the Plastocyanin domain-containing Copper binding protein, whose gene is regulated by miR398, which can play a role during abiotic and biotic stresses (Sunkar and Zhu, 2004).
The three members of the family miR482, together with the related sequence miR8558, accounted for almost half of the 46 predicted targets of non-conserved miRNAs ( Table 4). The putative miR482-targeted genes included kinase proteins (Casein-,Calcium-and Malectin-like kinases), calcium binding proteins, ATPases, glutamine aminotransferases, disease resistance and DNA repair genes, among others. The non-conserved miRNAs miR5225 and miR-Seq-1 had only one predicted target in C. canephora and C. arabica, respectively. Curiously, both miRNAs targets are associated with the ubiquitin proteasome system ( Table 4). The miRNA miR8697 was predicted to target 14 different genes that were annotated into five groups: Nucleoside diphosphate kinase Group I (NDPk_I)-like, Drought induced 19 protein (Di19), NADH dehydrogenase, NBS resistance gene and autophagy-related genes (Table 4). Four putative targets were identified for the novel miRNA miR-Seq-2. These targets, however, are all related to Pyruvate dehydrogenase E1 component subunit or secretory peroxidases (Table 4)  the glycolysis metabolic pathway and response to oxidative stress, respectively. The digital expression of the predicted miRNA targets was also computed among C. canephora and C. arabica EST-based contigs (Mondego et al., 2011) (Figure S3). The predicted targets were in general depleted in leaf-derived libraries (LF1 for C. canephora and LV4, LV5, LV8, LV9 and RM1 for C. arabica) ( Figure S3). This is the case for example for miRNAs miR165/166, miR172a and miR398a, where the high abundance of reads detected in C. canephora leaves by deep-sequencing (Table 1) is well correlated with the low accumulation of their targets in the contig-based digital analysis ( Figure S3). In accordance, some of the putative miR482-targeted genes, including Calcium kinase (Contig5317), targeted by miR482a, Malectin-like receptor kinase protein (Contig3591), targeted by miR482b and alpha-crystallin-Hsps_p23-like (Contig2315), targeted by miR482c and most of the miR8697 targets are depleted in leaf-derived tissues (Figure S3).

Discussion
In this study we used a deep-sequencing approach to identify and classify miRNAs in C. canephora and C. arabica. Out of approximately 9 million reads, we have identified about 280 thousand reads corresponding to miRNAs. These sequences represented 58 unique mature miRNAs that were divided into 33 families, including two that, as far as we know, have never been described in the literature before (with provisory names miR-Seq-1 and miR-Seq-2).
Our searching pipeline involved three independent strategies: i) BLAST searches against the miRBase database, for the identification of the conserved genes, ii) alignments of the sRNA reads to contigs/ESTs from C. canephora and C. arabica through the SOAP2 software and iii) search for novel miRNAs with the plant miRDeep tool (Yang and Li, 2011). The results are robust, since very stringent rules were used in all analyses. For example, only sequences having full matches, no gaps and at least 10 reads were retrieved from the BLAST searches. Some miRNAs were probably missed by using these rules, since nucleotide polymorphisms are observed even in closely related species , which might explain the reduced amount of conserved miRNAs observed in Coffea compared to other plants.
The data obtained from the SOAP2 software was also filtered with stringent rules. Only contigs or ESTs having a mapping pattern similar to what would be expected for a canonical miRNA locus (i.e. having one or two similar blocks with piled up reads) and having at least 10 reads were initially selected. Then, all candidate precursor sequences were checked by RNA folding programs for identifying bona fide miRNA transcripts. Since precursor miRNAs are readily degraded by the RNA silencing machinery, such se-quences are not frequently observed on EST sequencing efforts. However, some miRNAs have already been found by this strategy (Frazier and Zhang, 2011;Guzman et al., 2012). The precursor sequences from 16 of the 58 miRNAs identified could be predicted in coffee expressed sequences by this strategy, including five conserved genes, nine variants of known miRNAs and the two new putative families of miRNAs (Figures 2 and 3). Although the sRNAs were extracted and sequenced from C. canephora leaves, most of the precursors were found in C. arabica sequences. The discrepancy might be explained by the fact that the total number of available sequences from C. arabica was about two times higher than the ones from C. canephora (35,153 vs 18,007). From the 16 precursors, 10 were only found in C. arabica, four only in C. canephora and two on both species. In some cases the miRNA* sequences could not be observed, probably due to their low accumulation or imprecise DCL processing. However, the probable miRNA* sequence, with the 2-nt overhang characteristic of the DCL activity, could be found in the majority of the cases. The two novel miRNAs, miR-Seq-1 and miR-Seq-2, were found by the plant miRDeep tool in C. arabica EST-based contigs and RNA-seq data, respectively, providing extra support for the results.
The targets of almost all identified miRNAs could be predicted in coffee expressed sequences (Tables S3 and 4). Several known targets of miRNAs involved in different aspects of plant development and stress response were retrieved by this analysis. As observed for other plants (Shivaprasad et al., 2012), the new members of the family miR482 and the related gene miR8558 identified here have a wide range of presumed targets, including nucleotide binding site-leucine-rich repeat (NBS-LRR) plant innate immune receptors (Table 4). This miRNA has also been associated with the biogenesis of trans acting siRNAs (tasiRNAs) in other plants, a class of 21-nt long secondary siRNAs that are able to regulate the expression of several targets (Allen et al., 2005;Yoshikawa et al., 2005). TasiRNAs are made by the mutual action of a miRNA and the silencing amplification machinery on non-coding Trans-acting siRNA transcripts (TAS). One of the three coffee members of this family identified in this study (miR482c), and the related sequence miR8558, are 22 nt long (Table 2), the miRNA size usually associated in the biogenesis of tasiRNAs (Cuperus et al., 2010;Manavella et al., 2012). Furthermore, as also observed for other members of the family, the miR482 genes found here have several isoforms and are highly and promiscuously expressed in different types of tissues (Shivaprasad et al., 2012).
Some putative targets of non-conserved miRNAs, including the novel ones, could also be found. The miRNAs miR5225 and miR-Seq-1, for example, were predicted to target genes involved in the process of protein ubiquitination (Table 4). miR8697 has a broad-spectrum of predicted targets, including Nucleoside diphosphate kinase Group I (NDPk_I)-like, NADH dehydrogenase, Ubiquitin domain of GABA-receptor-associated protein and Drought induced 19 protein (Di19) ( Table 4). NDPKs catalyze the transfer of phosphate from nucleoside triphosphates to nucleoside diphosphates. There are four isoforms of NDPKs annotated in the genome of the model plant Arabidopsis thaliana. Apart from their role in basal metabolism, some NDPK isoforms have also been associated with intracellular signaling and heat-stress responses (Hasunuma et al., 2003). The Ubiquitin domain of GABA-receptorassociated protein observed in one of the miR8697 targets belongs to a large and conserved family of proteins involved in membrane trafficking and autophagy. Autophagy has historically been attributed to the control of basal cellular functions, but can also be activated as a response against certain types of stresses (Liu and Bassham, 2012). Finally, the zinc-binding protein Di19 is known to be up-regulated in leaves and roots of A. thaliana plants under progressive drought stress (Gosti et al., 1995). The protein functions as a transcriptional factor, inducing the expression of some pathogenesis-related proteins that can buffer the drought effects (Liu et al., 2013). As most of the miRNA targets predicted, Di19 was not observed in leaf-derived libraries of the EST-based digital expression analysis ( Figure S3). This correlation should be taken with caution, since the tissues used for making the EST-libraries were not taken from plants in the same conditions or developmental stages as the ones used for deep-sequencing. However, the general trend supports our target-discovery strategy. The understanding of how, where and when miRNAs interact with other genes will provide useful insights into coffee physiology, expanding both basic and applied knowledge about these economically important plants.

Supplementary Material
The following online material is available for this article: Table S1 -Categories of sRNA sequences, ranging from 16 to 26 nt, found in the library of C. canephora leaves. Table S2 -BLAST results of variants of known miRNAs against the the miRBase database. Table S3 -Targets of conserved microRNAs found in C. canephora and C. arabica. Figure S1 -Size distribution of the total number of sRNA reads from C. canephora leaves. Figure S2 -Gene Ontology terms of miRNA targets identified on C. canephora contig/EST libraries. Figure S3 -Electronic northern blot of predicted miRNA targets on C. canephora and C. arabica contig/EST libraries. This material is available as part of the online article from .

Note Added in Proof
While this manuscript was in press, a paper describing the C. canephora genome sequence came out and identified 92 conserved miRNAs by comparing with miRBase sequences (Denoeud et al., 2014). Of those, 38 miRNAs belonging to 19 families were also found in our pipeline based on expressed sequences.

Associate Editor: Juan Lucas Argueso Almeida
License information: This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.