The chloroplast genome of Rosa rugosa × Rosa sertata (Rosaceae): genome structure and comparative analysis

Abstract Rosa rugosa × Rosa sertata, which belongs to the family Rosaceae, is one of the native oil-bearing roses in China. Most research has focused on its essential oil components and medicinal values. However, there have been few studies about its chloroplast genome. In this study, the whole chloroplast genome of R. rugosa × R. sertata was sequenced, analyzed, and compared to other genus Rosa species. The chloroplast genome of R. rugosa × R. sertata is a circular structure and 157,120 bp in length. The large single copy and small single copy is 86,173 bp and 18,743 bp in size, respectively, and the inverted repeats are 26,102 bp in size. The GC content of the whole genome is 37.96%, while those of regions of LSC, SSC, and IR are 35.20%, 31.18%, and 42.73%, respectively. There are 130 different genes annotated in this chloroplast genome, including 84 protein coding genes, 37 tRNA genes, 8 rRNA genes, and 1 pseudogene. Phylogenetic analysis of 19 species revealed that R. rugosa × R. sertata belong to the Sect. Cinnamomeae. Overall, this study, providing genomic resources of R. rugosa × R. sertata, will be beneficial for species identification and biological research.

Rosa, a typical genus of the Rosaceae family, is widely distributed over the northern hemisphere (Rehder, 1949;Christenhusz et al., 2017). Traditionally, it has a wide range of uses, such as food, decoration, medicine, perfume industry, and ecological conservation (Wang et al., 2012;Cheng et al., 2016;Patel, 2017). Rosa rugosa × Rosa sertata, belonging to the genus Rosa of Rosaceae, is an important economic tree species in China. It is commonly called Kushui rose because it is mainly planted in Kushui, Lanzhou city of Gansu Province for production of dried flower bud tea, jam, rose essential oil, hydrosol, and other products. Among them, essential oil extraction products have appeared in the international market since the late 1980s. Studies on the composition of its essential oils show that it has high alcohol content, including citronellol, geraniol and farnesol (Son and Lee, 2012;Wu Y et al., 2020). It is worth noting that the relative content of citronellol is nearly half. Moreover, the decoction made from its flower tea has been proved to have anti-cancer antioxidant activity (Liu et al., 2018). A new kind of polysaccharide, which can be used as a safe immune regulator in the field of medicine or functional food, was found from the waste of the processing of its essential oil (Wu M et al., 2019). The evolutionary origin of most roses remains elusive, and the species in this study is not the exception, although it is recorded as a natural hybrid with the corresponding Latin name. Most puzzling is that there is no direct genetic evidence of its hybridization background.
The chloroplast, as one of the plants' organelles, plays an important role in maintaining life on earth by the process of conversion of solar energy into carbohydrates through photosynthesis and the release of oxygen (Daniell et al., 2016). Therefore, various essential genes for carbon fixation and metabolite synthesis exist in chloroplast genome. The common chloroplast genomes, ranging from 120 to 170 kb in size, generally encode 120 to 130 genes. They are usually composed of four parts, namely a large single copy (LSC) region, a small single copy (SSC) region and a pair of reverse repeat regions separating the first two parts (Bendich, 2004). For phylogeny and population genetics, it is necessary to study chloroplast genomes, because of their conservative gene structure and base content, and their ability to solve the relationship at a lower classification level (Wicke et al., 2011). In recent years, the progress of next generation sequencing technology provides researchers with faster and cheaper methods to obtain chloroplast genome information. In this study, the complete chloroplast genome of R. rugosa × R. sertata was first obtained by high-throughput sequencing technology and compared with other species within the genus of Rosa.
Healthy and mature leaves of R. rugosa × R. sertata were collected from Gansu Agricultural University (36°09′N, 103°70′E, Lanzhou, Gansu, China) and were preserved in liquid nitrogen and then stored in an Ultra-low temperature freezer until DNA extraction. Total genomic DNA was extracted from sampled leaves using a Plant Genomic DNA kit (TIANGEN, Beijing, China) following the manufacturer's instructions. The isolated genomic was used to prepare highthroughput DNA sequencing libraries with Illumina V3 kit (catalog number:ND607 Vazyme), and library products corresponding to 300-350bps were enriched, quantified and sequenced on Novaseq 6000 sequencer ( Niu et al. 2 150 model. Generated 17,902,347 paired-end raw reads and the sequencing data was first filtered by Trimmomatic (version 0.36), low-quality reads were discarded and the reads contaminated with adaptor sequences were trimmed. The clean reads and reference sequence as R. acicularis  (GenBank accession no. MK714016.1) were used to extract chloroplast-like reads, which aligned to the database built by Genepioneer Biotechnologies (Nanjing, China) using Bowtie2 v2.2.4 (Langmead and Salzberg, 2012) and SPAdes v3.10.1 (Bankevich et al., 2012). Then, the sequences with the cp-like reads were assembled with NOVOPlasty (Dierckxsens et al., 2017). Annotation of the assembled chloroplast sequence was conducted with two methods. Firstly, the CDS, rRNA and tRNA were predicted with Prodigal v2.6.3 (Hyatt et al., 2010), hmmer v3.1b2 (Prakash et al., 2017) and Aragorn v1.2.38 (Laslett and Canback, 2004), respectively. Secondly, blast v2.6 (Johnson et al., 2008) was used to compare the gene sequences of the assembled one and the reference species.
To determine the final annotation, the above two results were manually checked to remove the redundant and determine the multiple exon boundaries. A circular map of R. rugosa × R. sertata plastid genome was generated using the Chloroplot program (Zheng et al., 2020).
Gene flow between species or genetic diversity within a species is often measured by comparison of the chloroplast sequences. To determine differences in the chloroplast genome sequences of R. rugosa, R. odorata var. gigantea, R. multiflora, R. luciae, R. canina and R. rugosa × R. sertata, sequence identity was calculated for these species' chloroplast sequence using the online program mVISTA with R. chinensis cultivar Old Blush as a reference ( Figure S2, Table S4). Consistent with other studies, the region of greatest divergence is LSC, in which the noncoding regions possess higher divergence than coding regions. The chloroplast genome of R. rugosa × R. sertata is closer to R. rugosa, and the significant variation between them could be found in the intergenic regions of psbM-trnD, . It would be considered valuable to utilize the identification of these higher-resolution loci for species identification.
In the long term of evolution, the change of the IR region at the borders plays a critical role. In our study, the genetic architecture of seven Rosa genomes was mapped at the junction of the IR region, LSC region, and SSC region by IRscope ( Figure S3). Gene location and gene order were relatively conservative in Rosa. In R. canina, R. odorata, R. rugosa, R. chinensis, and R. rugosa × R. sertata, the codding region of ycf1 was at the boarder of SSC/IRa, and spanned the SSC and IRa region, while in R. lucieae and R. multiflora, it was at the boarder of SSC/IRb and spanned the SSC and IRb region. It is noteworthy that in R. rugosa and R. rugosa x R. sertata, the pseudogene ycf1 was located in IRb, while in R. lucieae and R. multiflora, it was located in IRa. The mutation region of pseudogene ycf1 in IRa/SSC or IRb/SSC region was 1106-1111bp.
The phylogenetic analysis was performed based on complete chloroplast genome sequences from 19 taxa, including 18 Rosa species and one outgroup (Vitis vinifera, MN561034.1), all of which were downloaded from the NCBI database except the R. rugosa × R. sertata. All the sequences from these 19 species were aligned by MAFFT v 7.455 (Katoh and Standley, 2013) and trimmed by trimAl (Capella-Gutiérrez et al., 2009). A maximum likelihood (ML) analysis was performed by IQtree (Nguyen et al., 2015), and a bootstrap test was set with 1000 repetitions. The result of phylogenetic analysis was visualized by MEGA v7.0 (Kumar et al., 2016) ( Figure 2). The chloroplast genomes play a significant role in understanding the evolutionary relationship and history of plant species (Jansen et al., 2007). Here, as expected, 14 species from the Rosa genus formed a monophyletic clade composed of seven branches, which were consistent with the seven subgroups obtained by morphological classification. R. rugosa × R. sertata was mostly related to R. rugosa, with bootstrap support value of 100%. They all belong to the Sect. Cinnamomeae. The availability of a completed R. rugosa × R. sertata chloroplast genome sequence will provide useful information for the phylogenetic study among Rosa.
Overall, the complete chloroplast genome of R. rugosa × R. sertata, an endemic oil-bearing rose species in China, was firstly reported and analyzed. The characteristics of quadripartite structure, genome size, GC content, and gene order of the plastid genome of R. rugosa × R. sertata were shown to be similar with that of other genus Rosa species. There were 37 long repeats sequences and 260 SSRs detected in this plastid genome. Besides, reconstructed phylogenetic relationships among 19 species found R. rugosa × R. sertata to be closely related to R. rugosa. These results combined with the comparison with the whole chloroplast genome of other genus Rosa species have provided the worthy information and will bring insight into developing DNA markers suitable for identification of species within this genus.

Supplementary Material
The following online material is available for this article: Figure S1 -Analysis of long repeat sequences and simple sequence repeats (SSRs) in R. rugosa × R. sertata chloroplast genome. Figure S2 -Sequence identity plot of 6 Rosa chloroplast genomes by mVISTA. Figure S3