Comparative genomics reveals diverse capsular polysaccharide synthesis gene clusters in emerging Raoultella planticola

Raoultella planticola is an emerging zoonotic pathogen that is associated with rare but life-threatening cases of bacteremia, biliary tract infections, and urinary tract infections. Moreover, increasing antimicrobial resistance in the organism poses a potential threat to public health. In spite of its importance as a human pathogen, the genome of R. planticola remains largely unexplored and little is known about its virulence factors. Although lipopolysaccharides has been detected in R. planticola and implicated in the virulence in earlier studies, the genetic background is unknown. Here, we report the complete genome and comparative analysis of the multidrug-resistant clinical isolate R. planticola GODA. The complete genome sequence of R. planticola GODA was sequenced using single-molecule real-time DNA sequencing. Comparative genomic analysis reveals distinct capsular polysaccharide synthesis gene clusters in R. planticola GODA. In addition, we found bla TEM-57 and multiple transporters related to multidrug resistance. The availability of genomic data in open databases of this emerging zoonotic pathogen, in tandem with our comparative study, provides better understanding of R. planticola and the basis for future work.

Raoultella species are facultative anaerobic gramnegative bacilli found in plants, wood, soil, water, and wildlife. (1) The genus contains four species: Raoultella planticola, (2) Raoultella electrica, (3) Raoultella ornithinolytica, (4) and Raoultella terrigena. (5) R. planticola is the most common human pathogen in the genus, causing biliary tract infections and urinary tract infections. (6) The vast majority of patients infected with R. planticola are immunocompromised individuals such as such as organ transplant recipients and those with malignancy or diabetes mellitus. (7) Recently, there are increasing reports of severe cases presented with bacteremia and sepsis. (6) Moreover, increasingly resistant strains of R. planticola have emerged and are responsible for the majority of health-care-associated infections. (1,8) Study also revealed the organism is capable to survive in a range of hospital environments by developing resistance to disinfectants. (9) Genetic analysis is essential in successfully addressing emerging infectious diseases. (10,11) Although a few R. planticola genome sequences are available, the genomic background of its pathogenesis and resistance is largely unknown. Here, we sequenced and reconstructed the complete circular genome of the R. planticola strain GODA and performed genome-wide comparisons in order to decipher the putative virulence and resistance determinants.
R. planticola GODA was grown in Luria-Bertani broth overnight at 37ºC. The overnight culture (1 to 5 × 10 8 CFU/mL) was pelleted and resuspended in PBS. Genomic DNA was extracted with DNeasy blood and tissue kit (Qiagen, Hilden, Germany), following the manufacturer's instructions. DNA was sheared to 10kb using the g-TUBE™ (Covaris). The sheared DNA was treated with DNA damage repair mix followed by end repair and ligation of SMRT adapters using the PacBio SMRTbell Template Prep Kit (Pacific Biosciences, Menlo Park, CA, United States). Whole genome sequencing was performed using the PacBio sequencing platform (Pacific Biosciences). Sequence runs of three singlemolecule real-time (SMRT) cells were performed on the PacBio RS II sequencer with a 120-minute movie time/ SMRT cell. SMRT Analysis portal version 2.1 was used for read filtering and adapter trimming, with default parameters, and post-filtered data of 1.2Gb (around 214X coverage) with an average read length of 6 kb were used for subsequent assembly.
The post-filtered reads were de novo assembled by Canu (v1.4) and converted into circular form via Circlator. These long reads were assembled and circularized into a complete circular genome (~5.6Mbp). Meanwhile, three additional plasmids were also reconstructed. The guanine-cytosine (GC) content of the GODA genome was 55.4%, which was similar with other related strains. Protein-coding genes in the genome and plasmids were annotated using NCBI Prokaryotic Genomes Automatic Annotation Pipeline (PGAAP). Functional classification of annotated genes was carried out by RPSBLAST v. 2.2.15 in conjunction with the COGs (Clusters of Orthologous Groups of proteins) database. A total of 5,461 genes were identified, including 25 rRNA genes, and 83 tRNA genes (Table I).
We further constructed a pan-genome dataset using whole genome sequence of GODA and 7 publicly available whole genome sequences of R. planticola strains (Table I). We considered each gene to be strain-specific if it was present only in one strain and absent in all other strains. Furthermore, the genes shared by all strains were considered to be pan-genomic core genes. Fig. 1 shows orthologous genes shared among strains and depicts the position and color-coded function of the R. planticola GODA-specific genes. The numbers of orthologous and strain-specific unique genes are shown in the Venn diagrams ( Fig. 2A). As presented in the figure, the pan genome of R. planticola revealed 4,382 core genes shared across all strains, whereas 147 genes were specific to R. planticola GODA. Functional analysis of GODA-specific genes revealed that, in addition to hypothetical proteins, a relative abundance of these gene are involved in replication and repair, followed by cell wall/membrane/ envelop biogenesis (Fig. 2B). The Average Nucleotide Identity (ANI) was calculated based on a modified algorithm (12) and revealed that R. planticola GODA is closely related to ATCC 33531, FDAARGOS_64, and CHB in terms of nucleotide sequences (ANI > 98%) (Fig. 3).
Virulence genes in the GODA genome were identified using the virulence factor database (VFDB). The identified virulence genes, which were also GODAspecific genes, were considered to be putative GODAspecific virulence factors.
The polysaccharide capsule is considered a major virulence factor of R. planticola (formerly named Klebsiella planticola). (13) Previous study in Klebsiella spp. suggests the wzx is a common component in the capsular polysaccharide biosynthesis pathway. (14) Our comparative genomics also revealed the presence of wzx flippase in the GODA genome, but this was lost in the environmental strains. Further investigation of its upstream and downstream genes revealed the entire capsular polysaccharide synthesis (cps) gene cluster (Fig. 4). Our findings provide the first genetic background of the cps gene clusters in R. planticola.
We further compared the cps clusters of environmental/clinical isolated strains and two distant-related Klebsiella strains (Fig. 4). Three highly conserved genes: galF, gnd and ugd were well-preserved across all strains analyzed, whereas the gene composition in between was often variable. A similar context has been noted in Klebsiella strains. (15) The inter-species variability (R. planticola vs Klebsiella strains) was relatively higher than the intra-species variability. The cps structure of two clinical isolates, GODA and FDAARGOS_64, were highly similar, implying both strains may express identical virulence factors. While wzx was commonly found in Klebsiella spp., it was lost in all environmental isolated strains of R. planticola in this study. (14) Genetic context analysis of the capsular polysaccharide synthesis gene cluster of GODA showed that wzx was located between a gene encoding UTP--glucose-1-phosphate uridylyltransferase and a 6-phosphogluconate dehydrogenase. A similar observation has been made in several capsular polysaccharide synthesis gene clusters of Klebsiella spp. (14) Capsular polysaccharide is a major virulence factor of Klebsiella spp. and genetic structures of the capsular polysaccharide synthesis gene cluster in Klebsiella spp. have been well studied. (16) Generally, galF at the 5' end of the capsular polysaccharide regions and gnd and ugd at the 3' end are highly conserved among different Klebsiella. The same context was identified in GODA. We also predicted genes encoding proteins necessary for capsular polysaccharide translocation and processing at the cell surface (wza, wzb, wzc, and wzi) and genes encoding glycosyltransferase. The resistome in GODA was annotated using the Resistance Gene Identifier from the Comprehensive An-tibiotic Resistance Database (CARD) (17) and IMG database (18) . GODA showed the presence of bla TEM -57 (Table  II), an extended-spectrum β-lactamase conferring resistance against β-lactam antibiotics such as penicillins and cephalosporins. (19) GODA was also equipped with a number of efflux systems. GODA contains homologs of multidrug and toxic compound extrusion (MATE) family (mdtK), resistance-nodulation-cell division (RND) family (mdtABC, oqxAB, acrAB), ATP (adenosine triphosphate)-binding cassette (ABC) superfamily (yojI, msbA), and major facilitator superfamily (MFS)    efflux pump (emrAB, mdtL, rosAB). These multidrugresistance efflux pumps, along with or in combination with extended-spectrum β-lactamase could result in resistance to multiple classes of antibiotics. (20,21) Our results demonstrate the capsular polysaccharide synthesis gene clusters in various strains of R. planticola and advance our understanding of the relationship between gene regions. Moreover, these findings may be useful for further development of genotyping in this organism. Also, the results of genome-wide prediction of multiple efflux systems and the comparative in silico study provide novel insights into the genome of GODA and lay the foundation for future experimental studies.
Data availability -This genome project has been deposited at the NCBI/GenBank (BioProject PRJ-NA375797), and includes the raw read data, assembly, and annotation. The assembly is available under accession CP019899; the version described in this paper is version CP019899.