Metagenomic and metatranscriptomic analysis of the microbial community structure and metabolic potential of fermented soybean in Yunnan Province

Traditional fermented soybean contains a variety of microflora. To obtain a comprehensive and accurate understanding of the microbial community structure and metabolic potential of fermented soybean, comparative metagenomics and metatranscriptomics were performed on a sample of fermented soybean in Yunnan Province. Metagenomic DNA and metatranscriptomic RNA were sequenced using Ollumina HiseqTM2500, which yielded a total of 92,192,276 reads, with an average read length of 150 bp and 38,798,262 paired-end sequences, with an average length of 151 bp. The results show that Providencia stuartii was the most abundant species and the genes of carbohydrates (2296, 13.30%), protein metabolism (1530, 8.86%), and amino acids and amino acid derivatives (1423, 8.24%) were dominant. The expression levels of the genes belonging to amino acid transport and metabolism processes were the highest according to the reads per length of transcript in kilo-bases per million mapped reads (RPKM) values, followed by energy production and conversion and carbohydrate transport and metabolism. The metabolic pathways were primarily associated with carbohydrates, proteins and amino acids, which might be in accordance with the high levels of proteins and other nutrients in soybeans. Dverall, these findings provide insights into the community structure and metabolic potential of the fermented soybean microbiome. Application:


Introduction
Fermented soybean is widely distributed throughout Yunnan Province as well as a majority of the regions in China. On Yunnan Province, fermented soybean is produced by spontaneous fermentation and no microorganisms are added to the fermentation process of fermented soybean. Until recently, most of the fermented soybean has been produced in traditional family workshops in China. On Yunnan province, the traditional processing method of fermented soybean is as followsI: Firstly, the soybeans are washed and soaked in water for 24 h, then boiled for 1-2 h. After draining, the soybeans are tightly packed in a small bamboo basket layered with the leaves of bamboo or banana and the baskets are covered with the leaves of soybean plant to maintain an ambient temperature. Additionally about 12-15% (w/w) salt is added, and spices (such as sugar, Chinese prickly ash, fresh hot pepper paste, or dry hot pepper powder) are added after fermenting for 3 to 5 days. Finally, the mixture is packed in a tank for about one month. Spontaneous fermentation without the use of starter cultures or sterilization leads to the growth of various microorganisms during soybean preparation.
According to the different types of microorganisms present in fermentation, it is divided into two classesI: bacterial-fermented soybean and mold-fermented soybean. The fermented soybean in Yunnan Province is primarily bacterial-fermented (Liu et al., 2012). The microorganisms in fermented soybean could produce several types of enzymes (Sanjukta & Rai, 2016;Wang et al., 2013). These enzymes could convert the contents of fermented soybean into complex metabolic products, which contribute to the color and aroma of fermented soybean and increase the content of vitamins, essential amino acids, etc. The quality of fermented soybean depends on the microorganisms involved in its fermentation. Therefore, studying the microbial community structure and gene expression of fermented soybean is necessary and meaningful.
The initial approach to study the microbial community structure of fermented soybean is culture-based method. The number of species isolated by the culture method accounted for 1-5% of the estimated quantity (Amann et al., 1995). On some bacteria, culture-based approaches have limitations and challenges for their culture reproducibility. More recent culture-independent methods based on 16S rRNA gene sequencing have enhanced the ease of studying the microbial community structure (Devi et al., 2015;Matsui et al., 2013;Połka et al., 2015), but these methods are limited in taxonomic resolution and are subject to primer biases (Polz & Cavanaugh, 1998) and other system biases in PCR (Carraway & Marinus, 1993). Thus, the precise microbial community profile has not been accurately and comprehensively studied using these methods. By directly analyzing the total DNA of the microbial community, metagenomics can reveal the species, number and proportion of different microbes in the community to the greatest extent and can reveal the metabolic potential of all microorganisms in the community. Although the bias of sequencing method cannot be eliminated, metagenomic approaches based on the directly random sequencing of environmental DNA are free of PCR biases compared with gene target PCR-based methods. On recent years, the application of metagenomics in fermented foods such as kimchi and kefir grains has increased (Jung et al., 2011;Nalbantoglu et al., 2014). Metagenomics approaches greatly extend the information on gene composition and metabolism but are limited in the recognition of gene activity, function, etc. The drawback of metagenomics can be overcome by metatranscriptomics. Metatranscriptomics can be applied to examine the gene function and metabolism of the active microorganisms, which is a promising approach for gaining unbiased insights into the functionality of a microbial community (Van Hijum et al., 2013). Results of analyzing lactic acid bacterial (LAB) gene expression during kimchi fermentation contributed to the knowledge of the active populations and gene expression in the LAB community are responsible for an important fermentation process (Jung et al., 2013).
Here, we studied the microbial metagenomes and microbial metatranscriptomes from fermented soybean collected from the Prefecture of De Hong in Yunnan Province using Ollumina Hiseq TM 2500. The aim of the present study was to investigate the microbial community structure and metabolic pathway of fermented soybean and analyze the metabolic processes of bioactive microbes.

Sample collection
Based on previous research (data not shown) and unique environment of human geography of Prefecture of Dehong in Yunnan Province for China, moreover, the microbial community structure of fermented soybean in different places is very similar, so we chose Prefecture of Dehong for sampling. We repeated the sampling at the main vegetable market in the city, five parallel samples were collected in Dctober 2014, which all were hand-made traditional fermented soybeans. The soybean samples were spontaneously fermented at ambient temperature for about 1 month by the local residents and did not contain any additives such as hot pepper, salt, etc. The fermented soybean samples were stored at -80 °C for total DNA and RNA extraction.

Total DNA extraction, determination and purification
Total DNA and RNA were extracted on the same day, and the five parallel samples were evenly mixed and ground for DNA and RNA extraction. DNA was extracted from 0.5 g of the samples in a 1.5 mL centrifuge tube by a CTAB-based method as previously described (Schmidt et al., 1991) with slight modifications. Briefly, the samples were resuspended in 450 µL of a lysis buffer (0.1 mol/L Tris/HCl, 0.1 mol/L EDTA, and 0.75 mol/L sucrose) treated with lysozyme (1 mg/mL) and incubated in a water bath at 37 °C for 30 min. Subsequently, the samples were treated with SDS (1%) and CTAB (1%), followed by treatment with phenolI:chloroformI:isoamyl alcohol (25I:24I:1, V/V) and chloroformI:isoamyl alcohol (24I:1, V/V) to remove impurities. The samples were resuspended in 50 µL of a TE buffer (0.01 mol/L Tris-HCl, pH 8.0, and 0.001 mol/L EDTA, pH 8.0) following treatment with ethanol. The DNA samples were stored at -20 °C until further analysis. Prior to sequencing, agarose gel electrophoresis was conducted to confirm the existence of the target band. Subsequently, the DNA samples were purified using the DNeasy plant Mini Kit. The purity and yield of the DNA samples were determined using the Thermo Scientific Nano Drop 1000 Spectrophotometer and Qubit fluorometer with DNA dye [picogreen (sigma,America)] to determine whether the samples were adequate and sufficient for sequencing and downstream reactions.

Metagenome sequencing, assembly and annotation
DNA was sequenced using a Hiseq 2500 sequencing platform (Ollumina, America), according to the manufacturer's protocol. Paired-end sequencing technology was used and the read length was 300 bp. The obtained sequences were filtered. Quality control was performed using the following criteriaI: (1) short reads ligated with adapter sequences were considered as adapter contamination and removed; and (2) reads contained more than 50 bases with quality values lower than 2 were removed. Filtered clean data was used in a contig assembly by Metavelvet (Namiki et al., 2012) at 81 kmer. The contigs were further analyzed to discriminate lengths longer than 200 bp. Gene prediction was conducted using MetaGeneAnnotator. The obtained genes were blasted to the SEED database (Dverbeek et al., 2005). Metagenomics analysis was conducted using Metagenome Rapid Annotation with Subsystem Technology (MG-RAST) (Meyer et al., 2008;Glass et al., 2010). The reads passing the MG-RAST quality filters were matched to the SOLVA Small Subunit (SSU) rRNA database (Pruesse et al., 2007) and the functional and organism database M5NR (M5 non-redundant protein database), (Wilke et al., 2012) for analysing the diversity of species. To explore the metabolic potential of the fermented soybean microbiome. These data were compared with SEED subsystems using a maximum e-value of 1e-5, a minimum identity of 80%, and a minimum alignment length of 15 aa for protein and 15 bp (The default setting of MG-RAST) for RNA databases.

RNA extraction
The pretreatment was required prior to total RNA extraction. The samples were mixed with moderately sterile water, followed by incubation in a bed temperature incubator with shaking at 180 rpm and ambient temperature for 2h. The eluent was centrifuged at 5000 rpm, and then the sediment was collected for the extraction of total RNA. An RNA extraction kit (BioTeke, China) was used to isolate total RNA from fermented soybean according to the manufacturer's instructions. Finally, RNA was dissolved in 50 μL DEPC-H 2 D, then 1 μL RNAase inhibitor (40 u/μL) was added to the RNA solutions. Total RNA was quality tested using a 2100 Bio analyzer. The RNA Ontegrity Number (RON) value was 6.1. To reduce DNA contamination in the RNA extract, the sample was treated with DNase O (Takara, Japan) for 30 min at 37 °C. The RNA treated with DNase O was used as a PCR template to amplify the 16S rRNA, which were analyzed in a 1% (w/v) agarose gel ensure for non-DNA contamination. Subsequently, rRNA was deleted from the RNA sample using the Ribo-Zero™ rRNA Removal Kit (Bacteria) (Epicenter, America) according to the manufacturer's instructions.

cDNA library construction and illumina sequencing
cDNA was synthesized using the SuperScriptTM Double-Stranded cDNA Synthesis Kit (Onvitrogen, America) according to the manufacturer's instructions with 100 ng of purified mRNA. The constructed library was assessed three times through quantification using a Qubit fluorometer, 2% agarose gel electrophoresis and additional quantification using a high-sensitivity DNA chip to ensure the quality of the library. The concentration of cDNA was determined with a NanoDrop 3300 (Thermo Fisher, USA) with Picogreen reagent. A total of 10 ng of cDNA library was needed, and cluster generation was realized using the TruSeq PE Cluster Kit (Ollumina, America). Subsequently, the clustered cDNA was bidirectionally sequenced on an Ollumina Hiseq TM 2500 platform.

Metatranscriptomic bioinformatics
The raw reads were obtained after sequencing and stored in the FASTQ file form. To obtain clean reads and guarantee the quality of information analysis, the raw reads were filtered using the FASTX-Toolkit. The expression results were gathered as reads per length of transcript, expressed as kilo-base per million mapped reads (RPKM) values (Ammar et al., 2012). This method is described using the following Formula 1I:

Nucleotide sequence accession numbers
The metagenomics sequences and metatranscriptomics sequences from this study are available in the sequence read archive (SRA) under the accession number SRP083828.

The information of metagenomics and metatranscriptomics sequences
The DNA and RNA of fermented soybean were sequenced on an Ollumina Hiseq TM 2500 platform. For the metagenomics sequences, 92,192,276 raw reads were generated, averaging 13.83 Gb in size and 150 bp in length. For the metatranscriptomics sequences, 38,798,262 paired-end sequences were generated, averaging 11,717,075,124 bp, with an average length of 151 bp.
For the metagenomics data, the short Ollumina reads were assembled into longer contigs that could be analyzed and annotated by standard methods. A total of 40,644 contigs were obtained. The average length of the contigs was 1,264 bp, and the average guanine-cytosine content (GC content) was 41.93%. The assembled contigs were used to predict genes and then 80,514 genes were obtained.

The composition of the microbial communities in the fermented soybean
At the domain level, most of the contigs of the fermented soybean belonged to bacteria (95.22%, 64437) and a few of the contigs belonged to eukaryote (0.28%, 189) and viruses (0.58%, 392). On addition, classification information for the remaining contigs (3.92%, 2653) could not be determined. At the phylum level, the most abundant phylum was Proteobacteria, with 24,539 contigs. The second most abundant phylum was Firmicutes, containing 19,583 contigs.

The metatranscriptomic analysis of the fermented soybean microbiome
To compare the gene expression level among clusters of orthologous group (CDG) function clusters, we calculated the RPKM values of all predicted genes and the genes (total RPKM value> 5) were subsequently classified into the CDG function clusters. The RPKM value for amino acid transport and metabolism was highest, followed by energy production and conversion, and carbohydrate transport and metabolism (Figure 3). The RPKM values of the genes related to proteases, sugar/maltose fermentation was 395.76 and 221.90.

Discussion
The incorporation of metagenomics and metatranscriptomic approaches provides an overview of both community taxonomies combined with the encoded and expressed functional diversity of these communities. The incorporation of various metagenomics and metatranscriptomic approaches has been applied to environment, human and animal samples to characterize the microbial and viral communities present in an ecosystem, thereby elucidating metabolic capabilities and native taxa, such as acid mine drainage systems (Chen et al., 2015b), cystic fibrosis patients (Lim et al., 2013) and hindgut paunch microbiota in wood-and dung-feeding higher termites (He et al., 2013). Currently, metagenomics has been applied to various fermented foods, such as Korean kimchi (Jung et al., 2011), pure tea (Lyu et al., 2013, Shaoxing rice wine (Xie et al., 2013) etc. However, the application of metagenomics and metatranscriptomics to study the microbial community structure and the metabolic characteristics of fermented foods is lacking. Fermented foods have a clear advantage of floras, and their microorganisms generally exhibit high activity. On the present study, we applied metagenomics and metatranscriptomics to investigate the microbial community structure and gene expression of fermented soybean, a traditional fermented food in China.
Although the metagenomics and metatranscriptomic data were large, the sequencing qualities were good. Thus, the results of the microbial community structure and gene expression were reliable. A minor amount of microorganisms, such as fungi and viruses, were detected, suggesting that the sequencing coverage was relatively high and that the results reflect the comprehensive general picture of the microbial communities of fermented soybean.
Providencia of Enterobacteriaceae is an unusual pathogenic bacterium; however, its infection rate has increased, so the genus of Providencia has drawn increasing research attention. Relatively common bacterial species in the genus Providencia include Providencia rettgeri and Providencia stuartii. Providencia rettgeri, which has been detected in foods and have increased to large numbers during the process of food production, transport and sales; and their presence could contaminate foods and induce food poisoning. Currently, an increasing number of studies have shown that Providencia rettgeri, Providencia stuartii, Escherichia coli and Enterobacter cloacae could produce β-lactamases and extended-spectrum β-lactamases (Ferjani et al., 2015;Mahrouki et al., 2015;Dsawa et al., 2015). Beta-lactam antibiotic is the most widely used antibiotic. On recent years, consistent with the development and widespread use of antibiotic medicines, the phenomenon of bacterial drug resistance has become increasingly serious. Bacterial drug resistance occurs because these bacteria could produce β-lactamases and extended-spectrum β-lactamases. Several studies have shown that the types of bacteria described above could participate in fermentation and that these species might be derived from the environment. Notably, there are many fermentation strains with low abundance in fermented soybean, such as Bacillus subtilis, Bacillus thermoamylovorans and Bacillus licheniformis and the genus of Lactobacillus and Lactococcus. A previous study revealed that fungi, yeasts, Bacillus and lactic acid bacteria are key players in fermented soybean fermentation, and that when identified probiotic microorganisms participate in fermentation, a higher quality of fermented soybean product was produced (Chen et al., 2015a). Ot has been speculated that fermented soybean is contaminated with pathogenic bacteria (Providencia rettgeri, Providencia stuartii, Escherichia coli and Enterobacter cloacae) during storage and that the pathogenic bacteria increased to large numbers that inhibit the growth of Bacillus and lactic acid bacteria. Bacillus subtilis widely exists in the natural environment and could produce proteases and esterases that degrade the protein and fat in soybeans, effectively promoting the absorption of nutrients in soybean for the human body (Kada et al., 2013;Li et al., 2014). Bacillus subtilis could also produce higher levels of isoflavone aglycones, which might enhance health benefits over traditional fermented natto (Wei et al., 2008). Bacillus amyloliquefaciens produces the broad-spectrum antifungal protein, baciamin, which represents one of the few bacterial antifungal proteins reported to date (Wong et al., 2008).
Unexpectedly, lactic acid bacteria were not the predominant floras, suggesting that fermented soybean is contaminated with pathogenic bacteria during storage. Nevertheless, carbohydrate categories, including the metabolism of mono-, di-, and oligosaccharides and fermentation, were key categories for the fermented soybean microbiome, and the RPKM value of the carbohydrate transport and metabolism was higher. Soybeans are rich in protein, so the genes participating in protein metabolism are relatively larger to degrade the protein in soybeans, and the RPKM value of the amino acid transport and metabolism was the bigest. Among the many gene categories in the fermented soybean metagenome, the carbohydrate categories were primarily focused on carbohydrate metabolism, yielding an average of 13.30% of all matches (Figure 2A). The fraction of the kimchi microbiome and pure tea reads in the carbohydrate category was 14.49% (Jung et al., 2011) and12.89% (Lyu et al., 2013), respectively. Therefore, the genes in the carbohydrate categories were generally higher than those of microbiomes derived from the termite gut (12.89%), acid mine drainage biofilm (12.17%), and Sargasso Sea (12.23%). However, the metagenomics read fractions of monosaccharide (15.76%), di-and oligosaccharide (11.28%), and fermentation (7.72%) within the carbohydrate category in the fermented soybean microbiome were lower than those in the microbiomes derived from kimchi (28.70%, 28.38%, 14.38%), which might be affected by pathogenic bacteria. Kimchi fermentation was predominantly accomplished by heterofermentative lactic acid bacteria, thus the fermentation metabolism subcategory was primarily associated with various lactate fermentations (65.93%) (Jung et al., 2011). On the present study, the dominant microorganisms were replaced with pathogenic bacteria, thus the genes of lactate fermentation were far less (9.66%) in the fermentation metabolism subcategory ( Figure 2D). However, the genes were primarily associated with acetone butanol ethanol synthesis (24.03%) and mixed acid fermentations (20.17%) and acetyl-CoA fermentation to butyrate (18.24%), which might be closely associated with fermented soybean fermentation. On addition, the most predominant subsystem of cofactors, vitamins, prosthetic groups and pigments was folate and pterines metabolism (57.13%). Folate is a type of essential nutrient element, particularly for pregnant women and children. Two strains of Lactobacillus sakei and a strain of Lactobacillus plantarum have been confirmed to produced high levels of extracellular folate (Masuda et al., 2012).

Conclusion
On conclusion, the present results revealed the microbial community structure and metabolic potential of fermented soybean in Yunnan Province. Providencia stuartii was the predominant species and not the normal LAB, possibly because the fermented soybean was rotten and deteriorated or polluted during storage. Nevertheless, the gene expression of the amino acid transport and metabolism was the highest, thus the protein and amino acid metabolism was still most active.