Screening and analysis on the differentially expression genes between diploid and autotetraploid watermelon by using of digital gene expression profile

Synthetic polyploids are key breeding materials for watermelon. Compared with diploid watermelon, the tetraploid watermelon often exhibit wide phenotypic differences and differential gene expression. Digital gene expression (DGE) profile technique was performed in this study to present gene expression patterns in an autotetraploid and its progenitor diploid watermelon, and deferentially expressed genes (DEGs) related to the abiotic and biotic stress were also addressed. Altogether, 4,985 DEGs were obtained in the autotetraploid against its progenitor diploid, and 66.02% DEGs is up-regulated. GO analysis shows that these DEGs mainly distributed in ‘metabolic process’, ‘cell’ and ‘catalytic activity’. KEGG analysis revealed that these DEGs mainly cover ‘metabolic pathways’, ‘secondary metabolites’ and ‘ribosome’. Moreover, 134 tolerance related DEGs were identified which cover osmotic adjustment substance, protective enzymes/protein, signaling proteins and pathogenesis-related proteins. This study present the differential expression of stress related genes and global gene expression patterns at background level in autotetraploid watermelons. These new evidences could supplement the molecular theoretical basis for the better resistance after the genome doubling in the gourd family.


Introduction
In animal and plant kingdom, many creatures have undergone polyploidization during evolution (Comai, 2005;Benya et al., 2017).Recent evidence indicate that 15% and 31% of angiosperm and fern speciation respectively have experience this event characterized by double or multiply complete sets of chromosomes (Wood et al., 2009).Naturally, polyploidization can be occurred inside a single species or via interspecific hybridization and which is named as autopolyploid or allopolyploid (del Pozo and Ramirez-Parra, 2014).Resynthesize genome duplication of plant cells can be traced back to 1904 by treatment with chloral hydrate and other narcotics (Blakeslee and Avery, 1937).Since then, various crop polyploidys have been created to explore novel phenotypes and improve economic traits such as in cotton (Wendel and Cronn, 2003), wheat (Dubcovsky and Dvorak, 2007), rice (Song et al., 2014) and so on.These successful examples have triggered on great interests to reveal the evolution mechanism between tetraploid and diploid progenitor (Otto and Whitton, 2000;Chantret et al., 2005;Saminathan et al., 2015).
Since plant often show novel phenotypes after polyploidization of chromosome which are not present in their diploid progenitors (Ramsey and Schemske, 1998), then resynthesize polyploidization provide a new way for screening resistant germ-plasma for breeding (Carputo et al., 2000;Vining et al., 2015).Up to now, various polyploidy with drought tolerance (del Pozo and Ramirez-Parra, 2014), cold resistance (Deng et al., 2012), salt tolerance (Chao et al., 2013), pathogen resistance (Oswald and Nuismer, 2007) have been reported, and these phenomena always accompanied by up expression of resistance gene (Wu et al., 2013), protective enzymes (Caverzan et al., 2012), or resistance related small molecules (Willett and Burton, 2002).However, these traits are not always consistent with gene dosage effects since there are more complex mechanism behind it, such as mutation buffering, allelic diversity and heterozygosis, and sub-functionalization of duplicated genes (Osborn et al., 2003;Comai, 2005;Dubcovsky and Dvorak, 2007).
The plant polyploidization could effects on its development and resistance to abiotic stresses, and this effects have been studied through the differential gene expression patterns (Chen and Ni, 2006;Doyle et al., 2008).Gene expression changes could elaborate the molecular basis of polyploid evolutionary event, which have been exampled in Arabidopsis (Wang et al., 2006), watermelon (Saminathan et al., 2015) and Brassica (Gaeta et al., 2009).In watermelon, RNA sequencing (RNA-seq) revealed that 5,362 (80.63%) differentially expressed genes (DEGs) were up-regulated in autotetraploid, and which involved in arginine biosynthesis, chlorophyllide synthesis, GDP mannose biosynthesis and so on.Moreover, alternative splicing events between diploid and autotetraploid were also found different in spatial expression (Saminathan et al., 2015).
Since autotetraploid watermenlon can be routinely induced by colchicines (Suying et al., 1993), and since these autotetraploids always show attractive resistance when challenged by stress stimuli, questions about the gene expression patterns under normal condition between different autopolyploids and their diploids in watermelon are still waiting exploring.Based on the genome sequenced data (Guo et al., 2013), digital gene expression (DGE) profile technique has become a convenient and economic way for it.Here, we present a gene expression pattern between an autotetraploid and its progenitor diploid watermelon, and DEGs related to the tolerance were also discussed.

Materials preparation and RNA isolation
The diploidy and autotetraploidy watermelon (voucher FR-32-1B-2n and FR-32-1B-4n) used in the experiment are kept in Institute of Tropical Crop Genetic Resources (ITCGR), Chinese Academy of Tropical Agricultural Sciences (CATAS).The 3~5 matured leaves were harvested between 15~20 node in the fruit-setting stage watermelons which were cultivated in greenhouse.The harvested leaves were transferred to liquid nitrogen immediately for quick-frozen then grated and mixed into two pools per sample, and kept at -80°C refrigerate for following experiments.
Total RNA was extracted and its quantity and quality was examined according to Qiao et al. (2014).

Illumina RNA-sequencing and data processing
To investigating the expression of genes in diploidy and autotetraploidy watermelons, two cDNA libraries, constructed from diploidy and autotetraploidy watermelon, named as 2n and 4n libraries, were sequenced by an Illumina technology.Specific operations are as follows, firstly, the mRNA was enriched, and then be enzymatic hydrolysis into short sequences.Secondly, taking these short sequences as moldboard, the cDNA and the complementary strand was synthesized.Finally, the double strand cDNA were used for setting up the sample libraries for PCR amplification.Its quality was tested with Agilent 2100 Bioanaylzer and quantity was tested with ABI StepOnePlus Real-Time PCR System.All things being in readiness, the library sequencing tasks were perform using Illumina HiSeq 2000.Millions of raw reads were generated, and then removing the low quality reads to obtain clean tags.

Reads mapping and Differentially Expressed Genes (DEGs) selection
To assume the gene expression characteristics in the diploidy and autotetraploidy watermelons, all clean tags were mapped into watermelon genome database (Cucurbit Genomics Database, 2018) by using Bowtie2 (Langmead et al., 2009) and BWA (Li and Durbin, 2009).Gene expression level was quantified by software RSEM (Li and Dewey, 2011), and the gene expression is represented by the fragments per kilobase per million (FPKM) value.Gene expression level is judged according to the criteria as following: the FPKM threshold 0.1 was set to determine whether the gene was expressed; FPKM in 0.1 to 10 indicates low level; FPKM in 10 to 1,000 indicates medium level, and FPKM > 1,000 indicates high level.
To find differentially expressed genes (DEGs) and perform further function analysis on the diploidy and autotetraploidy watermelons, DEGs were screened by using Possion distribution method (Benjamini and Yekutieli, 2011), namely genes had p-value ≤ 0.05 with false discovery rate (FDR) ≤ 0.001 and estimated absolute |log2 ratio| ≥ 1 in the comparisons of 4n and 2n were selected.

DEGs GO Annotation and KEGG Pathway Enrichment
To identify and analysis the function and pathway the DEGs involved from the macro-level, The WEGO software (Ye et al., 2006) and the KEGG database was adopted.Taking corrected p-value ≤ 0.05 or Q-value ≤ 0.05 to judge whether a function term or a pathway is significant.This analysis could recognize the main biological functions and pathway of DEGs (Kanehisa et al., 2008).

Analysis of Digital Gene Expression (DGE) profile
According to the sequencing results, in total, approximately 19.4 and 20.6 million raw reads were obtained, and removing low quality reads, a total of 18.6 and 20.1 million clean reads were successfully generated in 2n and 4n libraries, respectively (shown in Table 1).Furthermore, these tags were mapped to the watermelon reference genome to get its annotation.In our study, 13,206,108 (70.92%) and 13,683,906 (68.19%) total mapped reads were favorably acquired in 2n and 4n libraries, respectively (Table 1).In which 11,442,762 (61.45%) and 11,743,881 (58.52%) clean reads were perfectly matched to the reference genomes, respectively.Further analysis revealed that 12,652,595 clean reads (67.95%) in the 2n library and 13,260,965 clean reads (66.0%) in the 4n library were uniquely mapped (Table 1).
The sequencing depth in the two libraries was analyzed, to ensure our sequence data is sufficient for the DEGs coverage.The number of identified genes is increased followed with the amount of sequenced reads increasing.While, when the amount of sequenced reads reach 150×100 k, the growth curve of identified genes tend to be flatten (show in Figure 1), this result means that the number of identified genes is enough for the next step analysis.

Global analysis of gene expression
The normalcy of the whole DGE profile data was assessed by the dispersion of clean tags expression (Table 2).Obviously, in the diploidy and autotetraploidy  watermelon, the dispersion of the distinct clean tags expression between the 2n and 4n libraries showed seriously similar tendencies that a majority of genes are in a low or medium expression level, nevertheless, a few genes unexpressed or highly expressed, which imply that the whole DGE data between 2n and 4n libraries are normally distributed.Actually, some researches also revealed that a majority genes were low expressed (Wei et al., 2013;Lu et al., 2012;'t Hoen et al., 2008).

Different Expression Genes (DEGs) analysis
Altogether, there are 4,985 DEGs between the diploidy and autotetraploidy watermelons (Table S1 in Supplementary Materials).The fold changes of DEG expression by using of log 2 ratio to represent, interesting, the majority (95%) of these DEGs varied in log 2 ratio changes are lower than 5 but higher than 1, while only a few (5%) of them exhibited log 2 ratio higher than 5 or lower than 1 in the 2n and 4n libraries (Table S1 in Supplementary Materials), these results conform to the normal distribution of gene expression.
In the Figure 2, blue imply up-regulation genes in 4n library and orange imply up-regulation gene in 2n library.So compared with 4n library, there were only 33.98% DEGs up regulated genes (orange dots) in 2n library, that means 66.02% DEGs up regulated (blue dots) in 4n library (Figure 2).

Functional annotation of DEGs
A wide variety of regulatory and metabolic processes related to altered gene's expression throughout the life of the plant (DeRisi et al., 1997).Based on the allocated Gene ontology (GO) terms, the DEGs in 2n and 4n libraries were classified into several categories using WEGO software.All DEGs were categorized into 3 categories, namely biological process (19 subcategories), cellular component (12 subcategories) and molecular function (11 subcategories) (Figure 3).The 'metabolic process', 'cellular process' and 'single-organism process' subcategories were most prominent in the biological process group.In the cellular component group, 'cell', 'cell part' and 'organelle' were the outstanding terms.For molecular function group, the major subcategories were 'catalytic activity' and 'binding' (Figure 3).
A pathway analysis was carried out to appraise the metabolic network of the DEG's expression changes after chromosome doubling in the watermelons.The front 20 significantly concentrated pathways of the DEGs are presented in Figure 4.By comparing 2n with 4n library, the pathway analysis results of DEGs indicate that the top 3 pathway of DEGs were 'metabolic pathways', 'biosynthesis of secondary metabolites' and 'ribosome' (Figure 4).
In the present research, candidate genes with known roles in stress resistance were selected on the basis of DGEs between the 2n and 4n.134 candidate genes were identified with 34 up regulated genes in 2n library and 100 up regulated genes in 4n library (Table S2 in Supplementary Materials).According to gene's function, these genes can be classified into (1) osmotic adjustment substance related genes (13 genes), which are mainly closely involved the synthesis and metabolism of carbohydrate, proline and betaine; (2) protective enzymes/protein related genes (39 genes), including SOD、CAT、GSTs, etc.; (3) signaling proteins related genes (71 genes), including transcription factor dominated by WRKY (25 genes), hormone-related and protein kinase (46 genes); (4) pathogen interaction related genes (11 genes) (Table 3).
In view of metabolism, these genes cover six pathways, including glutathione metabolism (12 genes), peroxisome (13 genes), starch and sucrose metabolism (7 genes), plant hormone signal transduction (11 genes), plant-pathogen interaction (71 genes) and proline metabolism (5 genes).Coincidentally, 11 genes in plant hormone signal transduction pathway also belong to plant-pathogen interaction pathway.In addition, there are 26 genes belong to two or more than two metabolic pathways which imply these genes have more critical roles in the stress tolerance.Table 3. Differentially expressed genes related to stress resistance between diploid (2n) and tetraploid (4n) of watermelon.

The global gene transcription changes in diploid and tetraploid watermelon
Nowadays, DEGs provide a powerful tool to analysis the gene expression in a global view, this greatly facilitate breeders to decipher the secrets of autopolyploid.Among these secrets, whether various watermelon autopolyploids share some common gene expression patterns are still waiting to be revealed.
In the present study, two DEG profiles of autopolyploid watermelons were profiled to identify genes which are deferentially expressed.More than 18 million clean tags per library was reached, including more than 12 million unique mapped tags (Table 1), suggesting that the database selected is relatively completed.The global gene transcription approximately significantly altered 27.5% due to the chromosome doubling in diploid watermelon, and 66.02% in these DEGs up-regulated in autotetroploid watermelon.

General gene expression characters in the autopolyploid watermelon
Autopolyploidy can greatly alter the cytological process (Sharma and Gohil, 2013), biochemical process (PalII et al., 2015), genetic process (Lloyd and Bomblies, 2016) and physiological process (del Pozo and Ramirez-Parra, 2015) in various organisms.The autotetraploid watermelon used in our study results in larger petals and pedicel than diploid, and pollen grain of autotetraploid is larger with the increasing number of abnormal pollen grains, as a result, the seeds of autotetraploid are less than diploid (Feng, 2012).
GO analysis of DEGs in our DGE profile revealed that in cellular component, 'cell', 'organelle' and 'cell part' are dominant, which may contribute to the above mentioned morphological changes.Besides, higher level of secondary metabolites accumulation was also found in autopolyploidy (Lavania et al., 2012), such as alkaloids, terpene, scopolamine, etc (Madani et al., 2015;Dehghan et al., 2012;Lavania et al., 2012).Here, in our tetraploid, massive up-regulated genes are also fell into 'biosynthesis of secondary metabolites' pathway.

Gene expression divergence caused by tetrapolydization is involved with stress response
Evolutionarily, polyploidization has occurred in many plants naturally and in turn some of which may be with better change to be survival against extreme environments (Murray, 1995).Under harsh environment, the plants with duplicated genome always show a high level of expression of the tolerant related genes which contributed to their adaptability (Bertrand et al., 2015;Tamayo-Ordoñez et al., 2016).However, whether the polyploidized plant has to keep higher level of gene expression even under suitable condition or whether only some specific pathways are taking the surveillance job are still not well documented.New clues in this study showed that there are 134 defense related genes which were deferentially expressed in the tetraploid and the diploid (show in Table 3).These genes are mainly involved in signal transduction, osmotic adjustment, ROS related protective enzymes and disease resistance genes.
Efficiency of transform extracellular stimuli into intracellular physiological signals and transduction speed of intracellular signals play vital roles in stress tolerance.And these signals can be various substance, including sugars, transcription factors (TFs), protein kinases and calmodulin, (Kunz et al., 2014;Latz et al., 2013;Nakashima et al., 2012;Yoshida et al., 2014).In this research, 52.99% tolerance related DEGs encoding signaling substance, such as WRKY TFs, protein kinases and calmodulin, and 71.83% of them up-regulated in 4n library.A recent report revealed that allopolyploid Coffeaarabica is with higher content of sucrose than its two diploid parental species under cold temperature, whereas few differences were found at related gene expression level, supports that allopolyploid is more efficient in signaling (Bertrand et al., 2015).
Besides powerful signaling, polyploid plants can also up-regulate resistance genes involved in biosynthesis such as proline, betaine, ROS related protective enzymes and pathogenesis-related proteins (del Pozo and Ramirez-Parra, 2014;Bertrand et al., 2015;Tamayo-Ordoñez et al., 2016).Here, in our non-stressed 4n library, DEGs of proline synthesis, GB synthesis, protective enzymes and pathogenesis-related protein are strongly up-regulated.

Conclusion
A total of 18,783 DEGs were identified from diploidy (2n) and autotetraploidy (4n) watermelon.In which, 4,985 genes expressed significantly different and 3,291 DEGs were up-regulated in the autotetraploidy.GO analysis show that these DEGs mainly distributed in 'metabolic process', 'cell' and 'catalytic activity'.KEGG analysis revealed that these DEGs mainly cover 'metabolic pathways', 'secondary metabolites' and 'ribosome'.And 134 genes closely related to the stress resistance including osmotic adjustment substance, protective enzymes/protein, signaling proteins and pathogenesis-related protein in these significant DEGs.
These results suggest that there already exist a lot of differentially expressed genes in our diploidy and autotetraploidy watermelon under non-stress condition.And these DEGs can be related to various plant metabolic regulations, which contribute to the adaption ability and tolerances in autotetraploidy.These results supplement the molecular regulation system in autopolyploidy watermelon and also provides evidences of the basis of the resistance in autopolyploids.

Figure 1 .
Figure 1.Curve of sequencing saturation.X-axis shows the number of clean reads, units is 100 k --extreme value is currently the volume of sequencing.Y-axis shows the ratio of identified gene number to number of total gene reported in database.2n and 4n represent diploidy watermelon and autotetraploidy watermelon, respectively.

Figure 2 .
Figure2.Scatter plots of all expressed genes in 2n or 4n library.X-axis means log2 value of gene expression in 4n library, and Y-axis means log2 value of gene expression in 2n library.Blue imply down regulated genes, orange means up regulated genes in 2n library, and brown means non-regulation gene whether in 2n or in 4n library.If a gene expressed just in one sample, its expression value in another sample will be replaced by the minimum value of all expressed genes in control and case samples.Screening threshold is on top legend.

Figure 3 .
Figure3.GO functional classification on DEGs for 4n/2n.X axis means number of DEGs (the number is presented by its square root value).Y axis represents GO terms.All GO terms are grouped in to three ontologies: blue is for biological process, green is for cellular component and red is for molecular function.

Figure 4 .
Figure 4. Statistics of pathway enrichment of DEGs in 4n/2n.Rich Factor Refers to the ratio of DGEs and total genes located in this pathway.The larger the rich factor, the greater the degree of DEGs enrichment.Q-value is P-value after correction by a multiple hypothesis test ranging from 0~1, the closer to zero, indicating that the more significant enrichment of this pathway.

Table 1 .
Alignment statistics of clean reads align to reference gene.

Table 2 .
Number of genes at different FPKM interval after significance level correction.