Genes highly overexpressed in salt-stressed young oil palm (Elaeis guineensis) plants1

RNA-seq is a technique based on the large-scale sequencing of transcript-derived cDNAs using next-generation sequencing platforms mostly used today to characterize an organism’s transcriptome. The analysis of RNA-seq data allows for identifying genes differentially expressed in a given condition, such as salt stress. This study aimed to search and characterize genes from the African oil palm (Elaeis guineensis Jacq.) highly up-regulated during salt stress, with a long-term goal of gene promoter prospection and validation. The apical leaves from the control (electrical conductivity of ~2 dS m-1) and salt-stressed (~40 dS m-1) young oil palm plants, collected at 5 and 12 days after the beginning of the stress, were subjected to extraction of total RNA, with three plants (replicates) per treatment. The complete genome of E. guineensis, available at the National Center for Biotechnology Information, was used as the reference genome BioProject PRJNA192219. The differential expression analysis led to the selection for further characterization of seven genes, which had increased expressions of 37–84 times under salt stress. The strategy used in this study enabled the selection of seven salt-responsive genes highly up-regulated during salt stress, and some of them coded for proteins already reported as responsible for salinity tolerance in other plant species through over-expression or knockout.


Introduction
Soil salinity is a problem present in more than 100 countries spread over almost all continents; approximately 20% of the agricultural land worldwide has saline and/or sodic soils, and between 25 and 30% of the irrigated land area is affected by salt and essentially commercially unproductive (Shahid et al., 2018).
Salinity is often seen as a problem for the agricultural sector, prompting actions aimed at prevention or remediation in the affected areas; however, in the scope of biosaline agriculture, this problem is seen as an opportunity for the production of food, fibers, and bioenergy, as well as for the recovery of degraded areas and the use of marginal ones (Joshi et al., 2020;Tıpırdamaz et al., 2020;Duarte & Caçador, 2021). In this context, cultivation systems for saline environments are developed, using the ability of some plants to grow under saline conditions in combination with the use of saline soils and water resources, and better soil and water management (Ventura et al., 2015;Duarte & Caçador, 2021).
The transcriptome is the complete set of RNA produced under specific circumstances or in a cell, tissue, organ, or an entire organism, in a given moment of its development. Several RNA types are present in the transcriptome, such as mRNA, tRNA, rRNA, snRNA, snoRNA, miRNA, lncRNA, and pseudogenes (Wang et al., 2017). In recent years, due to the technological advances and cost reduction achieved with the RNA-seq technique, we have witnessed an explosion in the amount of transcriptome data generated and made public (Lowe et al., 2017).
The current study is built upon the study by Vieira et al. (2020), who evaluated the morphophysiological responses and ionic imbalance in the substrate, roots, and leaves of young African oil palm plants (Elaeis guineensis Jacq.) under different NaCl doses following a rigorous substrate salinization protocol. The present study carried out a comprehensive large-scale transcriptome analysis aimed to prospect and annotate saltresponsive genes in the genome of the African oil palm that were highly up-regulated during salt stress, with a long-term goal of gene promoter prospection and validation.

Material and Methods
All oil palm plants used in the study were regenerated from embryogenic calluses (AM33 genotype). Young oil palm plants at the growth stage known as "bifid saplings" were subjected to different doses of NaCl in March 2018 and maintained under these conditions for 12 days in a completely randomized experimental design at a greenhouse in Brasília, DF, Brazil (15.732° S, 47.900° W, 1,030 meters of altitude) (Vieira et al., 2020).
Based on the morphophysiological responses of the young oil palm plants to salinity stress, the apical leaves from three stressed plants (~40 dS m -1 of electrical conductivity) were collected at 5 and 12 days after the stress onset (DAT), together with the apical leaves from three control plants at 12 days (~2 dS m -1 ), immediately frozen in liquid nitrogen and stored at -80 °C until total RNA extraction.
Total RNA samples were subjected to RNA-Seq using an Illumina HiSeq platform at the GenOne Company (Rio de Janeiro, RJ, Brazil), using the paired-end strategy. The fastq files with the 150-nt-long sequences were deposited in the Sequence Read Archive (SRA) database of the National Center for Biotechnology Information -BioProject PRJNA573093, BioSample SAMN12799239.
All the RNAseq analyses went through the OmicsBox bioinformatics platform (OmicsBox -Bioinformatics Made Easy, BioBam Bioinformatics, March 3, 2019, https:// www.biobam.com/omicsbox), in version 1.2.4., module transcriptomics, using the adjusted Reference Based Transcriptomics Workflow. Only high-quality sequences with a minimum average quality of Q30 and a minimum length of 75 nts underwent mapping against the African oil palm reference genome (Singh et al., 2013) -files downloaded from the National Center for Biotechnology Information (BioProject PRJNA192219; BioSample SAMN02981535) on October 2020 -using the software STAR (Dobin et al., 2013).
The nine BAM files generated during the mapping stage were submitted to the generation of the Count Table (BAM + GFF) using the HTSeq software version 0.9.0 (Anders et al., 2015), and pairwise differential expression analysis using the software package edgeR version 3.28.0 (Robinson et al., 2010).
The false discovery rate (FDR) and Log 2 (Fold Change) were used as criteria to select the genes responsive to salinity stress -FDR ≤ 0.01 and Log 2 (Fold Change) ≥ 5.0. FDR is a statistical approach used in multiple hypothesis testing to correct for multiple comparisons, and it is typically used in high-throughput experiments to correct for random events that falsely appear significant (Benjamini & Hochberg, 1995). Genes differentially expressed (DEGs) at 5 and 12 days after imposing the salinity stress (DAT) underwent further characterization based on the African oil palm genome annotated and available at NCBI (Singh et al., 2013).

Results and Discussion
A total of 212,241,278 pairs of high-quality sequences underwent mapping against the African oil palm genome at an average of 23,582,364 per sample. Over 99% of these sequences mapped to the African oil palm reference genome, showing no contamination by RNA from other organisms. Finally, on average, approximately 74% of the sequence pairs mapped just once to genes in the African oil palm genome (Table 1). The African oil palm reference genome has 29,567 genomic features of type 'gene'; however, 4,213 of these features had no aligned reads detected in any of the samples generated in this study (data not shown).
When comparing the control against stressed plants on the fifth DAT, the pairwise differential expression analysis revealed that out of the 25,354 genes from the E. guineensis genome (Singh et al., 2013), 35 genes were differentially expressed using the following selection criteria: FDR ≤ 0.01 and Log 2 (Fold Change) ≥ 5. On the other hand, when comparing control against stressed plants, on the 12 DAT, the pairwise differential expression analysis revealed 11 DEGs. Seven genes were present in both groups of DEGs (Figure 1).  The small number of DEGs revealed in this study is a direct consequence of the high Log 2 (Fold Change) used, where only DEGs with a fold change of 32 times or more were selected. Another fact that contributed to this small number of DEGs is the FDR used in this study, i.e., ≤ 0.01. FDR offers more confidence than just p.value, and, usually, results below 0.05 are considered real (Benjamini & Hochberg, 1995).
The reason to select such values for Log 2 (Fold Change) and FDR in this study was the intention to select salt-responsive genes with high increases in the expression level for future studies on gene promoter prospection and validation. The database of salt-responsive genes generated in this present study is a reservoir of data that can be used to respond to several questions regarding the response of oil palm plants to salt stress. One question, for instance, could be whether the strategy of salt stress employed by Vieira et al. (2020) was able to render the plants at 5 DAT to the osmotic stress and at 12 DAT to the ionic stress.
These seven DEGs underwent structural and functional annotation. Just one of the seven DEGs was in a scaffold still not linked to any chromosome in the African oil palm genome. These DEGs had between one and 12 exons, and all code for proteins. Two of them were in the sense strand, and the remaining in the anti-sense strand. These seven genes coded for a thiosulfate sulfurtransferase, a non-specific lipid transfer protein, an acid endochitinase, a structural protein of the cell wall rich in glycine, a transcription factor, a probable amino acid transporter in the vacuole, and a carbonic anhydrase (Table 2).
LOC105055832 is a homolog of the probable vacuolar amino acid transporter YPQ1 gene in the African oil palm genome. Transport proteins from three major families -ATF (amino acid transporter family), APC (amino acid-polyaminecholine transporter family), and UMAMIT (usually multiple acids move in and out transporter family) -mediate the transport and allocation of amino acids within plants (Yang et al., 2020).
The ATF subfamily has different groups of proteins: AAPs (amino acid permeases), LHTs (lysine and histidine transporters), GATs (γ-aminobutyric acid transporters), ProTs (proline transporters), AUXs (indole-3-acetic acid Table 2. Gene ontology classification of the genes highly overexpressed in the apical leaf of young oil palm plants under salinity stress, after 12 days of treatment. Genes were classified into biological process, molecular function, and cellular component at the second level of gene ontology classification * Unplaced Scaffold ID; ** Chromosome number transporters), aromatic and neutral amino acid transporters, and amino acid transporter-like proteins (Yao et al., 2020).
Several different types of amino acid transporters transport proline, an important compatible solute known to play a role in protecting plants from abiotic stresses, such as salinity, drought, and freezing, and altered expression of some of these transporters can enhance plant tolerance to salt and drought stresses (Yao et al., 2020).
LOC105055156 is a homolog of the thiosulfate sulfurtransferase 16, chloroplastic, gene in the African oil palm genome. Sulfurtransferases/rhodaneses (Str) comprise a large and complex group of enzymes that catalyze the transfer of sulfur from thiosulfate to cyanide, leading to the formation of thiocyanate and sulfite (Papenbrock et al., 2011). Hatzfeld & Saito (2000) were the first ones to report the existence of rhodaneses in plants, and Mao et al. (2011) showed that STR1/STR2 sulfurtransferases play a role in plant embryo/ seed development, highlighting the important biological function of this group of ubiquitous enzymes may have in plants. More recently, Moseler et al. (2019) classified this protein family into nine clusters depending on their primary sequence and domain arrangement based on the expansion of knowledge regarding STRs in higher photosynthetic organisms.
LOC105045973 is a homolog of a non-specific lipid transfer protein 1 gene in the African oil palm genome. Plant non-specific lipid transfer proteins (LTPs) are small soluble proteins that facilitate the transfer of fatty acids, phospholipids, glycolipids, and steroids between membranes. LTPs have roles in seed storage, lipid mobilization, cuticle synthesis, somatic embryogenesis, pollen tube adhesion, and plant tolerance to abiotic -drought, cold, and salt stress -and biotic stress -bacterial and fungal pathogens (Gangadhar et al., 2016). Over-expression of nsLTP genes led to improved tolerance to salt, drought, and cold stress in Arabidopsis and potato (Guo et al., 2013;Zou et al., 2013;Gangadhar et al., 2016).
LOC105047536 is a homolog of an acidic endochitinase gene in the African oil palm genome. Chitinases are glycosyl hydrolases that catalyze the degradation of chitin, one of the most abundant biopolymers in nature, and genes encoding proteins with homology to chitinases have been identified from a wide range of organisms, including organisms that do not contain chitin, such as bacteria, viruses, animals and plants (Kwon et al., 2007).
Endogenous genes encoding proteins belonging to the plant chitinase families are involved in abiotic stress responses, and the overexpression of chitinase genes in plants -whether fungal -or plant-derived genes -does increase tolerance to abiotic stresses, such as salinity and heavy metals (Kwon et al., 2007;Brotman et al., 2012).
LOC105058460 is a homolog of the transcription factor bHLH94 gene in the African oil palm genome. The basic helixloop-helix (bHLH) transcription factors are a large gene family in the plant genome involved in regulating plant responses to abiotic stresses, including salt stress. Recent studies have shown that the overexpression of this type of genes increased the tolerance to drought and salinity stress in tomato and Arabidopsis plants (Waseem et al., 2019;Qiu et al., 2020).
LOC105039727 is a homolog of a glycine-rich cell wall structural protein 2 gene in the African oil palm genome. In plants, glycine-rich proteins (GRPs) are characterized by the presence of semi-repetitive glycine-rich motifs and classified into five distinct groups -Class I, II, III, IV, and V (Mangeon et al., 2010). Czolpinska & Rurek (2018) postulate that GRPs could play a promising role in agriculture through plant genetic engineering in the coming years because some studies have shown that their overexpression can confer tolerance to cold, drought, and salinity stresses.
LOC105031978 is a homolog of an alpha carbonic anhydrase 7 gene in the African oil palm genome. Plants contain three evolutionarily distinct carbonic anhydrases (CAs) families -αCAs, βCAs, and γCAs, and multiple genes encoding all three types of CAs. CAs are zinc metalloenzymes found in numerous plant tissues and different cellular locations that catalyze the interconversion of CO 2 and HCO 3 - (DiMario et al., 2017).
Transgenic Arabidopsis plants overexpressing a βCA gene from rice showed improved salt tolerance (Yu et al., 2007). αCAs are the largest CA gene family in most plants, but they are also the least studied most likely because they are not highly abundant in leaves and roots.
The expression values of the seven genes -highly upregulated either at 5 and 12 DAT -showed a fold change between 37 and 84 times in the salt-stressed plants at 12 DAT in comparison to the control plants. The levels of expression at 5 DAT from all seven genes were higher than at 12 DAT ( Figure 2).
If the goal of a transcriptomics study, such as this present study, is also to select candidate salt-responsive genes for future gene promoter prospection and validation, fold change alone is not a secure criterion, as it has to take into consideration the expression value of a gene in the control plant. All seven genes had expression values close to zero (data not shown).
Based on the analysis of the expression values at 12 DAT, these seven genes were divided into two groups: those with an expression value lower than 100, and those higher than 100 (Table 3). Taken together, these results put the three genes in the latter group as candidate genes for future studies aiming at the prospection and validation of salt-responsive promoter sequences.
As pointed out above, five out of these seven salt-responsive genes code for proteins with studies showing that their overexpression increased the tolerance to salinity stress in several plant species (Kwon et al., 2007;Yu et al., 2007;Brotman et al., 2012;Guo et al., 2013;Zou et al., 2013;Gangadhar et al., 2016;Czolpinska & Rurek, 2018;Waseem et al., 2019;Qiu et al., 2020). Three of these five genes showed an expression value lower than 100, and two higher than 100 at 12 DAT (Table 3). Taken together, these results put these five genes as candidate genes for future studies aiming at the generation of oil palm plants tolerant to salt stress, via a constitutive overexpression strategy.

Conclusions
1. The prospection and annotation strategy applied allowed the selection of seven salt-responsive genes from oil palm plants showing increased expression levels as a result of stress with approximately 40 dS m -1 .
2. Three out of those seven salt-responsive genes are candidate genes for future studies aiming at the prospection and validation of salt-responsive promoter sequences.
3. Five out of those seven salt-responsive genes are candidate genes for future studies aiming at the generation of oil palm plants tolerant to salt stress, via a strategy of heterologous constitutive overexpression. Table 3. Statistics from the differential expression analysis of the genes highly overexpressed in the apical leaf of young oil palm plants under salinity stress, after 5 and 12 days of treatment, in comparison to the control plants