The DNAJ gene family in yerba mate (Ilex paraguariensis): genome-wide identification, structural characterization, orthology based classification and expression analysis

Abstract Dry leaves and twigs of yerba mate are widely infusion-consumed in southern Southamerica. Endemic and adapted to the Atlantic Forest, its extensive full-sun monoculture links to diverse biotic (pest, pathogens) and abiotic stresses (solar radiation, drought), impacting its productivity, ecology and socioeconomic niche. We focused in comprehensively characterize the DNAJ gene family in yerba mate to predict its possible roles on development and diverse stress responses to further assist crop manage. Our results suggest that yerba mate DNAJ proteins account 140 diverse members of six structural types displaying potential variable roles in protein homeostasis control. We were able to classify them into 51 distinct orthology groups, in agreement to Arabidopsis, and performed translational genomics of function, localization, expression and stress responsiveness data. Genome mapping and expression analysis indicated that yerba mate DNAJ genes differ in expression, nucleotide composition, length and exon-intron structure. Intronless or few introns genes -linked to rapid stress response- accounted 85 DNAJs. Promoters of DNAJ genes harbored a 73.2% of cis-acting regulatory elements involved in response to diverse stresses, hormones and light, simultaneously. We hypothesize that yerba mate DNAJs assist to plant survival during multiple stresses linked to current dominant agroecosystem but promote its growth under shade.


Introduction
Dry leaves and twigs of the yerba mate or erva mate tree Ilex paraguariensis A. St.-Hil. (Aquifoliaceae) are normally consumed as an infusion called "mate" or "chimarrão" in Argentina, Brazil, Paraguay and Uruguay (Pereira Croge et al. 2021).The industry of yerba mate deeply influences the economy of the producer regions of Argentina, Brazil and Paraguay, which harbor about 160,000, 85,000 and 35,000 ha of this crop, respectively (Gortari et al. 2020), amounting a production of more than 900,000 tons in 2016 (Pereira Croge et al. 2021).
The yerba mate tree is native and adapted to the Atlantic Forest biome environment, however the full-sun monoculture practice of this crop is extensive (Montagnini et al. 2011).In this context, yerba mate is negatively affected by diverse insect and acari pests, in addition to fungi and viral pathogens (Sosa et al. 2011;Debat et al. 2014a;Rybak et al. 2014;Bejerman et al. 2017Bejerman et al. , 2020;;Bergottini et al. 2017).The most important abiotic stress that affects yerba mate in this extensive agricultural system is the direct solar radiation that causes higher evapotranspiration demand and drought stress that ultimately reduces the plantation productivity (Gortari et al. 2019(Gortari et al. , 2020)).To cope with those stressful situations and gain yield, diverse breeding programs were carried out in yerba mate which eventually obtained cultivars adapted to diverse environmental conditions (Belingheri & Prat Kricun 1997;Prat Kricun 2010;Stein et al. 2014).In addition, traditional mechanical harvesting of yerba mate leaves and twigs cause lesions that reduce its yield and mean life (Kurtz et al. 2014).Soil compaction may cause an anaerobic environment that affects the root-associated microbiome to this crop tree also (Bergottini et al. 2017).In this sense, model agronomical practices were implemented to maintain plants and soil health (Burtnik et al. 2006;Barbaro 2017;Zelada Cardozo & Gonzalez Villalba 2019).However, the multiple stress scenario that affect current yerba mate plantations is expected to be negatively enhanced by the global climate changes (Roeber et al. 2021) leading to major socioeconomic problems associated to an unsustainable system (Montagnini et al. 2011;MECON 2018).An alternative agroforestry approach, such as the return of yerba mate to its adapted forest conditions, is now considered (Dos Santos 2009;Montagnini et al. 2011;Marques et al. 2019;Salas et al. 2019;Pereira Croge et al. 2021), together with the recent study of associated microbes -some of them growth promoters-to the sustainable cultivation of this crop (Bergottini et al. 2017;Laczeski et al. 2020).As main outcomes of those studies in yerba mate, shade conditions decreased stress, influenced nutrient richness at the leaves and flavour, and increased diversity of the root-associated microbiome.
Plant stress is a state in which plants are exposed to unfavourable environmental or biological conditions that lead to increasing demands made upon it and consequently affecting growth, development and crop yields (Mosa et al. 2017).Plant response to stress involves the triggering of mechanisms to sense the stressful signal to enable an optimal growth response.As part of this response, specific transcription factors (TFs) bind to cis-acting regulatory elements (CAREs) at the upstream region of stress responsive genes (Verma et al. 2016).Considering the multiple biotic and abiotic stressful factors that could affect plant survival, development and growth, a growing amount of evidence is emerging to highlight the omnipresence of a crosstalk between the diverse response pathways from different stressor signals.In this sense, phytohormone-mediated regulation of stress response is well documented.In general, abscisic acid (ABA) mediates the response to drought, cold, heat and wounding stress, while salicylic acid (SA), jasmonic acid (JA) and ethylene (ET) mediate the response to biotic stress caused by pathogen infection and pests (Bari & Jones 2009;Shi et al. 2010;Verma et al. 2016).The additional crosstalk between those hormones with auxins and gibberellins (GA) allows plants to cope with stressful situations and sustained growth (Verma et al. 2016).In addition, light influences gene expression and all aspects of plant growth and development, and could act as a stressor and take part in the crosstalk with abiotic and biotic pathway stress responses (Petrillo et al. 2014;Nawkar et al. 2017;Roeber et al. 2021).In this sense, environments with reduced light improve drought stress tolerance and the defense response to biotic invaders that affect plant growth (Roeber et al. 2021).As a significant outcome, light regulates the unfolded protein response to diverse stresses that affect protein folding, in which a set of molecular chaperones is expressed to alleviate stress (Nawkar et al. 2017).
As sessile organisms, plants depend on a complex machinery of specialized proteins to respond to stressful environmental conditions, but also to break through normal development and growth (Vierling 1991;Rajan & D'Silva 2009).Heat shock proteins (HSPs) take part of this machinery as molecular chaperones responsible for protein folding, assembly, translocation and degradation, under different stress conditions and normal cellular processes (Park & Seo 2015).The main plant chaperone families are named according to their approximate molecular weights, such as HSP70 (DNAK), J-protein/HSP40 (DNAJ), HSP60, HSP90, HSP100, and small HSP (sHSP) families (Fragkostefanakis et al. 2015).HSP70 chaperones have a major role in protein quality control and require nucleotide exchange factor proteins and the DNAJ proteins acting as cochaperones for their functions.
The DNAJ proteins recognize unfolded substrates and deliver them to HSP70, stimulating its ATPase activity, which in turn induces conformational changes of HSP70 that stabilizes its interaction with the substrate (Pulido & Leister 2018).Moreover, there is increasing evidence of some HSP70-independent functions of the DNAJ proteins (Rajan & D'Silva 2009;Finka et al. 2011;Pulido & Leister 2018).Members of the DNAJ family possess one or more of the following domains: the DNAJ domain, responsible for binding of to the ATPase domain of HSP70, the Zinc finger domain, the C terminal domain, and the DNAJ-like domain (Pulido & Leister 2018).According to the arrangement of these domains, DNAJ proteins can be classified in the following structural types (A to F): A) contains the DNAJ, Zinc finger and C terminal domains; B) contains the DNAJ and C terminal domains; C) contains only the DNAJ domain; D) contains a DNAJ-like domain; E) contains the Zinc finger domain; and F) contains the C terminal domain (Rajan & D'Silva 2009;Finka et al. 2011;Pulido & Leister 2018).DNAJ types D, E and F are considered HSP70independent chaperones and their evolution from canonical DNAJ types A, B and C was proposed (Pulido & Leister 2018).
Currently, the repertoire of DNAJ genes has been catalogued in diverse plants, showing high number and diversity of functions (Sarkar et al. 2013;Pulido & Leister 2018;Verma et al. 2019).Despite the importance of this stress related genes, the collection of DNAJs in yerba mate and Ilex as a whole remains uncharacterized to date.
To sum up, considering the constitutive stressful environmental conditions that yerba mate plantations are exposed to, an integral knowledge on the genetic basis of stress responses will be helpful for future considerations on the management of this crop.In this sense, the main goals of our study were to comprehensively characterize the DNAJ gene family in yerba mate and to predict possible roles of this family on development and diverse stress responses.For this purpose, we carried out a genomewide identification and structural characterization, proposed an orthology based classification, and performed expression analysis of the DNAJ gene family in this important crop tree.

Identification of yerba mate DNAJs
Trinity-assembled transcriptome sequences from leaves at different stages of a mature tree of Ilex paraguariensis grown in monoculture under full sun radiation and other conventional cultivation conditions, deposited at the National Center for Biotechnology Information (NCBI, <https://ncbi.nlm.nih.gov>)under BioProject PRJNA251985, Accession GFHV00000000 (Aguilera et al. 2018), were used to build a local search database on the software Geneious 9.1.8(Biomatters Ltd.).The plant of I. paraguariensis used here comprises the cultivar 16318 (CA 538 INTA) of the Instituto Nacional de Semillas (INASE, <https://gestion.inase.gob.ar/consultaGestion/>).It is an elite material with about 2000 planted hectares, showing higher yield of green leaves (17,950 kg/hectares) and lower susceptibility to drought.

Structural characterization of yerba mate DNAJs
To further characterize accurate nucleotide (nt) composition and full-length transcript sequences of yerba mate, those detected were submitted to an iterative fine tuning back mapping step involving the 72M next generation sequencing (NGS) paired end reads (101 nt) of yerba mate of the mentioned BioProject PRJNA251985, SRA SRP043293 (Debat et al. 2014b) using the Geneious mapper (five to 10 rounds) at default values (Chapter 10, Geneious 9.0; <https:// assets.geneious.com/documentation/geneious/GeneiousPrimeManual.pdf>) but a word length of 28 and 1% of maximum mismatch per read.Output curated refined and extended transcript sequences were supported by reads contiguity among overlapping reads, pairwise % identity, mean coverage of bases and Q20 values criteria.Coding sequences at transcripts were determined via the open reading frame (ORF) finder tool, subsequently translated and submitted to HMM searches at Pfam via the InterProScan tool to identify and annotate those DNAJ and related domains on proteins.Sequences lacking Pfam DNAJ annotations in the previous step were further compared to Superfamily database via HMM searches as described above, and their domains were properly annotated and checked via in-house BlastP HMM searches (cut off e-01).In addition, sequences lacking Superfamily annotations in the previous step were submitted to in-house BlastP HMM searches (cut off e-01) of DNAJ domains and properly annotated.Putative Yerba mate DNAJs were further classified in structural types (A to F) according to schemes of Pulido & Leister (2018).Subsequently, annotated yerba mate DNAJ proteins were submitted to best BlastP hits searches (cut off e-05) at NCBI Reference Proteins Database taxa sections Viridiplantae and A. thaliana, and I. paraguariensis BioProject PRJNA315513, Accession GEWR00000000 proteome (Fay et al. 2018), and posterior alignments with fulllength protein of retrieved hits via Mafft v7.308 at default values.Validation of the full-length of coding sequences and the domain organization of the annotated yerba mate DNAJ proteins based in homology and orthology criteria, where applicable, considering best hits e-value, pairwise % identity and query coverage, added to global protein alignment features, pairwise % positive BLSM62 and domain organization correspondence.In addition, proteome NCBI-GEWR00000000 was also submitted to DNAJ searches and subsequent steps as described above to further characterize possible hits distinctively present among both yerba mate accessions.Protein sequences of identified yerba mate DNAJ genes were analyzed with the Expasy Compute pI/Mw tool (Gasteiger et al. 2005, <https://web.expasy.org/compute_pi/>) to obtain molecular weights and theoretical isoelectric points.Further, coding sequences of yerba mate DNAJ proteins were examined for polymorphisms, microsatellites -via Phobos 3.12and splice variants in Geneious.

Orthology based classification of yerba mate DNAJs
Phylogenetic clustering analysis of yerba mate DNAJs were conducted at Ensembl Plants (Bolser et al. 2016) by translating fully curated orthology group information of A. thaliana (<https://plants.ensembl.org/Arabidopsis_thaliana/Info/Index>) onto the appropriate yerba mate ortholog.Phylogenetic trees of yerba mate and A. thaliana related protein sequences were constructed via multiple alignments -Mafft v7.308 at default values-submitted to Neighbour Joining (NJ) clustering using a p-distance substitution model and 1,000 bootstrap replicates.Additional translation of information between A. thaliana and yerba mate DNAJs such as gene functional description and gene ontology (GO), average tissue expression status of genes, and abiotic stress (ABS) response status of genes were conducted at TAIR (Huala et al. 2001, <https

Expression analysis in yerba mate DNAJs
The resulting mappings of aligned reads generated to curate the consensus DNAJs identified in yerba mate were employed to calculate transcript expression levels, using coding sequences as reference in Geneious 9.1.8,measured as fragment per kilo base of exon model per million mapped reads (FPKM) values.A clustered heat map of those log2-transformed FPKM values was obtained using Heatmapper (<http://www.heatmapper.ca/expression/>).

Structural characterization of yerba mate DNAJs
Curated nucleotide composition and potential full-length transcripts of the 140 identified yerba mate DNAJ protein hits were obtained throughout an iterative back mapping of sequence reads strategy (Suppl.File 2).Subsequent downstream analysis were supported by the attained overlapping reads, a mean 99.7% pairwise identity, a mean 84.2 X coverage and a mean 99.0%Q20 value associated to the polished consensus sequences (Suppl.Tab. 3 and Suppl.File 2 available at <http://dx.doi.org/10.17632/294x7524bn.1>).Transcripts length ranged from 401 to 8,707 nt (Suppl.Tabs 3 and 4 available at <http://dx.doi.org/10.17632/294x7524bn.1>),mean 1,667 nt, and their largest recognized ORFs varied from 348 to 7,764 nt in length (Tab. 1 and Suppl.Tab. 5 available at <http://dx.doi.org/10.17632/294x7524bn.1>),mean 1,221.4nt.Overall, 140 translated ORF sequences (Suppl.Tab.6 available at <http:// dx.doi.org/10.17632/294x7524bn.1>) were then submitted to Pfam to identify and annotate DNAJ and related domains on proteins by which 90 yerba mate DNAJs were confirmed through their highly     To validate the full-length nature of coding sequences and the domain organization of the 140 annotated DNAJ proteins of yerba mate, best BlastP hits were retrieved from NCBI Reference Proteins Database considering Viridiplantae and A. thaliana taxa, and I. paraguariensis NCBI-GEWR00000000 (Suppl.Tab. 8).Subsequently, protein alignments were performed for each annotated DNAJ protein of yerba mate and the full-length protein of retrieved hits, those of Viridiplantae and A. thaliana harboring database annotated features (Suppl.Tab. 9 and Suppl.File 3 available at <http://dx.doi.org/10.17632/294x7524bn.1>).Identification of true homolog and ortholog proteins of yerba mate DNAJs was achieved based in hits E-value (median e-160), pairwise % identity (mean 79.0%) and query coverage (mean 89.7%), added to global protein alignment features, pairwise % positive BLSM62 (mean 79.2%) and equivalent domain organization (Tab. 1, Suppl.Tabs.8 and 9, and Suppl.File 3 available at <http://dx.doi.org/10.17632/294x7524bn.1>).
Finally, splice variants at coding sequences were identified via transcripts alignments and/ or mapping of sequence reads at four yerba mate DNAJs of types C (2) and E (2), source of different length protein sequences at the same locus (Suppl.File 6 available at <http://dx.doi.org/10.17632/294x7524bn.1>).

Orthology based classification of yerba mate DNAJs
A subsequent phylogenetic clustering of yerba mate DNAJs was conducted at Ensembl Plants by considering well-recognized orthogroups of A. thaliana, that is orthologs and paralogs contained at a particular orthology group, and translating this information onto the corresponding yerba mate ortholog protein (Tab.1).Additional translation of information between A. thaliana and ortholog yerba mate DNAJs such as gene functional description, gene ontology (GO), average tissue expression and abiotic stress (ABS) response status of genes were conducted at TAIR, NCBI AceView and Arabidopsis eFP (Suppl.Tab. 12 available at <http://dx.doi.org/10.17632/294x7524bn.1>).Overall Arabidopsis orthologs to yerba mate DNAJs are well-recognized as DNAJ members according to their TAIR functional description with the exception of 14 loci encoding structural types D and E DNAJs, characterized as such by other approaches (Suppl.Tab. 12 available at <http://dx.doi.org/10.17632/294x7524bn.1>).In addition, with the exception of 12 loci not evaluated to date, total Arabidopsis orthologs to yerba mate DNAJs showed a positive fold change value in gene expression during abiotic stress assays, 70 of them upregulated and constituting ABS responsive genes (Suppl.Tab. 12 available at <http://dx.doi.org/10.17632/294x7524bn.1>).
According to the presented phylogenetic approach, a total of 140 yerba mate DNAJs were classified into 51 distinct and coherent orthogroups (Tab.1).Nomenclature of DNAJs employed here allows to recognize the orthogroup (1-51), the structural type (A-F) and the related number (1-140) of the protein, in that order.Orthogroups 1 to 18 and 32 hold one or more paralog members which comprehend 108 of total yerba mate DNAJs (Tab.1).Rooted phylogenetic trees based on multiple alignments of full length DNAJ proteins of yerba mate and A. thaliana for each of those orthogroups were further built, which retrieved the same protein pair relationships -highly supported-than the ones presented at Table 1, and showed equally supported associations among yerba mate paralogs (Figs.1-4).DNAJ domains of each one-member orthogroup harboring structural types C and D of yerba mate DNAJs were further aligned altogether with their corresponding A. thaliana orthologs and the resultant tree revealed well-supported protein pair relationships, the same than presented at Table 1 (Fig. 3h).The same strategy and results were obtained for one-member orthogroups of type E yerba mate DNAJs carrying Zinc finger domains (Fig. 4f).
Orthogroup 1 is the second largest with 17 DNAJ members of C ( 14) and D (3) structural types (Tab. 1 and Fig. 1a).Eight DNAJ paralogs of varying lengths (239 to 563 AA) harbor sole DNAJ domains at different protein positions (1C3, 1C33, 1C35, 1C87, 1C88, 1C89, 1D91 and 1D104).Particularly, DNAJs with marked similar domain organization were found to cluster together such as those harboring additional Fer4_13 domain (1C24, 1C72 and 1C34) and those with extra DUF3444 domain (1C67, 1C68, 1C74, 1C75 and 1C102), respectively, denoting a common ancestor for each subgroup of 1.Protein 1C79 is particular in having an arrange of two DUF3444 domains which accounts for being one of the largest DNAJs in yerba mate (1,072 AA).Orthogroup 2 is the third largest and contains 15 DNAJ members of types B (9), C (5) and F (1) (Tab. 1, Fig. 1b).Type C DNAJ proteins clustered together, including those with additional DUF1977 domain (2C31, 2C38, 2C51 and 2C105) and the largest 2C60.Protein 2F94 constitutes the unique type F DNAJ in yerba mate, which has a non-functional DNAJ domain lacking the HPD motif but a conserved C terminal domain.Type B DNAJs (2B7, 2B9, 2B10, 2B22 2B26, 2B39, 2B42, 2B61 and 2B81) limited to orthogroup 2 and exhibited similar lengths (mean 333 AA) and domain organization but arranged into three different subgroups.Orthogroup 3 of DNAJs is the largest in yerba mate with 18 members of types A (11) and C (7) (Tab. 1, Fig. 2a).Overall C-type members have varying protein lengths (138 to 535 AA) but grouped together (3C17, 3C27, 3C28, 3C29, 3C48, 3C71 and 3C97).Type A DNAJs are exclusively found in orthogroup 3 of yerba mate and all hold the Zinc finger domain embedded into the C terminal domain (3A1, 3A11, 3A14, 3A19, 3A44, 3A46, 3A54, 3A58, 3A66, 3A69 and 3A84).In addition, those A-type DNAJs exhibited similar lengths and domain organization but grouped into three distinct clusters of paralogs.Orthogroup 4 consists in four C-type DNAJ members (4C21, 4C45, 4C57 and 4C64) of similar lengths (331 to 397 AA) which particularly harbor an extra DNAJ-X domain (Tab.1; Fig. 2b).Orthogroup 5 is formed by three structural type C DNAJs, small in size (151 to 166 AA) and with central DNAJ domains (5C5, 5C23 and 5C30), while orthogroup 7 comprises two C-type members (7C32 and 7C49) with about410 AA and DNAJ domains near the N-terminus of the protein (Tab.1; Figs.2c,e, respectively).Orthogroup 6 is the fourth largest in yerba mate with 12 type C protein members of extreme varying lengths (14-fold; 182 to 2587 AA), all but 6C13 and 6C40 harboring additional domains than DNAJ and displaying distinct domain organizations (Tab.1; Fig. 2d).Protein 6C12 holds two TPR-like domains at central region while 6C63 ports a terminal DUF3395 domain and 6C50, by far the largest DNAJ in yerba mate with 2,587 AA, presents a central GYF_2 domain.Further, the most similar domain organizations of DNAJs also grouped together such as those that include an additional jiv90 domain (6C6, 6C56 and 6C76) and those with two extra and large TPR-like domains (6C68 and 6C78), respectively.Apart from the conserved DNAJ domains, proteins  Figure 4 -a-f.Phylogenetic tree of yerba mate and Arabidopsis thaliana ortholog type E DNAJ proteins (left), clustered heat map of the expression profile of yerba mate DNAJs (middle) connected to the domain organization of those proteins (right).Full-length protein sequences were aligned via Mafft, submitted to NJ clustering using p-distance, 1,000 bootstrap replicates (support values denoted at the tree) and rooted with the group consensus sequence (c).log2-transformed FPKM values of the expression of yerba mate DNAJs are represented left to the heat map.AA, aminoacides -a-e.Orthogroups 9, 10, 11, 16 and 18, respectively; f.One-member orthogroups of structural type E. (13C59, 13C62 and 13C86), 15 (15C4 and 15C70) and 32 (32C2 and 32C15) are defined by a small size (mean 175 AA) and a single central DNAJ domain (Tab. 1;Figs. 3c,e and g,respectively).

Genomic mapping of yerba mate DNAJs
A total of 140 curated yerba mate DNAJ transcript sequences were mapped against the 32,521 genomic scaffolds of I. paraguariensis NCBI-GCA_905181385.1.In this sense, all the DNAJ transcripts were found to have a genomic counterpart: 129 were unambiguously mapped to a single genomic region while three mapped to highly similar genomic regions in two or more scaffolds in addition to eight which different parts mapped to different scaffolds, mostly at the ends of those genomic sequences (Suppl.Tab.14; Suppl.File 7 available at <http://dx.doi.org/10.17632/294x7524bn.1>).Particularly, large scaffolds 4, 6, 38, 61, 188 and 275 carry two or more DNAJ genes (Suppl.Tab.14; Suppl.File 7 available at <http://dx.doi.org/10.17632/294x7524bn.1>).DNAJ genes ranged from 0.4 (18E139) to 46.6 (1C34) kbp in length, with 94 genes under the mean of 8.4 kbp (Suppl.Tab. 14 and Suppl.File 7).In addition, the exon-intron structure of DNAJ genes of yerba mate was found to be deeply variable, Rodriguésia 74: e00492022.2023 ranging from none (13 genes) to 21 introns (6C50), with 85 genes under the mean of 5 introns (Suppl.Tab.14; Suppl.File 7 available at <http://dx.doi.org/10.17632/294x7524bn.1>).A minor positive linear association was found among gene size and intron number at yerba mate DNAJs according to the R square (R 2 ) coefficient value (0.39), however the association was strong (R 2 =0.99) among gene size and intron fraction at those DNAJ genes.In addition, a negligible positive linear associations (R 2 < 0.01) were found among intron number and expression level, added to intron fraction compared to expression level at yerba mate DNAJs genes.
Finally, promoter sequences 1.5 kbp in length upstream of the putative translation start site of each yerba mate DNAJ gene were identified and annotated at the corresponding genomic scaffolds (Suppl.Files 7 and 8) and submitted to searches for regulatory elements.As a main result, all the 140 DNAJ gene loci were predicted to have CAREs at their promoter regions (Suppl.Tab. 15 and Suppl.File 9).According to their functions those 117 different CAREs could be further grouped into six major categories such as 1-stress response (24), 2-hormone response (20), 3-light response (31), 4-cellular development ( 13), 5-core cis-element (4) and 6-unknown function ( 25), with a mean of 26.6 different CAREs at each yerba mate DNAJ gene promoter (Suppl.Tab.16 available at <http:// dx.doi.org/10.17632/294x7524bn.1>).Stress response accounts for a 34.8% of the total CAREs at promoter of yerba mate DNAJs followed by light response (20.7%), hormone response (17.7%), core cis (11.5%), cellular development (9.4%) and unknown function (5.9%) elements.Drought stress response CAREs such as MYB, MYC, MBS, DRE and their related sequences were present at 138 DNAJ genes.MBS is also involved in high salt and low temperature stresses, and always linked to other drought CAREs was found at 56 DNAJ genes.LTR and different HSE sequences, involved in low temperature and heat stress responsiveness were found at 44 and 17 yerba mate DNAJ genes, respectively.In total, 82 DNAJs were revealed as low temperature responsive genes via MBS and/ or LTR CAREs.In addition, 96 DNAJs harboring ARE comprised anaerobic responsive genes.Defense stress responsive CAREs such as as-1, W box, WRE3, WUN-motif, TC-rich repeats, CCAATbox, box S and AT-rich sequence, involved in the protection of plants from attacks by diverse pests and pathogens and/or wounding by herbivores and environmental mechanical stresses, were largely present at 133 yerba mate DNAJ genes.It is worth to mention here our findings of those sequences at yerba mate transcriptomes (GFHV00000000, GEWR00000000) that ultimately belong to aphids, bacteria from aphids, psyllids and fungi commonly linked to this crop (Suppl.Tab. 2 available at <http:// dx.doi.org/10.17632/294x7524bn.1>).Light responsive CAREs were found at 139 DNAJ genes, and major elements included Box 4 (112), G-Box (102), GT1-motif (73), , GATAmotif (53) and  among others.Of the 20 types of hormone responsive CAREs, those that involve ABA such as ABRE and related sequences (ABRE2, ABRE3a, ABRE4, AT ~ABRE) were major, ocurring at 94 yerba mate DNAJ genes.Cis elements CGTCA-motif and TGACG-motif involved in the JA responsiveness were found at 84 genes, while the ET responsive element ERE harbor 80 genes, and SA responsiveness CAREs such as TCA-element and SARE occurred at 49 yerba mate DNAJ genes.GA responsive elements P-box, GARE-motif and TATC-box were found at 70 genes, while auxin responsive elements such as AuxRR-core, AuxRE, TGA-box and TGA-element, were present at 43 DNAJ genes.

Genome wide identification and structural characterization of yerba mate DNAJs
The DNAJ gene family is characterized by highly conserved protein members among diverse organisms (Sarkar et al. 2013;Pulido & Leister 2018;Verma et al. 2019) such that true comparisons, inferences and translational genomics of annotations and functions are possible.In this context, DNAJs of I. paraguariensis could be identified according to conserved domains defining HSP40 proteins (DNAJ, DNAJ central, DNAJ C terminal, DNAJ-X) via a search strategy based on HMM profile of those Pfam domains (Mistry et al. 2021), or by linking shared features with confirmed DNAJ proteins of Arabidopsis (Pulido & Leister 2018;Zhang et al. 2018) directly through a reference-based search approach.Hence, in agreement to those identified and further characterized NGS-derived sequences, the DNAJ gene family in yerba mate comprises at least 140 distinct members.Similar genome wide analysis of the DNAJ family were accomplished in representative taxa of such diverse range as monocots, rosids and asterids which found 104 to 115 genes in rice (Sarkar et al. 2013;Luo et al. 2019), 89 to 120 in Arabidopsis (Miernyk 2001;Rajan & D'Silva 2009;Finka et al. 2011;Zhang et al. 2018) and 76 in chili pepper (Fan et al. 2017), respectively, always from structural types A to D proteins.
Further, in yerba mate the central HPD motif at the DNAJ domain is consistently present at the 96 DNAJ proteins of A, B or C types, classified here according to the stringent criteria of Rajan & D'Silva (2009) on the occurrence of that motif.These 96 DNAJs that could interact with HSP70 chaperones mainly via HPD conserved motif are also classified as HSP70 dependent proteins, according to the criteria of Pulido & Leister (2018).On the contrary, those DNAJ proteins of yerba mate with DNAJ domains that lack the central HPD motif are recognized as HSP70 independent DNAJ-related ones, such as the 16 type D proteins, which still may support the folding of substrates according to those authors.
In their work, Pulido & Leister (2018) also recognized novel HSP70 independent DNAJ-related proteins in Arabidopsis, including conventional structural type D proteins harboring DNAJ-like domain and others.Regarding that study, novel types E and F DNAJs harbor a single Zinc finger or C terminal domains, lacking the function of protein quality control of the DNAJ domain but being still important for substrate binding, which may have evolved from types A or B DNAJs, respectively.In view of this last update, Arabidopsis currently reaches 157 DNAJ genes and holds the most extensive classification of DNAJs in plants regarding types A to F (Pulido & Leister 2018;Zhang et al. 2018).In agreement to this comprehensive classification, that we followed throughout this work, we recognized and characterized additional HSP70 independent DNAJ-related proteins in yerba mate beyond type D. Those novel DNAJs of yerba mate of type E (27) harbor mainly a single Zinc finger domain while the single type F protein at orthogroup 2 carries a C terminal domain and a less conserved DNAJ-like domain lacking the HPD tripeptide.Hence and according to their structures, overall 140 yerba mate DNAJ proteins could be classified into types A to F (11 A, 9 B, 76 C, 16 D, 27 E, 1 F) comparable to Arabidopsis (Pulido & Leister 2018;Zhang et al. 2018), displaying potential variable roles in the control of protein homeostasis.

Orthology based classification and expression analysis of yerba mate DNAJs
Yerba mate DNAJ proteins showed medium to high global similarity and equivalent domain organization to corresponding Arabidopsis orthologs despite the 125 million years of the split of Asterids and Rosids (Guyot et al. 2012) to which these species belong to, respectively.Total Arabidopsis orthologs to yerba mate DNAJs are well-recognized DNAJ members according to their gene functional description complemented to GO data at TAIR (Huala et al. 2001) and by the detailed contributions of diverse authors (Miernyk 2001;Rajan & D'Silva 2009;Finka et al. 2011;Pulido & Leister 2018;Zhang et al. 2018).The subsequent phylogenetic clustering approach considering those total 140 yerba mate DNAJs and ortholog proteins of Arabidopsis properly grouped at Ensembl Plants (Bolser et al. 2016) showed 51 distinct and coherent DNAJ orthogroups in yerba mate which shed light on the evolution of structural types and domains organization added to paralog relationships in compound DNAJ clades.As a whole, this approach contributed to the understanding of the DNAJ proteins scenario in yerba mate, and is comparable in part to those of Pulido & Leister (2018) and Zhang et al. (2018) for Arabidopsis which still wait for a unified orthology based classification of DNAJs.
Additionally, the widespread correspondence between yerba mate and Arabidopsis ortholog DNAJ proteins allowed a precise interspecific translation of information between the crop tree and the model species, such as prediction of subcellular localization of DNAJ proteins and the ABS responsive status of DNAJ genes.At this regard, according to GO data at TAIR (Huala et al. 2001) and Zhang et al. (2018), Arabidopsis DNAJ proteins orthologs to yerba mate can function in different compartments, i.e. the cytosol, nucleus, organelles.Those DNAJs with similar localization may come from different clades and are involved in similar cellular events but with different roles (Zhang et al. 2018).Considering abiotic stress assays including cold, drought, genotoxic, heat, osmotic, oxidative, salt, UV-B and wounding treatments, those 128 Arabidopsis orthologs to yerba mate DNAJs with ABS data showed a positive fold change value in gene expression, and in particular 70 of those DNAJs were upregulated and further considered as ABS responsive genes (Kilian et al. 2007).
Rodriguésia 74: e00492022.2023 As a main outcome, expression profiles based on RNA-seq showed that overall 140 DNAJs were expressed in leaves of yerba mate.In addition, differences in gene expression were revealed, with 61 yerba mate DNAJs above the mean, and highest expression levels associated to 15 DNAJs of nine diverse clades (1,3,5,10,11,13,18,39,47) and structural types (A, C, D, E).Arabidopsis DNAJs orthologs to those of yerba mate with data on average tissue expression, such as AT2G42750:1C34, AT5G22060:3A19:3A58:3A69, AT2G17880:5C23:5C30, AT1G64500:10E131, AT3G47650:11E128, AT4G13830:13C59, AT5G43260:18E139 and AT5G02160:39E124, also showed high expression levels, while AT1G56300:3C17 and AT3G14200:3C48 are well expressed ABS genes, compared to moderately and low expressed DNAJs in this model species (Thierry-Mieg & Thierry-Mieg 2006;Kilian et al. 2007).Considering the 19 multiple DNAJ clades of yerba mate, mean gene expression levels above the overall mean were found at orthogroups 3 (types A or C), 5, 7 and 14 (type C), and 9, 10, 11 and 18 (type E) while most type B and the single F DNAJs exclusives of orthogroup 2 are the less expressed.In addition, within orthogroup differences at gene expression were also found at all compound DNAJ clades of yerba mate, even at related paralogs of the same structural type and similar domain organization such as those of the largest orthogroups 1, 2 and 3, which are expected to have redundant functions as occurs in Arabidopsis (Zhang et al. 2018).The availability of additional RNA-seq datasets of yerba mate derived from diverse environmental and stress conditions could provide further insights into the diversity and expression of DNAJs in this crop and complement the foundational baseline landscape of yerba mate DNAJs reported here.

Genomic mapping of yerba mate DNAJs
As a whole, 140 yerba mate DNAJ genes showed a great variability in nucleotide composition, length and their exon-intron structure comparable to other plant species such as Capsicum annuum, A. thaliana and Oryza sativa (Fan et al. 2017;Zhang et al. 2018;Luo et al. 2019).Close DNAJ paralogs of yerba mate that hold similar protein domain organization were found to share a comparable gene structure, such as 1C67:1C68:1C74:1C75:1C102, but the contrary also occurred, i.e. 2C31:2C38:2C51:2C105, as reported in Arabidopsis (Zhang et al. 2018).On the other hand, increased expression levels of yerba mate DNAJ genes were somewhat accompanied to increased intron number or fraction at those genes, in contrast to strong associations reported in Arabidopsis and rice considering a wide gene approach (Ren et al. 2006).Size of introncontaining yerba mate DNAJ genes strongly associated to the intron fraction of that gene.Further, the percentage of intronless DNAJ genes in yerba mate (13; 9.9%) was the lowest if compared to 32.9% in chili pepper (Fan et al. 2017), 22.2% in thale cress (Zhang et al. 2018) and 20.0% in rice (Luo et al. 2019).To stand out, yerba mate DNAJ genes with few introns (1-3) accounted for 45 members, and 85 genes were under the overall mean of five introns (0-5).This is important to the DNAJ genes of yerba mate since intronless or fewer intron genes were found to be rapidly regulated to respond timely to various stresses (Jeffares et al. 2008).
The CAREs are non-coding DNA sequences in gene promoters that control the transcription of their accompanying genes, and have been reported elsewhere, inclusive at DNAJ genes (Hernandez-Garcia & Finer 2014;Fan et al. 2017;Huang et al. 2021).Promoter regions of total 140 yerba mate DNAJs harbor diverse CAREs involved in the response to stresses (drought, high salt, low temperature, heat, anaerobic, defense), hormones (ABA, JA, ET, SA, GA, auxins) and light simultaneously, constituting the 73.2% of total cis elements in this family, while another 20.9% of those CAREs with known function are related to transcription efficiency or growth.These results are consistent with an scenario were crops require to adapt to such adverse situations and grow, by evolving crosstalking mechanisms among hormones, ligth and diverse stresses (Verma et al. 2016;Nawkar et al. 2017;Roeber et al. 2021).As a result of responding to diverse environmental and endogenous factors, yerba mate DNAJs may be involved in a variety of stressful and physiological processes and in coordinating plant growth under adverse conditions, as reported previously in other eukaryotes (Finka et al. 2011).As a whole, our study suggests that the 140 DNAJs characterized of yerba mate are potentially responsive genes to cope with the well-known major stresses that persistently affect this crop in the context of an universal full-Rodriguésia 74: e00492022.2023 sun monoculture practice, such as those abiotic (Rakocevic et al. 2008;Gortari et al. 2019Gortari et al. , 2020;;Salas et al. 2019), biotic (Sosa et al. 2011;Rybak et al. 2014;Bejerman et al. 2017;Bergottini et al. 2017) and mechanical (Kurtz et al. 2014) shocks.We hypothesize that the large repertoire of DNAJ genes and these cis-acting regulatory elements are concomitant to the survival of the yerba mate plant during those multiple stresses.While in shade conditions DNAJs may be involved primarily in promoting growth in this crop tree, we hypothesize that DNAJs assist to yerba mate survival during multiple stresses associated to the current dominant agroecosystem.Recently, Salas et al. (2019) measured the behaviour of yerba mate plantlets under a light gradient varying from fullsun culture to conditions of Atlantic Forest.This experiment offered the basis to test our hypothesis, considering mature trees.In sum, our results introduce for the first time an exhaustive analysis of yerba mate DNAJs and provide functional and evolutionary insights into this important family of plant chaperones.

Figure 1 Figure 3
Figure 1 -a-b.Phylogenetic tree of yerba mate and Arabidopsis thaliana ortholog DNAJ proteins (left), clustered heat map of the expression profile of yerba mate DNAJs (middle) connected to the domain organization of those proteins (right).Full-length protein sequences were aligned via Mafft, submitted to NJ clustering using p-distance, 1,000 bootstrap replicates (support values denoted at the tree) and rooted with the group consensus sequence.log2-transformed FPKM values of the expression of yerba mate DNAJs are represented left to the heat map.AA, aminoacides -a.Orthogroup 1; b.Orthogroup 2.