Genome-wide identification and characterization of R2R3-MYB genes in Medicago truncatula

Abstract MYB is a large family of plant transcription factors. Its function has been identified in several plants, while there are few reports in Medicago truncatula. In this study, we used RNA-seq data to analyze and identify R2R3-MYB genes in the genome of Medicago truncatula. Phylogenetic analysis classified 150 MtMYB genes into 21 subfamilies with homologs. Out of the 150 MtMYB genes, 139 were distributed among 8 chromosomes, with tandem duplications (TD) and segment duplications (SD). Microarray data were used for functional analysis of the MtMYB genes during growth and developmental processes providing evidence for a role in tissues differentiation, seed development processes, and especially the nodulation process. Furthermore, we investigated the expression of MtMYB genes in response to abiotic stresses using RNA-seq data, which confirmed the critical roles in signal transduction and regulation processes under abiotic stress. We used quantitative real-time PCR (qRT-PCR) to validate expression profiles. The expression pattern of M. truncatula MYB genes under different abiotic stress conditions suggest that some may play a major role in cross-talk among different signal transduction pathways in response to abiotic stresses. Our study will serve as a foundation for future research into the molecular function of M. truncatula R2R3-MYB genes.


Introduction
Myeloblastosis (MYB) genes are one of the largest transcription factors families in the plant kingdom (Romero et al., 1998;Rosinski and Atchley, 1998). MYB typically contain an MYB-binding domain at the N-terminus, composed of 1-4 imperfect repeats, with approximately 51-52 amino acid residues of incomplete conserved peptides encoding three a-helices (Kanei-Ishii et al., 1990;Lipsick, 1996;Stracke et al., 2001). These three a-helices form a helix-turn-helix (HTH) structure and fold with three relatively conserved tryptophan residues, separated by 18-19 amino acid residues of regular arrangement, and further participate in the formation of hydrophobic interactions (Ogata et al., 1995). The structure of the MYB domain reveals that the HTH interacts with the major groove of DNA (Ogata et al., 1994). MYB-containing genes have a diverse number of MYB proteins containing incomplete repeats and are divided into four categories, 1R-MYB, R2R3-MYB, R1R2R3-MYB, and 4R-MYB (Jin and Martin, 1999).
In the MYB gene family, R2R3-MYB is the largest category in plants and yeast (Martin and Pazares, 1997;Jin and Martin, 1999). To date, 126 R2R3-MYB genes have been identified in Arabidopsis (Stracke et al., 2001), 157 in maize (Du et al., 2012a), 244 in soybean (Du et al., 2012b), 205 in Gossypium raimondii (He et al., 2016), and 166 in cassava (Liao et al., 2016). R2R3-MYB transcription factors are involved with abiotic stress response, reactive oxygen species signaling pathways, secondary metabolism, and hormone signaling pathways. AtMYB41 from Arabidopsis is induced in response to high salinity, drought, cold, and abscisic acid (Lippold et al., 2009). AtMYB102 are rapidly induced by osmotic stress and abscisic acid (ABA) treatment (Denekamp and Smeekens, 2003). OsMYB55 is induced in rice by high temperature and over-expression of OsMYB55 resulted in improved plant growth under high temperature (Elkereamy et al., 2012). Meanwhile, OsMYB91 plays a role in plant growth regulation and salt stress tolerance in rice (Zhu et al., 2015). TaMYB4 of wheat is induced by salicylic acid, ethylene, abscisic acid, and methyl jasmonate, demonstrating a role of TaMYB4 in response to biotic stress (Al-Attala et al., 2014). GbMYB5 from Gossypium barbadense is positively involved in response to drought stress during plant development (Chen et al., 2015). GbMYBFL, involving in flavonoid biosynthesis, is most closely related to R2R3-MYB and displays high similarity to MYB from other plants (Zhang et al., 2017).
Medicago truncatula is a model plant for genetic research of legume plants (Young and Udvardi, 2009), as it has a small genome, high genetic transformation efficiency, self-pollination, nitrogen fixing, along with high biological diversity. Crop improvement of legume breeding has become an important subject (Lee et al., 2017). MtERF and MtMAPKKK gene families of M. truncatula are valuable for characterizing molecular function to improve stress tolerance in plants Shu et al.. 2016). However, the R2R3-MYB family is poorly identified on a genomic level in M. truncatula.
In this study, we performed a genome-wide analysis of the R2R3-MYB gene family in M. truncatula, including multiple alignment analysis, phylogenetic analysis, chromosome localization, gene duplication analysis, and expression profiling. We used quantitative real-time reverse transcription (qRT-PCR) experiments to compare and analyze five stress treatments (ABA, cold, freezing, drought, and salt) with a control treatment. Our study will serve as a foundation for future research into the molecular function of M. truncatula R2R3-MYB genes.

Materials and Methods
Plant growth and treatment M. truncatula (cv. JemalongA17) seeds were grown in a growth chamber (Conviron E15), between 18ºC (night) and 24ºC (day), with humidity ranging from 60% to 80%, and a 14/10 h light-dark period (light, 06:00-20:00). The seedlings were irrigated with half-strength Hoagland solution once every other day. At 8 weeks, M. truncatula were randomly divided into six groups for stress treatments. Plants treated with cold stress (B group) and freezing stress (C group) were transferred to chambers set at 4 ºC and -8 ºC, respectively. Plants treated with drought stress (D group) and salt stress (E group) were respectively treated with 300 mM mannitol and 200 mM NaCl solutions. Plants in the ABA (F group) treatment group were sprayed with 100 mM ABA solution. The control (untreated, A group) and treated (B-F groups) seedlings were harvested 3 h after treatment. For each group, five randomly chosen whole seedlings were pooled to form a biological replicate. All plant samples were frozen in liquid nitrogen and stored at -80 ºC until use.
Phylogenetic analysis of the M. truncatula R2R3-MYB gene family Phylogenetic analysis was performed using MEGA (Version 5.0), and constructed using neighbor-joining (NJ) methods. The NJ method used the following parameters: Poisson correction, pair-wise deletion, and 1000 bootstrap analysis for statistical reliability (Tamura et al., 2011).

Chromosomal location and gene duplication of R2R3-MYB gene family
The location of R2R3-MYB genes were mapped to different chromosomes using the Circos software (http://circos.ca/) (Krzywinski et al., 2009). If two genes with similarities of more than 85% were separated by four or fewer gene loci, they were identified as tandem duplications (TD). Otherwise, they were classified as segmental duplications (SD). The duplications with R2R3-MYB genes were identified by plant genome duplications (PGDD, http://chibba.agtec.uga.edu/duplication/) (Lee et al., 2012, and duplicated genes between different chromosomes or loci were linked in the diagrams.
Expression analysis of R2R3-MYB genes in plant growth and development using high throughput data We studied the expression analysis of R2R3-MYB genes during growth and development and under different abiotic stress treatments. Gene expression data were downloaded from the Medicago truncatula Gene Expression Atlas (MtGEA) Project (MtGEA, http://mtgea.noble.org/v3/) (Benedito et al., 2008). Genome-wide transcriptome data from M. truncatula in different tissues during development were downloaded from the NCBI short read archive database (SRA database) (http://www.ncbi.nlm.nih.gov).

Expression analysis of R2R3-MYB genes under different abiotic stress treatment conditions
Under six different abiotic stress treatments, MtMYB gene expressional values were evaluated using the TopHat (Trapnell et al., 2009) and Cufflinks (Trapnell et al., 2012) software. The data were analyzed, clustered, and displayed using the ggplot2 of R software (Version 3.1.0), as our previous research described .
Quantitative reverse transcription PCR (qRT-PCR) analysis of R2R3-MYB genes expressed in M. truncatula Ten R2R3-MYB genes were selected and quantitative primers were designed based on genes sequences, while the ACTIN and GAPDH genes served as reference genes (see Table S1). Total RNA was extracted from M. truncatula grown under the six conditions (control, ABA, drought, salt, cold, and freezing) using a total RNA kit (Tiangen, Beijing, China) according to the manufacturer's instructions. The RNA extracts were reverse transcribed to cDNA, using a PrimeScript RT reagent Kit (Toyobo, Shanghai, China). qRT-PCR was performed using the Light-Cycler ® 96 System (Roche, Rotkreuz, Switzerland) with SYBR Premix Ex TaqTM II (Toyobo, Shanghai, China). The experiments were repeated for three biological replicates and the PCR conditions were set as follows: 95 ºC for 2 min, 40 cycles of 95 ºC for 30 s, 55 ºC for 30 s, and 72 º for 1 min. The fold change value was calculated using the expression abundances, which based on the 2 -DDCT method.

Identification and classification of R2R3-MYB genes in M. truncatula
To identify R2R3-MYB genes in M. truncatula, Arabidopsis R2R3-MYB gene sequences were used as queries with e-value set as 1E-3. The gene name, gene locus, chromosome location, amino acid sequence, introns, family groups, and isoelectric point (pI) are described in Table 1. Out of the 150 MtMYB genes, 139 genes were distributed across 8 chromosomes of M. truncatula. The remaining 11 MtMYB genes were MtMYB5,6,7,13,14,16,34,35,36,37,and 52. The length of these 139 R2R3-MYB ranged from 194 to 1514 amino acids, with an intron distribution of 0-12, and the isoelectric point were distributed from 4.77 to 9.93.

Phylogenetic analysis of the R2R3-MYB genes in M. truncatula
To investigate the evolutionary relationships of R2R3-MYB genes, we performed multiple sequence alignment and phylogenetic analysis. MtMYB transcription factors were queried against AtMYB transcription factors using BLASTP, and we constructed phylogenetic trees of the MtMYB transcription factor family using ClustalW2 and MEGA. MtMYB genes were divided into 21 subfamilies ( Figure 1) consistent with the distribution of AtMYB gene family members. The subfamily 1 has six members, subfamily 6 has 14 members, subfamily 13 has seven members, subfamily 14 has ten members, and subfamily 18 has eleven members. This is similar to the distribution of the Arabidopsis subfamily members. R2R3-MYB transcription factors appeared to be conserved in plants as the evolution of M. truncatula is present in the evolution of Arabidopsis.

Chromosomal location and gene duplication of R2R3-MYB gene family
To determine the evolution and expansion of MYB genes, we used Circos software to construct the distribution of MYB genes across chromosomes ( Figure 2). Out of the 150 R2R3-MYB genes, 139 were distributed across 8 chromosomes, mainly chromosome 1, 4, 5, and 7 (MtChr1, MtChr4, MtChr5 and MtChr7, respectively). There were 28 R2R3-MYB genes located on MtChr5 alone. The fewest number of genes were found on MtChr6 with ten R2R3-MYB genes. Using sequence alignment, mainly through gene duplication, 128 out of 139 genes were duplicated and divided into two categories. There were 68 segment duplications (SD) caused by the amplification of R2R3-MYB transcription factor members on different chromosomes and 60 tandem duplications (TD) resulting from the generation of R2R3-MYB transcription factor gene clusters. The SD and TD genes were mainly found on MtChr4, MtChr5, MtChr6, MtChr7, and MtChr8, while other chromosomes only contained SD genes. The regions containing TD genes were hot regions of gene distribution. These duplications may have led to the expansion of the MtMYB gene family in the M. truncatula genome.

Expression analysis of R2R3-MYB genes in growth and development
The expression information of 71 R2R3-MYB transcription factors were extracted and analyzed by cluster analysis (Figure 3 and Table S2) using the annotation information of microarray data based on MtGEA. We clustered the 71 R2R3-MYB transcription factors into four groups (A-D). Group A was mainly concentrated in the roots and nodules of M. truncatula. There was only low expression in other tissues and during developmental processes. Group B was mainly expressed in the development of seed. Group C genes were expressed in various organ tissues. These results indicated that they were both involved in tissues construction process of M. truncatula. Finally, Group D genes were expressed at a low level in various tissues and during developmental processes.
The transcriptome sequencing data of M. truncatula were downloaded from the NCBI SRA database, and the gene expression level was obtained using TopHat2 and Cufflinks analyses. We extracted 67 R2R3-MYB transcription factors (Figure 4 and Table S3). Multiple genes with high expression in the roots and nodules were classified into Group E, while Group F contained numbers of paralog genes, which were mainly up-regulated in the flower, the carps, and inflorescence.
Expression analysis of R2R3-MYB genes under different abiotic stress treatment conditions RNA-seq was used to investigate the expression of R2R3-MYB transcription factors in response to abiotic stress of M. truncatula. Out of the 150 MtMYB genes, 64 were expressed in response to abiotic stress (cold, freezing, drought, salt and ABA) ( Figure 5 and Table S4). In the control treatment, multiple genes were down-regulated or showed no change in expression, yet few genes were upregulated. We compared genes regulated in response to Analysis of R2R3-MYB in Medicago 613 abiotic stress to the control treatment and found genes highly up-regulated in response to cold and freezing stresses (50/64, 78.1%). These genes interacted with their paralog genes showing co-expression. In response to drought, high salt, and ABA stresses, multiple genes showed opposing expression and co-expression of the pairs of paralog genes. Significantly, most of these genes showed only low expression or no expression. In particular, some genes were down-regulated and co-expressed in response to ABA stress. It was suggested that the expression of these transcription factors are induced by ABA hormones, which regulate the downstream response genes and affect the response of plants to drought stress and salt stress.

qRT-PCR validation of R2R3-MYB genes expression
To validate the reliability of the RNA-seq data under abiotic stress in M. truncatula, we selected 10 R2R3-MYB transcription factors (MtMYB010, MtMYB011, MtMYB012, MtMYB013, MtMYB014, MtMYB054, MtMYB090, MtMYB100, MtMYB108, and MtMYB116) from M. truncatula for qRT-PCR validation. The expression levels of the RNA-seq data and qRT-PCR gene expression analysis resulted in a correlation coefficient of 0.85 ( Figure 6). These results suggest the RNA-seq data are highly reliable.

Discussion
To date, R2R3-MYB transcription factors have been identified in plants, including Arabidopsis (126) (Yanhui et al., 2006), maize (157) (Du et al., 2012a), rice (102) (Yanhui et al., 2006), Populus (192) (Wilkins et al., 2009), and cassava (166) (Liao et al., 2016). The M. truncatula genome has been sequenced, yet the R2R3-MYB transcription factors have not been researched. This study performed genome-wide analysis of the R2R3-MYB transcription factors in M. truncatula. We identified 150 R2R3-MYB genes in M. truncatula and compared these with other plants. The R2R3-MYB genes of M. truncatula are similar to their Arabidopsis, maize, and cassava counterparts. The results indicated that R2R3-MYB transcription factors are highly conserved in plants. Cannon et al. (2004) have shown gene duplication was closely related to plant evolution and played an important role in the gene amplification. Gene duplication analysis showed the same subfamily members were located on different chromosomes, such as the 21 subfamily gene members. These genes were distributed across all chromosomes and occurred as TD and SD. These results were consistent with the results of the phylogenetic tree.
Furthermore, distribution of other subfamily members was confirmed based on clustering in the phylogenetic tree. These results suggest that the R2R3-MYB genes in M. truncatula are highly conserved as a result of gene duplication. The duplication pattern of R2R3-MYB genes, are consistent with the expression of R2R3-MYB genes in tomato (Zhao et al., 2014) and Populus (Wilkins et al., 2009). Tandem gene duplication was a major driver of gene expansion in M. truncatula.
MYB genes play a role in plant development and stress tolerance (Martin and Pazares, 1997). We have shown that R2R3-MYB genes were typically expressed in the roots, nodules, seedpods, and flowers. Compared with soybean and Arabidopsis, differential expression was observed during flower development, root formation, and seed development for R2R3-MYB genes (Du et al., 2012b). Nodulation is the result of a symbiosis between legumes and rhizobial bacteria in soil. Libault et al. (2009) have reported a gene named Control of Nodule Development (CND), encoding an MYB transcription factor gene. When the CND gene is silenced, nodulation is reduced (Libault et al., 2009). These results indicate that the MYB 618 Li et al. transcription factors may play a major role in regulation of legume-specific nodulation.
MYB transcription factor genes have been investigated as regulators for plant responses (Ambawat et al., 2013). Their results show that MYB genes are involved in response to various abiotic stresses in higher plants. Herein, we found that MYB genes were not only down-regulated, but in some cases up-regulated in response to drought and Analysis of R2R3-MYB in Medicago 619 The expressional values of 71 R2R3-MYB genes were retrieved from MtGEA, and they were normalized and used as input, red represents high expressional levels, while blue represents low expressional level.   salt stress. A positive response of MYB genes following drought stress, salt stress, and ABA-induced stress has been observed in Arabidopsis (AtMYB002, AtMYB060/AtMYB094, AtMYB044, and AtMYB096) (Cominelli et al., 2005;Jung et al., 2008;Seo et al., 2009;Urao and Shinozaki, 1993), Boea crassifolia (BcMYB1) (Chen et al., 2005), and Saccharum officinarum (ScMYBAS1) (Prabu and Prasad, 2012). However, MtMYB genes were up-regulated in response to cold and freezing stresses, while the opposite is observed in Arabidopsis. Interestingly, our result agrees with the expression profile of cotton in response to drought and salt stress (He et al., 2016). Both cotton and M. truncatula are diploid and their duplication can be divided into TD and SD. The expression pattern of M. truncatula MYB genes under different abiotic stress conditions suggest that some may play a major role in cross-talk among different signal transduction pathways in response to abiotic stresses.

Conclusions
In summary, we have identified 150 MYB genes in M. truncatula, which were classified into 21 subfamilies based on phylogenetic analysis. Meanwhile, their expression profiles were investigated using microarray and RNA-seq. The results revealed regulatory roles in plant growth and tissue development, and especially nodule development. In addition, we explored the role of the MYB genes in response to abiotic stresses. Our results suggested MYB transcription factors broadly participate in abiotic stress response of M. truncatula, whose function can be carefully explored in future. Table S2 -Expression data of R2R3-MYB transcription factors in various tissues and development processes. Table S3 -Expression data of R2R3-MYB transcription factors in various tissues. Table S4 -Expression data of R2R3-MYB transcription factors response to abiotic stresses.

Associate Editor: Hong Luo
License information: This is an open-access article distributed under the terms of the Creative Commons Attribution License (type CC-BY), which permits unrestricted use, distribution and reproduction in any medium, provided the original article is properly cited.
Analysis of R2R3-MYB in Medicago 623