Positive selection, molecular recombination structure and phylogenetic reconstruction of members of the family Tombusviridae: Implication in virus taxonomy

A detailed study of putative recombination events and their evolution frequency in the whole genome of the currently known members of the family Tombusviridae, comprising 79 accessions retrieved from the international databases, was carried out by using the RECCO and RDP version 3.31β algorithms. The first program allowed the detection of potential recombination sites in seven out of eight virus genera (Aureusvirus, Avenavirus, Carmovirus, Dianthovirus, Necrovirus, Panicovirus, and Tombusvirus), the second program provided the same results except for genus Dianthovirus. On the other hand, both methods failed to detect recombination breakpoints in the genome of members of genus Machlomovirus. Furthermore, based on Fisher’s Exact Test of Neutrality, positive selection exerted on protein-coding genes was detected in 17 accession pairs involving 15 different lineages. Except genera Machlomovirus, and Panicovirus along with unclassified Tombusviridae, all the other taxonomical genera and the unassigned Tombusviridae encompassed representatives under positive selection. The evolutionary history of all members of the Tombusviridae family showed that they segregated into eight distinct groups corresponding to the eight genera which constitute this family. The inferred phylogeny reshuffled the classification currently adopted by the International Committee on Taxonomy of Viruses. A reclassification was proposed.


Introduction
RNA recombination is one of the major factors responsible for the generation of new RNA viruses and retroviruses. The biological mechanisms of recombination differ across organisms, but in broad terms recombination results in the creation of mosaic sequences where the evolutionary history at each site may be different. Recombination, defined as the exchange of genetic information between two nucleotide sequences, is an important process that influences biological evolution at many different levels. Recombination explains a considerable amount of genetic diversity in natural populations and, in general, genes located in regions of the genome with low levels of recombination have low levels of polymorphism (Posada and Crandall, 2001). Recombination reshuffles existing variation and even creates new variants. It has been shown that RNA recombination enables the exchange of genetic material, not only between the same or similar viruses but also between distinctly different viruses (Worobey and Holmes, 1999). Sometimes, it also permits crossovers between viral and host RNA (Greene and Allison, 1994;Aaziz and Tep-fer, 1999;Baroth et al., 2000;Nagai et al., 2003). Taking into account the structure of viral genomic molecules and the location of crossover sites, three basic types of RNA recombination were distinguished: homologous, aberrant homologous and non-homologous (Lai, 1992;Alejska et al., 2001). The former two occur between two identical or similar RNAs (or between molecules displaying local homology), while the latter involves two different molecules. Most of the collected data suggest that RNA recombinants are formed according to a copy choice model (Alejska et al., 2001). A viral replication complex starts nascent RNA strand synthesis on one template, called RNA donor, and then switches to another template, called RNA acceptor. Accordingly, two main factors are thought to affect RNA recombination: the structure of recombining molecules and the ability of the viral replicase to switch templates. Through generations, viral populations evolve under various selective forces at different regions and sites that display different functional constraints. A stringent and robust criterion for detecting adaptive evolution in a proteincoding gene is an accelerated nonsynonymous (d N , amino acid replacing) rate relative to the synonymous (d S , silent) rate of substitutions, with the rate ratio w = d N /d S > 1. As silent mutations do not change the amino acid whereas re-placement mutations do, the difference in their fixation rates provides a measure of selective pressure on the protein.
Amongst positive-strand plant RNA viruses, the family Tombusviridae encompasses several viruses with an important economical impact. According to the 8 th ICTV (International Committee on Taxonomy of Viruses) report (Fauquet et al., 2005), the family Tombusviridae includes the following genera: Tombusvirus, Carmovirus, Necrovirus, Dianthovirus, Machlomovirus, Avenavirus, Aureusvirus and Panicovirus. According to the Baltimore classification, the viruses in this family are classified as Type IV viruses, and are part of the luteovirus supergroup (Habili and Symons, 1989). The RNA is contained in an icosahedral (T = 3) capsid, composed of 180 units of a single coat protein 27-42 kDa in size; the virion measures 28-35 nm in diameter, and is not enveloped. All Tombusviridae have a positive-sense, single-stranded linear genome, with the exception of dianthoviruses, whose genome is bipartite. The genome is approximately 4-5.4 kb in length, depending on the genus. The 3' terminus is not polyadenylated. The 5' terminus is capped only in Carnation mottle carmovirus, Red clover necrotic mosaic dianthovirus and Maize chlorotic mottle machlomovirus. The genome encodes 4-6 ORFs. The polymerase ORF encodes an amber stop codon that is the site of a readthrough event within ORF 1 (except in dianthoviruses, where readthrough occurs via a frameshift), producing two products necessary for replication. There is no helicase encoded by the virus. The replication process of members of family Tombusviridae comprises the following steps: (i) the virus penetrates into the host cell, (ii) the viral genomic RNA is uncoated and released into the cytoplasm, (iii) the viral RNA is translated to produce the two proteins necessary for RNA synthesis (replication and transcription), (iv) a negative-sense complementary ssRNA is synthesized using the genome RNA as a template, (v) a new genomic RNA is synthesized using the negative-sense RNA as a template, (vi) the RNA-dependant RNA polymerase (RdRp) recognizes internal subgenomic promoters on the negative-sense RNA, to transcribe the 3' co-terminal subgenomic RNAs that will generate the capsid and movement protein, (vii) new virus particles are formed (White and Nagy, 2004).
The main objective of this work was to determine and characterize virus evolution mechanisms of the Tombusviridae based on the occurrence of putative recombination events and positive selection in their full-length genome. This was achieved by the analysis of 79 accessions obtained from GenBank. As a result, we propose a reclassification according to their predicted evolutionary history.

Material and Methods
The sequences of the entire genome of 79 accessions cataloged in GenBank were used in this study (Table 1).
The nucleotide sequences were aligned using programs CLUSTALW 2.0.9 and CLUSTALX 2.0.9 (Larkin et al., 2007) with default configuration. Their phylogenetic relationships were determined with the Maximum-likelihood (ML) algorithm incorporated in the MEGA version 5 program (Tamura et al., 2011) under assumption of the substitution models proposed by Jukes and Cantor (1969) Hasegawa et al. (1985) (HKY85), and Tamura and Nei (1993) (TN93). Bootstrap analyses with 500 replicates were performed to assess the robustness of the branches.
Using the MEGA4.1b program (Kumar et al., 2008), positive selection was inferred by the counting method described by Nei and Gojobori (1986) and, later on, by Suzuki and Gojobori (1999). According to this method, the phylogenetic tree of sequences analyzed was used. For the parsimony method, the total numbers of synonymous (c S ) and nonsynonymous (c N ) substitutions as well as the average numbers of synonymous (s S ) and nonsynonymous (s N ) sites per codon over the phylogenetic tree for each codon site were computed according to the maximum parsimony principle (Fitch, 1971;Hartigan, 1973). The null hypothesis of selective neutrality (r S = r N or w = 1) was tested for each site by computing the probability (p) of obtaining the observed or more biased values for c S and c N , which were assumed to follow a binomial distribution with the probabilities of occurrence of synonymous and nonsynonymous substitutions given by s S /(s S + s N ) and s N /(s S + s N ), respectively. Positive selection is inferred when p < 0.05 and c N /s N > c S /s S (Suzuki, 2006). Potential recombination events between diverged nucleotide sequences were explored using two programs: RDP v3.31b (Martin et al., 2005b) and RECCO (Maydt and Lengauer, 2006). RDP incorporates several published recombination detection methods into a single suite of tools: RDP (Martin and Rybicki, 2000), GENECONV (Padidam et al., 1999), BOOTSCAN (Martin et al., 2005a), MAXCHI (Smith, 1992), CHIMAERA (Posada and Crandall, 2001), SISCAN (Gibbs et al., 2000), and 3SEQ (Boni et al., 2007). In all cases, default parameters were used. Only events predicted by more than half of the methods are considered as significant. The algorithm developed and described by Maydt and Lengauer (2006) as being a fast, simple and sensitive method for detecting recombination in a set of sequences and locating putative recombination breakpoints is based on cost minimization. This method has only two tunable parameters, recombination and mutation cost. In practice the only parameter considered is a, representing the cost of mutation relative to recombination. When a changes from 0 to 1, the cost of mutation weighted by a increases, and the cost for recombination weighted by 1 -a decreases. In other words, parameter a controls the ambiguity between mutation and recombination.  Red clover necrotic mosaic virus RNA 1 (RCNMV-RNA 1) NC_003756 Red clover necrotic mosaic virus RNA 2 (RCNMV-RNA 2) NC_003775 Red clover necrotic mosaic virus RNA 1/Can (RCNMV-RNA 1/Can) AB034916 Red clover necrotic mosaic virus RNA 2/Can (RCNMV-RNA 2/Can) AB034917 Sweet clover necrotic mosaic virus RNA 1/59 (SCNMV-RNA 1/59) NC_003806 Sweet clover necrotic mosaic virus RNA 2/59 (SCNMV-RNA 2/59) NC_003807 Sweet clover necrotic mosaic virus RNA 2/38 (SCNMV-RNA 2/38) S46027

Recombination events during Tombusviridae evolution
Examination of the RECCO program output regarding the occurrence of recombination events in the complete genome of the Tombusviridae family, revealed that three out of five aureusviruses were putative recombinants (PoLV.Pigeonpea, JCSMV.Iran, MaWLMV.USA). In contrast, CLSV (unknown isolate) and CLSV.Canada did not show any recombinant signal (Table 2). Within the genus Aureusvirus, the most frequently recombining virus was PoLV.Pigeonpea (33 putative recombination sites), whereas only 28 possible recombination signals were detected in the genome of viruses JCSMV.Iran and MaWLMV.USA. Similarly, the only representative of the genus Avenavirus (OCSV) was a potential recombinant with 175 putative sites. The RDP package confirmed these results for both genera. Among the carmoviruses, 14 out of 30 members were possible recombinants. According to RECCO, the most frequently recombining virus was JINRSV with 134 putative events, while MeNSV.Nagasaki   (Table 4). Regarding the members of genus Tombusvirus, there was an agreement between the two methods indicating that 80% of the analyzed accessions were putative recombinants. While CBLV had the highest number of putative recombination signals (67 sites), TBSV.Cherry had only two recombination sites. Furthermore, it is noteworthy that the two representatives of genus Machlomovirus (MCMoV, and MCMoV.Nebraska) were not recombinants as assessed by the two methods of analysis used in this study. Seeking for the recombination frequency in the genome of the Tombusviridae, two-thirds of the aureusviruses (JCSMV.Iran, and MaWLMV.USA) showed that in most cases, their breakpoint length was a single residue. In contrast, the breakpoint length of most putative recombination sites of PoLV.Pigeonpea was between three and 37 nucleotides (Table 2). Also, the breakpoint length of the major recombination sites of the single representative of genus Avenavirus (OCSV) consisted of a single residue. In about 50% of the members of the genus Carmovirus, the length of their most detected recombination sites was a single residue. As opposed to that, the breakpoint interval of the remaining members exceeded three residues reaching a size as long as 82 residues (MeNSV.NH). In 62% of the investigated dianthoviruses, the breakpoint length exceeded three nucleotides reaching 100 residues (CarRSV.RNA 1) (Table 3). In the necroviruses, the breakpoint interval distribution was similar i.e., 50% of the breakpoints consisted of a single residue, while the remaining breakpoints were between three and 77 nucleotides. For the sole member of the genus Panicovirus (PMV), most of the recombination sites had a breakpoint length of a single residue (45) ( Table 3). As for the tombusviruses, 75% showed a breakpoint length exceeding three residues up to 161 nucleotides (AMoCV.Bari) ( Table 4).

Nucleotide sequence analysis
Maximum composite likelihood estimate of the nucleotide substitution pattern were made using the MEGA4.1b program. The results for Tombusviridae showed that the rates of different transitional substitutions varied from 3.18 to 14.61, and those of transversional substitutions varied from 6.6 to 8.57. The nucleotide frequencies were: 0.269 (A), 0.258 (T/U), 0.207 (C), and 0.266 (G). The transition/transversion rate ratios were k 1 = 1.705 (purines) and k 2 = 0.482 (pyrimidines). The overall transition/transversion bias was R = 0.547, where R = [AGk 1 + TCk 2 ]/[(A+G)(T+C)]. There were a total of 1218 positions in the final dataset. In all these analyses, the codon positions included were first + second + third + noncoding. All positions containing gaps and missing data were excluded from the dataset (complete deletion option).
The MEGA4.1b program also incorporates the Tajima's Neutrality Test. The purpose of this test is to indentify sequences which do not fit the neutral theory model at equilibrium between mutation and genetic drift. Tajima's test compares a standardized measure of the total number of segregating sites (the polymorphic DNA sites) in the sampled DNA and the average number of mutations between pairs in the sample. Tajima's D was determined (D = 5.280926).

Positive selection
The high genetic stability of viruses can be attributed to negative or purifying selection to maintain the functional integrity of the viral genome. The degree of negative selection in genes, or the degree of functional constraint for the maintenance of the encoded protein sequence, can be estimated, as mentioned above, by the ratio between the nucleotide diversities in nonsynonymous and synonymous positions (d N /d S ). For most coding genes the d N /d S ratio is < 1 which is consistent with negative selection against protein change. In contrast, a d N /d S ratio > 1 may be an indication that adaptive or positive selection is driving gene divergence. In this study, pairwise comparisons of all screened accessions showed that, none of the members of the genera Machlomovirus and Panicovirus, and unclassified 652 Boulila Table 3 -Determination of inferred putative recombination events and their frequency along the sequences of necroviruses, one panicovirus and dianthoviruses. Algorithm RDP v3.31b showed that only events supported by more than half of the different methods are reported. Nucleotide numbering corresponds to the aligned sequences. Abbreviations: NRS: -number of recombination sites, GIRE: -genomic interval of recombination events (the span of sequences in the viral genome where recombination events were predicted).

Recombination determined by RECCO
Recombination determined by RDP v3.31b Virus.isolate  (Table 5). It is worth pointing out that, in the viruses with a segmented genome, positive selection was detected only in RNA 2, suggesting that probably reassortment events occurred. All these results were obtained by testing neutrality in sequence pairs with Fisher's Exact Test. The probability of rejecting the null hypothesis of strict-neutrality (d N = d S ) in favor of positive selection for each sequence pair was determined. Values of p less than 0.05 were considered significant at the 5% level. The variance of the difference (d N -d S ) was computed using the bootstrap method (500 replicates). All analyses were made using the Nei-Gojobori method incorporated in the MEGA program. All positions containing gaps and missing data were excluded from the dataset (complete deletion option).
The final dataset comprised a total of 234 positions.

Phylogenetic relationships
The phylogenetic relationships among members of the family Tombusviridae, based on the sequences of their complete genome, were inferred using a Maximum Likelihood algorithm under the assumption of three models of substitution (JC, HKY85, TN93). The topologies of the constructed trees were identical. The inferred phylogeny showed that each taxonomical genus in the family Tombusviridae constituted a homogenous group clearly distinct from the others. However, the results obtained in this study evidenced a few differences in terms of virus species composition within each taxonomical genus compared to the current classification adopted by the ICTV. In fact, three viruses considered by the ICTV as unassigned (PLPV.PV-0193) and unclassified Tombusviridae (NLVCV.Alaska, PCRPV.GR 57) showed a close phylogenetic relationship to known members of the genus Carmovirus. Moreover, the viruses belonging to this genus were divided into two distinct subgroups. The first subgroup comprised viruses: TCV, CCFV, JINRV, HCRSV, PLPV, PCRPV, NLVCV, SCV, AFBV, PFBV, CPMoV, SYMoMV and CarMoV, and the second subgroup encompassed viruses: MeNSV, and PSNV. Furthermore, it was proposed that genus Necrovirus should be constituted by two distinct subgroups named tentative Subgroup I (BBSV, TNV.D, LWSV) and tentative subgroup II (OMMV, TNV.A, OLV-1) (Figure 1). It should be noted that here OMMV is an integral part of subgroup I rather than an unclassified Necrovirus. In contrast, genus Aureusvirus encompassed members that evolved in a homogenous manner: CLSV, PoLV, MaWLMV, and JCSMV. Similarly, the following members of genus Tombusvirus also formed a coherent ensemble: MaNSV, CBLV, LNV.L, LNV.Zantedeschia, PeLV, CNV, CymRSV, AMoCV, TBSV.Statice, TBSV.Nipplefruit, TBSV.Pepper, TBSV.Cherry, GALV, PNSV, and CarIRSV. Their evolutionary history 656 Boulila  reshuffled the existing classification adopted by the ICTV since 2009. In fact, according to this classification, MaNSV was considered as an unassigned Tombusviridae, whereas LNV and PNSV were included in the unclassified Tombusvirus group. Concerning genus Dianthovirus which clearly was not monophyletic, the clustering pattern showed two distinct clades representing their RNAs 1 and 2, as illustrated in Figure 1. Originally, RVX was considered as an unclassified virus within genus Dianthovirus.

Discussion
This study evidenced the prediction of putative recombination events in the genome of several members of the family Tombusviridae and demonstrated that tombusviruses and carmoviruses are highly recombinant compared to viruses of the other genera. For this purpose, two methods were chosen (RECCO and RDP v.3.31b), based on the fact that they are appropriate for the mosaic structure of viruses as reported in previous works (Boulila, 2009;2010). In this study, using the RECCO algorithm, it was demonstrated that the viruses belonging to the following genera contained putative recombination signals in their genome: Aureusvirus, Avenavirus, Carmovirus, Dianthovirus, Necrovirus, Panicovirus, and Tombusvirus. These results were in good agreement with those obtained by the RDP package except for members of genus Diantovirus. By both methods, the two representatives of genus Machlomovirus (MCMoV, MCMoV.Nebraska) were found to be nonrecombinant. As revealed by RECCO, the most frequently recombining viruses were: OCSV, RVX.RNA 1, JINRSV, and PMV with 175, 166, 134, and 108 putative recombination sites, respectively. All of these recombination signals were constituted by a single residue. MeNSV.Nagasaki, MeNSV.NK, RCNMV.RNA 2, TNV-A.C, and TBSV.Cherry (2 sites), MeNSV.MNSV-Al, CarRSV-RNA 2, and TNV-A.FM1B (3 sites), MeNSV.Kochi, BBSV-Val25.Iran, and TNV-D.Hungarian (4 sites), CarMoV, MeNSV.MNSV-264, and MeNSV.NH (5 sites), RCNMV.RNA 1.Can, TBSV.nipplefruit, and TBSV.pepper (7 sites), and RCNMV.RNA 1 (8 sites) showed the lowest frequency of recombination breakpoints. In contrast, most of these breakpoints had an interval exceeding three nucleotides. Furthermore, this study showed that recombination may occur between viruses belonging to different genera. For example: Oat chlorotic stunt avenavirus (OCSV) and Melon necrotic spot carmovirus (MeNSV) may give rise to Pothos latent aureusvirus (PoLV). Similarly, OCSV itself may result from a recombination between Turnip crinkle carmovirus (TCV) and Maize white line mosaic aureusvirus (MaWLMV) ( Table 2). Seemingly, these viruses could contain part of their sequences particularly in the coat protein-encoding gene of each other. Such an event was largely studied for Cucumber necrosis tombusvirus (CNV) and Melon necrotic spot carmovirus (MeNSV) (Riviere and Rochon, 1990).
On the other hand, investigations of selective pressure acting on protein expression of virus genes led to the identification of positive selection in 17 accession pairs involving 15 different lineages. It is worth mentioning that numerous viruses: JCSMV.Iran, OCSV, CarRSV-RNA 2, RCNMV-RNA 2.Can, BBSV.Val25.Iran, GALV.nipplefruit, TBSV.Statice, PNSV, and PLPV.PV.0193 evolved under both mechanisms: recombination and positive selection between which synergism might be occurring. Such a synergism between recombination and natural selection may have played a major role in Darwinian molecular evolution.
The evolutionary history of the Tombusviridae has shown that the 79 accessions split into eight clearly separated clusters representing the eight genera of the Tombusviridae family. From the present phylogenetic study, at least two taxonomic implications can be drawn: (i) three viruses (NLVCV.Alaska, PCRPV.GR 57, PLPV.PV-0193) currently considered by the ICTV as: one unassigned Tombusviridae (PLPV.PV-0193), and two unclassified Tombusviridae (NLVCV.Alaska and PCRPV.GR 57). All of them should be included in genus Carmovirus; (ii) In addition to the viruses belonging to genus Carmovirus which have formed two separated subgroups, the members of genera Necrovirus, and Dianthovirus evolved separately and divided into two distinct subgroups as shown in Figure 1. In contrast, members of genera Aureusvirus, and Tombusvirus formed separately a single ensemble. The evolutionary relationships among viruses are a reliable approach for classification. As stated by Stuart et al. (2004) (who reported similar results regarding the genetic divergence of components of genus Necrovirus), the comparison of complete genomes is a more balanced approach that should provide a more precise scheme of relatedness. On the other hand, it should be pointed out that, in genus Dianthovirus, the genetic divergence between RNAs 1 and 2 is correlated to the final products synthesized and their use by the virus to survive. For example: RNA silencing is a small RNA-guided sequence-specific gene activation mechanism in eukaryotes that is involved in different biological phenomena (e.g. development, heterochromatin formation and defense against molecular parasites such as viruses). Many viruses express suppressors to counteract RNA-silencingmediated antiviral defenses. These RNA silencing suppressors have been identified in the following genera: Aureusvirus, Carmovirus, Tombusvirus, and Dianthovirus (Voinnet e al., 1999;Qu et al., 2003;Mérai et al., 2005;Takeda et al., 2005). Dianthovirus uses a unique strategy to suppress RNA silencing. The dianthoviral suppressor consists of multiple components including P27, P88 (encoded by two ORFs in RNA 1) and viral RNA (Takeda et al., 2005). Moreover, sequence variability of the coat protein-coding gene (RNA 1) may be linked to the interaction between this structural protein and the host and vector which themselves show a major diversity among diathoviruses. In contrast, the ORF in RNA 2 encodes the movement protein. All these factors can influence the divergence between the two RNAs.
Finally, to the author's best knowledge, this is the largest study in the literature so far on recombination potentially occurring in the entire genome of all currently known members of the family Tombusviridae as well as positive selection operating on protein expression and their phylogenetic reconstruction. In addition, a reclassification based on their predicted evolutionary history, is proposed.