DendroSSR: SSRs and sequence alignment as tools for building phylogeny trees

Abstract This study introduces a new method to construct phylogenetic trees by combining both of the Simple Sequence Repeats (SSRs) and sequence alignments. The purpose of this work is to present the DendroSSR program and show it via a case study involving diverse Aspergillus species. To show how the DendroSSR program works to resolve complicated species relationships in phylogenetic trees, we employed the Aspergillus species as an example of a research case. The DendroSSR employs a technique containing multiple phases beginning with, detecting SSRs, computing SSRs similarities, sequences alignment, building a distance matrix based on SSRs similarity and sequences alignments, and then hierarchical clustering, and presenting the findings in a dendrogram. Sometimes sequence alignments alone may not give adequate information to generate a phylogenetic tree to resolve complicated species relationships. Therefore, establishing a distance matrix that is formed of addition of SSRs similarity across sequences to the traditional sequence alignment helps the process substantially and resolves the connections of complex species on phylogenetic trees. Additionally, it may be hard to distinguish complex relationships across species when studying conserved sequences, which could lead to an incomplete representation of their evolutionary relationships. These limitations are addressed by DendroSSR, which offers a technique to produce phylogenetic trees by incorporating SSRs similarity across species into the approach of generating phylogenetic trees. As it is known, SSRs are extensively scattered across the genomes of species and exhibit a great variation. Therefore, SSRs may support the knowledge gathered from sequence alignments by providing more information on genetic variation and even evolutionary relationships. The use of DendroSSR analysis might be considered for creating phylogenetic trees as a complementary or secondary strategy among the species under examination in circumstances where traditional phylogenetic analysis fails to clarify the species complex phylogenetic relationships.

program comes to add an importance to the traditional phylogenetic analysis by incorporating SSRs' similarities that might help resolving complex species relationships on phylogenetic trees when traditional phylogenetic analyses that are based on sequence alignments fail.This is accomplished by including SSRs into phylogenetic analysis in addition to traditional sequences alignments.The SSR identification, computation of SSR similarity, sequence alignment, generation of distance matrix based on SSR similarity and alignment distances, hierarchical clustering, and dendrogram visualization are all included in the DendroSSR program as an implementation of a systematic approach (Kofler et al., 2007;Python Software Foundation, 2021;McKinney, 2017;Needleman and Wunsch, 1970;Real and Vargas, 1996;Cock et al., 2009;Virtanen et al., 2020;Hunter, 2007;Ward Junior, 1963).This strategy offers a beneficial method to standard methods of building trees, which are dependent primarily on sequence alignments in order to generate phylogenetic trees.
We give in this study a detailed case study, and this study used several Aspergillus species for demonstrating the capabilities of DendroSSR in resolving species complex relationships in phylogenetic studies.The study highlights the ability of this software to resolve confusing places within phylogenetic trees and reveal the evolutionary relationships across taxa.
Therefore, the goal of this study is to introduce the tool DendroSSR which uses a new approach to build phylogenetic trees and demonstrate it by using this program in a case study including several Aspergillus species.

Materials and Methods
In this section, we provide a methodology and workflow of DendroSSR (Flowchart 1), which includes the input of sequence data, SSR identification, the calculation of SSR similarity, sequence alignment, distance matrix computation, hierarchical clustering, and the visualization of dendrograms.We applied the DendroSSR program to analyze the phylogenetic relationships among 12 Aspergillus species, and the phylogenetic analysis obtained by DendroSSR were then compared to results to those obtained from a traditional sequence alignment analysis.

Introduction
Phylogenetic trees are utilized in various ways across multiple subfields of biological research, including but not limited to taxonomy, molecular biology, and ecology (Townsend et al., 2012).The presence of these trees is essential for understanding the ancestral relationships among different genera species.Computational phylogeny approaches are used to build phylogenetic trees from a large number of DNA sequences.Distance-matrix approaches like neighborjoining or UPGMA (Unweighted Pair Group Method with Arithmetic Mean), that utilize multi-sequences alignment to build distance matrix, are the easiest to use but don't use an evolutionary model (Felsenstein, 2004).A number of approaches for sequences alignment, like ClustalW, also build phylogenetic trees by applying the easier techniques that are based on distance (Felsenstein, 2004).The Maximum parsimony is another simple method to estimate evolutionary trees, but it assumes a model of evolution.When estimating an evolutionary tree, more sophisticated approaches use the measure of maximum likelihood such as a Bayesian approach system (Felsenstein, 2004).
Despite the fact that phylogenetic trees made from mapped genes or genomic sequences from various species can infer valuable information about evolution, these studies aren't perfect and need to be improved.Furthermore, the trees they make are not always right.They do not always show how the groups they include have changed over time.As can be with any science result, they can be proven wrong with more research (for example, by getting more data or studying the data we already have with better tools).The data that they depend on may be unclear; studies can be messed up by genetic recombination (Townsend et al., 2012), horizontal gene transfer (Arenas and Posada, 2010), and hybridization among species that were not closest to one another on the tree before hybridization, convergent evolution, and conserved sequences (Woese, 2002).
Regarding the process of generating phylogenetic trees, conventional computational methodologies primarily depend on sequence alignments to compute estimated genetic distances and concluded evolutionary relationships among diverse organisms.Alternatively, these methodologies may exhibit certain limitations is due to factors such as genetic recombination, horizontal gene transfer, hybridization, convergent evolution, or conserved sequences (Townsend et al., 2012;Arenas and Posada, 2010;Woese, 2002;Parhi et al., 2019;Felsenstein, 2004).
This study presents the software program DendroSSR (2023) as a tool for constructing phylogenetic trees.DendroSSR relies on both sequence alignments and Simple Sequence Repeats (SSRs) in its methodology.The SSRs have proven to be of significant value in the study of genetic variety and the processes of evolution (Ellegren, 2004), due to the high level of variability they exhibit as well as the large quantity of which they are composed (Townsend et al., 2012;Arenas and Posada, 2010;Woese, 2002;Parhi et al., 2019;Felsenstein, 2004;Geneious, 2022).The DendroSSR performs a new way to infer phylogenetic species relationships in comparison to the most common software such as MEGA or PAUP* which mainly uses sequence alignments only (Kumar et al., 2018;Swofford, 2002).Here where DendroSSR Brazilian Journal of Biology, 2023, vol. 83, e275386   3/6 DendroSSR: phylogeny trees via SSRs and sequence alignment confusing.In addition, these 12 sequences demonstrate how well DendroSSR handles DNA sequence variation and species evolution.These 12 sequences explain DendroSSR compared with UPGMA.

UPGMA tree construction
After that, the sequences were aligned with the help of a standardized set default that was already set into the Geneious program (Geneious, 2022).The UPGMA in the Geneious software was used to create a phylogenetic tree of Aspergillus species (Geneious, 2022).This tree was compared with the tree that was generated by DendoSSR program.

Data input of DendroSSR
The FASTA file format is the only format accepted and it can be used to submit sequence data to DendroSSR program.With the graphical user interface, users have the ability to download sequence files, and the software will then read and make processing for the sequences that file has.The software will read the labels that relate to sequences and consider them.For this study, we collected ITS sequences of 12 Aspergillus species in FASTA format and downloaded them to the program.

SSR identification
Regular expressions are utilized by DendroSSR in order to determine the SSRs that are present in each sequence.Identified SSRs are saved in a list together with the positions where they began and ended as well as their length were determined (Kofler et al., 2007;Python Software Foundation, 2021;McKinney, 2017).In this study, we identified the SSRs for each of the 12 Aspergillus species using DendroSSR software.

SSR similarity
In order for the software to determine the degree of similarity between two sequences on the basis of their SSRs, it must first compute the intersection between the sets of SSRs contained in each sequence.The DendroSSR used, The Jaccard index, which is the size of the intersection divided by the size of the union of the SSR sets, is then used to compute the SSR similarity.This is done after the Jaccard index has been calculated.We calculated the SSR similarity for all pairs of Aspergillus species (Needleman and Wunsch, 1970;Real and Vargas, 1996;Python Software Foundation, 2021).

Sequence alignment
Using the BioPython pairwise 2 modules, the DendroSSR program computes an alignment score for each pair of sequences that are being compared.We performed global sequence alignments for all Aspergillus species pairs (Needleman and Wunsch, 1970;Real and Vargas, 1996;Cock et al., 2009: Python Software Foundation, 2021).

Distance matrix
The DendroSSR computes a distance matrix for all sequence pairs based on a weighted average of SSR distance (1 -SSR similarity) and traditional alignment distance.
For the 12 Aspergillus species, we first calculated the SSR similarity between each pair of species, as described in the SSR Similarity section.Then, DendroSSR program computed the alignment distances for each pair of species using the global sequence alignment performed with the Needleman-Wunsch algorithm (Python Software Foundation, 2021;McKinney, 2017;Needleman and Wunsch, 1970;Real and Vargas, 1996;Cock et al., 2009).After both SSR similarities and traditional alignment distances calculated.This program then created a distance matrix for sequences analyzed.As a demonstration, we built a distance matrix based on both calculated SSR similarities and traditional alignment distances for Aspergillus species sequences (Python Software Foundation, 2021;McKinney, 2017;Needleman and Wunsch, 1970;Real and Vargas, 1996;Cock et al., 2009)

Hierarchical clustering
The hierarchical clustering found in the SciPy library of python is used side by side with the Ward method in order to get the distance matrix into a linkage matrix.After that, the linkage matrix is used to construct a dendrogram, the matplotlib library of python is implemented to show the tree (Virtanen et al., 2020;Hunter, 2007).In order to construct the phylogenetic tree, we applied hierarchical clustering to the dataset of ITS sequences of Aspergillus species (Ward Junior, 1963;Python Software Foundation, 2021).

Dendrogram visualization
The resulted tree of high resolution has been showed using the matplotlib library.The resulted tree can be saved as an image file on personal computer by users (Virtanen et al., 2020;Hunter, 2007;Python Software Foundation, 2021).The generated DendroSSR tree of Aspergillus species sequences was generated and saved.

Comparison with traditional sequence alignment analysis
To evaluate the performance of DendroSSR, we compared its results with those obtained from a traditional sequence alignment analysis of UPGMA method using the same dataset of 12 Aspergillus species.

The DendroSSR phylogenetic tree
The DendroSSR program, based on sequence alignments and SSRs (Simple Sequence Repeats), was used for analysis and to build a phylogenetic tree (Figure 2).The results grouped the Aspergillus species as follows: Group 1: (A.

Comparison of the DendroSSR tree with the UPGMA tree
Comparison of the DendroSSR tree (Figure 2) with the UPGMA tree (Figure 1) are summarized as following: Both trees place A. fumigatus and A. fischeri in the same group.Unlike the UPGMA tree, the DendroSSR tree classifies A. steynii and A. campestris as sister species.Therefore, the DendroSSR method provides a more faithful portrayal of the true connections between species.
DendroSSR's greater resolution over the UPGMA approach is shown by its ability to be more clearly depict the relationships among A. terreus, A. oryzae, and A. flavus.With this addition, the DendroSSR may show more resolution to solve complex connections across species.
Both trees agree on how to classify A. niger, A. luchuensis, and A. costaricensis, hence their relationships are consistent, and they are strongly related.The UPGMA approach identifies genetic similarities and common evolutionary  DendroSSR: phylogeny trees via SSRs and sequence alignment history, and the DendroSSR tree maintains this information, demonstrating its reliability and consistency.
Interesting new clustering: A. nidulans and A. clavatus are now grouped together in the DendroSSR tree, although they were previously separated in the UPGMA tree.This indicates that DendroSSR may be better able to uncover unexpected relationships between species.
Consistent Representation of the Outgroup: The Pythium outgroup is consistently represented in both trees, providing a stable baseline against which to evaluate the divergence of the Aspergillus species.Accurately interpreting species connections relies on maintaining a stable outgroup with constant representation.
These contrasts show that the DendroSSR approach is preferable than the UPGMA method for several studies because it provides a more comprehensive, precise, and consistent representation of species relationships.

Discussion
The present investigation employed DendroSSR to analyze a dataset of DNA sequences obtained from various Aspergillus species.The efficacy of DendroSSR program was tested through its proficient execution of SSR identification, SSR distance matrix computation, and alignment distance matrix generation.The software DendroSSR was utilized to carry out hierarchical clustering, leading to the production of a dendrogram that visually represents the clustering of sequences (Kofler et al., 2007;Python Software Foundation, 2021;McKinney, 2017;Needleman and Wunsch, 1970;Real and Vargas, 1996;Cock et al., 2009;Virtanen et al., 2020;Hunter, 2007;Ward Junior, 1963).The dendrogram presented a thorough depiction of the phylogenetic relationships among sequences, by utilizing both SSR content and alignment distances, thereby leading to generate phylogenetic relationships of the Aspergillus species.The utilization of DendroSSR analysis (as depicted in Figure 2) has enhanced the level of resolution for certain Aspergillus species in contrast to the UPGMA Method in this study (as illustrated in Figure 1).The taxonomic classification of certain Aspergillus species has been refined, resulting in a clearer grouping of A. terries, A. oryzae, and A. flavus.Additionally, A. steynii has been reclassified and is now more definitively grouped with A. campestris, indicating a more precise relationship between these species.Furthermore, A. nidulans and A. clavatus are classified as a cluster, which contrasts with the UPGMA dendrogram.The enhanced resolution observed in the DendroSSR analysis can be ascribed to the utilization of both sequence alignments and SSRs, which can show more resolution into the phylogenetic associations among species.However, the differences in the underlying methodologies employed to construct the trees may account for the alterations in the tree topology between the UPGMA Method and DendroSSR trees (Townsend et al., 2012;Arenas and Posada, 2010;Woese, 2002;Parhi et al., 2019;Felsenstein, 2004;Geneious, 2022).Phylogenetic trees are considered as the most reliable depictions of evolutionary relationships, based on the DNA and protein sequences.In order to enhance our understanding of phylogenetic species relations, we need to advance our methods in analysis of phylogenetic trees by incorporating more informative characters.The DendroSSR uses informative characters based on both traditional sequence alignment and SSRs similarity between genera species to come out with the best tree topology to solve complex species relationships that are not resolved by trees that are generated based on traditional sequence alignment.
In this study, we used 12 species from the genus Aspergillus.The genus Aspergillus species may have undergone rapid species diversification within a short time period, creating challenges in detecting genetic or morphological differences among various species.The challenge may be intensified in cases where the method employed for constructing the tree is not optimal, particularly if specific assumptions are not valid for the given species leading to an unclear phylogenetic relationships among species on a tree (Townsend et al., 2012;Arenas and Posada, 2010;Woese, 2002;Parhi et al., 2019;Felsenstein, 2004).And this was the case with the phylogenetic analysis of Aspergillus species in this study (Figure 1).In order to enhance the understanding of the interrelationships among A. steynii, A. nidulans, and other Aspergillus species groups, it is recommended that this study explore the utilization of other phylogenetic methodologies in addition to a traditional phylogenetic analysis.DendroSSR presents a unique methodology for conducting phylogenetic analysis of DNA sequences, which places significant emphasis on the SSR content in addition to traditional sequence alignments (Kofler et al., 2007;Python Software Foundation, 2021;McKinney, 2017;Needleman and Wunsch, 1970;Real and Vargas, 1996;Cock et al., 2009;Virtanen et al., 2020;Hunter, 2007;Ward Junior, 1963).When comparing highly divergent sequences, alignmentbased methods may not provide meaningful results (Townsend et al., 2012;Arenas and Posada, 2010;Woese, 2002;Parhi et al., 2019;Felsenstein, 2004;Geneious, 2022).The SSRs in phylogenetic analysis allow the study of how these repeating elements affect genome evolution, resulting in a full tree demonstrating inter-species relationships.This Python-based DendroSSR builds phylogenetic trees where SSRs and sequence alignments are used to do this (Geneious, 2022;Kofler et al., 2007;Python Software Foundation, 2021;McKinney, 2017;Needleman and Wunsch, 1970;Real and Vargas, 1996;Cock et al., 2009;Virtanen et al., 2020;Hunter, 2007;Ward Junior, 1963).This approach uses SSRs and sequence alignments to show speciesrelated sequences' evolutionary connections.DendroSSR is particularly useful when standard phylogenetic analysis cannot distinguish species connections on the tree.Thus, DendroSSR analysis may be used to build phylogenetic trees for species when standard methods fail to clarify their complicated connections.

Figure 1 .
Figure 1.Phylogenetic tree of ITS sequences of Aspergillus species constructed using UPGMA method: traditional sequence alignment approach.

Figure 2 .
Figure 2. Phylogenetic Analysis: DendroSSR-generated Tree of ITS sequences of Aspergillus Species with Integrated SSRs and Sequence Alignments.