Phylogenetic inference applied to germplasm bank characterization and interspecific breeding in passionfruit

Abstract: Interspecific crosses between the more than 520 Passiflora species may or may not be compatible. The sequences ITS, matK, psbA-trnH, rbcL and trnL-F were used to confirm species identity and to estimate the genetic distances among 48 Passiflora accessions of a germplasm bank. Twenty species were used for crosses within and between distinct Passiflora subgenera. The phylogenetic resolution based on ITS data was superior to that of any single chloroplast marker, recommending it as a DNA barcode for this genus. However, the tree topology based on Bayesian phylogenetic analysis using the combination of the four chloroplast markers possessed greater support for subclades within the subgenus Passiflora, which contains more species of interest for breeders. We could not identify a clear cutoff value of pairwise Kimura two-parameter (K2P) distances to predict success or failure of crosses. Rather, the clustering of species pairs together within the same or closely related phylogenetic subclades predicted the probability of wide cross compatibility more reliably.


INTRODUCTION
Passiflora is the largest genus of the Passifloraceae family, with about 520 described species in tropical and warm temperate regions (MacDougal and Feuillet 2004). Brazil, where about 150 Passiflora species are found, is one of the main centers of diversity of the genus, with 88 endemic species (Cerqueira-Silva et al. 2016, Mezzonato-Pires et al. 2018). Although about 70 species are edible, production chains were only developed for Passiflora edulis Sims. (sour passionfruit) and P. alata Curtis (sweet passionfruit) in Brazil, which is the world's largest producer and consumer of passionfruit (Cerqueira-Silva et al. 2018).
Wide crossing is considered the most important source of genetic variation for breeding in Passiflora. Disease resistance, ornamental value, self-compatibility, fruit quality and insensitivity to photoperiod are some of the current target traits of Passiflora breeding (Lira Júnior et al. 2014, Ocampo et al. 2016, Cerqueira-Silva et al. 2018). Chances of breeding success depend on the correct identification and classification of species and on basic knowledge about chromosome number and breeding behavior (Hansen et al. 2006). Several biological barriers could play a role in hybrid incompatibility. Usually, genetic distance estimates are used as a first approach to draw conclusions about potential compatibility (Chapman andBurke 2007, Muñoz-Sanz et al. 2020). Knowledge about the intra-and PW Inglis et al. inter-specific genetic variability and phylogenetic relationships between cultivated Passiflora species and their wild relatives may be useful to increase the chances of making wide crosses that produce viable seed and to maintain the molecular polymorphism narrowed by extensive selection (Costa et al. 2012). Several molecular phylogenetic studies have sought to clarify taxonomic issues in Passiflora (Feuillet and MacDougal 2003, Muschner et al. 2003, Hansen et al. 2006, Muschner et al. 2013, Ramaiya et al. 2014, Grisi et al. 2019, which may be the result of the confoundingly high intraspecific variability observed in Passiflora species (Mader et al. 2010).
The Passiflora breeding program of Embrapa Cerrados has exploited hybridizations between wild and cultivated species. Germplasm is maintained at the Active Germplasm Bank Flor da Paixão, in Planaltina-DF, Brazil, one of the largest living collections of Passiflora species. To date, some taxonomic uncertainties still persist, as well as questions on the genetic relatedness and phylogeny of different species of the germplasm included in breeding programs. In this study, DNA sequences of nuclear ribosomal ITS and several chloroplast genes were obtained from a selection of Passiflora species of interest to breeders. The new Passiflora sequences as well as others already available in NCBI GenBank were subjected to phylogenetic analysis to clarify taxonomic questions about these accessions and place them in their phylogenetic context. Thereafter, to help identify potential hybrid combinations for breeding, the correlation between phylogenetic distance and cross-compatibility between Passiflora species was analyzed.

MATERIAL AND METHODS
For the analysis, 48 Passiflora accessions of the Active Germplasm Bank 'Flor da Paixão' of Embrapa Cerrados were selected (Table 1), which are 43 representative species of the current focus of breeding efforts on the improvement of sour, sweet and ornamental passionfruit. To complement the analysis, we used selected reference sequences from recent phylogenetic studies of Passiflora species contained in GenBank, which also served to confirm our previous species identifications through DNA barcodes.
Genomic DNA was extracted from silica-dried leaf tissue using a CTAB-based protocol (Inglis et al. 2018a) and the nuclear ribosomal internal transcribed spacers (ITS) and four chloroplast regions, trnH-psbA, trnL-trnF, rbcL and matK were sequenced. The procedures and assembly of data matrices were essentially as described previously (Inglis et al. 2018b). A combined matrix of the four chloroplast regions was also prepared, in which alignment gaps indicated unavailable accession data. The ITS of numerous Passiflora accessions was initially unsuitable for Sanger sequencing due to the presence of a strong poly-G-poly-C hairpin loop. However, the quality of the ITS sequence data was high after substituting the dGTP in the original PCR dNTP mix with 7-deaza dGTP (Dierick et al. 1993).
Pairwise Kimura two-parameter (K2P) distances were calculated from a concatenated chloroplast + ITS matrix in MEGA 7 (Kumar et al. 2016), ignoring ambiguous positions. After a tree search in PAUP*, maximum parsimony statistics were generated (v4.0b10; Swofford 2003). Alignment gaps were treated as missing data and heuristic searches comprised 1.000 repeats of five cycles of random taxon addition, holding one tree per cycle, with ACCTRAN character-state optimization and TBR branch swapping. A Bayesian phylogenetic hypothesis based on the ITS matrix and concatenated chloroplast matrix was obtained using reversible-jump MCMC in MrBayes 3.2.2 (Ronquist et al. 2012). Priors were allowed to vary independently for each data partition and one cold and three heated MCMC chains were run in parallel for 10 million generations and sampled every 1000 generations. This runtime was sufficient to cause the convergence diagnostic, the mean standard deviation of split frequencies, to drop to below 0.01 in all repeated runs. The first 25% of the trees were discarded (burn-in period) prior to calculation of 50% majority rule consensus trees. Concordance factors were calculated between the ITS and concatenated chloroplast matrices under the maximum likelihood (ML) criterion in IQTREE (v. 2.1.12;Minh et al. 2020).
For the interspecific hybridizations, 20 Passiflora species, consisting of two cultivated (P. edulis and P. alata) and 18 wild species collected in Brazil, were analyzed. A total of 29 interspecific crosses among the 20 species were attempted as paired combinations (Table 3). Pollination was attempted by direct use of freshly collected pollen on newly opened flowers. The interspecific crosses were classified as compatible or incompatible, according to the resulting fruit and seed set. Compatibility was declared if at least one viable plant resulted from a cross, after seed germination and full plant growth was confirmed in the greenhouse. Incompatibility was declared if an interspecific cross failed to produce viable plants after extensive and repeated pollination attempts under controlled greenhouse conditions.

RESULTS AND DISCUSSION
Sequence data representing the entire amplicons of all five markers were successfully generated for all 48 Passiflora accessions selected from the Germplasm Bank (Table 1). Newly sequenced accessions are indicated by the prefix P** in the trees. In the cases of matK and trnH-psbA, many are new sequences of the represented Passiflora species and in the case of ITS, many are now complete ITS1-5.8S-ITS2 records. The phylogenetic analysis based on ITS sequences ( Figure 1) and combined chloroplast DNA sequences (matK, trnH-psbA, trnL-F) (Figure 2) allowed the separation of the Embrapa passionfruit accessions into four subgenera known as Passiflora (L.), Decaloba (DC.) Rchb., Astrophea (DC.) Mast. and Deidamioides (Harms) Killip. These results agree with earlier phylogenetic analyses (Muschner et al. 2003, Krosnick et al. 2013, as well as the acknowledged division of the genus into two groups correlated with flower size. The small-flowered group contains species assigned to the subgenera Astrophea, Decaloba and Deidamioides, and the large-flowered group species of the subgenus Passiflora. Some species with small flowers (Passiflora bahiensis, Passiflora malacophylla and Passiflora laurifolia), not analyzed by Muschner et al. (2003), were also found in the latter group. From a horticultural point of view, Passiflora is the most important subgenus to which a large number of species with great variability in flower morphology, size and color are assigned (Cerqueira-Silva et al. 2018), which is also the reason for the greater representation of this group in this study.
Aside from Bayesian inference (BI) to infer phylogenetic relationships, the performance of each sequenced locus was evaluated under the maximum parsimony (MP) criterion. Results of the five individual markers and a combination of the four chloroplast markers are given in Table 2. The tree inferred by chloroplast marker matK had the highest Consistency Index (CI), but this locus was represented by the fewest Passiflora species in GenBank. Among the markers of Passiflora species with extensive representation in GenBank, chloroplast marker rbcL had the lowest CI of them all, as well as the lowest phylogenetic resolution, as indicated by the low Normalized Consensus Fork Index (NCFI). Combining the chloroplast data into a single concatenated matrix improved the Consensus Fork Index (CFI) compared to the component matrices, but did not raise the CI. Of all markers used, the CI of ITS was the lowest, while the NCFI was higher than all but matK and the combined chloroplast DNA sequences. However, ITS outperformed all individual chloroplast loci and combined chloroplast data in terms of greater tree length, phylogenetic resolution and better pairwise discriminatory power. Nevertheless, the support for subclades within the resolved subgenera of the combined chloroplast tree was superior, particularly in subgenus Passiflora (Figures 1 and 2). The ITS1 portion of the full ITS region has been proposed as a DNA barcode for Passiflora (Giudicelli et al. 2015), in view of the good discriminatory potential across a large representative species sample and the correct identification rate of 68.64%, which can be further improved by the inclusion of ITS2. However, the significant levels of intraspecific variation reported raise concerns with regard to the sole use of ITS sequences for Passiflora species identification (Mader et al. 2010). The large number of Parsimony Informative Characters (PICs) in the ITS matrix (Table 2) could also indicate a risk of substitution saturation of this marker, as previously reported for Passiflora (Muschner et al. 2003). The difficulty in obtaining high-quality ITS sequence data in Passiflora, requiring nonstandard techniques to overcome a strong intramolecular secondary structure, may impair the wide adoption of the marker and increase the likelihood of miscalled bases, contributing to species mis-identification. Apart from intraspecific variation, differences in data quality are also possibly responsible for several small variances in clustering between our new sequences and those of earlier studies in certain species. This is exemplified by the variation in the ITS tree of P. rhamnifolia and P. haematostigma accessions. Variance in the chloroplast tree is likely to also result from differences in representation among the four markers in the combined matrix. The only solution to the latter problem is to assess each gene tree separately. Despite differences in gene representation between ITS and the four chloroplast markers, we obtained a gene concordance factor (gCF) of 68.6% averaged over all nodes of the ITS and combined chloroplast trees under the ML criterion (Minh et al. 2020), representing a good agreement. The mean site concordance factor (sCF) was 36.4 and mean ultrafast bootstrap (1000 pseudoreplicates) 85.9%. Due to the greater support for internal nodes, we used the combined chloroplast marker tree for interpretation of the hybridization data.
Several taxonomic questions related to the identity of the studied germplasm bank accessions were successfully resolved in this study. Among these accessions, a typical yellow-skinned sour passionfruit (P48/CPAC-MJ-M-01) clustered together with a purple-skinned accession (P46/CPAC-MJ-21-03) and another (P47/CPAC-MJ-M-23), with an unusual variation found in P. edulis: the presence of four stigmas. The ITS tree, however, distinguished the purple-skinned accession P46/CPAC-MJ-21-03 from the other studied P. edulis accessions. This is promising with a view to further    investigations, with the inclusion of a larger sample of the various sub-categories of P. edulis, P. edulis f. flavicarpa and wild P. edulis Sims accessions. Significant morphological variation was detected between accessions classified as Passiflora amethystina (P. amethystina "verdadeiro", P. amethystina "SP" and P. amethystina "rui"). An analysis of these accessions based on ITS and chloroplast DNA sequences (Figures 1 and 2) showed that P. amethystina "verdadeiro" and P. amethystina "SP" are closely related, while P. amethystina "rui" is somewhat distinct. In this context, a new analysis of the taxonomy of P. amethystina "rui" is warranted, considering its morphological differences from other accessions assigned to the species.  After repeated manual pollination attempts under controlled greenhouse conditions, 24 interspecific hybridizations were considered compatible and 11, incompatible (Table 3). Based on their placement in the combined chloroplast Bayesian tree (Figure 2), all crosses of this study between species of the subgenera Astrophea and Passiflora were incompatible, although not all possible taxon combinations were exhaustively tested. Within the same subgenus, however, many crosses were successful, whereas compatibility varied according to each subclade. For instance, all crosses between species of the same subclade within subgenus Decaloba (Figure 2, Table 3) produced viable seeds (e.g. P. sanguinolenta x P. citrina; P. sanguinolenta x P. capsularis). The same was observed for subgenus Passiflora, where all five crosses within a subclade were compatible (e.g. subclade 1a: P. laurifolia x P. nitida; subclade 1b: P. sidifolia x P. actinia). No clear cutoff value of K2P distance to predict the success or failure of Passiflora interspecific crosses could be established (Table 3). An extreme example is the short distance between the incompatible P. edulis and P. actínia and, at the other extreme, compatibility between the distantly related P. setacea and P. amethystina. Viable plants were also obtained from hybridizations between species of different subclades ( Figure 2, Table 3), e.g. 1a x 1b (P. galbana x P. actinia; P. coccinea x P. actinia) and 1a x 1c (P. edulis x P. setacea; P. coccinea x P. setacea; P. setacea x P. edulis). Crosses between species of the subgroups 1d and 1a produced no fruits or viable seeds (e.g. P. serratodigitata x P. edulis; P. serratodigitata x P. coccinea). Finally, the results of crosses between species of subclades 1 and 2 were mixed, indicating compatibility of seven combinations (P. galbana x P. alata; P. galbana x P. edulis; P. quadrangularis x P. alata; P. edulis x P. caerulea; P. mucronata x P. alata, P. setacea x P. alata; P. setacea x P. amethystine) and no viable seed set of three others (P. edulis x P. tenuifila, P. caerulea x P. edulis, P. amethystina x P. edulis, P. serratodigitata x P. alata). Elsewhere, successful crosses displaying normal meiotic behavior have been achieved between P. coccinea and P. hatschbachii (Souza et al. 2020).
Some crosses were only successful in one direction, as for example P. caerulea x P. edulis, for which compatibility was only observed in one direction of the cross (P. edulis x P. caerulea). On the other hand, compatibility was confirmed for both P. edulis x P. setacea and the reciprocal (Table 3). Interspecific crosses were likely to be compatible up to an average genetic distance threshold of 0.01065, but unlikely to be successful if the genetic distance exceeded 0.01385. Species with intermediate genetic distance were identified that could serve as candidates for future bridge-cross projects with currently available fertile hybrids to motivate breeders to overcome barriers to wide crosses in this genus. In our Passiflora hybridization experiments, lower K2P distance values were somewhat predictive of compatibility (Table 3), but rather inconsistent. All crosses with K2P distances of 0.0163 and below were compatible and the largest compatible distance was 0.0247 (P. setacea x P. amethystina). Notwithstanding the crudeness of distance-based methods for phylogenetic inference compared to cladistic methods, some of the observed inconsistencies in compatibility clearly indicated that more complex and specific biological mechanisms are possibly involved in many of the interactions, rather than merely phylogenetic distances between the crossed species.
The factors affecting compatibility of interspecific hybridization are manifold and have a fundamental influence on speciation (Jiggins 2019). Some mechanisms occur prior to fertilization, preventing pollen penetration of the ovule. Postfertilization mechanisms may include mitotic incompatibility, endosperm degeneration or differences in chromosome number (Hansen et al. 2006). In this study, incompatible crosses were detected between species with different chromosome numbers (P. mansoi x P. caerulea, P. hematoestigma x P. edulis, P. mansoi x P. edulis). However, eight other crosses between species with the same chromosome number (2n=18) were also incompatible ( Table 3). The use of interspecific hybrids between two species to facilitate gene introgression with a third species that would otherwise be incompatible (bridge cross), should also be explored in more detail in interspecific Passiflora breeding (Ocampo et al. 2016).
In Brazil, Passiflora species other than P. edulis and P. alata are being cultivated and used locally for fruit consumption or pharmacological and ornamental purposes. These species include, for instance, P. cincinnata, P. nitida, P. quadrangularis, and P. setacea, which are promising candidates for intraspecific domestication and breeding, as well as for introgression of important traits into commercial passionfruit species. The interspecific crosses involving P. nitida (P. laurifolia x P. nitida), P. quadrangularis (P. quadrangularis x P. alata) and P. setacea (P. coccinea x P. setacea; P. edulis x P. setacea; P. setacea x P. alata; P. setacea x P. amethystina) described here were all compatible.
Our data suggest that phylogenetic inference could be exploited, to a certain extent, to predict the compatibility of interspecific crosses of passionfruit for breeding. This is particularly important for this genus with more than 500 species, where interspecific crosses are a common breeding technique. In breeding programs, DNA barcoding to confirm species identity and phylogenetic placement could be used as a first proxy for the selection of interspecific crosses.