Genetic diversity and nonparametric statistics to identify possible ISSR marker association with fiber quality of pineapple

: Due to the increasing search for renewable resources, plant fibers have become an alternative when creating new products. Studies demonstrate the potential use of pineapple fibers in composites. The objective of this work was to evaluate the genetic diversity and verify any association between ISSr (Inter Simple Sequence Repeats) bands and quality of pineapple fibers for use in cements in the civil construction. The study analyzed the genetic variability of 11 pineapple genotypes, as well as the possible association of 131 bands from 16 ISSR markers with fiber quality characteristics. Eleven bands were selected based on their high correlations (0.64578 * to 0.72457 * ) with three fiber quality variables. Of these, two bands were purified, sequenced, and blasted against sequences in GenBank at NCBI. These markers can be used in marker assisted selection to genetically improve the quality of pineapple fiber. Bands that returned no hits in the NCBI BLAST search can be deposited as new sequences in the GenBank. Therefore, the SCAr markers, once validated, can be useful in pineapple genetic breeding programs worldwide by using molecular marker assisted selection for fiber resistance, which could subsidize the development of more promising genotypes for industrial use and contribute to the sustainability of this new production sector.


INTRODUCTION
Pineapples belong to the family Bromeliaceae, which comprises approximately 76 genera and 3.553 species (Butcher and gouda 2018). It is one of the most consumed fruits worldwide, with global production of 25.8 million tons and cultivated area of 1.04 million hectares in 2016 (Faostat 2018). In recent years, there has been a growing interest in use of materials from renewable and sustainable resources, including plant fibers instead of synthetic fibers to reinforce composites.
In civil construction, the use of plant fibers in cement matrices is notable as an excellent alternative because the fibers minimize the fragility of the matrix (Aziz et al. 2005, Brandt 2008). Fibers from some taxa of Ananas are already in use, such as 'Curauá' (Ananas comosus var. erectifolius), but there is still a large number of pineapple cultivars which have never been evaluated, especially as a reinforcement in composite matrices.
Due to growing demand, 'Curauá' hybrids developed for ornamental purposes (Souza et al. 2014) have become potential candidates for fiber quality studies . These hybrids have shown promising results related to use in polymeric composites for automotive products (Sena Neto et al. 2013, 2015. However, little has been done in relation to using these fibers in cement matrices, which could be the basis of more sustainable civil construction products.
The Pineapple germplasm collection at embrapa, located in Cruz das Almas, Bahia, with more than 700 conserved accessions of Ananas spp., is a potential source for these studies. However, phenotypic characterization for fiber quality is extremely laborious and expensive, making the characterization of a large number of accessions highly unfeasible.
Thus, given the difficulty to obtain the phenotypic characterization for fiber quality in pineapple, nonparametric tests becomes an interesting approach to try to identify associations between marker fragments and phenotypic variables to be used in marker assisted selection (MAS) in pineapple breeding programs aiming to develop pineapple with fiber quality for the purpose of uses in composites.
Molecular biology techniques can contribute to hasten genetic breeding programs by using molecular markers that detect genetic variability (or genetic polymorphisms) of DNA sequences (Arif et al. 2010). ISSr markers (Inter Simple Sequence Repeats) are based on PCR (polymerase chain reaction), do not require prior knowledge of DNA sequences of the target species, and produce reproducible, highly polymorphic fragments at a low cost. These markers are broadly used in genetic diversity and variability studies (gottardi et al. 2001(gottardi et al. , Barth et al. 2002(gottardi et al. , gonzález et al. 2002.
Therefore, the objectives of the present study were to evaluate the genetic diversity between pineapple varieties already under cultivation and hybrids, and given the phenotypic characterization, try to identify possible associations between bands of ISSR markers and fiber quality variables for use in cement matrices using nonparametric statistics. Once validated, these markers can be used in MAS and hasten the pineapple genetic breeding program aiming fiber quality for use in the civil construction industry given that the phenotypic evaluation is extremely difficult, costly and time-consuming.

geNeTIC MATerIAL
This work evaluated pineapple varieties already under cultivation and hybrids with cultivation potential (close to being released as varieties), since one of the premises for using the fibers in cement matrices is production volume. eleven cultivars, hybrids and their background are detailed in Table I. The 'Curauá' variety was used as reference due to its excellent properties of fiber quality (Leão et al. 2009). The cultivated varieties and the hybrids were obtained from field crosses and mother plant nurseries at embrapa, located in Cruz das Almas, Bahia.

DNA eXTrACTION
The DNA was extracted in the Advanced Biology Center (NBA, Núcleo de Biologia Avançada) at embrapa using the CTAB (Cetyltrimethylammonium bromide) method proposed by Doyle and Doyle (1990).
Young leaves of the 11 pineapple genotypes were collected for DNA extraction. Briefly, approximately 300 mg of leaf tissue was macerated in liquid nitrogen and transferred to 2.0 (mL) microtubes. Afterwards, an extraction buffer was added (1.  [v/v] β-mercaptoethanol), which was preheated to 65 °C, and the tubes homogenized for 5 min. Subsequently, the samples were incubated in a water bath at 65 °C for 45 min and then homogenized for 10 min.
Afterwards, chloroform:alcohol-isoamyl (24:1) was added to the samples, which were then centrifuged at 10.000 x g for 10 min (step conducted twice), followed by the addition of ice cold isopropyl alcohol to the supernatant. The material was placed in a freezer (-20 °C) for 24 h and then centrifuged at 10.000 x g for 10 min. Following this, the DNA was washed with 70% ethanol and the pellet resuspended in TE (10 [mM] Tris-HC, pH 8.0, 1[mM] EDTA) plus ribonuclease (10 mg.mL -1 rNase), placed in the oven at 37 °C, and 3.0 (M) of sodium acetate was added. The material was then centrifuged for 20 s at 3.000 x g. Ice cold absolute ethanol was added to the supernatant and the samples centrifuged at 10.000 x g. The DNA as washed two more times with 70% ethanol and resuspended in nuclease free water. Finally, the extracted DNA was stored at -20 °C. The quality and quantity of the DNA was evaluated with a comparative analysis of the samples in 1% agarose gel using markers of known molecular weight.

DNA AMPLIFICATION USINg ISSr MArKerS
The DNA was amplified using 16 ISSr primers (Table II). The PCr reactions were completed to a final volume of 15 (µL) with the following reagents: 1x buffer, 2.5 (mM) MgCl 2 , 0.2 (mM) dNTPs, 0.2 (µM) of each primer, 1 U of Taq Polimerase (LBM), and 25 (ng) of genomic DNA. The amplifications were carried out in the Applied Biosystems thermocycler (model Veriti® 96-Wells) using the following program: an initial denaturation step at 94 °C for 3 min, followed by 45 cycles of denaturation at 94 °C for 45 s for each one, 45 s for primer annealing at 48 °C, and an extension at 72 °C for 1 min. This was followed by a final extension at 72 °C for 7 min and 14 °C ∞. The amplified products were separated using electrophoresis in 2.5% agarose gel, 90 V, in TBe 0.5X buffer, and stained with ethidium bromide (0.5 µg.mL -1 ). The amplified fragments were viewed under UV light and photographed using a photodocumentation system. Fragment sizes were determined based on a 1 kb molecular weight marker (Qiagen).

MOLeCULAr DATA ANALYSIS
Fragments from the ISSr markers, for both the diversity and association studies, were evaluated for absence (0) and presence (1) of bands. The electrophoretic profile of primer ISSR-30 for the 11 pineapple genotypes evaluated is shown in Figure 1.
Based on the (0) and (1) data, a genetic dissimilarity matrix of the genotypes was generated, which was calculated with the Jaccard coefficient (complement of the Jaccard index = 1-c). Cluster analysis was conducted using the UPgMA (Unweighted Pair-group Method with Arithmetic Mean) method and the packages, qlcmatrix, ade4 and Nbclust, of the r software (Development Core Team 2016) and the threshold defined according to the parameters described set by Mingoti (2005).
Cluster validation was carried out by calculating the cophenetic correlation coefficient (CCC) (Sokal and rolf 1962).

ANALYSIS OF PHeNOTYPIC CHArACTerISTICS
Fiber quality data were obtained from previous tests (Silva 2016). The main variables measured are related to the mechanical properties of resistance to traction (reS), elasticity module (eLS), and water absorption (ABS), Crystallinity index (IC), fiber length (COM) and diameter of fiber (DIM).
As to the phenotypic characteristics, for all data regarding water absorption, the fibers were cut into 30 mm where approximately 1g of fiber weighed. These fibers were immersed in recipients with water and submitted to weighing and the first read was carried out after 3 h of immersion and at every 24 h during 5 consecutive days, according to Lopes et al. (2010). Four replicates were defined for each variety and the absorption calculated. Also the traction assay of fibers was carried out with 15 replicates for obtainment of reS and eLS values.
A combined analysis of the molecular (qualitative) and phenotypic (quantitative) data was also carried out to determine the genetic variability based on the Gower (1971) algorithm; groups were validated according to the CCC (Sokal and Rolf 1962). The qualitative data analyzed using the cluster, Nbclust, ade4, agricolae and factoextra packages in the r program (Development Core Team 2016) and the gower algorithm analyzed using the statmatch, cluster, ade4, Nbclust, clustOfvar and qlcmatrix packages in the R software (Development Core Team 2016). The Principal Components Analysis (PCA) analysis using the phenotypic data was calculated using the Factominer package in the r software (Development Core Team 2016).

NONPArAMeTrIC ANALYSIS
Since both genotypic and phenotypic data were available, the possibility of finding any correlation between a band and a phenotypic variable, although remote, was tested. The correlation analysis was carried out for 3 fiber quality variables (RES, eLS and ABS) and 131 bands from 16 ISSr molecular markers. Spearman's correlation and the nonparametric Kruskal-Wallis test were obtained by the statistical software SAS (SAS Institute 2010) using the following commands: proc, corr spearman and proc npar1way and anova, respectively. The nonparametric test, was applied since it does not require a normal distribution of data and therefore, was most suitable for our data.

SeQUeNCINg BANDS TO DeSIgN SCAr MArKerS
After the correlation analysis of the data using nonparametric tests, some relatively high correlations were found between bands and the phenotypic characteristics of interest for fiber quality for use in cement composites/civil construction industry. However, although all 11 bands were excised from the agarose gel, only 2 presented PCr with a single band. The  (2019) 91(3) e20180749 6 | 13 single bands were purified using the BIONEER Invitrogen Accuprep gel purification kit (http:// us.bioneer.com/Protocol/AccuPrep%20gel%20 Purification%20Kit.pdf) and sequenced by capillary electrophoresis -ABI3730 sequencer, with the POP7 polymer and BigDye v3.1, by the Myleus Biotechnology company (www.myleus. com).

An Acad Bras Cienc
After obtaining the FASTA files with the contigs from the sequencing, the SCAR primers were designed using the PrIMer3PLUS software (http://www.bioinformatics.nl/cgibin/primer3plus/ primer3plus.cgi).

RESULTS AND DISCUSSION
Studies using plant fibers to reinforce composites have been conducted using different plants, such as sisal, jute, coconut, soybeans, and bananas (Mohanty et al. 2000, Mishra et al. 2004, Liu et al. 2005, Cao et al. 2006, romanzini et al. 2013, Zhu et al. 2013).
geNeTIC DIVerSITY BASeD ON THe MOLeCULAr DATA The 16 ISSr primers tested on the 11 genotypes generated 131 polymorphic bands. The number of bands per primer varied from 2 (ISSr 03) to 18 (ISSr 53), with an average of 8 polymorphic bands per primer.
The dendrogram, generated by the Jaccard index using the genetic dissimilarity matrix of the genotypes, contained 2 groups (Fig. 2).  Vanijajiva (2012) evaluated the diversity of 15 pineapple accessions in Thailand, using 4 ISSr markers, and obtained 27 polymorphic bands and a genetic distance ranging from 0.32 to 0.97. These studies reveal the existence of genetic variability to be explored among pineapple varieties using ISSr markers.
geNeTIC DIVerSITY BASeD ON PHeNOTYPIC CHArACTerISTICS Measuring phenotypic characteristics for fiber quality in pineapple is one of the most excruciating and time consuming efforts. generally it is unfeasible to evaluate more than 6 characteristics in 10 genotypes in one year. The data used to evaluate fiber quality are listed in Table III The genetic distance values ranged from 0.23, between 'BrS Vitória' and 'BrS Ajubá' to 3.06, between FIB-gOr and 'Curauá'. In principle, the genotypes, which have 'Curauá' as one of their parents, should be considered good candidates for having interesting fiber characteristics to be exploited by the cement composite civil construction industry.

PCA AND gOWer ALgOrITHM
Since we had the phenotypic data for 6 characteristics for fiber quality, PCA analysis was carried out to identify the most important variables; those which were contributing more to the phenotypic variation among the genotypes (Fig. 4).  In our work, the PCA showed that PC1 and PC2, in total, correspond to approximately 70% of the phenotypic variation with contributions of 49.14 and 19.29%, respectively (Fig. 3). The variables which most contributed were eLS, IC,reS and DIM,contributing 29.11,20.14,19.69 and 19.51%, respectively. Variables reS, IC and eLS, showed high and positive correlations, and are directly related with fiber quality, all contributing to desired mechanical properties. regarding ABS, DIM and COM, these contributed with negative correlations, confirming that when the fiber absorbs more water, it suffers dimensional variations, which is a characteristic which hinders the contact of the fiber with the matrix and for a fiber to be considered of quality, it must have low ABS (Fig. 3). The COM variable contributed least to the phenotypic variation between the genotypes: 4.3%, and therefore, was excluded from the gower algorithm analysis.
The UPgMA grouping method was based on the gower algorithm (gower 1971) (Fig. 5). The number of groups was calculated by the pseudo t 2 index, proposed by Duda and Hart (1973) in the "NbClust" package (Charrad et al. 2013) of the r software program (r Development Core Team 2016). The genetic distance varied from 0.16 to 0.44. Dissimilarity was lowest between 'BrS Anauê' and FIB-POT, probably because they originated from the same cross. genetic dissimilarity was highest between 'BrS Imperial' and 'BrS Boyrá', as well as 'BrS Imperial' and FIB-gOr. Souza et al. (2017) conducted a combined group analysis using the UPgMA method and the gower algorithm and thermal and mechanical data for fiber quality of pineapple for use in the automobile industry, which resulted in 4 groups with genetic distance values that varied from 0.14 to 0.50.
In our study, the combined group analysis (Fig.  5) better separated the genotypes in comparison to the individual analyses for each type of variable: molecular ( Fig. 1) and phenotypic (Fig. 2). In previous studies, cluster analyses have helped identify promising genotypes for use in genetic improvement programs by providing information about the potential of existing variability (Souza et al. 2012.
The cophenetic correlation coefficient values were 0.86, 0.84 and 0.85 for the data evaluated using the Jaccard index, mean euclidean distance, and gower distance (gower 1971), respectively. According to Sokal and rohlf (1962), values above 0.80 indicate a good fit between the distance matrices and derivatives of the graphic distances, giving credibility to the groups formed using the UPgMA method. The results of our work are in agreement with those of Souza et al. (2017) and Sena Neto et. al. (2013, 2015, who found significant values for thermal and mechanical properties of fibers from different pineapple genotypes, which demonstrates the possibility of using these fibers to reinforce composites.

ANALYSIS VIA NONPArAMeTrIC TeSTS
Nonparametric tests, although limited, are used when there is no normal distribution of data and therefore, were chosen to increase information using molecular and phenotypic data already obtained, especially due to the difficulty in obtaining phenotypic data for fiber quality for a large number of pineapple genotypes.
In association studies, it is important to evaluate many individuals of a population of crosses and to carefully measure phenotypic characteristics of interest. For pineapples, however, measuring phenotypic characteristics of fiber quality is unfeasible due to the difficulty and high cost of the tests that make it impossible to evaluate numerous individuals at one single time. Therefore, this possible association was analyzed using nonparametric methods, Spearman's correlation and the Kruskal-Wallis test, as successfully done by Souza et al. (2017).
In our study, we did not expect to obtain any correlations between markers and phenotypic characteristics of interest, mainly due to the quantitative nature of the variables for fiber quality. However, interesting results were found that can be explored to improve the culture, with the goal of identify promising materials that can be used in the composite in civil construction industry.
For the evaluation of the association between the 131 polymorphic bands and 3 phenotypic variables, 11 high (+ and -) and significant correlations, which varied from -0.72457 * to 0.74903 ** for fiber quality (Table IV), were encountered.
One plant fiber characteristic is the capacity to absorb high amounts of water, which results in high dimensional variation. This is one of the main phenomena that can influence the behavior of fibers when they are used in cement composites, because the natural variation in humidity can reduce the diameter of the fibers causing them to lose contact with the matrix to which they are adhered. For this variable, there was a correlation with 4 bands for different ISSR markers that varied from -0.64578 * to 0.72457 * . resistance to traction is the maximum tension of the fiber and is a determinant variable to measure high performance as a mechanical reinforcement. In relation to this variable, bands M10 of the ISSr 24 marker, M8 of the ISSr 53 marker, M7 of the ISSr 84 marker, and M3 ISSr 95 marker had correlation values from -0.63960 * to 0.74903 ** .
The elasticity module presented 3 bands, M4 of the ISSr 81 marker, M5 of the ISSr 53 marker and M13 of the ISSr 84 marker, which had correlation values of 0.72175 ** , -0.6931 * and -0.72457 * , respectively. The highest correlation found between a band and characteristic of interest for fiber quality was band M3 of marker ISSr 95 and resistance to traction, with a correlation value of 0.74903 ** . Two Figure 5 -Dendrogram of the 11 pineapple genotypes based on the gower algortithm (gower 1971): combined molecular and phenotypic data for fiber quality. The UPGMA method was used to define the groups and cutoff points were defined according to the pseudo t 2 parameters (Duda and Hart 1973). other high correlations were found for the ISSr 84 marker, band M7 and resistance to traction and band M23 and elasticity module, which both had correlation values of -0.72457 * .
Souza et al. (2017) conducted the first work related to the use of nonparametric methods to evaluate fiber quality of pineapple and obtained results similar to the present study. The study selected 11 bands based on the correlation of 17 ISSR molecular markers with four fiber quality variables and obtained high correlations (0.63434 * to 0.76169 ** ). According to the same author, nonparametric tests are a viable alternative for this type of evaluation, since pineapple fiber has superior mechanical properties that should be explored as a reinforcement of polymeric (Sena Neto et al. 2015) and cement composites.
In our work, the correlation between bands of DNA markers and phenotypic characteristic was evaluated based on three fiber quality variables (reS, eLS, and ABS) and ISSr markers. The three variables evaluated are of fundamental importance to fiber quality studies and the evaluation of fibers for use in cement composites. Fibers for this purpose need to have low ABS and high reS and ELS. Two highly correlated bands were sequenced (M6 ISSR 72 and M13 ISSR 84; 4 -PE x SC-73; 10-'FIB-GOR') and, from this, 5 SCAR (Sequence Characterized Amplified Regions) primers were designed for each ISSr marker, which will be validated in accessions of the Pineapple collection at embrapa (Table V).
The contigs were blasted against the NCBI website (BLAST:< https://blast.ncbi.nlm.nih. gov/Blast.cgi?PAge_TYPe=BlastSearch>) but they were classified as no hits (i.e., there were no similar sequences in this database). However, these sequences could be exploited in the future by validation of the primers. What is interesting about this study is the combination of markers that can be used giving more credibility when choosing plants that provide interesting materials for use as cement composites. Furthermore, the more bands that are sequenced and validated, the higher the credibility of the MAS.
The fact that the sequenced bands had an output of "no hits" in the BLAST (Basic Local Alignment Search Tool) search, opens new opportunities in regard to validation, because they can be deposited as new sequences in the NCBI GenBank if they are associated with fiber quality for the purpose proposed in this work. Therefore, this work will significantly contribute to provide markers that can be used in MAS for cement composites in pineapples for use in the civil construction industry. Therefore, the SCAr markers, once validated, can be useful in pineapple genetic breeding programs worldwide by using MAS selection for fiber quality, which could subsidize future improvement in the development for more promising genotypes for industrial use and contribute to the sustainability of this new production sector.