Genetic diversity of coconut germplasm native to South Kalimantan, Indonesia: a molecular study 1

Coconut ( Cocos nucifera L.) is one of the most important tree crops in the world, especially in the tropics. This study aimed to determine the genetic diversity of coconut germplasm native to South Kalimantan, Indonesia, using the rbc L marker. Nine coconut samples, being eight natively collected from this region and one as an outgroup (obtained from the GenBank database), were used in the study. According to the rbc L marker, the coconut germplasm native to South Kalimantan has a relatively high diversity, with a nucleotide diversity (π) of 0.51. The level of diversity is strongly correlated with the mutation present in the observed region, rbc L. The phylogenetic analysis showed that the coconut germplasm has a unique relationship, where the ‘Dalam’ cultivar is the closest to three other dwarf coconuts, i.e., ‘Genjah Kuning 1’, ‘Genjah Kuning 3’ and ‘Wulung’.


INTRODUCTION
Coconut (Cocos nucifera L.), commonly known as the "tree of life" in many nations of the tropics, is one of the most important tree crops in the world (Huang et al. 2013, McKeon et al. 2016). It has currently become a promising export commodity for several countries, e.g., Indonesia and several other Southeast Asian countries (McKeon et al. 2016). In 2020, Indonesia, the Philippines and India were the top three coconut producers worldwide, accounting for 75 % of the global production. Specifically, Indonesia is the largest coconut-producing country, with 17.13 metric tons, or 30 % of the total world production, followed by the Philippines with 14.77 metric tons (26.4 %) and India with 14.68 metric tons (17.0 %) (Burton 2021).
Nowadays, the major coconut growing countries focus on breeding superior varieties with high yield, high oil content, disease resistance and tolerance to biotic stresses (Batugal et al. 2009). Hence, to support a coconut-breeding program, it is urgent to characterize the genetic diversity of this germplasm. In general, morphological markers are common in determining the genetic diversity of coconut, such as seed germination time, fruit component analysis, floral biology, pollination behavior and foliar traits (Maurice et al. 2015, Zhang et al. 2021. Biochemical parameters, like foliar polyphenols, proteins and isozymes, have also been applied in many studies in coconut diversity analysis (Rajesh et al. 2015). However, these two markers show many drawbacks, e.g., being limited in number, showing modest levels of polymorphism, low heritability, and being influenced by the developmental stages of the plant and varied environmental factors (Rajesh et al. 2014, Rajesh et al. 2015, Le et al. 2020. Alternatively, molecular markers offer an attractive option, if compared to morphological and biochemical ones, in characterizing the genetic diversity of coconut. Several molecular markers, such as random amplified polymorphic DNA (RAPD) (Rajesh et al. 2014), simple sequence repeat (SSR) (Loiola et al. 2016, Geethanjali et al. 2018, Mahayu & Taryono 2019 and start codon targeted polymorphism (ScoT) (Rajesh et al. 2015), including amplified fragment length polymorphism (AFLP), restriction fragment length polymorphism (RFLP), inverse sequence-tagged repeat (ISTR) and inter simple sequence repeat (ISSR), have been employed to understand the diversity of this germplasm (Arunachalam 2012).
Further, DNA barcoding is a widely used, valuable and effective tool that enables rapid and accurate identification of plant germplasm (Li et al. 2014), including coconut (Le et al. 2020). This technique uses a short DNA sequence from a standard and agreed-upon position in the plastid genome (Li et al. 2014). According to Jeanson et al. (2011), the rbcL is one of the selected DNA barcoding markers used for this purpose. While the rbcL has certain limitations, like evolving slowly and showing the lowest divergence, this marker shows certain advantages. For example, it has high universality, is easy to amplify and has discriminatory power at the family and genus levels (Li et al. 2014, Kang et al. 2017. Presently, rbcL has been widely used for phylogenetic analysis within families and subclasses of angiosperms and even among different groups of seed plants (Kang et al. 2017).
This study aimed to determine the genetic diversity of coconut germplasm native to South Kalimantan, Indonesia, using the rbcL marker.

MATERIAL AND METHODS
Nine coconut samples (Table 1), being eight natively collected from four locations or regencies of South Kalimantan, Indonesia (Figure 1), and one as an outgroup obtained from the GenBank database (NCBI 2021), were used in this study. Most samples were prepared molecularly at the University of Lambung Mangkurat, Indonesia, from May to September 2021.
Sample preparation includes several activities, i.e., DNA extraction, quantification, amplification, electrophoresis, purification and sequencing. The DNA was extracted using young coconut leaf samples, with a commercial DNA extraction kit (Geneaid Biotech Ltd., Taiwan). The DNA quantification was carried out by a NanoVue UV-VIS spectrophotometer (GE Healthcare, UK) and the DNA amplification using the rbcL primers, i.e., rbcL-F (5'-ATGTCACCACAAACAGAGACTAAAGC-3') and rbcL-R (5'-GTAAAATCAAGTCCACCRCG-3') (Gholave et al. 2017 Data analysis began with reconstructing the consensus sequence target, rbcL, by the MEGA-X software (Kumar et al. 2018). In this stage, all rbcL sequences were checked, refined and assembled manually. After that, multiple alignments were performed for all rbcL sequences by Clustal Omega (Sievers et al. 2020). Third, the phylogenetic analysis of those aligned sequences was performed using the maximum likelihood, neighbor joining and maximum parsimony methods. The internal nodes of all phylogenetic trees were evaluated by the bootstrapping analysis, with 1,000 replicates (Lemey et al. 2009). Finally, the genetic diversity and divergences were determined using Kimura 2-Parameter (K2) distances (Kumar et al. 2018).

RESULTS AND DISCUSSION
Based on the rbcL sequence, the coconut germplasm native to South Kalimantan, Indonesia, showed a relatively high diversity, with a nucleotide diversity (π) of 0.51 (Table 2) Huang et al. (2016), this level of genetic diversity is often affected by several factors, such as the breeding system, seed dormancy and dispersal mechanism, and geographic variation and range. Life span and other life-history traits, natural selection and the history of populations are other factors that affect the genetic diversity level. Environmental factors are also often responsible for the patterns of its observed diversity at small spatial scales (Huang et al. 2016).
Referred to Lloyd et al. (2016), a high genetic diversity level of the population has a significant impact, both on conservation and breeding programs. Conceptually, it is the main factor in forming a baseline population for natural selection and the  evolutionary process (Govindaraj et al. 2015). Indeed, it is an essential aspect of the evolutionary trajectory or a precondition for future adaptive changes. Hence, only present-day populations need a high genetic diversity to adapt rapidly (Lloyd et al. 2016).
However, despite the current high genetic diversity, the rapid rate of evolutionary change can exceed this species' adaptation rate. Consequently, future studies are needed to understand how the loss of genetic diversity will affect the ability of future generations to continue to cope with environmental changes in populations (Lloyd et al. 2016). Hence, understanding genetic diversity is very urgent in increasing the effectiveness and efficiency of conservation programs, especially for endangered species. This is because some aspects of conservation biology, such as loss of genetic diversity, are only dealt with in detailed population genetic studies (Luan et al. 2006). For plant breeding, a level of genetic diversity becomes more urgent in the context of climate change (Govindaraj et al. 2015). Plant breeders use this aspect in developing new and improved cultivars with desirable traits, both associated with various biotic and abiotic stress tolerances (Swarup et al. 2021). It is also essential in generating some agricultural phenomena, like transgressive segregation and heterosis. Diverse hybrid lines are necessary for defect correction of commercial cultivars and the development of new ones. Thus, the identification of cultivars, broadening of diversity and their subsequent utilization are the main goals of future crop improvement or breeding programs (Bhandari et al. 2017). Further, at a molecular level, a high level of genetic diversity is strongly correlated with the mutation that occurs in the observed region. In this study, polymorphic sites (S) had 577 nucleotides (Table 2). According to Frankham et al. (2004), genetic diversity and mutation are two things that are interrelated or cannot be separated from each other. Indeed, the main factor for the emergence of its diversity is mutation (Govindaraj et al. 2015). In other words, the higher the mutation rate, the higher the genetic diversity that it presents. In general, the most common mutations in rbcL sequences are substitutions, including transitions and transversions. Insertions/deletions are sometimes present in the region and reveal a conservative nucleotide substitution rate (Clegg 1993).
In this study, the bias value of transition/ transversion was 0.66 (Table 2). Referring to Stoltzfus & Norris (2016), this parameter is reflected in the different ratios that affect a complex function of sequence divergence degree. Generally, transitions are more often present in most sequences than transversions (Aloqalaa et al. 2019). Hence, it is common in molecular evolution (Stoltzfus & Norris 2016). This sometimes tends to cause changes in the gene expression or biochemical properties of protein products (Keller et al. 2007).
The Tajima neutrality test (Table 2) indicates that the native coconuts studied were a population that had undergone a balanced selection, and there were no rare alleles because the rbcL sequence had D > 0. Conceptually, the purpose of the Tajima's test is to identify the region that does not fit the neutral theory model at equilibrium between mutation and genetic drift. In addition, the purpose of the Tajima's D test is to distinguish between a DNA sequence evolving randomly ("neutrally") and one evolving under a non-random process, including directional selection or balancing selection, demographic expansion, or contraction, genetic hitchhiking, or introgression. However, to prove this, further studies that are more accurate and comprehensive are necessary (Korneliussen et al. 2013).
The phylogenetic analysis showed that the coconut germplasm native to South Kalimantan has a unique relationship. In general, this germplasm was divided into three clades, based on the maximum likelihood, neighbor joining and maximum parsimony approaches (Figures 2-4). For the maximum likelihood and maximum parsimony (Figures 2 and  4), the 'Dalam' cultivar (which belongs to the tall type of coconut) was grouped into Clade 1, with 'Genjah Kuning 1' and 'Genjah Kuning 3', and 'Genjah Wulung'. Meanwhile, 'Genjah Salak 1' and 'Genjah Salak 3' were grouped into Clade 2. In this case, Clade 3 consisted of two cultivars, i.e., 'Genjah Salak 2' and 'Genjah Kuning 2'. Following the neighbor joining (Figure 3), the sample of voucher 2153 from Puerto Rico (an outgroup) was clustered with (or had the closest relationship to) 'Genjah Kuning 2'.
Interestingly, following Figure 2, the native coconuts from this region are grouped into paraphyletic and monophyletic groups. In this case, Clades 1 and 2 are monophyletic, whereas Clade 3 is paraphyletic. Initially, all clades have the same common ancestor, with a characteristic C base in the studied region. However, specifically for Clade 3, it changed genetically or molecularly to base T. The bootstrapping analysis confirmed that the grouping of coconut germplasm did not change in general, because the bootstrap value shown on the phylogram was more than 50 %, except for the phylogenetic tree generated by neighbor joining, especially between local cultivars with outgroups.
In brief, information on phylogenetic relationships is necessary for researchers, conservationists and    breeders (Flint-Garcia 2013). All use the information in predicting the genetic diversity of the offspring when individuals cross or hybridize (Acquaah 2007). In addition, this information may be applied to inferring species and their evolutionary history and helping to analyze the gene flow, genetic differentiation and species delimitation (Fernández-García 2017). CONCLUSIONS 1. Following the rbcL marker, the coconut germplasm native to South Kalimantan, Indonesia, has a relatively high diversity, shown by a nucleotide diversity (π) of 0.51. The level of diversity is strongly correlated with the mutation present in the observed region, rbcL; 2. The phylogenetic analysis showed that the coconut germplasm native to this region has a unique relationship. In this case, the 'Dalam' cultivar has the closest relationship with three other dwarf coconuts, i.e., 'Genjah Kuning 1', 'Genjah Kuning 3' and 'Wulung'.