Exploring Parthenium weed biotypes by chloroplast DNA barcode analysis

Background:Parthenium weed (Parthenium hysterophorus L.) is an invasive weed that has invaded vast regions of Pakistan in a relatively very short period of a decade or two, threatening the crop fields of the agrarian fed country. Parthenium hysterophorus L. is native of central South America and Gulf of Mexico, has now turned out to be a weed of global significance due to its alarming invasions and profuse spread in approximately all parts of the world. Its invasion is probably due to the contamination of its seeds in the imported grains from other countries of the world.Objective:During comprehensive sampling from Pakistan and Australia, it was observed that parthenium weed accessions exhibited several distinct morphological features present at different geographical regions. Therefore this study focuses on the use of plastid DNA barcodes (psbA-trnH) to evaluate the extent of variations in nucleotide sequences between the parthenium weed sampled accessions.Methods:The variability or genetic diversity was evaluated through sequencing of the amplified products and data was subjected to phylogenetic analysis in Molecular Evolutionary Genetic Analysis (MEGA; version 6.06) software.Results:In Maximum Likelihood tree, mainly two clades with three subdivisions are evident which showed increased heterogenity. The results of sequence based markers showed 12 haplotypes of P.hysterophorus populations (having two parsimony informative sites) with 10 indels and a few SNPs (single nucleotide polymorphisms).


INTRODUCTION
Parthenium hysterophorus L. commonly named parthenium weed is described as the most serious weed of 21 st century that has attained a major weed status in a relatively shorter time in most of the tropical and subtropical parts of the world due to its invasive nature. Parthenium hysterophorus is notorious due to its adverse and detrimental effects on human beings causing allergic rhinitis, hay fever, severe asthma and contact dermatitis. Native to Central America, it has now achieved substantial invasion and has extended to eastern and Southern Africa, Southern USA, South Asia, the South China Region, Vietnam, Australia and the South Pacific (Navie et al., 1996a;Tamado and Milberg, 2000). In Pakistan, the extent of its spread was demonstrated by an eye opening survey done on the local flora of Islamabad, federal capital of Pakistan and its neighboring city Rawalpindi, by Shabbir and Bajwa (2007) who reported Cannabis sativa, another exotic invasive species, as the main dominant weed genus with Parthenium hysterophorus L., Senna occidentalis L., Malvastrum coromandelianum L. Garcke and Lantana camara L. Whereas, just after the two years Parthenium hysterophorus has achieved the major weed status and recent situation is that no location in Islamabad is without the menace of parthenium invasion (Hassan and Amin, 2009). Although various control measures have been in practice globally for the last two decades, success in management of this weed has not yet been achieved. In order to find out an eco-friendly, comprehensive and effective management option in Pakistan, genetic analysis of the weed using molecular markers can provide extremely useful information. The evolution in genetic diversities of invasive weeds is a major contributing factor in successful invasion of the invasive species (Prentis et al., 2008). Furthermore, this knowledge of invasion histories is extremely important for selecting improved and better management options of the exotic weeds in the invaded areas (Prentis et al., 2010). Keeping in view the importance of genetic analysis a preceding study by Jabeen et al. (2015) elucidated high genetic diversity and increased invasive potential of parthenium weed within and among populations based on ISSR fingerprinting markers which showed that high genetic diversity is related to parthenium weed intraspecific hybridizations.
The Parthenium hysterophorus infestation in Pakistan shows high degree of variation in the morphological features in varying environmental conditions at different geographical regions in Pakistan. The same findings were observed in Australia where two biotypes of parthenium weed are known to be present due to the two separate introductions from USA which are morphologically distinct in a way that the plants of biotype from central Queensland is more aggressive, taller, produce larger diaspores in contrast to the plants of second biotype which are limited in their spread (Navie et al., 1996b). Morphological variations are due to the changes found at DNA level. To elucidate the variations at molecular level the present study was done which is novel in Pakistan regarding the use of DNA barcodes to identify different biotypes/ haplotypes of Parthenium hysterophorus. Employing small, standard DNA regions or DNA barcodes to genomes is an authentic tool for plant identification with many advantages and potential over the conventional morphology based taxonomy (Okuyama and Kato, 2009;Costion et al., 2011;Yang et al., 2012;Phukuntsi et al., 2016). DNA barcodes are usually slow evolving regions that are used to address basal angiosperm relationships at high resolution and have successful applications in plant invasion biology (Comtet et al., 2015). It gave insight into biotypes/haplotypes that is the number of times of introductions of the weed (multiple introductions) and can be exhibited by the differences in the nucleotide sequences. Using DNA barcodes in recent researches can surmount several limitations in detection of morphologically cryptic plant species that show high degree of phenotypic plasticity in which a genotype may acquire diverse phenotypes in response to the changing environment (Valentini et al., 2008).
Earlier workers has recommended ITS sequence as a useful barcode due to its high discriminatory power than other barcodes but sequencing and amplifying whole ITS sequence in various species is a problem (Lee et al., 2011). However, ITS 2 region gained less attention due to the presence of multiple copies in the genome which usually illustrate high levels of differentiation. Moreover, ITS sequences exhibited more inter and intra specific distances than intra generic which may lead to inaccurate results (Yamaguchi et al., 2006). More recent researches proposed the idea of using whole plastid genome as a barcode in identifying the plants in comparison to single or two locus barcodes (Nock et al., 2011;Yang et al., 2014). However, this initiative is still in the waiting list of approval because of high sequencing costs and documented as restricted in neglecting specie boundaries. Some more recent proposals are being known to overcome the problems associated with DNA barcodes to increase the extent of differentiation in plants such as target enrichment and genome skipping (Hollingsworth et al., 2016). The present study was performed by using psbA-trnH intergenic region as DNA barcode to evaluate Parthenium weed samples of two pre identified biotypes of Australia and samples collected from Pakistan.

Sampling
Ten different samples of two discrete populations of Parthenium in Australia (Toogoolawah and Clermont) were collected in June 2011 from the School of Land and Food Sciences University of Queensland, Brisbane Australia. In Pakistan parthenium weed sampling was done from Khyber Pakhtun Khwa (KPK) and Punjab province. In KPK, twenty different Parthenium weed samples were taken from Peshawar valley (including Mardan, Charsadda, Swabi, Attock and Peshawar) in Feb-March 2012; the remaining samples were collected in March-May 2012 from wide eco-geographical areas in the Punjab Province of Pakistan. A survey of the southern Punjab region was done in May-June, 2012 but parthenium weed population could not be collected due to sparse and reduced growth of parthenium weed at that time of year. A detailed list of the sample sizes and sampling sites is given in Table 1. In addition, respective latitudes, longitudes (in degrees), elevations, mean annual temperatures and mean annual rainfall of the sampling sites were also determined.
For each population, either ten dried leaf samples in silica gel were collected or ten fresh leaf samples were collected, stored in zip lock bags, held in ice containing cooler and stored at -80 ᵒ C refrigerator until used for DNA extraction. In some situations where the fresh leaves were unavailable, weed seed samples were collected (Australian samples).

Morphological observations of the weed during sampling
During sample collection various features were found. Parthenium resembles to the introduced weeds of the same family for example its leaves resemble leaves of Artemisia sp. and the whole plant resembles the Chrysanthemum in its earlier/rosette stage. For these reasons the sampling was done with care at different locations identifying the P. hysterophorus properly according to the plant characteristics described by Tiwari et al. (2005). Photographs were taken with a digital camera. Two varying growth phases of P. hysterophorus were also noted along with the tallness of the weed plant, pinnately or bipinnately dissected leaves, length of peduncles, flowering prospective (number of flowers) and seed production potential.
Purified and amplified DNA fragments were sent to the Advanced Biosciences International (ABI) sequencing facility, Malaysia for sequencing. The sequences were retrieved from the company through electronic mail.

Sequence assembly, alignments and phylogenetic tree
The obtained sequences were first analyzed with the Chromas LITE (version 2.1) software program. Bio Edit program (Version 7.2.5) (http://www.mbio. ncsu.edu/bioedit/bioedit.html) and Contig assembly program were used for detailed sequence analysis and consensus sequences were obtained. For phylogenetic analysis of psbA-trnH, sequences were multiple aligned using CLUSTALW executed in MEGA 6.0.
Phylogenetic tree was constructed based on the maximum likelihood by Bootstrap re sampling (100 replications) method which was used to measure the reliability of individual nodes in each phylogenetic tree. Phylogenetic inferences were done and conclusion was drawn.

RESULTS AND DISCUSSION
During surveys the Parthenium plant was generally found with strong taproot (Figure 1F), erect branched stem and alternately arranged pinnately and bipinnately dissected leaves. Parthenium weed showed two growth phases during its life cycle; the vegetative/juvenile stage in which large, green, pinnate leaves form rosette like arrangement on the top of ground which was devoid of flowering ( Figure 1A). The larger lower leaves did not allow any other vegetation to grow beneath their carpet forming pattern and ultimately forms area of its own. The other stage was the adult stage which originated from the juvenile stage as an erect, profusely branched, octangular and hairy stem ( Figure 1B). The stem was woody in the mature plants. It was grooved along its length reaching to the height of 2 meters or above in the soils where water was in adequate amounts usually during the monsoon weather when the Khyber Pakhtun Khwa (KPK) and northern Punjab get maximum rainfall during the year. Each capitulum was whitish to creamish in color, pentangular, about 4-5 mm in diameter produced on the top of the plant on fork shaped peduncles which arised from the leaves ( Figure 1C and D). The heterogamous capitula consisted of 4-5 pistillate ray florets and staminate central disc florets which were more than 20 in number. The ray florets produced single basal seed. The weed seeds were minute, light in weight, wedge shaped and each flower usually produced 4-5 seeds about 1.5-2 mm in length with white scales. The microscopic view of the structure of the parthenium seed is presented in Figure 1E.
During sampling the parthenium samples showed morphological differences between the samples. The samples collected from KPK and northern Punjab showed longer stems with larger number of leaves, branches and flowers. These flowers were found to be produced on elongated peduncles with bracts ( Figure 2B) protruding out of the plant (Figure 2A) to get more advantage of the amphiphillous pollination. As the number of flowers was more therefore these plants were hypothesized to produce larger number of seeds which also contributed to its high invasion potential and larger infestation ( Figure 2C) all along the Peshawar valley and northern and central Punjab Province.
In contrary, at Sahiwal and Okara, the regions in the south of Lahore division (towards the southern Punjab region) another biotype of Parthenium weed was found that possessed distinct morphology. In these areas, the Parthenium hysterophorus infestation was found to be less invasive as compared to the aforementioned biotype. These plants were not so much long, have shorter stems and therefore less number of fleshy green alternately arranged broad leaves with inflorescence producing on the short peduncles with reduced or no bracts ( Figure 3A and B) possibly capable of only self-pollination. Only a few capitula/flowers were noticed to be found in the central top portion of the plant and hence, these weed plants produced lesser number of seeds and have longer generation time. This biotype was confined only at the specified area and thus exhibited weaker potential to spread. The farmers of the area were investigated about the cultivation of crops there which was found to be imported Australian wheat that was being cultivated in that region.
The former biotype possess increased invasive potential which was found to occur mostly in all the regions of north eastern Pakistan and exhibited distinct morphological features which not only enhance its maximum growth but are major contributors for its increased competitive ability in contrast to the other shorter biotype. Taller stems, more number of leaves and branches and particularly augment production of capitula on long peduncles and provide bright chances for efficient amphiphilous mode of pollination and faster growth rates. Faster growth rates are correspondingly high due to increased capitula production and consequently generation of large number of viable, vigorous seeds due to increased intra specific hybridizations and cross pollinations. This more aggressive biotype evidenced in our study can be related to the Clermont biotype of Australian parthenium population described by Navie et al. (1996b) and Hanif et al. (2012) in their respective studies.
The second biotype which although is invasive but not to the extent to the former one can be linked to the Toogoolawah Australian parthenium populations Navie et al. (1996b) and Hanif et al. (2012), exhibited peculiar morphological features which render this biotype less competitive and confined only to specified regions of Okara and Sahiwal. These weed plants have shorter stems, less number of branches and leaves with a few capitula produced on short peduncles only on the top portion of the plants. These biological structures provide more chances of selfpollination in contrast to the cross-pollination and high production of viable seeds in the aforementioned biotype.
These varying morphological features of the two different biotypes provided the urge and strong recommendation for the molecular genetic analysis of these putative two biotypes in Pakistan. This notion of performing molecular characterization is also of great importance because molecular data can be employed in the form of DNA sequences and thus easily used in comparing the living beings, demonstrating their process of evolution. In the present study, the collected Parthenium hysterophorus samples were investigated through the molecular characterization among Australian parthenium samples.

Molecular analysis: DNA barcoding
DNA extraction and PCR amplification of psbA-trnH chloroplast DNA -The extracted DNA of the parthenium leaf samples is shown in Figure 4. In PCR, amplification band of about 500 base pairs was obtained using specific psbA-trnH barcode primers on the extracted DNA of the 95 samples ( Figure 4). The purified PCR product after running on the agarose gel was sent to sequencing facility in Malaysia.

Sequencing and sequence phylogenetic analysis of the amplified fragments
After three weeks the nucleotide sequences were retrieved through electronic mail which were subsequently viewed and analyzed in Chromas LITE software. Consensus sequences were produced by CAP 3 (Contig Assembly Program) free online software program. The consensus sequences were transformed into FASTA format and saved in a Word notepad file. The subsequent multiple alignments of all the contigs were carried out by CLUSTALW in Molecular MEGA (version 6.06) sequence analysis program. Barcode differentiations were visualized using character based Maximum Likelihood (ML) tree method and then bootstrapped with 100 replicates. The ML tree can be visualized in the Figure 5. In ML tree phylogenetic analysis, mainly two clades with three subdivisions was obtained by the division of first clad into two. There was a high level of genetic heterogeneity in the first clad. In total, there were 12 haplotypes of P. hysterophorus samples have been obtained by editing the sequence data. The Parthenium hysterophorus voucher NL0609 was also included in the alignments. Two parsimony informative sites were found with 14 indel polymorphisms. In our study increased heterogeneity was found in the collected P. hysterophorus samples. The sequences can be provided on demand.  The collection of geographically distinct Parthenium populations took about more than a year and the respective sampling sites were documented using GPRS along respective latitudes, longitudes and altitudes.
Non coding sequences of the chloroplast genome are a principal source of data for population genetics and molecular systematic studies of plants however comparatively little is known about points of variation amongst diverse non coding areas of the chloroplast genome. Many researchers have focused on the use of combination of nuclear ITS sequences and different regions of the plastid genome protein coding regions as the potential plant barcodes (Wang et al., 2010;Srirama et al., 2010).
The utility of psbA-trnH intergenic spacer non coding region at lower taxonomic levels i.e. intra specific differentiation is used in the present study in some preliminary experiments with the weed which explored the variations efficiently after sequencing the region in 95 accessions of Parthenium weed. In our study we have not used the universal barcodes rbcL+matK as recommended by CBOL plant working group in (Peter et al., 2009) because in some studies it is shown that this barcode in combination or each one marker have less discriminatory power (less than 80%) in taxonomically complex groups such as Angiosperms (Zhu et al., 2014). Hao et al. (2012) ranked psbA-trnH chloroplast non coding spacer third most efficient region after psbD-trnT and psbJ-petA plastid spacers in discriminating plant taxa due to the high degree of divergence. The psbA-trnH region has proved to be efficient in determining heterogeneity at the nucleotide level and appears to have greater intra specific differences making it popular candidate for population studies. The psbA-trnH also proved to be superior in inferring the mean inter specific distance in plants and in examining the barcoding gap. Kress and Erikson (2007) in another similar study, regarded psbA-trnH spacer region as the most efficient single locus plant barcode due to its high level of genetic variability. Tang et al. (2016) documented the use of psbA-trnH intergenic region with four other barcodes (matK, rbcL, ITS and ITS2) to identify a Chinese medicinal herb genus Uncaria (Gouteng). Their study declared psbA-trnH as the best applicable barcode region with ITS2. Pang et al. (2012) has evaluated the use of psbA-trnH intergenic spacer in the meta analysis in which it is documented that frequent intra-population inversions are present in the spacer which could mislead to the increased measure of diversities among species. This region is although short i.e. 500 bp in length ranges from 119-1,000 base pairs in angiosperms (Cowan et al., 2006). Our results showed 12 haplotypes with two parsimony informative sites and 14 indel polymorphisms. Although psbA-trnH is more polymorphic than matK or rbcL, its use as core barcode is limited due to several reasons. Relatively short length of the spacer with extensive prevalence of inversions and insertions within species and sequencing errors due to long polystructures make it problematic. In addition, due to frequent indels it can also explore excessive polymorphisms (Dong et al., 2015). For such population genetic structure analysis the use of further gene regions as barcodes would prove to be more advantageous. However, in the present study more regions were not practiced due to the shortage of time and funding.
In present study a phylogenetic tree was constructed for the final consensus sequence data by the Maximum Likelihood method. This was constructed by using MEGA 6.0 software (Tamura et al., 2011). In figure 5, two clades of genetically diverse populations with 12 haplotypes were found along 2 parsimony informative sites in the sequence alignments in MEGA 6.06 software. Ten indels and a few SNPs (single nucleotide polymorphisms) were observed. The analysis in the present study showed that the weed has experienced multiple chances of introductions in the country. In the ML tree, all the sampled accessions were included with Australian and Pakistan Parthenium hysterophorus samples. There are two main divisions where the first division is further divided into two, ultimately with total three subdivisions appearing in the phylogenetic tree. The two main divisions might have showed the two varying biotypes of parthenium weed as evidenced in the Australia, while the first division is further grouped into two showing the high level of heterogeneity in this group which included most samples of the first biotype. The Australian samples were assembled in second group along the weed samples collected from okara and other southern Punjab samples are grouped. There are many reasons for high genetic heterogeneity in the parthenium weed sampled accessions. The multiple introductions of the weed in Pakistan plausibly have formulated the basis for genetically diverse populations which got the chance to prevail and spread in the country at alarming rate. Conversely, single introduction of genetically diverse founder populations is another possibility which may perhaps has caused increased genetic diversity in the P. hysterophorus populations.

CONCLUSIONS
The results of our analytical study showed that Parthenium hysterophorus got entry in the country through multiple introductions and has now established in vast agricultural areas which is an alarming situation.

CONTRIBUTIONS
TA: designed the project and provided financial assistance. Also provided scientific assistance in performing trials and write-up. RJ: PhD scholar; performed all necessary experiments. SA: cosupervisor; provided research facilities and guidance. WA: helped in write up.

ACKNOWLEDGEMENT
We are highly obliged to Dr. Wajiha Irum for improving the manuscript and Dr. Peter Prentis for providing his useful suggestions during the work. This work was supported by Higher Education Commission Pakistan as a split PhD program in Australia and Pakistan.