Evidence of heterozygosity and recombinant alleles in single cysts of Giardia duodenalis

Giardia duodenalis is divided into eight assemblages (named A to H). Isolates of assemblage A are divided into four sub-assemblages (AI, AII, AIII and AIV). While isolates of sub-assemblage AII are almost exclusively detected in human hosts, isolates of assemblage B are encountered in a multitude of animal hosts and humans. Here, we isolated single cysts of G. duodenalis from a human stool sample and found that one of them had overlaps of assemblage AII and B alleles and an unexpectedly high number of variants of the beta-giardin (Bg) and GLORF-C4 (OrfC4) alleles. In addition, one of the Bg alleles of that cyst had a fragment of sub-assemblage AII interspersed with fragments of assemblage B, thus indicating that this allele may be a recombinant between sequences A and B. Our results are unprecedented and put a check on the statement that different assemblages of G. duodenalis represent species with different host specificities.


Introduction
Giardia duodenalis (synonyms: G. intestinalis and G. lamblia) is an enteric parasite that affects a wide variety of domestic animals, wild animals and humans.Giardiasis is one of the most common causes of protozoal diarrhea worldwide and can occur in both temperate and tropical regions (RYAN & CACCIÒ, 2013).
In developed countries, the prevalence of giardiasis in humans ranges from 0.4 to 7.5%, but in developing countries this figure may be much higher and up to 30% of the population may be affected (FENG & XIAO, 2011).The symptoms of giardiasis vary widely, from asymptomatic infection to acute or chronic diarrhea accompanied by dehydration, nausea, vomiting and weight loss (CACCIÒ & RYAN, 2008).
Giardia duodenalis is a species complex and, although the isolates from the various hosts have no detectable morphological differences, remarkable variation exists at molecular level.Based on molecular characterization, the species is divided into at least eight groups, named assemblages A to H (CACCIÒ & RYAN, 2008).Giardia duodenalis can be typed by a number of PCR assays, including those targeted towards beta-giardin (Bg), glutamate dehydrogenase (Gdh), triose phosphate isomerase (Tpi) and GLORF-C4 (OrfC4) (MONIS et al., 1999;CACCIÒ et al., 2002;SULAIMAN et al., 2003;ALMEIDA et al., 2010).A list of the currently available markers for Giardia typing is given elsewhere (RYAN & CACCIÒ, 2013).
The assemblages A and B of Giardia duodenalis are the only ones able to infect humans and other mammals, and these are therefore considered to be potentially zoonotic (LALLE et al., 2005;SPRONG et al., 2009).Isolates of assemblage A can be divided into four "sub-assemblages" (AI, AII, AIII and AIV).While isolates of sub-assemblage AII are almost exclusively detected in human hosts, isolates of assemblage B are encountered in a multitude of animal hosts and humans (reviewed in RYAN & CACCIÒ, 2013).
However, the division of the Giardia duodenalis species complex into genetically distinct assemblages has been contested, as indicated by demonstration of allelic sequence divergence within isolates, coupled with identification of and intra-and inter-assemblage recombination (LASEK-NESSELQUIST et al., 2009).
Allelic sequence heterozygosity (ASH) in populations of assemblage B cysts has been widely reported, but ASH in assemblage A cysts much less so (GELANEW et al., 2007;LEBBAD et al., 2008).The presence of genetically different cysts in fecal samples and the finding of different levels of ASH in the genome of Giardia duodenalis assemblages A and B have been puzzling epidemiologists and complicating the understanding of the molecular epidemiology of giardiasis (MORRISON et al., 2007;LEBBAD et al., 2008;FRANZÉN et al., 2009;SPRONG et al., 2009).Unlike other organisms, Giardia duodenalis cannot be routinely cultured, and thus ASH, genetic recombination and mixed infections can be more readily differentiated through molecular analysis of a single Giardia cyst (ANKARKLEV et al., 2012).
Here, we focused on studying a single stool sample that was originally diagnosed as Giardia duodenalis sub-assemblage AII in order to investigate heterogeneous sequences.We isolated cysts of Giardia duodenalis by using the micromanipulation technique.Each cyst was identified by using multilocus analysis for detection genetic recombination and/or ASH.

Materials and Methods
A human stool sample containing Giardia cysts was subjected to the technique of centrifugation-floatation in sucrose (modified Sheather technique) (SHEATHER, 1923), as described elsewhere (SOUZA et al., 2007).Approximately 2 g of fecal material was mixed into 11 ml sucrose solution (d = 1.203 g/cm 3 ) and the homogenate was filtered thorough sterile gauze.The filtered solution was centrifuged in 15 ml conical tubes at 2000 x G for 10 minutes.Floating cysts were recovered from the surface and were individualized by using a micromanipulation technique (AGUIAR et al., 2015).Each cyst was recovered in a 200 µL PCR-tube and DNA extraction and the first PCR round (multiplex-PCR) was performed in the same tube.Four pairs of primers were used simultaneously for amplification of fragments of the genes Gdh, Tpi, OrfC4 and Bg.The PCR mix added to each microtube was (containing 11 µL of TE and digested cyst): 29.2 µL of ultrapure water; 1 µL of dNTP (10 mM of each nucleotide); 5 µL of 10X PCR buffer (Platinum  Taq Polymerase, Invitrogen); 1 µL of each primer (sense and antisense) (10 pmol/µL); 1.5 µL of MgCl2 (50 mM); and 0.3 µL Taq DNA Polymerase (5 U/µL) (Platinum  Taq Polymerase, Invitrogen).Following this, four nested PCRs were performed using samples of 5 µL of the product obtained from multiplex PCR.The entire protocol of multilocus amplification of genomic DNA was exactly as that described previously (AGUIAR et al., 2015).
The PCR products were observed through electrophoresis on 2% agarose gel.Bands were excised from the gel and purified by using GE Healthcare kits (Illustra, GFX PCR DNA and GEL band purification kit), in accordance with the manufacturer's instructions.The products were sequenced bi-directionally four times using the ABI PRISM BigDye  Terminator v3.1 kit (Ready Reaction Cycle Sequencing, Applied Biosystems).Sequencing products were analyzed in an automated sequencer (3500 Genetic Analyzer, Applied Biosystems).Sequences were assembled and the contig was formed using the phred base-calling and phrap assembly tools, which are available in the Codoncode aligner suite, v.4.2.1.(Codoncode Corp. Dedham, MA, USA).The edited sequences were aligned by means of Clustal W, which is available in the BioEdit Sequence Alignment Editor suite (HALL, 1999), based on similar and homologous sequences available in GenBank.
In some cases, after excision from agarose gel, nested PCR products were cloned using the InsTAclone TM PCR cloning kit (Fermentas) and Escherichia coli strain JM109 (Promega) competent cells, in accordance with the manufacturer's protocol.The positive clones were sequenced using M13 vector-specific primers as described above.
Non-redundant nucleotide sequences were submitted to GenBank, under the accession numbers KR046095-KR046124.
Phylogenetic trees were reconstructed using the neighbor-joining method (SAITOU & NEI, 1987).The evolutionary distances were computed using the maximum composite likelihood method (TAMURA et al., 2004).The percentage of replicate trees in which the associated taxa clustered together was estimated by means of a bootstrap test (1000 replicates) (FELSENSTEIN, 1985).Evolutionary analyses were conducted in MEGA6 (TAMURA et al., 2013).

Sequence analysis on Gdh
The Gdh sequences from cysts C1, C4, C6, C7, C8, C10, C11, C12, C+1 and C+2 were all identical to each other and identical to Giardia duodenalis isolate JH (EF685688) (sub-assemblage AII).No polymorphic site was found in these sequences.The Gdh Heterozygosity in single Giardia cyst sequence from cyst C3 was also assigned to sub-assemblage AII and was identical to the above sequences except for the presence of a single polymorphic site (Y) at position 832 of the gene (Figure 1A).Cyst C2 was identified as assemblage B, but several polymorphic sites along its sequence were detected, at the following positions in the gene: 486, 546, 582, 612, 723, 807, 834, 895 and 933 (Figure 1B).

Sequence analysis on Bg
The Bg genetic sequences from cysts C1, C3, C4, C6, C7, C8, C10, C11, C12, C+1 and C+2 were all identical to each other and identical to Giardia duodenalis isolate HC29 (KF922983) (assemblage A).No polymorphic site was found in those sequences.The Bg chromatograms obtained from cyst C2 showed several polymorphic sites compatible with overlap between sequences A and B (Figure 1C).The nested PCR products from cyst C2 were cloned and 14 alleles were revealed.Twelve alleles consisted of assemblage B, one allele was assemblage A and one allele had an assemblage A fragment interspersed with sequences of assemblage B, thus indicating that this allele may have resulted from recombination between sequences A and B. The Bg alignment between alleles found in cyst 2 is shown in Figure 2. The phylogenetic relationships between the Bg alleles from all the cysts investigated in this work and homologous sequences available in GenBank are shown in Figure 3.

Sequence analysis on orfc4
The OrfC4 genetic sequences from cysts C1, C3, C4, C6, C7, C8, C10, C11, C12, C+1 and C+2 were all identical to each other.The GenBank sequence closest to these is the sequence of Giardia duodenalis isolate WB (XM_001704865) (sub-assemblage AI).No polymorphic sites were found in any of these sequences.The OrfC4 chromatograms obtained from cyst C2 were difficult to interpret, but were compatible with occurrence of simultaneous amplification of two sequences that were very different to each other (data not shown).The nested PCR products of cyst C2 were cloned and nine alleles were revealed.Five alleles consisted of assemblage B, and four alleles were assemblage A. The phylogenetic relationships between the OrfC4 alleles from all the cysts investigated in this work and homologous sequences available in GenBank are shown in Figure 4.

Discussion
Sequencing Giardia duodenalis genes directly from PCR amplified products reveals occurrences of polymorphic sites in cysts in stool samples from patients, and in cloned culture isolates and single cysts and/or trophozoites isolated through micromanipulation (WIELINGA et al., 2011;ANKARKLEV et al., 2012).Detection of double peaks in chromatograms from PCR products amplified from single cysts provides plausible evidence that allelic sequence heterozygosity does occur in Giardia duodenalis (ANKARKLEV et al., 2012).However, it is not possible to know how many alleles exist in a PCR mixture from which chromatograms present double peaks.In addition, considering amplicons containing two or more alleles, it is possible that not all nucleotide differences between them are detected in chromatograms, because the different alleles may be present at different prevalences in the population of strands (ANKARKLEV et al., 2012).For this reason, we aimed in this study to clone PCR products that presented chromatograms with heterogeneous sites.
Among the cysts analyzed, only one of them (cyst C2) had sequences relating to Giardia duodenalis assemblage B. All other cysts were classified as sub-assemblage AII.Neither of the samples containing more than one cyst (C+1 and C+2) showed any shared sites between assemblages A and B, and both of them were assigned to sub-assemblage AII, which probably indicates that the frequency of assemblage B cysts in the original stool sample was very low.These results demonstrate that a specimen classified as sub-assemblage AII may actually contain other assemblages that are underrepresented.
Considering the single cysts classified as sub-assemblage AII, all chromatograms presented monomorphic sites, with the only exception of the Gdh sequence from cyst C3, in which only one polymorphic site was detected.This result contrasts with the results obtained with cyst C2.Although the chromatogram from the Tpi sequence of cyst C2 did not have any double peaks, allelic sequence heterozygosity was detected at all other loci.
Several heterogeneous sites were found in Gdh sequences from cyst C2, but most of them were not differences between assemblages A and B. Therefore, the Gdh chromatogram from cyst C2 was probably formed by superposition of different alleles of assemblage B. Unfortunately it was not possible to clone the Gdh products of cyst C2.
In contrast, the OrfC4 and Bg sequences from cyst C2 showed overlapping of assemblage A and B alleles.In both cases, the sequences obtained after cloning PCR products revealed occurrences of an unexpectedly high number of allele variants of both genes.They also provided evidence for the existence of simultaneous occurrence of assemblage A and B sequences in a single cyst.
Although axenic assemblage B isolates had previously been shown to possess type A alleles for Gdh, Tpi and other genes, which would be indicative of gene transfer events from assemblage A-B individuals (LASEK- NESSELQUIST et al., 2009), the results presented here are unprecedented for single cysts.
The possibility that this event might be due to cross-contamination during the course of nucleic acid amplification cannot be completely ruled out.It is also possible that more than one cyst was aspirated together with cyst C2, although this possibility is remote because the cyst was aspirated from a drop containing only one cyst per microscope field.After aspiration, the cyst was placed in the dish again and the droplet was reexamined.The cyst was then aspirated again and was transferred to a microtube.Furthermore, it is very unlikely that multiple alleles of assemblage B were from more than one cyst, because the sampled population of cysts was largely formed by assemblage AII cysts.Aspiration of more than one assemblage B cyst in a population in which the ratio between assemblages A and B was at least 10: 1 would be a very unlikely event.
Assemblage B Bg fragments flanking assemblage A Bg fragment, as observed in one of the clones of cyst C2 (C2_BG_CLONE8), provides strong evidence that alleles of different assemblages actually coexist in the same cyst.No technical artifact could convincingly explain this event, which probably results from crossover recombination between strands of assemblages A and B. It is noteworthy that there are no sequences similar to the C2_BG_CLONE8 sequence in GenBank.
The simultaneous occurrence of assemblage A and B alleles in a single cyst strongly suggests that genetic exchange occurs between these assemblages.This puts a check on the statement that different assemblages represent distinct species (at least as regards the distinction between AII and B) (FRANZÉN et al., 2009).
The surprisingly high number of alleles of Bg and OrfC4 in a single cyst is not supported by current knowledge on the biology of this agent.During encystation, the DNA of Giardia duodenalis is replicated, resulting in a cyst with a ploidy of 16N in four nuclei (BERNANDER et al., 2001).Such a high number of alleles of Bg and OrfC4 (and probably gdh) detected in a single cyst may be a consequence of replication errors of DNA polymerase from Giardia duodenalis assemblage B, or a consequence of any other strategy used by the parasite towards increasing genetic diversity.Although rather speculative, this statement should be considered in further studies in order to explain why genetic heterogeneity identified in assemblage B isolates is such a common event.are alleles sequenced from cyst C1.The evolutionary history was inferred using the neighbor-joining method.The optimal tree with the sum of branch length = 0.90220161 is shown.The percentage of replicate trees in which the associated taxa clustered together in the bootstrap test (1000 replicates) is shown next to the branches.The tree is drawn to scale, with branch lengths in the same units as those of the evolutionary distances that were used to infer the phylogenetic tree.The evolutionary distances were computed using the maximum composite likelihood method and are in the units of the number of base substitutions per site.The analysis involved 18 nucleotide sequences.The codon positions included were 1 st + 2 nd + 3 rd + noncoding.All positions containing gaps and missing data were eliminated.There were a total of 343 positions in the final dataset.Evolutionary analyses were conducted in MEGA6.
DNA polymerases used to amplify targets during PCR may misincorporate nucleotides at different rates.The relatively low-fidelity Taq DNA polymerases are routinely used for the differentiation between PCR-yielded products even though it incorporates mutations in a small number of strands.Such a small number of mutated strands will be diluted within a high amount of correct copies and the mutations would not be detected by sequence analysis.However, when one clones the PCR products, such mutants would be promptly noticed.McInerney et al. (2014) compared error rate during PCR by DNA polymerases and found that Taq DNA polymerase incorporated 99 mutations in a total of 135,000 base pairs sequenced, which correspond to 1 nucleotide misincorporation to each ~1,400 base pairs sequenced.Error rate of Taq DNA polymerase is heavily dependent on the analytical and methodological differences among studies (KUNKEL & BEBENEK, 2000) and values typically reported ranges between 1-20 x 10 -5 (1 to 20 artifactual mutations each 10,000 base pairs sequenced).In our experiment, a low PCR-generated error was attributed, because all the cloned sequences were shorter than 500bp.In addition, the Gdh PCR products from cyst C11 were cloned and the five obtained clones were 100% identical to each other (data not shown), which strongly indicates that the fidelity of Taq DNA polymerase had low impact on generating artifactual mutations in the conditions of the experiment.
From the results presented here and from those presented elsewhere (LASEK-NESSELQUIST et al., 2009;ANKARKLEV et al., 2012), an interesting hypothesis might emerge.Giardia duodenalis does not infect only through clonally expanded numbers of individuals.Rather, a genetically heterogeneous population is composed of an infecting population with individuals that are highly specialized and adapted to the host and individuals that may act as a genetic reservoir and which are present in much smaller numbers.A minority of the individuals may be responsible for the spillover capacity of the organism, in which perpetuation of the genetic material is ensured through gene exchanges with the numerous host-adapted individuals.For example, an animal The evolutionary history was inferred using the neighbor-joining method.The optimal tree with the sum of branch length = 0.90220161 is shown.The percentage of replicate trees in which the associated taxa clustered together in the bootstrap test (1000 replicates) is shown next to the branches.The tree is drawn to scale, with branch lengths in the same units as those of the evolutionary distances that were used to infer the phylogenetic tree.The evolutionary distances were computed using the maximum composite likelihood method and are in the units of the number of base substitutions per site.The analysis involved 14 nucleotide sequences.The codon positions included were 1 st + 2 nd + 3 rd + noncoding.All positions containing gaps and missing data were eliminated.There were a total of 393 positions in the final dataset.Evolutionary analyses were conducted in MEGA6.
challenged with cyst C1 might not become infected at all.On the other hand, it might become infected by cyst C2 and then transmit cyst C2 back to humans.Once cyst C2 enters the human intestinal tract again, it would start a new infection by expanding the numbers of cells containing AII genetic material.Thus, studies focusing on populations of host-adapted cysts would be very useful for confirming whether only clonally expanded numbers of individuals form them or whether they are actually formed through other genetically distinct individuals that are less represented but always present.The highly numerous adapted individuals would be responsible for the infection itself while the less numerous unadapted individuals would be responsible for genetic exchanges and perpetuation of the organism and its diversity, in a cooperative population.

Figure 1 .
Figure 1.Chromatograms obtained after sequencing PCR products from single cysts.Sequences are unedited raw data and the nucleotide numbers on chromatograms do not follow the nucleotide numbering of genes.In each panel, sequences at the top were derived from the forward primer and sequences at the bottom were derived from the reverse primer.(A) Gdh sequences from cyst C3; (B) Gdh sequences from cyst C2; (C) Bg sequences from cyst C2.

Figure 2 .
Figure 2. Alignment of PCR-derived sequences from Bg.The numbers on the ruler do not follow the nucleotide numbering of the gene.Sequences starting with C2 are alleles cloned from cyst C2.Sequence starting with C1 are alleles sequenced from cyst C1.Data from GenBank: isolate GS is assigned to Giardia duodenalis assemblage B (FJ560593); and isolates JH and WB are assigned to Giardia duodenalis assemblage A (FJ560592 and XM001705373, respectively).Assemblage A sequences differed from assemblage B sequences at several sites (fixed mutations).Between sites 104 and 188, the C2_BG_CLONE8 sequence shares mutations with assemblage A. Between sites 1 and 104 and between sites 188 and 365, the C2_BG_CLONE8 sequence shares mutations with assemblage B. Dots (.) represent identity.

Figure 3 .
Figure 3. Evolutionary relationships of Bg sequences.Sequences starting with C2 are alleles cloned from cyst C2.Sequences starting with C1are alleles sequenced from cyst C1.The evolutionary history was inferred using the neighbor-joining method.The optimal tree with the sum of branch length = 0.90220161 is shown.The percentage of replicate trees in which the associated taxa clustered together in the bootstrap test (1000 replicates) is shown next to the branches.The tree is drawn to scale, with branch lengths in the same units as those of the evolutionary distances that were used to infer the phylogenetic tree.The evolutionary distances were computed using the maximum composite likelihood method and are in the units of the number of base substitutions per site.The analysis involved 18 nucleotide sequences.The codon positions included were 1 st + 2 nd + 3 rd + noncoding.All positions containing gaps and missing data were eliminated.There were a total of 343 positions in the final dataset.Evolutionary analyses were conducted in MEGA6.

Figure 4 .
Figure 4. Evolutionary relationships of OrfC4 sequences.Sequences starting with C2 are alleles cloned from cyst C2.Sequences starting with C1 are alleles sequenced from cyst C1.The evolutionary history was inferred using the neighbor-joining method.The optimal tree with the sum of branch length = 0.90220161 is shown.The percentage of replicate trees in which the associated taxa clustered together in the bootstrap test (1000 replicates) is shown next to the branches.The tree is drawn to scale, with branch lengths in the same units as those of the evolutionary distances that were used to infer the phylogenetic tree.The evolutionary distances were computed using the maximum composite likelihood method and are in the units of the number of base substitutions per site.The analysis involved 14 nucleotide sequences.The codon positions included were 1 st + 2 nd + 3 rd + noncoding.All positions containing gaps and missing data were eliminated.There were a total of 393 positions in the final dataset.Evolutionary analyses were conducted in MEGA6.