The influence of primer choice on archaeal phylogenetic analyses based on 16S rRNA gene PCR

Abstract Polymerase chain reaction (PCR) assays targeting 16S rRNA genes followed by DNA sequencing are still important tools to characterize microbial communities present in environmental samples. However, despite the crescent number of deposited archaeal DNA sequences in databases, until now we do not have a clear picture of the effectiveness and specificity of the universal primers widely used to describe archaeal communities from different natural habitats. Therefore, in this study, we compared the phylogenetic profile obtained when Cerrado lake sediment DNA samples were submitted to 16S rDNA PCR employing three Archaea-specific primer sets commonly used. Our findings reveal that specificity of primers differed depending on the source of the analyzed DNA. Furthermore, archaeal communities revealed by each primer pair varied greatly, indicating that 16S rRNA gene primer choice affects the community profile obtained, with differences in both taxon detection and operational taxonomic unit (OTU) estimates.


Introduction
It has been extensively documented that the vast majority of existing microbial diversity consists of yet uncultured organisms (Rappé and Giovannoni, 2003;Lloyd et al., 2018).To access this diversity, molecular biology techniques have become a valuable asset to explore microbial communities in environmental samples.Several methods have been employed throughout the years, such as denaturing gradient gel electrophoresis (DGGE), restriction fragment length polymorphism (RFLP), clone libraries of polymerase chain reaction (PCR) amplicons, quantitative PCR and, more recently, metagenome assembly using next generation sequencing (Streit and Schmitz, 2004;Garrido-Cardenas and Manzano-Agugliaro, 2017).As a result, our knowledge on the diversity and metabolic potential of yet uncultured groups has greatly improved in the last years, with the description of several novel taxa and the proposal of entire phyla with no cultured representatives, greatly affecting both bacterial and archaeal phylogeny (Hug et al., 2016).
Amplified DNA was visualized on 1% agarose gels electrophoresis stained with ethidium bromide (0.5 μg/mL).Amplicons were purified using GeneJET PCR Purification kit (Thermo Scientific), cloned into pGEM-T Easy ® (Promega) vector, according to manufacturer's instructions, and transformed into Escherichia coli DH5α competent cells by heat shock treatment.Plasmidial DNA of the recombinant clones was extracted by phenol-chloroform-isoamyl alcohol at 25:24:1 (vol/vol/vol) and sequenced by Sanger method at Macrogen Inc. (Korea).

DNA Sequences analyses and primers coverage in silico analysis
The quality of the DNA sequences obtained was checked with Phred algorithm (http://asparagin.cenargen.embrapa.br/phph/) and only those with quality superior to 20 in more than 400 nucleotides were considered for further analyses.Chimeric sequences were identified by UCHIME 2, through the NCBI platform.Taxonomic classification of the sequences was performed with the latest releases of Greengenes (13_8) (DeSantis et al., 2006) and SILVA (v132) (Quast et al., 2013) taxonomical databases, both using Mothur v.1.24.1 (Schloss et al., 2009), and with the Ribosomal Data Project (RDP) tool on https://rdp.cme.msu.edu/classifier/classifier.jsp.Only identity thresholds of 90% or higher were considered.
Multiple alignments of 16S rRNA gene sequences amplified with different primers sets were performed separately with Clustal X v. 2.1 (Larkin et al., 2007) and gap columns generated were manually filtered.Mothur was used to calculate richness and diversity indexes, coverage estimations, as well as unique and shared OTUs at 97% identity for the construction of Venn diagrams.Shared and exclusive OTUs were estimated based on analysis of overlapping region of the sequences amplified by the different primers.
In silico evaluation of the primer pairs was performed using the online tool TestPrime1.0 (Klindworth et al., 2013) and the non-redundant SILVA database (SSU r138) (Quast et al., 2013).Aiming a more realistic simulation of recent studies (Muyzer et al., 1993;Bahram et al., 2018).Indeed, reports using both clone libraries and next generation sequencing of PCR amplicons are still frequently published (Tupinambá et al., 2016;Antranikian et al., 2017;Wu et al., 2017;Belmok et al., 2019).However, it is widely acknowledged that this approach is not devoid of potential biases, with steps such as DNA extraction, inhibition of, or unspecific DNA amplification, generation of PCR artefacts and differential amplification all playing a crucial role in result analyses (Delmont et al., 2013).
It has been demonstrated that the primers used in 16S rRNA gene PCR assays greatly affect the microbial taxa detected in environmental samples (Frank et al., 2008;Hong et al., 2009) and factors such as the DNA flanking the template region may also result in preferential amplification of some sequences (Santos et al., 2019).Indeed, the effectiveness of PCR targeting bacterial 16S rRNA genes to evaluate microbial diversity is subject of a plethora of factors (Acinas et al., 2005;Engelbrektson et al., 2010).Furthermore, despite the great advances in DNA sequencing techniques, biases on 16S rRNA gene-based studies is a topic that still requires attention (Klindworth et al., 2013;Kennedy et al., 2014;Brooks et al., 2015).
Primers described as universal to prokaryotic 16S rRNA gene have been demonstrated to be not as effective for the Archaea domain (Kolganova et al., 2002;Baker et al., 2003), reinforcing the utmost importance of adequate archaealspecific primer design (Baker et al., 2003;Gantner et al., 2011).Since most studies comparing prokaryotic 16S rDNA-based methods were performed with Bacteria, little is known about the biases associated with the Archaea domain.Therefore, in the present study we compared the phylogenetic profile obtained when a Cerrado lake sediment DNA sample was submitted to 16S rDNA PCR employing widely used Archaea-specific primers.

DNA Samples
Lake sediment samples were collected in native Cerrado (a Brazilian savannah-like biome) and used in this study.Sediment samples were obtained from a lake known as "Lagoa Rio Preto Alto" (LRPA) in the "Sempre Vivas" National Park, located in the state of Minas Gerais, Brazil, on May 2010 (FAP-DF project 2009/00086-7).Sampling was performed using 10 cm diameter PVC tubes, by introducing the tube up to 5 cm into sediments 1 m below the lake`s water level.

DNA extraction, PCR conditions and 16S rRNA genes libraries construction
DNA was extracted from 0.5 g of each sample with PowerSoil DNA Isolation Kit (MO Bio Laboratories Inc.) according to the manufacturer's instructions.PCR assays were conducted for each sample with three different primer combinations: 21f/958r (5´TTCCGGTTGATCCYGCCGGA-3´/ Brazilian Journal of Biology, 2023, vol.83, e247529 3/8 16S rRNA primer choice and archaeal phylogeny PCR behavior, one mismatch per primer at all locations except at the five bases of the 3´end was allowed (Klindworth et al., 2013).

Nucleotide sequence accession numbers
The nucleotide sequences from this study were deposited in the GenBank database under accession numbers MK527511-MK527839.

Results and Discussion
Three different primer pairs described as universal and specific for archaeal 16S rRNA genes were used to amplify a sediment environmental DNA sample (LRPA).Clone libraries of 16S rRNA genes amplified with each of the three selected primer combinations were obtained for the lake sediment DNA sample, totalizing three libraries (LRPA 21f-958r, LRPA 109f-915r and LRPA 340f-1000r).Despite the similar number of sequenced clones in each library, the number of sequences that were classified as Archaea after quality and chimera analyses was highly different among primer pairs (Table 1).
All 16S rDNA sequences detected with primers 21f-958r were classified as Archaea (Table 1), suggesting a high specificity of this primer combination for archaeal DNA sequences present in our sample.In contrast, primers 109f-915r were less specific, and amplified two bacterial sequences.Although these primers, originally designed to describe methanogens (Großkopf et al., 1998), have been extensively used to describe archaeal communities (Nishizawa et al., 2008;Jeyanathan et al., 2011;Carnevali et al., 2018), previous results of our group revealed that they are effective in the amplification of bacterial DNA sequences, especially when these organisms are present in higher abundance in environmental samples (data not shown).
Considering the current archaeal phylogeny and taxonomy proposed in the literature, classifications obtained with SILVA seemed more adequate, especially at the lower taxonomic levels, as it considers nomenclatures currently employed for many uncultivated groups (e.g.I.1c, Bathyarchaeia, Woesearchaeia), while other databases still present outdated nomenclatures or are not able to classify sequences affiliated to recently proposed taxa.Therefore, classifications obtained with SILVA were selected for further comparisons of the 16S rDNA libraries obtained with the different primer pairs.
As shown in Figure 1, the archaeal communities revealed by each primer pair varied greatly.While sequences amplified with primers 109f-915r and 340f-1000r resulted in similar phyla profiles, with most sequences affiliated to Euryarchaeota, amplicons obtained with primers 21f-958r were mainly associated to Thaumarchaeota and Bathyarchaeota (Figure 1A).In addition, Woesearchaeota could only be detected by 109f-915r and 340f-1000r.At lower taxonomic ranks, differences among the three primer pairs were also observed.With the exception of a few unclassified sequences amplified by 340f-1000r, all Euryarchaeota sequences were affiliated to methanogenic orders.While only Methanocellales and Methanosarcinales could be detected by primers 21f-958r, three additional methanogenic groups were amplified by 109f-915r and 340f-1000r -Methanobacteriales, Methanomicrobiales and Methanomassillicoccales -in different proportions (Figure 1B).Regarding Thaumarchaeota, 16S rDNA sequences from the yet uncultured subgroup I.1c were only revealed by primers 21f-958r and 109f-915r, while sequences belonging to I.1a (Nitrosotaleales) and I.1b (Nitrososphaerales) were amplified by all primers (Figure 1B).
The archaeal community depicted by primers 109f-915r and 340f-1000r was more comprehensive, revealing a high diversity of archaeal groups usually associated with hypoxic eutrophic freshwater sediments, similar to the one analyzed in the present study (e.g.methanogens) (Castelle et al., 2015;Laskar et al., 2018;Zhou et al., 2018).Interestingly, the archaeal diversity established by primers 21f-958r, however, was mainly associated to the TACK superphylum, with many thaumarchaeal sequences, a group not usually reported in lake sediment surveys (Fan and Xing, 2016;Hu et al., 2015).It is worth pointing out that this primer pair also yielded a high number of Bathyarchaeota sequences, a group that has been increasingly demonstrated as fundamental to the ecological dynamics in anoxic environments (Zhou et al., 2018).Thus, while generating fewer amplicons from methanogenic 16S rDNA when compared to the other two pairs, this primer pair provided important insights into this phylum`s diversity.
Despite this fact, such low methanogenic 16S rRNA gene detection in freshwater lake sediments is unusual.One of the main features of these environments is the high abundance of organic matter, especially in the upper layers (Thomaz et al., 2001;Esteves, 2011).Thus, the decomposition of organic matter plays an important ecological role, where complex molecules are degraded, leading to H 2 , CO 2 and CH 4 formation through microbial activity (Sansone and Martens, 1982).In this context, methanogens are key players in this process, considering that methanogenesis serves as the final step in the anaerobic food chain while also maintaining thermodynamically favorable conditions for fermentative and acetogenic processes (Ferry, 2011).For this reason, methanogens are commonly detected in freshwater lake sediments (Zhu et al., 2012;Rodrigues et al., 2014).Thus, primers 21f-958r are possibly not the most adequate when describing these archaeal communities.
These results are reinforced when compared to in silico assays employing SILVA's TestPrime tool (Table 2) (Klindworth et al., 2013).While it is important to highlight that differences between in vitro and in silico assays are a common occurrence, this analysis indicates that the pair 21f-958r has the lowest taxonomic coverage, being restricted to mainly halophiles and TACK superphylum related groups (e.g.Bathyarchaeia, Nitrososphaeria, Thermoprotei).As previously mentioned, this primer pair yielded a notoriously higher relative abundance of TACK superphylum sequences when compared to methanogens, a result that greatly differs from what would be expected from an ecological standpoint as well as the results obtained with the 109f-915r and 340f-1000r pairs.On the other 16S rRNA primer choice and archaeal phylogeny hand, in silico assays indicated that the pair 340f-1000r has the highest taxonomic coverage of the three, with widespread detection across all recognized archaeal taxa by SILVA.Primer pair 109f-915r yielded similar results, though its overall taxonomic coverage was not as high as 340f-1000r.It is also worth pointing out that, as previously mentioned, this primer pair resulted in a few bacterial sequences, something that was not detected on in silico analysis (Table 2).Thus, considering the taxonomic coverage estimations employing the TestPrime tool as well as the results obtained in our in vitro assays, primer pair 340f-1000r is likely best suited to these environments.
OTU based analyses at 97% sequence similarity revealed great differences when the results obtained in LRPA with the three sets of primers are compared.A low number of shared OTUs among all three libraries and a high number of exclusive OTUs retrieved by each primer pair were observed (Figure 1C).Furthermore, α-diversity analyses showed small variations in the Shannon diversity index when each primer pair was used (Table 3), which could lead to different interpretations of a given environment's diversity.It is worth pointing out that the libraries obtained with all three primer pairs had a coverage of around 80% with the number of sequences analyzed.These results indicate that archaeal 16S rRNA gene primer choice greatly affects the community profile obtained, with differences in both taxon detection and OTU estimates.

Conclusions
Altogether, our results highlight the importance of primer choice when describing archaeal communities in PCR based environmental studies.Domain-specific primer design is a well-recognized challenge, and the commonly used primers analyzed in this study may yield different outlooks on archaeal diversity and phylogeny in a variety of habitats.This aspect should be considered during both experimental design and data analyses.Depending on the community composition and archaeal abundance in a given sample, entire groups could possibly be overlooked and OTU estimations at different taxonomic levels may vary.Therefore, as more 16S rRNA gene sequences are deposited and novel groups are discovered, more comprehensive domain specific primers may be designed for archaea, reducing preferential amplification issues.Thus, while PCR based environmental studies have inherent biases, future studies will surely improve archaeal detection when employing this method.

Figure 1 .
Figure 1.Classification with SILVA database of archaeal 16S rRNA sequences amplified from a lake sediment (LRPA) DNA with three different primers sets at phylum level (A) and order level (B).In (C), Venn diagram showing shared and unique OTUs, with 97% sequence similarity, among sequences amplified from LRPA sample with the three different primers sets.

Table 1 .
Number of sequenced clones, quality, and classification at domain level of sequences amplified from a lake sediment (LRPA) DNA sample with three different primer pairs.
a Sequences not identified as 16S rDNA (no significant BLAST hits).b 16S rRNA gene sequences classified with 100% identity in Greengenes, RDP and SILVA databases.c Percentage of high quality, non-chimeric sequences classified as Archaea.

Table 2 .
Taxonomic coverage of in silico assays of each primer pair used in this study according to the TestPrime tool (SILVA).Taxa covered by each primer pair are shown in bold.

Table 3 .
α-Diversity analysis (97% sequence similarity) of archaeal 16S rRNA gene sequences amplified from a lake sediment (LRPA) sample with three different primer pairs.Archaeal sequences were normalized for the smallest library size (n = 63).