Genetic diversity of Toll-like receptor 5 among pig populations

The Toll-like receptor 5 (TLR5) recognizes flagellin of Gram-positive and -negative bacteria and plays an important role in the host defense system. Here, we surveyed single nucleotide polymorphisms (SNPs) in the coding sequence of the porcine TLR5 gene in 83 individuals from five pig breeds, these including Chinese local populations and Western commercial pig breeds. A total of 19 medium polymorphic SNPs (0.25 < PIC < 0.5) were identified, three of which were missense mutations that clustered within the extracellular domain of TLR5. One of the non-synonymous SNPs fell within a 228-amino acid region which has been shown to be important for flagellin recognition. Four SNPs were only found with high frequencies in Oriental pig breeds. The 19 SNPs were found in 30 haplotypes, one of which segregated at high frequency in all samples. Compared with Western pig breeds, Chinese local populations had higher genetic diversity and more haplotypes. Tajima’s test showed no evidence for deviation from neutrality. The data provide useful information for future genetic marker characterization by means of disease association analysis and/or stimulating the mutation carrier with relevant ligands.


Introduction
Innate immunity, which develops in minutes following microbial invasion and is the first line of host defense against infection, triggers and regulates the adaptive immune system Medzhitov and Janeway Jr, 2000). It can be activated by the recognition of microorganisms through pattern recognition receptors (PRRs) and involves the release of cytokines and chemokines, and the activation of monocytes and macrophages (Janeway Jr and Medzhitov, 2002). Toll-like receptors, which recognize structural components unique to pathogenic microbes, are important PRRs of microbial infection and play a central role in the activation of innate immunity and the development of antigen-specific adaptive immunity through their interaction with ligands (Medzhitov et al., 1997;Poltorak et al., 1998;Takeda et al., 2002;Zhang et al., 2004).
TLR5, a member of the TLR family, specifically recognizes flagellin, which is present in the flagellar structure of Gram-positive and -negative bacteria (Hayashi et al., 2001). Like other TLRs, TLR5 is composed of adjacent extracellular, transmembrane and intracellular regions. The extracellular region contains 22 tandem leucine-rich repeats (LRRs) flanked by cysteine-rich capping structures at the N-and C-terminal sides, which are known as LRRNT and LRRCT, respectively (Matsushima et al., 2007). The LRRs are motifs that are 20-30 amino acids in length and are responsible for ligand-binding. The intracellular region includes a signaling portion, the Toll/IL-1 receptor (TIR) domain, which binds adaptor molecules and induces cellular immune responses through a signaling cascade (Muzio et al., 2000).
Genetic variation in the porcine TLR5 gene may also affect the ligand recognition and signal transduction of the receptor and alter the susceptibility/resistance of the host to infectious diseases. The aim of this study was to analyze TLR5 genetic diversity in Western commercial and Chinese local pig populations. A total of 83 animals from five pig breeds were used to explore the genetic variation present in the coding sequence. The results should provide useful information for future genetic marker characterization in studies using disease association analysis and/or stimulation of the mutation carrier with relevant ligands, which is important for breeding programs aiming at disease-resistance traits in pigs.

Animals
The sample panel included 83 unrelated domestic pigs comprising Chinese local pig breeds (27 Min and 23 Beijing black pigs), Western commercial pig breeds (13 Yorkshire and 15 Landrace pigs), and 5 Chinese wild boars. Genomic DNA was extracted from ear tissue by a standard phenol-chloroform method.

PCR and sequencing
The complete coding sequence of the porcine TLR5 gene, which is 2571 bp long and composed of a single exon, was amplified by a pair of primers designed using Primer Premier 5.0 according to the sequence deposited in GenBank (No. AB208697). The primer sequences were as follows: 1F-GAA AGC TTA TGG GAG ACT GCC TGG TC (forward), 1R-GCT CTA GAC TAG GAG ATG GTC ACG CTT TG (reverse). The fragment was amplified with high fidelity Pfu DNA polymerase (TransGen, Beijing, China) using a 3-min step at 95°C, followed by 30 cycles at 95°C for 30 s, 60°C for 30 s, and 72°C for 2 min each, with a 7-min final extension at 72°C. The PCR products were sequenced at the Beijing Genomics Institute (Beijing, China), using the primer pair 1F/1R and other two primers (2F-CAG ATA CCC CTT GTG TGC as the forward primer and 2R-GGG TGA CAG TGA ACA AGA TG as the reverse one).

Sequence analysis
The full-length gene contig was built using the SEQMAN module of LASERGENE (version 6.0). SNPs were detected by sequence alignment with the DNAMAN package (version 5.2.2), and each of them was confirmed by manual inspection of the sequence diagram using the Chromas software. For SNPs occurring in fewer than three individuals, the fragments were reamplified and sequenced again to reduce the risk of reporting PCR artifacts as polymorphisms.
The PROC HAPLOTYPE procedure of SAS9.1 was used to estimate haplotype frequencies by implementing the Expectation Maximization algorithm which is commonly used in haplotype phasing and frequency estimation. Confidence intervals and standard errors were estimated under a binomial assumption, which is default in the procedure for each haplotype frequency estimate. The estimates were then utilized to assign the probability that each individual possesses a particular haplotype pair. Individual haplotypes predicted with probabilities < 99% were subject to further verification. The fragments were reamplified, sub-cloned into the pMD-18T vector, and sequenced. In each case, at least five sub-clones were sequenced, including at least one copy of each allele.
Tajima's test was performed to test for deviation from neutrality. This test proposed by Tajima (1989) is based on the fact that, under the neutral model, estimates of S/a 1 and k are unbiased estimates of theta (the population mutation rate 4Nem). DNA sequence polymorphism (DnaSP) software (version 5.10) was used to calculate the Tajima's D value and its confidence limits (two-tailed test) by assuming this statistic to have a beta distribution (Tajima, 1989). At the same time, nucleotide diversity (Pi) was calculated using Equation 10.5 of Nei (1987) with DnaSP.
Relative synonymous codon usage (RSCU) value, which is defined as the ratio of observed frequency of a codon in a gene to that expected under equal codon usage, was calculated as below (Sharp et al., 1986)  where X i is the observed frequency of codon i in the gene under study, n is the number of synonymous codons for the amino acid studied.

Nucleotide polymorphism
A total of 40 SNPs were identified in the coding sequence of the porcine TLR5 gene,with 14 of these SNPs leading to amino acid transformations. Most of them were present at frequencies lower than 3% and/or were specific to a single pig breed. Only five of the SNPs (12.5%) were transversions, while the others (87.5%) were transitions, this showing an obvious transitional bias. The pattern and distribution of base substitution are shown in Figure 1.
The nucleotide diversity value, Pi, was calculated within populations. Wild boars displayed the highest variability among all populations (Pi = 4.2 x 10 -3 ), while Yorkshire pigs showed the least variability (Pi = 1.29 x 10 -3 ). Pi was higher in the Chinese local populations (Pi = 3.29 x 10 -3 ) than in the Western commercial pig breeds (Pi = 1.89 x 10 -3 ). Nineteen SNPs showed an intermediate level of polymorphism (0.25 < PIC < 0.5) ( Table 1). Three SNPs, c.137G > A, c.834T > G and c.1246A > T, were non-synonymous, resulting in amino acid replacements of Gly by Asp at position 46, His by Gln at position 278, and Thr by Ser at position 416, respectively. The SNP c.137G > A (p.Gly46Asp) was predicted to cause an acquisition of charge, from neutral to negative, and c.834T > G (p.His278Gln) a change of charge from positive to neutral. SNPs c.137G > A, c.1812A > T, c.1815T > C and c.1836G > A were only found in Oriental pig breeds (Min pig, Beijing black pig, and Wild boar). At the loci of the SNPs c.360T > C and c.834T > G, the major alleles in Oriental populations were different from those in Western breeds. All of the medium polymorphic SNPs were distributed in structural domains, seven in LRRs, eight in TIR, and others in LRRNT and LRRCT (Figure 2).

Haplotype diversity
Thirty haplotypes were predicted and grouped for the presence of missense substitutions using the 19 medium polymorphic SNPs (Figure 3). Haplotype A1 was present at the highest frequency (36.14%) in the whole sample, whereas the other haplotypes showed frequencies ranging between 0.60% and 12.05%. Group A segregated at the highest frequency(46.38%), while variants carrying nonsynonymous mutations had frequencies ranging between 0.60% and 23.49%. The SNPs c.1246A > T, c.1269G > A and c.1278C > T always cosegregated in the 83 individuals from the five pig breeds. The SNPs c.2100T > C, c.2124A > G and c.2127A > G also cosegregated in the pigs investigated, and the same cosegregation existed in the SNPs c.1812A > T and c.1815T > C. Compared with the Western commercial pig breeds, Beijing black pigs and Min pigs had more haplotypes, whereas Yorkshire pigs had the fewest among the pig breeds investigated.

Test for deviation from neutrality
Tajima's test, applied separately to individual Oriental breeds and to Western populations, was used to explore whether deviation from neutrality could be detected in the porcine TLR5 gene on the basis of all SNPs detected. For pig TLR5 genetic diversity 39   each breed, and for Oriental breeds, Western breeds, and all breeds combined, respectively, both positive and negative values were obtained for both the full gene and ectodomain, but they were not significantly different from zero (p > 0.05) ( Table 2).

Non-synonymous SNPs
In this study, the genetic diversity of the porcine TLR5 gene was analyzed using PCR and a direct sequencing method. Nineteen SNPs of intermediate polymorphism were identified in the coding sequence, three of which were non-synonymous substitutions and clustered within the extracellular domain (Table 1, Figure 2). The SNP c.137G > A (p.Gly46Asp) is located in the LRRNT domain, which is composed of 27 amino acids (positions 21 to 47 in the polypetide, as predicted by Matsushima et al., 2007) and is hypervariable among species (Figure 4). Both the SNPs c.834T > G (p.His278Gln) and c.1246A > T (p.Thr 416Ser) fell into irregular LRR domains, LRR9 and LRR15, respectively (Matsushima et al., 2007).
The LRRNT can shield the hydrophobic core of the first LRR from exposure to solvent, and therefore is expexted to play a role in stabilizing the protein structure (Matsushima et al., 2005). Irregular LRRs are common in mammalian TLRs, and involvement in ligand recognition and binding has been suggested (Bell et al., 2003;Wei et al., 2011). In addition, p.His278Gln occurs within a 228amino acid region, which has been revealed to confer species-specific flagellin recognition in the mouse and human (Andersen-Nissen et al., 2007). It will be important to further analyze the effect of the SNPs on receptor function by means of eukaryotic expression vectors of the porcine TLR5 gene with site-directed mutagenesis in cell cultures, and/or to analyze their association with immune-related traits in various populations.

Synonymous SNPs
Several synonymous SNPs were found in the LRRs and the TIR domains ( Figure 2). Synonymous SNPs do not affect amino acid sequence, and therefore are not expected to alter the function of the protein. However, accumulating evidence indicates that synonymous SNPs can have an effect on protein function through alteration of mRNA stability, mRNA splicing, mRNA expression, mRNA maturation, and mRNA folding (Hoffmeyer et al., 2000;Duan et al., 2003;Johnson et al., 2005). For example, a silent mutation in the MDR1 gene results in decreased levels of mRNA by a change in mRNA stability, thus altering the protein level . Another example is a synonymous SNP in exon 4 of the human leukocyte antigen (HLA)-A2 gene, which results in the alternative expression of HLA-A*01:01:38L through aberrant splicing (Dunn et al., 2011). In addition, the presence of a rare codon, marked 40 Yang et al.  by a silent mutation, can affect the translation rate as a result of the scarcity of cognate tRNA, which in turn influences the cotranslational folding and/or insertion of a protein into the membrane, thereby altering the conformation and function of the protein (Kimchi-Sarfaty et al., 2007;Komar, 2007). RSCU is an effective measurement of codon usage bias. If the synonymous codons are used equally, the respective RSCU values would be 1.00. A codon which is used more frequently than expected will have a value higher than 1.00, and vice versa for a codon which is used less frequently than expected. In this study, six synonymous SNPs, c.2100T > C, c.2124A > G, c.2127A > G, c.2170T > C, c.2178C > T, and c.2394A > G, led to codon replacements with significantly different values of RSCU. In particular, SNPs c.2100T > C, c.2124A > G, and c.2127A > G always cosegregated, in the pattern of TGG or CAA, in the populations under study (Figure 3). In the case of SNP c.2100T > C, the RSCU values for codons GGT and GGC were 0.61 and 1.13 respectively; whereas in c.2124A > G, the RSCU values for GAA and GAG were 1.38 and 0.63; and in c.2127A > G, the RSCU values for GCA and GCG were 1.05 and 0.32, respectively. Furthermore, the substitutions all occurred at the third base in the codon, which suggests a large effect on protein folding (Gupta et al., 2000;Cortazzo et al., 2002). The simultaneous replacement of more frequently used codons GGC-GAA-GCA by the less frequent ones GGT-GAG-GCG at the three adjacent positions probably slows down the rate of translation at the corresponding mRNA region, and as a result, may influence the cotranslational folding and the function of the protein.

Variability in populations
Among the populations investigated, wild boar showed the highest variability, which is consistent with the common assumption that genetic variability decreases as a result of the selective breeding. Compared with the Western commercial populations, Min and Beijing black pigs had higher variability, as revealed by the Pi value and the number of haplotypes (Figure 3), which reflects the known breeding history of each breed. The Min pig, a Chinese indigenous breed, and the Beijing black pig, which was produced by crossing Chinese indigenous breeds with Western commercial populations, have an extensive genetic basis, whereas many alleles have been fixed in the Western commercial breeds during a long period of intensive breeding.

Tajima's test
The Tajima's test is a commonly used test of neutrality in population genetic studies. It tests whether sequences in question fit the neutral theory model at equilibrium between genetic drift and mutation. If the D value is close to zero, the null neutral hypothesis is accepted, which implies that the population has not experienced any contraction or recent growth, and that no selection is acting at the locus (Tajima, 1989). On the contrary, if the D value is significantly different from zero, the null hypothesis is rejected (Tajima, 1989). Compared with other intraspecies neutrality tests that use polymorphic data within a single species such as Fay and Wu's H, Tajima's D test does not require an outgroup sequence (Tajima, 1989, Fay andWu, 2000). In the present study, the Tajima's D statistic for each breed, and for Oriental breeds, Western breeds, and all breeds combined, showed no significant difference from zero (p > 0.05), hence, the null hypothesis was not rejected. This indicates that the evolution of the porcine TLR5 gene is caused by random drift of selectively neutral mutants. The result is consistent with the results obtained for the human and chimpanzee TLR5 genes using the same method (Wlasiuk et al., 2009). Studies have shown that selection direction in TLRs may differ among species, and many factors can influence their prediction by statistical methods, such as relaxation of purifying selection and the analytical methods used (Kryazhimskiy and Plotkin, 2008;Barreiro et al., 2009). In addition, in this study the sample size per population was small, which could also affect the results of the test. Therefore, the results of this study should be interpreted with caution.