INTRODUCTION
Over the last years, it has become increasingly clear that individual genetic variation is an essential component of overall immune responses that contributes to susceptibility, progression, and outcome of infectious and autoimmune diseases, and cancer. Innumerable studies have uncovered the extent of human genetic variation while trying to map their role in multifactorial and polygenic diseases. These studies showed that there are different types of variants, ranging from single nucleotide polymorphisms (SNPs) with one sole base change to repetitive short- and medium-sized sequences, as well as copy number variations, which may extend throughout large segments of chromosomes. Individual genomes are currently being analyzed in the 1000 Genomes Project to catalogue the full extent of human genetic variation. Previous efforts like HapMap focused on SNPs aiming to map gene variants that affect health, disease, and individual responses to medications and environmental factors. Today, the number of known SNPs is in the range of 10 million and HapMap is estimated to contain 80% of all SNPs with frequencies of >10%(1) (useful web links are listed in Appendix 1). These projects have a common characteristic, namely, the mapping of hundreds of thousands of allelic variants. Useful variants for these genome-wide association studies (GWAS) typically exhibit frequencies between 0.5 and 5% and a great number of cases and controls is needed to obtain statistical power. For example, a 90% power to detect an allele with 1% frequency, with a risk factor of 2 GWAS, demands approximately 20,000 individual samples(2). This number can be considerably lower (~300) for a rare allele frequency of roughly 10%. A good example are the studies conducted by the Wellcome Trust Case Control Consortium, which analyzed several thousand individuals in genomewide scans in order to map the genetic risk for type 2 diabetes, rheumatoid arthritis, and Crohn's disease, to name a few of the target diseases studied(3).
On the other hand, pooling samples can also pose a problem. Disease heterogeneity and population differences act as confounding factors, hindering identification of relevant genes due to the small effect many of the variants have on the disease itself.
Studying the genetic background of Brazilian populations has always been a challenging issue. Not only there is a high degree of miscegenation, but the input from different migratory currents differs from one region to another. Amerindian, Caucasian, and African ethnicities contributed to this true “melting pot” since the very beginning of colonization five centuries ago by the Portuguese. Studies aiming to assess the relative contribution of the three races forming the gene pool of the Brazilian population through matrilinear(4) and patrilinear(5) descent, using mtDNA and chromosome Y-based methods, confirm historical data of crossbreeding between European men and Amerindian and African women(6,7).
In the Southeast region of Brazil where São Paulo is located, the composition of the mixture includes a majority of Italians, Spaniards, and Germans. However, this region also received a significant input from Africans and Indigenous peoples. In addition, as the most populated city in South America, São Paulo has always been an important destination of migrants from other places in Brazil and from neighboring countries. To account for the genetic admixture, some surrogate markers, such as skin color, are employed in many studies. However, Parra et al.(6) showed that in Brazilians this is a poor substitute for the assessment of individual ancestry. We chose to analyze healthy individuals coming from the same socioeconomic strata as the patients seen at the largest hospital in São Paulo. These individuals mirror the great diversity present in the population of the city of São Paulo, and though ancestry markers were not analyzed, we feel the information should be reported in order to be available for fellow researchers in the field when searching for candidate genes for case-control studies of the many different types of diseases that affect 19 million inhabitants of the extended metropolitan region of São Paulo.
The data shown in this paper include allele frequencies of known polymorphic genes in innate and acquired immunity, the majority with proven impact on gene function. Most of the polymorphisms shown here have an impact upon transcription levels. Others lead to changes in the binding strength to their corresponding ligands and/or in intracellular signal transduction, and even in the half-life of messenger RNAs.
The scope of this paper does not include detailed information on the genes or their polymorphisms, which can be assessed in good textbooks(8).
Innate immunity genes
The innate immune system relies on the presence of pathogen-associated molecular patterns (PAMPs) on microbial surfaces resulting in the activation of effector cells capable of clearing infection and inducing inflammation in this process. These pattern recognition molecules can be cell-associated, as is the case for Tolllike receptors (TLRs)(9,10), or soluble as in the case of the mannose-binding lectin family of proteins(11).
TLRs are predominantly expressed in antigenpresenting cells, either on the surface (TLR 1, 2, 4, 6) or in the cell (TLR 3, 7, 9). Each recognizes a specific PAMP-like Gram-negative bacterial lipopolysaccharide (TLR-4), fagellin (TLR-5), single-stranded (TLR-7) or double-stranded (TLR-3) viral RNA, bacteriumderived CpG DNA (TLR-9). TLR-4 is the major representative of the family, and is known to respond to exogenous and endogenous ligands, participating in inflammation and local tissue responses to wounds, hypoxia, or other forms of stress. Diminished response to lipopolysaccharides was mapped to amino acid substitutions in the TLR4 gene(12).
The complement system can be activated in different ways, one of which is the lectin pathway, initiated by mannose-binding lectin (MBL), reviewed in detail by Dommett et al.(13). MBL is an acute phase reactant that binds to mannose, sugars, and other microbial compounds by way of the lectin domain. Once activated, the cascade begins with the cleavage of C4 and C2, by MBL-activated MASP-1 and −2. It is intriguing that MBL-2, the functional form of MBL in humans, harbors polymorphisms in coding regions leading to non-preserved amino acid substitutions. In addition, there are promoter polymorphisms in strong linkage disequilibrium, which result in extended haplotypes. Of notice, the frequency of these alleles varies greatly worldwide and many different diseases, infectious or otherwise were studied(13).
Interleukin 10 and receptor genes
IL-10 is the prototypical down-regulating effector cytokine in immune response. Its action is crucial for control of inflammation accompanying immune responses. Absence of IL-10 in animal models of disease leads to tissue damage, autoimmunity, or chronic infection. IL-10 is produced by many different immune cells, such as macrophages and dendritic cells, TH1, TH2, and TH17 lymphocytes. It counteracts the production of pro-inflammatory interferon-γ and increases the differentiation of regulatory T cells(14). The gene encoding IL-10 is highly polymorphic, presenting several variants in the 5′ and promoter regions with impact on in vitro and in vivo production(15,16). IL-10 exerts its actions through the specific heterodimeric IL-10 receptor, where chain 1 is responsible for high affinity binding(17).
Immunomodulatory genes in the MHc class III region
Several immunomodulatory genes are located in the MHC class III region. In addition to the well-known inflammatory cytokines tumor necrosis factor alpha (TNF-α) and lymphotoxin alpha (LT-α), functional polymorphisms are being studied in recently described genes, like HLA-B associated transcript 1 (BAT1), nuclear factor of kappa light polypeptide gene enhancer in B-cells inhibitor-like 1 (NFKBIL1), and leukocytespecific transcript 1 (LST1)(18). The exact functional role of these three genes is not yet clear.
BAT1, the most telomeric of the three genes, is a DEAD-box RNA helicase, one of several genes in this region involved in RNA processing(19) and, importantly, a target for immune evasion by cytomegalovirus(20). NFKBIL1, a protein with some homology to the IκB family, was suggested to be involved in the control of cytosolic nuclear transcription factor-kappa B (NF-κB), a major molecule governing the transcription of over 200 different immune response genes(21). A detailed paper(22) recently showed that NFKBIL1 is expressed in all tissues, and that the protein is present in macrophages and T cells in the synovium of rheumatoid arthritis patients. Furthermore, in spite of the apparent homology, this product binds mRNA and seems to function in RNA processing. Finally, LST1 is translated into multiple isoforms with varying expression according to cell type and induction form. Expression in immune cells is widespread and, as a rule, leads to diminished lymphocyte proliferation.
IL-4, −5, −13, and receptor genes
IL-4 is a pleiotropic cytokine essential for IgE synthesis in B cells and for T cell differentiation into the TH2 phenotype(23). Th2 cells secrete IL-4, IL-13, and IL-5. Mast cells also secrete IL-5 and this interleukin activates eosinophils. IL-5, together with IL-4 and IL-13, is deeply involved in the induction and maintenance of allergic processes. The functions of IL-13 in immune surveillance and in TH2 type immune response partially overlap with IL-4, but IL-13 also has an impact on tissue eosinophilia, as well as tissue remodeling and development of fibrosis(24). Both IL-4 and IL-13 genes harbor functionally relevant polymorphisms. The biological activity of these two cytokines occurs through binding on target cells to their specific receptor. IL-13, which shares several biological functions with IL-4, operates through the IL-13 receptor, a heterodimer formed by the shared IL-4R alpha chain and IL-13R alpha chain.
Other cytokine and chemokine genes
A great variety of cytokines, chemokines, growth factors, and other molecules are produced in the beginning or during expansion of innate and acquired immune responses. The type of trigger, the site of the tissue, and the combination of the input of different molecules will shape the nascent immune response, defining the predominant type of cell and portfolio of effector molecules produced, according to Amsen et al.(23) and Pulendran et al.(25). The proinflammatory cascade of events is well-known, and two of the classical effector molecules involved are IL-12 and interferon gamma (IFN-γ). In addition, monocyte chemoattractant protein 1 (MCP-1) and chemokine C-C motif receptor 5 (CCR5) are, respectively, chemokine and chemokine receptor playing major roles in proinflammatory events(8). Cytotoxic T lymphocyte-associated protein 4 (CTLA-4) is an essential ligand governing activity of T lymphocytes. Thus, functional polymorphisms which modify levels or efficiency of any of these molecules may impact the outcome of the immune response due to the shift in balance of inflammatory and regulatory responses. The polymorphisms presented here have been widely studied, and have been found to play a role in several infectious and autoimmune diseases.
OBJECTIVE
To present the frequency of SNPs of some immune response genes in a population sample from São Paulo city.
METHODS
Subjects
Data in this study were gathered from a sample of healthy individuals, non-HLA identical siblings of bone marrow transplant recipients from the Hospital das Clínicas da Faculdade de Medicina da Universidade de São Paulo, obtained between 1998 and 2005. These individuals underwent clinical and laboratory screening as possible donors, and were deemed fit for donation, but were dismissed due to MHC incompatibility with respective recipients. Samples were obtained after informed consent and permission from the Ethics Committee of the hospital. Since the sample number varied for each SNP analyzed, numbers (n) are shown in tables 1–5.
Table 1 Genotype and allele frequencies of innate immunity gene polymorphisms in a sample of healthy individuals
SNP | Alleles | Genotypes (%) | AF |
---|---|---|---|
TLR4 +896, n = 255 | AA | 223 (87.4) | 0.94 |
AG | 31 (12 2) | ||
GG | 1 (0 4) | 0.06 | |
TLR5 +1174, n = 279 | CC | 261 (93.5) | 0.97 |
CT | 17 (6.1) | ||
TT | 1 (0 4) | 0.03 | |
MBL-550, n = 281 | CC | 1 17 (41 .5) | 0.65 |
CG | 134 (48) | ||
GG | 30 (10.5) | 0.35 | |
MBL-221, n = 281 | CC | 10 (4.0) | 0.16 |
CG | 69 (24) | ||
GG | 202 (72) | 0.84 | |
MBL +4, n = 281 | CC | 158 (56) | 0.74 |
CT | 101 (36) | ||
TT | 22 (8.0) | 0.26 | |
MBL 52*, n = 281 | CC | 259 (92.1) | 0.96 |
CT | 21 (7.5) | ||
TT | 1 (0.4) | 0.04 | |
V1BL 54*, n = 281 | AA | 6(2) | 0.14 |
GA | 64 (23) | ||
GG | 210 (75) | 0.86 | |
V1BL 57*, n = 281 | GG | 262 (93) | 0.96 |
GA | 19(7) | ||
AA | 0 | 0.04 | |
V1BL A/O**, n = 281 | A/A | 176 (63) | 0.80 |
A/O | 91 (32) | ||
O/O | 14(5) | 0.20 | |
MASP2 120, n = 281 | GG | 269 (96) | 0.98 |
GA | 12 (4) | ||
AA | 0 | 0.02 |
SNP: single nucleotide polymorphism; AF: allele frequencies.
*codon;
**C+D+E, which corresponds to variants in positions 52,54,57.
Table 2 Genotype and allele frequencies of IL-10 and IL-10 receptor gene polymorphisms in a sample of healthy individuals
SNP | Alleles | Genotypes (%) | AF |
---|---|---|---|
IL-10-592, n = 278 | AA | 38 (14) | 0.35 |
AC | 119 (43) | ||
CC | 121 (43) | 0.65 | |
IL-10-819, n = 278 | CC | 120 (43) | 0.65 |
TC | 120 (43) | ||
TT | 38 (14) | 0.35 | |
IL-10-1082, n = 278 | AA | 137 (49.3) | 0.7 |
AC | 115 (41 .4) | ||
CC | 26 (9.3) | 0.3 | |
IL10 −2763, n = 278 | AA | 29 (10) | 0.25 |
AC | S3 (30) | ||
CC | 166 (60) | 0.75 | |
IL-1 0-2849, n = 278 | AA | 15 (5.4) | 0.18 |
AG | 68 (24 5) | ||
GG | 195 (70.1) | 0.82 | |
IL-1 0-3575, n = 278 | TT | 163 (59) | 0.76 |
TA | 98 (35) | ||
AA | 17(6) | 0.24 | |
IL10R 138*, n = 265 | AA | 211 (80) | 0.9 |
AG | 49 (18) | ||
GG | 5(2) | 0.1 | |
IL10R 330*, n = 259 | GG | 151 (58 3) | 0.76 |
AG | 94 (36.3) | ||
AA | 14 (5.4) | 0.24 |
SNP: single nucleotide polymorphism; AF: allele frequencies.
*codon.
Table 3 Genotypes and allele frequencies (AF) of MHC III gene polymorphisms in a sample of healthy individuals
SNP | Alleles | Genotypes (%) | AF |
---|---|---|---|
TNFA −238, n = 281 | AA | 1 (0.4) | 0.O4 |
AG | 18 (6.4) | ||
GG | 262 (93.2) | 0.96 | |
TNFA −308, n = 281 | AA | 4 (1) | 0.1 |
AG | 45 (16) | ||
GG | 232 (83) | 0.9 | |
LTA +80, n = 273 | CC | 116 (42) | 0.6 |
AC | 114 (42) | ||
AA | 43 (16) | 0.4 | |
LTA +252, n = 279 | AA | 132 (47.3) | 0.7 |
AG | 118 (42.3) | ||
GG | 29 (10.4) | 0.3 | |
NFKBIL1 −63, n = 276 | AA | 38 (14) | 0.35 |
AT | 11 8 (43) | ||
TT | 120 (43) | 0.65 | |
BAT1 −22, n = 265 | GG | 127 (48) | 0.7 |
CG | 110 (41.5) | ||
CC | 28 (1.O5) | 0.3 | |
BAT1 −348, n = 266 | CC | 192 (72) | 0.85 |
CT | 68 (26) | ||
TT | 6 (2) | 0.15 | |
LST1 +290, n = 141 | AA | 7 (5) | 0.O9 |
AG | 10 (7) | ||
GG | 124 (88) | 0.91 |
SNP: single nucleotide polymorphism; AF: allele frequencies.
Table 4 Genotypes and allele frequencies of IL-4, IL-5, IL-13, and receptor gene polymorphisms in a sample of healthy individuals
SNP | Alleles | Genotypes (%) | AF |
---|---|---|---|
IL5 - 746, n = 128 | CC | 20 (16) | 0.43 |
CT | 71 (55) | ||
TT | 37 (29) | 0.57 | |
IL4 −589, n =188 | TT | 21 (11) | 0.31 |
CT | 68 (38) | ||
CC | 91 (51) | 0.69 | |
IL4 +33, n = 22C | TT | 30 (14) | 0.31 |
CT | 77 (35) | ||
CC | 113 (51) | 0.69 | |
IL4 +3017, n = 86 | GG | 15(174) | 0.51 |
GT | 58 (67.4) | ||
TT | 13 (15 2) | 0.49 | |
L13 +2044, n = 160 | AA | 5 (4.0) | 0.2 |
AG | 60 (37.0) | ||
GG | 94 (59.0) | 0.8 | |
IL4R +223, n = 212 | A/A | 49 (19) | 0.50 |
A/G | 129 (61) | ||
G/G | 43 (20) | 0.50 | |
IL13RA1 +1398, n = 85 | AA | 57 (67) | 0.76 |
AG | 16 (19) | ||
GG | 12 (14) | 0 24 |
SNP: single nucleotide polymorphism; AF: allele frequencies.
Table 5 Genotypes and allele frequencies of several immune response gene polymorphisms in a sample of healthy individuals
SNP | Alleles | Genotypes (%) | AF |
---|---|---|---|
IL6 −174, n = 264 | GG | 168 (64) | 0.8 |
CG | S3 (31) | ||
CC | 13 (5) | 0.2 | |
INFG +874, n = 273 | AA | 95 (35) | 0.6 |
AT | 129 (47) | ||
TT | 49 (18) | 0.4 | |
IL12B +1 188, n = 266 | AA | 134 (50) | 0.7 |
AC | 109 (41) | ||
CC | 23 (9) | 0.3 | |
CCR5Δ32 , n = 278 | WT*/WT | 252 (90) | 0.95 |
DEL**/WT | 24 (9) | ||
DEL/DEL | 2 (1) | 0.O5 | |
MCP1 - 2518, n = 256 | AA | 122 (48) | 0.7 |
AG | 103 (40) | ||
GG | 31 (12) | 0.3 | |
CTLA4-318, n = 217 | CC | 191 (88) | 0.93 |
CT | 23 (11) | ||
TT | 3 (1) | 0.07 | |
CTLA4 +49, n = 228 | AA | 94 (41) | 0.66 |
AG | 113 (50) | ||
GG | 21 (9) | 0.34 | |
CTLA4 CT60, n = 181 | AA | 42 (23) | 0.47 |
AG | 37 (48) | ||
GG | 52 (29) | 0.53 |
SNP: single nucleotide polymorphism; AF: allele frequencies:
*wild type;
**deletion.
DNA extraction and genotyping
Blood samples were drawn and DNA was extracted by dodecyltrimethylammonium bromide/ cetyltrimethylammonium bromide (DTAB/CTAB)(26) or alternatively, by salting-out methods(27).
RFLP-PCR genotyping
There was no deviation from expected Hardy-Weinberg proportions in any of the genes analyzed. For genotyping of all SNPs, 100 ng genomic DNA was used. Polymorphisms were typed by PCR-RFLP as described elsewhere (additional information on the SNPs presented is available in Appendix 2 and upon request). Briefly, PCRs were performed in a final volume of 25 µL containing 100 ng genomic DNA, 40 uM of dNTP and 0.2 U of Taq polymerase, 1.5 mM of MgCl2, 0.25 pM of each primer. In some cases, protocols employed 2.0 mM of MgCl2 and 0.5 pM of each primer. PCR was usually carried out with an initial 5-minute denaturation step at 95°C followed by 35 cycles at 95°C for 20 seconds, annealing for 30 seconds, followed by an extension at 72°C for 20 seconds and a final extension step of 5 to 7 minutes at 72°C. An aliquot of 10 uL of the PCR product was digested for 3 hours with the specified restriction enzyme (New England Biolabs), in a total volume of 20 uL at the temperature specified by the manufacturer. Digested products were separated by electrophoresis on 2 to 4% agarose gel, stained with ethidium bromide, and visualized under ultraviolet (UV) light.
RESULTS
Allele and genotype distribution of 41 different gene polymorphisms, mostly cytokines, but also including other immune response genes, are shown in tables 1–5.
DISCUSSION
In this article we present a series of allele and genotype frequencies of known and novel immune response genes. Though these genes show a modest contribution to the overall phenotype, it is important to detail the effects each gene has in the development and progression of a given disease. The sum of multiple genetic and environmental factors leads to different clinical presentations and therapeutic responses in each patient(28). Thus, the study of significant numbers of patients carrying the same disease, as well as the comparison between similar diseases (for example, autoimmune diseases), opens the way to identify relevant mechanisms in their pathophysiology. Genetic polymorphisms, such as those shown in this article, were associated with a variety of autoimmune, inflammatory, and infectious diseases, ranging from celiac disease and rheumatoid arthritis to acute myocardial infarction, Chagas’ disease, and viral hepatitis.
The majority of polymorphic sites in the genome is common in populations worldwide, and variants exhibit moderate frequency(29), implying that most have gone through a balancing selection, that is, they have been preserved because in addition to imparting susceptibility to certain diseases, they also have a beneficial role according to the environmental background of the populations. Two important points should be made concerning some of the polymorphisms we have studied that showed very low frequencies (for example TNF-α238 and TLR5 +1174). Low frequencies impact the statistical power and a greatly increased number of samples need to be examined to reach significance in association studies. When the impact of the variant upon a phenotype is low, the issue is further complicated. In candidate gene studies, in which cases and controls typically number only in the hundreds, this is an important issue and should be taken into account when choosing target genes. On the other hand, genes with greater effects can be reliably analyzed.
Some considerations may help circumvent or lessen the impact of the issues pointed out here before: first and foremost, a robust hypothesis based on clinical evidence is needed. It should be noted that while genome-wide screening does not employ a priori hypotheses, case-control studies will benefit from correlation with data obtained through careful clinical follow-up and detailed laboratory data. The choices of additional markers in the same gene or chromosome region, the analysis in subsequent independent samples, the use of two to four times more controls than patient samples, and care to avoid hidden population structure, which can result in false differences, are additional points to be taken into account. Although many claims of associations have been published, few are subsequently replicated, a problem affecting GWAS studies as well(2,30).
However, as pointed out by Eric Lander et al.(2), there is still a role for association studies. The primary value of genetic mapping is not risk prediction, but providing novel insights about mechanisms of disease. Knowledge of disease pathways can suggest strategies for prevention, diagnosis, and therapy.
CONCLUSION
Finally, though ancestry was not defined in our study population, we believe that the data presented here can be of great value for case-control studies, to define which polymorphisms are present in biologically relevant frequencies, and to assess targets for therapeutic intervention in polygenic diseases with a component of immune and inflammatory responses.