Abstract
We report a sampling strategy based on Mendelian Breeding Units (MBUs), representing an interbreeding group of individuals sharing a common gene pool. The identification of MBUs is crucial for case-control experimental design in association studies. The aim of this work was to evaluate the possible existence of bias in terms of genetic variability and haplogroup frequencies in the MBU sample, due to severe sample selection. In order to reach this goal, the MBU sampling strategy was compared to a standard selection of individuals according to their surname and place of birth. We analysed mitochondrial DNA variation (first hypervariable segment and coding region) in unrelated healthy subjects from two different areas of Sardinia: the area around the town of Cabras and the western Campidano area. No statistically significant differences were observed when the two sampling methods were compared, indicating that the stringent sample selection needed to establish a MBU does not alter original genetic variability and haplogroup distribution. Therefore, the MBU sampling strategy can be considered a useful tool in association studies of complex traits.
breeding units strategy; mtDNA haplogroup distribution; association studies
RESEARCH ARTICLE
Mendelian breeding units versus standard sampling strategies: Mitochondrial DNA variation in southwest Sardinia
Daria SannaI; Maria PalaII; Piero CossuI; Gian Luca DedolaI; Sonia MelisIII; Giovanni FresuI; Laura MorelliI; Domenica ObinuI; Giancarlo TonoloIV, V; Giannina SecchiVI; Riccardo TriunfoVII; Joseph G. LorenzVIII; Laura ScheinfeldtIX; Antonio TorroniII; Renato RobledoX; Paolo FrancalacciI
IDipartimento di Zoologia e Genetica Evoluzionistica, Università di Sassari, Sassari, Italy
IIDipartimento di Genetica e Microbiologia, Università di Pavia, Pavia, Italy
IIIDepartment of Neuroscience and Center for Neurovirology, Temple University School of Medicine, Philadelphia, Pennsylvania, USA
IVServizio di Diabetologia, Azienda ASL 2 Olbia, Olbia, Italy
VDepartment of Clinical Sciences, Lund University, Malmö, Sweden
VIServizio di Diabetologia, Università di Sassari, Sassari, Italy
VIICRS4, Centro di Ricerca, Sviluppo e Studi Superiori in Sardegna, Cagliari, Italy
VIIIDepartment of Anthropology, Central Washington University, Ellensburg, Washington, USA
IXDepartment of Genome Sciences, University of Washington, Seattle, Washington, USA
XDipartimento di Scienze e Tecnologie Biomediche, Università di Cagliari, Monserrato, Cagliari, Italy
Send correspondence to Send correspondence to: Daria Sanna Dipartimento di Zoologia e Genetica Evoluzionistica University of Sassari Via Francesco Muroni 25 07100 Sassari, Italy E-mail: darsanna@uniss.it
ABSTRACT
We report a sampling strategy based on Mendelian Breeding Units (MBUs), representing an interbreeding group of individuals sharing a common gene pool. The identification of MBUs is crucial for case-control experimental design in association studies. The aim of this work was to evaluate the possible existence of bias in terms of genetic variability and haplogroup frequencies in the MBU sample, due to severe sample selection. In order to reach this goal, the MBU sampling strategy was compared to a standard selection of individuals according to their surname and place of birth. We analysed mitochondrial DNA variation (first hypervariable segment and coding region) in unrelated healthy subjects from two different areas of Sardinia: the area around the town of Cabras and the western Campidano area. No statistically significant differences were observed when the two sampling methods were compared, indicating that the stringent sample selection needed to establish a MBU does not alter original genetic variability and haplogroup distribution. Therefore, the MBU sampling strategy can be considered a useful tool in association studies of complex traits.
Key words: breeding units strategy, mtDNA haplogroup distribution, association studies.
Introduction
Population definition, sample selection and choice of markers are crucial points in human population genetics studies, and the sampling strategy depends principally on the questions being asked. In addition to biological aspects, such studies should also take into account important socio-cultural parameters, such as language and religion, along with social and self-identity affiliation. If a human population is clearly ethnically-identified and recent admixture is negligible, sampling strategies based only on surname (whenever distinctive) and place of birth are preferred, since they allow exclusion of recent immigrants, not yet blended into the gene pool, from the analysis. Moreover, surname and place of birth criteria can be extended from the DNA donors to their ancestors, provided that genealogical information is available.
A more stringent sampling strategy is required in studies based on genome-wide association scans, which look for different allele distributions between individuals with (cases) or without (controls) a phenotype of interest. The case-control experimental design is expected to be appropriate in surveys on homogeneous populations, whereas both false-positive and false-negative results may occur in heterogeneous or substructured populations, if cases and controls are not carefully sampled according to their origin. This scenario is likely to occur in an island like Sardinia, where the majority of the present population is distributed among 363 isolated villages (Siniscalco et al., 1999) which, while sharing common ancestry, might have diversified during many centuries of isolation. Therefore, it is important to identify true Mendelian Breeding Units (MBUs), i.e. interbreeding groups of individuals sharing a common ancestral gene pool. In Sardinia, the most practical way to define a MBU is to derive a direct estimate of the percentage of endogamous mating occurring in the last 200 years. This information was obtained anonymously from municipal and ecclesiastical marriage registers (Siniscalco et al., 1999). However, rigorous sample selection for reconstructing MBUs led to a conspicuous reduction in sample size, which might have significantly skewed haplotypic or allelic frequencies. In a previous paper (Siniscalco et al, 1999), we reported a pilot study on 55 unrelated controls belonging to the MBU of Carloforte, who were genotyped at six markers. We showed there that the allele frequencies, and therefore the genomic profile, remained constant even when only a subset of 20 individuals was analysed.
The main goal of this work was to evaluate the reliability of the MBU approach in describing genetic variation in human populations, particularly regarding its application to association studies of complex traits.
We compared genetic variability in two sets of samples which included different individuals recruited from the same areas, using two diverse sampling strategies. With the Standard (STD) Method, individuals unrelated for at least two generations were selected on the basis of the surname and place of birth of their grandparents, depicting present-day genetic variation, with the sole exclusion of the most recent immigrants. Using the MBU Method, the selected DNA donors were proven to be descendants of individuals present in the 17th century archives, with no common ancestors for up to at least five generations. This was ascertained by means of a complete genealogical history checking, based on the official records made available to us by the City Halls. Samples collected using the latter method, being representative of population settlements before the migratory events of the last few centuries, allow an extension of the temporal resolution of genetic variability. Therefore, comparison of the two sampling methods might also reveal possible occurrences of diachronic genetic variation in the analysed areas, due to micro-evolutionary dynamics such as drift or gene flow from neighbouring populations.
The analysed samples belong to two different socio-cultural areas, Cabras and western Campidano, whose cultural traits differentiated around the second half of the 19th century: the former, and its neighbouring area, became a flourishing fishing centre, while the latter consists of rural villages whose economy is based on farming and sheep raising.
We studied mitochondrial DNA (mtDNA), since it has been extensively used as a molecular marker during the past 20 years, is maternally inherited, does not recombine and is in a haploid state; thus it is more sensitive than nuclear DNA to the effects of genetic drift and gene flow, and any discrepancy between the two sampling methods is expected to be enhanced.
Materials and Methods
Sample selection
Using the MBU strategy, we analysed 85 unrelated healthy subjects from two areas located in southwestern Sardinia: 35 individuals from Cabras and 50 individuals from western Campidano (Figure 1). Using the STD strategy, we analysed 71 unrelated individuals coming from the same areas. Comparison was performed between 48 samples from Cabras and its neighbouring area (up to 50 km) and 23 samples from the western Campidano area.
mtDNA analysis
Whole genomic DNA was extracted using standard procedures. For each individual, mitochondrial haplogroup affiliation was determined by both sequencing of the first hypervariable segment (HVS-I) of the control region from position 15997 to 16399 bp (Anderson et al., 1981) and RFLP (Restriction Fragment Length Polymorphism) analysis of the coding region for the presence/absence of haplogroup diagnostic markers (see Table 1 for details).
Data analysis
BioEdit software 7.0.5.2 (Hall, 1999) was used to align the sequences obtained. To characterise genetic variation among sampling sites, estimates of the number of polymorphic sites (S), the number of haplotypes (h), the nucleotide diversity (Pi), and the haplotype diversity (Hd) were obtained using the DnaSP 4.10 software (Rozas and Rozas, 1999). Pearson chi-square (χ2) values (Pearson, 1900) were calculated in order to assess whether there was any difference between the haplotype frequency distributions obtained for the same areas by means of different sampling strategies (MBU and STD). Principal Coordinate Analysis (PCoA) was carried out on the matrix of DNA pairwise differences, using the Genalex 6.3 software (Peakall and Smouse, 2006). The method based on the covariance matrix with data standardisation was applied. In order to assess the occurrence of significant genetic structuring among samples, analysis of molecular variance (AMOVA) was performed on the matrix of pairwise DNA distances among haplotypes, using the Arlequin 3.1 computer package (Excoffier et al., 2005). Furthermore, genetic differentiation between pairs of samples was estimated by pairwise ΦST values, computed from the matrix of haplotype DNA pairwise differences. The significance of variance components and F-statistic was assessed by a random permutation test (10,000 replicates).
A Median-Joining network was drawn for each sampling strategy using Network 4.2.0.1 software (http://www.fluxus-engineering.com).
Results
Nucleotide sequence analysis of HVS-I (GenBank accession numbers: HM584611-HM584695 for MBU samples, and HM594952-HM595022 for STD samples) combined with RFLP analysis allowed the clustering of samples from both MBU and STD strategies into nine main haplogroups. They increased to eleven when sub-haplogroups K and U5b3 were also considered (Table 2). Haplogroup H, which includes the Cambridge Reference Sequence (CRS) (Anderson et al., 1981), proved to be the most common. Haplogroup U5b3, reported as Sardinian-specific (Fraumene et al., 2006; Pala et al., 2009), was found in Cabras MBU, western Campidano MBU and Cabras STD, missing in western Campidano STD only. The values of genetic diversity, calculated for the dataset of HVS-I, were similar for all regions and sampling strategies considered, showing a high level of variability (Table 3). Furthermore, we found a total of 82 different haplotypes. Those whose occurrence was detected by both sampling methods (MBU and STD) showed comparable relative frequency distributions, with no significant Pearson chi-square values (Table 4).
Nucleotide sequences from the control region were combined with RFLP data on the coding region to obtain a single dataset for the following analysis.
The first two coordinates of PCoA, which account for 62.39% of the total variability, identify two main groups of haplotypes. However, haplotypes were not grouped either according to the geographic area of origin (Cabras or western Campidano) or to the sampling strategy adopted (MBU versus STD) (Figure 2).
Accordingly, the analysis of molecular variance (AMOVA) did not indicate significant genetic differentiation among samples (ΦST = 0.0096, p > 0.05). Indeed, nearly all variance was found within samples (99.04%), whereas differences among samples accounted for only 0.96% of the total variation. These results were further confirmed by the pairwise comparison of samples, which did not show any significant genetic differentiation (Table 5).
Furthermore, network analysis showed similar relationships among haplogroups without geographical structuring when the two sampling methods were compared (Figure 3).
Discussion
Estimates of genetic diversity (Table 2) obtained for the two sampling strategies were compatible with no occurrence of high levels of repeated haplotypes in the STD strategy, as could be expected. This finding supports the possible occurrence of a homogeneous population shared by both the western Campidano and Cabras areas, with a constant high level of genetic variability in the samples obtained by the two sampling methods and low levels of stochastic forces.
The similarity of genetic diversity values between areas and sampling strategies may be explained considering the lack of diachronic divergence between the present and past genetic settlement of the western Campidano and Cabras areas. Furthermore, this finding is attributable to the absence of genetic drift in the analysed areas. Indeed, this stochastic force, if present, could lead to genetic heterogeneity due to random loss of haplogroups and alteration of their frequencies. The absence of higher levels of identical haplotypes among the STD samples suggests that no significant founder effects affected the population recently. Consistently, the result of PCoA applied to the combined dataset (control region + coding region) (Figure 2) contributed to group MBU and STD samples without genetic structuring. Such similarity was also confirmed by the corresponding, not significant, P values of ΦST.
Network analysis was also consistent with the results above. The two sampling strategies displayed similar global relationships among mitochondrial haplogroups without geographical structuring, showing that mtDNA haplogroup frequencies and distribution obtained by the MBU method were not skewed by the severe sample selection of the method used.
Overall, these results suggest a lack of genetic variation in southwest Sardinia, probably due to a continuous gene flow between the areas, either in the past or more recently, which may have counterbalanced the development of microheterogeneity due to genetic drift.
Previous studies carried out on the paternal unilinear marker Y-chromosome pointed out a similar trend for the entire Sardinian population, suggesting an initial settlement of a relatively large number of individuals with a common origin (Contu et al., 2008) and conspicuous genetic variability.
The presence of genetic structuring is the major obstacle in association studies based on genome-wide scans searching for linkage disequilibrium (LD) between patients and controls (Risch and Botstein, 1996; Terwilliger and Weiss, 1998), even in isolated populations like Finns and Sardinians (Eaves et al. 2000; Taillon-Miller et al., 2000). Pooling individuals belonging to different breeding units may merge alleles that might have different frequencies in different villages, as we have previously reported for some common polymorphisms in Sardinian villages (Robledo et al., 2002).
As previously shown, in a well-defined breeding unit, a small sample was sufficient to describe the genomic profile of the population, which was not affected by severe reduction of sample size (Siniscalco et al., 1999). More importantly, the repeated application of our strategy in different MBUs offers the advantage of reducing the risk of false-positive results due to population stratification, since obtaining similar artifactual results in different MBUs is not anticipated.
In conclusion, the comparison of the variability detected by means of the MBU and STD sampling methods points to a diachronic continuity of the genetic structure of southwestern Sardinia. The benefit of the MBU sampling strategy lies in the possibility of: i) selecting the original population on the basis of written documents and not by inferring surname monophyletism, and ii) not excluding from the analysis unrelated individuals with polyphyletic surnames, when present, in the founder families.
Our results confirm that the MBU sampling strategy, despite the drastic reduction in sample size, does not introduce deviations in gene frequencies, even if haploid markers such as mtDNA are used. Therefore it can be considered a useful tool in association studies of complex traits, making it possible to infer the genetic settlement of the population, recovering the deepest branches of a genealogy and avoiding the recent contribution of foreign peopling.
Acknowledgments
We wish to thank all the participants who made this study possible. We are also grateful to Marcello Siniscalco and Marco Casu for helpful discussions and criticism and to Mary Ann Groeneweg for revising the manuscript. This work was supported by funds from the Fondazione Golinelli made available by Marcello Siniscalco, Compagnia di San Paolo (to A.T.) and grants from the Italian Ministry of Research, MIUR (funds ex 60% to R.R. and P.F.).
Received: July 8, 2010; Accepted: February 8, 2011.
Associate Editor: Angela M. Vianna-Morgante
License information: This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
- Anderson S, Bankier AT, Barrell BG, de Bruijn MHL, Coulson AR, Drouin J, Eperon IC, Nierlich DP, Roe BA, Sanger F, et al. (1981) Sequence and organization of the human mitochondrial genome. Nature 290:457-465.
- Contu D, Morelli L, Santoni F, Foster JW, Francalacci P and Cucca F (2008) Y-chromosome based evidence for pre-Neolithic origin of the genetically homogeneous but diverse Sardinian population: Inference for association scans. PLoS One 3:e1430.
- Eaves IA, Merriman TR, Barber RA, Nutland S, Tuomilehto-Wolf E, Tuomilehto J, Cucca F and Todd JA (2000) The genetically isolated populations of Finland and Sardinia may not be a panacea for linkage disequilibrium mapping of common disease genes. Nat Genet 25:320-323.
- Excoffier L, Laval G and Schneider S (2005) Arlequin v. 3.0: An integrated software package for population genetics data analysis. Evol Bioinform Online 1:47-50.
- Fraumene C, Belle EMS, Castri L, Sanna S, Mancosu G, Cosso M, Marras F, Barbujani G, Pirastu M and Angius A (2006) High resolution analysis and phylogenetic network construction using complete mtDNA sequences in Sardinian genetic isolates. Mol Biol Evol 23:2101-2111.
- Hall TA (1999) BioEdit: A user-friendly biological sequence alignment editor and analysis program for Windows 95/98/NT. Nucleic Acids Symp Ser 41:95-98.
- Pala M, Achilli A, Olivieri A, Hooshiar Kashani B, Perego UA, Sanna D, Metspalu E, Tambets K, Tamm E, Accetturo M, et al. (2009) Mitochondrial haplogroup U5b3: A distant echo of the epipaleolithic in Italy and the legacy of the early Sardinians. Am J Hum Genet 84:814-821.
- Peakall R and Smouse PE (2006) Genalex 6: Genetic analysis in Excel. Population genetic software for teaching and research. Mol Ecol Notes 6:288-295.
- Pearson K (1900) On the criterion that a given system of deviations from the probable in the case of a correlated system of variables is such that it can be reasonably supposed to have arisen from random sampling. Phil Mag Ser 5:157-175.
- Risch N and Botstein D (1996) A manic depressive history. Nat Genet 12:351-353.
- Robledo R, Orrù S, Sidoti A, Muresu R, Esposito D, Grimaldi MC, Carcassi C, Rinaldi A, Bernini L, Contu L, et al. (2002) A 9.1-kb gap in the genome reference map is shown to be a stable deletion/insertion polymorphism of ancestral origin. Genomics 80:585-592.
- Rozas J and Rozas R (1999) DnaSP v. 3: An integrated program for molecular population genetics and molecular evolution analysis. Bioinformatics 15:174-175.
- Siniscalco M, Robledo R, Bender P, Carcassi C, Contu L and Beck J (1999) Population genomics in Sardinia: A novel approach to hunt for genomic combinations underlying complex traits and diseases. Cytogenet Cell Genet 86:148-152 (and Erratum 87:296).
- Taillon-Miller P, Bauer-Sardiña I, Saccone NL, Putzel J, Laitinen T, Cao A, Kere J, Pilia G, Rice JP and Kwok PY (2000) Juxtaposed regions of extensive and minimal linkage disequilibrium in human Xq25 and Xq28. Nat Genet 25:324-328.
- Terwilliger JD and Weiss KM (1998) Linkage disequilibrium mapping of complex disease: Fantasy or reality? Curr Opin Biotechnol 9:578-594.
Publication Dates
-
Publication in this collection
02 June 2011 -
Date of issue
2011
History
-
Accepted
08 Feb 2011 -
Received
08 July 2010