From rumors to genetic isolates.

Here we propose a registration process for population genetic isolates, usually geographic clusters of genetic disorders, based on the systematic search of rumors, defined as any type of account regardless of its reliability. Systematically ascertained rumors are recorded, and validated through a progressive process of pre-established steps. This paper outlines the conceptual basis for this approach and presents the preliminary results from a rumor-based nationwide registry of genetically isolated populations, named CENISO (Censo Nacional de Isolados), operating in Brazil since 2009. During the first four years of its existence (2009-2013), a total of 191 Rumors were registered and validated, resulting in a prevalence rate of one per million inhabitants of Brazil. When the five statutory geographic regions of Brazil were considered, more Rumors were registered for the Northeast (2.11; 1.74-2.54 per 10(6)) than for the remaining four regions, North, Center-West, Southeast, and South, which did not differ among themselves. About half (86/191) of the recorded rumors were proven to be geographic clusters; of these disorders, 58 were autosomal recessive, 17 autosomal dominant, 5 X-linked, 3 multifactorial, and one environmental (thalidomide embryopathy).


Introduction
Rumors, clusters, alarms, and endemics or epidemics constitute a semi-continuous chain of events, some of which could be difficult to categorize, particularly those on the border between two categories. Working definitions for these different terms in the ECLAMC (Latin American Collaborative Study of Congenital Malformations) have been published elsewhere (Castilla and Orioli, 2004). More authorized significances can be found in Last's dictionary of epidemiology (Porta, 2008).
Since 1967, the ECLAMC program has conducted birth defect surveillance aimed at the detection and investigation of unusual occurrences in time and/or space. For time clusters, or epidemics, routine monitoring is performed, and quarterly data are compared against other equivalent surveillance systems through the International Clearinghouse for Birth Defects Surveillance and Research (ICBDSR, 2011). In addition, Focus, a continuous protocol for the study of space clusters, or endemics, based on the systematic evaluation of rumors, has been on-going since 1967 in all South-American countries, including Brazil (Castilla and Orioli, 2004).
The complete and systematic survey of genetically isolated populations and related geographic clusters of genetic disorders has been performed for small and welldefined populations, such as those in Finland (de la Chapelle, 1993), Israel (Zlotogora et al., 2009), and the American Old Order Amish (Strauss and Puffenberger, 2009). Conversely, a similar study of the population of a large country has never been attempted to the authors' knowledge.
With a population of close to 200 million inhabitants, the population of Brazil makes up one half of South America, and one third of Latin America. The genetic structure of Brazilian populations had been thoroughly investigated since the late 1950's through the study of the frequency of consanguineous marriages in Roman Catholic Church records by Newton Freire-Maia (Freire-Maia, 1958;Freire-Maia and Freire-Maia, 1961), as well as by the quantification of the ethnic admixture from biological markers in northeastern Brazilian populations by Newton Morton and Henrique Krieger (Morton, 1964;Krieger et al., 1965), among other methodologies. Furthermore, geographic clusters of some single gene diseases in Brazil have been known about for many decades. The pioneer publications include the studies of new mutations for Grebe's achondrogenesis (OMIM #200700) in the state of Bahia (Quel-ce-Salgado, 1964); acheiropodia (OMIM #200500) in the state of Minas Gerais (Freire-Maia, 1975, 1981; and oculo-cutaneous albinism (OMIM #203200) in Ilha dos Lençóis (Freire-Maia et al., 1978). Nevertheless, no systematic survey of genetically isolated populations, and/or geographic clusters of genetic disorders, has ever been performed in Brazil on a nationwide level.
In 2008, the Brazilian government created the 'Instituto Nacional de Genética Médica Populacional (INAGEMP)' for the study of medical genetics at the population level. Within the framework of this institute, a specific program for the study of genetically isolated populations was started in 2009 under the name CENISO, a Portuguese acronym for National Census of Isolates: 'Censo Nacional de Isolados'. CENISO is a surveillance system of sub-populations with endemic chronic diseases, most of them Mendelian disorders, aimed at their study, needs assessment, prevention and care.
This work presents a method for searching for genetic isolates and geographic clusters of genetic conditions based on the systematic collection, recording and validation of any rumors, regardless their source and reliability, as well as the results of its application in Brazilian populations.

Data collection
The data included here correspond to the first 48 months of operation of the CENISO program, from its start in August 2009 until the date of the last rumor included in this report: July 31, 2013.

Geographic distribution
Brazil is officially subdivided into more than 5,500 counties or municipalities, which are grouped into 27 States, and assembled into 5 Regions. Denominators for local population size were obtained from the Brazilian Institute of Geography and Statistics (IBGE) (Brazilian Institute of Geography and Statistics, 2013), and expected values for disease prevalence rates from the Unique Health System (SUS) (DATASUS, Brazilian Universal Health Service, 2013). Due to the limited sample size of less than two hundred observations, only Regions were considered for grouping localities in this work. Each locality was defined by their geographic coordinates at the degree and minute level. For space areas, such as municipalities or states, coordinates of their capital or administrative head localities were used.

Illnesses and their etiologies
Illnesses were defined by their OMIM code number www.omim.org whenever possible. Furthermore, ICBDX-BPA (International Classification of Diseases, Tenth edition, extended to a fifth code digit by the British Paediatric Association), was used when a specific coding slot was available.

Definitions
Rumors (gossip, hearsay) are any type of account, oral or written, of the unusual occurrence of a given fact, which in our case would be a suspected isolated population, and/or the unusual occurrence of a genetic or malformation disorder.
A geographic cluster is defined by a prevalence rate higher than expected (as determined from comparable population data) for a given disorder, in a population living in a defined geographic area, over a long period of time. However, for genetic diseases, this definition was reformulated, as the diseases in question are usually very rare, and expected prevalence rates are commonly unknown. Because of asymptomatic heterozygotes for recessive conditions, and non-penetrants for dominant conditions, what actually needs to be considered is the frequency of genes and genotypes rather than that of phenotypes (Castilla and Orioli, 2004) Rumor Collection: The core of this program is the regular ongoing collection of rumors, as defined above, in a given population. The search for these rumors is proactive, by disseminating the question: "do you know any population with genetic problems?" This direct, though loose, question was initially disseminated through two main channels: national genetics meetings, and the internet. The former included the distribution of a simple reporting form with the collaboration of the Brazilian Society of Genetics, and the Brazilian Society of Medical Genetics, while the latter is carried out through the availability of open reporting access to the INAGEMP webpage http://www.INAGEMP.bio.br.
Rumor Validation: As rumors are groundless by definition, and most of them will prove to be false, validation efforts should be proportional to the reliability of the rumor. Registration is performed without any exclusion criteria, and a validation process, which is structured in four progressive phases, is then applied to registered rumors.

Phases of the study
Phase I is the registration of the Rumor, which is entered into the rumor registry. This is open to public observation at www.INAGEMP.bio.br through the link CENISO, and the button 'PARA VER AS POPULAÇÕES JÁ RE-GISTRADAS (pdf)': 'To See Already Registered Populations (pdf).
Phase II is the definition of the rumor. A simple one-page form is sent to the reporting person, for the recording of name, birth year, birth place, and denomination of the anomaly type for each known affected individual in the suspected Cluster. Denominators for the reported sample are then estimated from statutory databases according to a given space-time framework for the expected incidence rate for the reported condition. If the rumor is substantiated by the provided information, Phase III is then initiated.
Phase III includes further definition and delineation of the rumor, through a brief site-visit. One or two medical geneticists from the CENISO staff visit the involved population with these three objectives: 1) to confirm the cluster, 2) to observe the population general conditions in loco, 3) to establish local contact with persons and institutions. A recording form with basic information is filled out for later discussion with the CENISO staff. This form includes information about the involved disease details, and diagnostic certainty, local resources, including hospitals, day care centers, physicians, nurses, and social workers, parochial and civil registration books, local natural leaders, community perception of the problem, proposed strategies and action plans. Due to the long distances usually involved in travelling in a country as large as Brazil, there are reserved funds at INAGEMP to finance these campaigns.
Phase IV is the development of a research project, if justified. If the cluster is already being studied by other research groups, collaboration and/or support is offered if needed. Community aspects are discussed with the leaders and support is provided after needs are assessed.

Statistical analysis
Confidence intervals at a critical value of 5% were used in this study.

Ethics considerations
Data included here refer to identified human subpopulations in which all individual human subjects were anonymized at the initial registration phase. Brazilian legislation (Resolução CNS 466/2012) does not require IRB approval for data obtained from public databases, as is the case for CENISO.

Results
After excluding three repeated and 13 irrelevant entries, 191 of 207 reported rumors were registered in the CENISO database during the initial 48 months of operation. A summary of the whole database is available at the INAGEMP-CENISO webpage, http://www. INAGEMP.bio.br/sis/produtos/CENISO_planilha.pdf#. Figure 1 illustrates the geographic distribution of the 191 reported rumors, 86 of which were already confirmed as clusters, in the five official geographic regions of Brazil, which is further analyzed in Table 1.
While the observed total prevalence rate of reported rumors was one per million inhabitants (0.989; 95% CI 0.854-1.140), the northeastern Region (NE), had a significant higher prevalence rate of rumors (2.110; CI: 1.740-2.540) than the other four regions, north (N), center-west (CW), southeast (SE), and south (S), which did not differ among themselves. Furthermore, this difference was also true for clusters, for which NE had a higher prevalence (1.017 per million; CI: 0.764-1.330) than the remaining four regions (See Table 1).
The reported 191 rumors represented 86 proven geographic clusters, involving 58 autosomal recessive disorders, 17 autosomal dominant disorders, 5 X-linked disorders, 3 multifactorial disorders, and 1 environmental disorder. The environmental disorder is thalidomide embryopathy, resulting from a drug used for the treatment of leprosy in endemic areas of Brazil (Vianna et al., 2013). Autosomal recessive disorders were the most prevalent rumors [0.534 per million inhabitants (CI: 0.435-0.647)] and clusters [0.300 per million inhabitants (CI: 0.228-0.388)] compared with the remaining etiologic groups: autosomal dominant, X-linked, multifactorial, environmental, and unspecified etiology (Table 2).
Likewise, the NE Region had a significant higher prevalence rate of rumors for autosomal recessive disorders per million inhabitants (1.280; CI: 0.995-1.620) than the other four regions. Nevertheless, this difference could not be substantiated for proven clusters (Table 3).
The 86 identified clusters involve 70 different disorders, with 16 instances of repetition of the same condition 188 From rumors to isolates clustered in more than one geographic location, none of them contiguous (Table 4). Fifty one of the 70 identified diseases are autosomal recessive traits, 12 are autosomal dominant, 5 are X-linked recessive, and 2 are unknown. The 13 disorders involved in more than one cluster are very rare conditions. All but one produce autosomal recessive phenotypes, with expected gene prevalence rates below one percent in the population at large. These conditions are (MIM codes in parenthesis): Spinocerebellar Ataxia Type 3 (109150), Albinism, oculocutaneous (203100), Fraser Syndrome (219000)

Discussion
To the authors' best knowledge, a nationwide, systematic register of genetic clusters, such as CENISO in      Brazil, is an almost unique program, even considering some similar, albeit not equal, ongoing programs in other parts of the world, such as the aforementioned studies of Amish and Mennonites in the US and Canada (Strauss and Puffenberge, 2009;Rider et al., 2011), Arabs in Israel (Zlotogora et al., 2009), and isolates in Finland (de la Chapelle, 1993). The purpose of CENISO is to base the building of a registry on sensitive and imprecise rumors, provided that validation by a cost-benefit efficient process is used. This system could be applied to other populations, not only for the registration and surveillance of endemic areas but also for outbreaks of diseases caused by environmental agents.
About half (86/191: 45.0%) of the CENISO-registered rumors were confirmed as geographic clusters, while one third (57/191: 29.8%) has not yet reached validation phase-2 for Rumor definition, and therefore are not yet confirmed. While low specificity is expected for this searching approach, based on extensive recording of unselected anecdotes, this low validation rate could also reflect a passive response rate from the CENISO coordination during the reported period. Nevertheless, unlike time/space clusters of diseases, also known as epidemics (Williams et al., 2002), most genetic etiologic factors are rather stable over time, undergoing gradual variations through generations, mainly due to cultural changes in reproductive patterns within genetic population isolates. Existence of space clusters over such a long period reasonably explains their higher reliability when compared with time/space clusters (Williams et al., 2002).
More reported rumors were located in the Northeastern region than in the rest of Brazil, a finding which is most likely related to the reported higher prevalence of consanguineous marriages in this region (Freire-Maia, 1958;Krieger et al., 1965;Weller et al., 2012). With only approximately 15 million inhabitants each, two of the five Brazilian geographic regions, North and Central-West are markedly less populated then the Northeast, with a population of 53 million. However, the low concentration of clusters in the remaining two regions cannot be explained by small population size, namely, the South with 28, and the Southeast with 82 million residents.
Because most genetically isolated sub-populations, for geographic or cultural reasons, are also inbred, the observed higher involvement of autosomal recessive phenotypes is expected. Nevertheless, autosomal dominant and X-linked, as well as oligo or polygenic traits, are also observed in the registered clusters in spite of not being directly related to inbreeding. One possibly associated factor in these situations is the readiness to recognize familial aggregation in small, defined populations where acquaintance of relatives and knowledge of ancestors through several generations is more complete compared to families living in urban large cities. Another explanation is the low mobility of isolated populations, so large pedigrees with domi-nant or X-linked mutations are concentrated on small geographic areas. The aniridia cluster in Alagoas is an example of this phenomenon (Fernandes-Lima et al., 2013). Furthermore, small isolates could have randomly selected specific mutations during successive genetic drifts starting from a small number of founders and going through reductions on the effective population size in periods of high mortality in the past, as reported for the Finnish population (de la Chapelle, 1993).
More recently, Moreau et al. (2011) have shown a selective advantage for people in the wave front of population range expansions in Quebec, Canada. This differential reproduction in wave front migrating individuals can lead to substantial changes in gene frequency in the derived populations. Considering the historical patterns of colonization and internal migrations in Brazil, we could expect that phenomena similar to those seen in Canada might have happened here as well. The cancer clusters in southern Brazil (Achatz et al., 2009) could be an example of such a phenomenon, but further investigation is needed to test this hypothesis.
One drawback of this report is our ignorance of the completeness of ascertainment of clusters in Brazil. This will most likely come with time as the current register continues operating, with the essential collaboration from the users, as stated for the Israeli National Genetic Database by Zlotogora et al. (2009).
Expansion of this registry of rumors and clusters to the rest of South-America, first, and the whole of Latin America, later, is planned at CENISO after the Brazilian registry is consolidated. This extension to other countries will mainly, though not exclusively, use the ECLAMC hospital network, an INAGEMP program described elsewhere (Castilla and Orioli, 2004).