Social Epidemiology of a Large Outbreak of Chickenpox in the Colombian Sugar Cane Producer Region: a Set Theory-based Analysis

There are few social epidemiologic studies on chickenpox outbreaks, although previous findings suggested the important role of social determinants. This study describes the context of a large outbreak of chickenpox in the Cauca Valley region, Colombia (2003 to 2007), with an emphasis on macro-determinants. We explored the temporal trends in chickenpox incidence in 42 municipalities to identify the places with higher occurrences. We analyzed municipal characteristics (education quality, vaccination coverage, performance of health care services, violence-related immigration, and area size of planted sugar cane) through analyses based on set theory. Edwards-Venn diagrams were used to present the main findings. The results indicated that three municipalities had higher incidences and that poor quality education was the attribute most prone to a higher incidence. Potential use of set theory for exploratory outbreak analyses is discussed. It is a tool potentially useful to contrast units when only small sample sizes are available.

Social epidemiology of a large outbreak of chickenpox in the Colombian sugar cane producer region: a set theory-based analysis Epidemiología social de una gran epidemia de varicela en la región colombiana productora de caña de azúcar: un análisis basado en teoría de conjuntos 1 Instituto Nacional de Salud Pública, Cuernavaca, México. 2 Hospital General Regional, Instituto Mexicano del Seguro Social, Cuernavaca, México.

Introduction
Chickenpox, or varicella, is a benign disease caused by infection with the varicella zoster virus (VZV).In general, cases of chickenpox appear among children between the ages of 1 and 14 1 , and when the infection occurs in adolescents or adults the severity is higher than in children.In addition, it is potentially more frequent among immunosuppressed individuals 2 .Migration is an important risk factor associated with the occurrence of chickenpox.When adults not exposed to VZV migrate to regions where chickenpox is endemic, the risk of infection is high 3 .This has been described as occurring when an individual interacts with others in schools, homes and shopping centers 4 .
Studies from Serbia and Montenegro 5 , Puerto Rico 6 , St. Lucia-West Indies 7 , India, Southeast Asia 8 , and Somalia 9 , and with US Navy and Marine Corps recruits in island territories 10 report that chickenpox is a serious disease among adults in tropical climates, where seroprevalence is lower.However, a recent study in Australia reported that social and cultural characteristics are more significant than climate for VZV transmission 4 , suggesting overlaps with some of the determinants of chickenpox outbreaks, which unfortunately are not known.
In the Cauca Valley region of Colombia, a large outbreak of chickenpox was observed and documented between 2003 and 2007 11 .This outbreak affected children and adults in similar proportions.However, the causes of this epidemic were not explored.The region involved in this outbreak is located in western Colombia, which includes the Pacific Ocean coast, between 3° 05' and 5° 01' latitude N, 75° 42' and 77° 33' longitude W. The geography of the Cauca Valley region is varied, with coasts, mountains, jungles, and a very fertile valley.In the plains region, sugar cane constitutes the main crop, which is highly important since this relatively small region produces roughly 1.7% of the world's sugar.
The objective of this study was to describe the context of the large outbreak of chickenpox in this region, with special emphasis on any macrodeterminants potentially related with incidence.Our a priori hypothesis was that the chickenpox outbreak was related to problems with the performance of health care services, lower educational levels among the population, the migration of vulnerable peoples, and/or changes in sugar cane-related processes.Since only a small sample size was available, it was decided to explore the use of set-theory methods to contrast municipalities.With this approach a simple method to link social epidemiology with field epidemiology was tested.Usual methods in social epidemiology include, for example, multilevel analysis or complex multivariate analyses, so the analysis is usually performed by experts.

Methods
A case series study was conducted, with the 42 Cauca Valley municipalities serving as observation units.Agency registries from the epidemiological surveillance system were used to obtain the number of clinical cases between 2003 and 2007.These data originated from the weekly reports of mandatory notification events.According to the Colombian epidemiological surveillance program, a case of chickenpox is clinically defined when a patient has mild to moderate fever with a few general symptoms associated with maculopapular and vesicular lesions that form granulose crusts 12 .

Potential contextual determinants
To explore the social and physical environment, five robust indicators were used: (i) vaccination coverage, (ii) production function in a subsidized regime, (iii) education quality, (iv) area size of planted sugar cane, and (v) violence-related immigration.The first three indicators were extracted from the Cauca Valley's 2006 Municipal Management Report 13 ; the fourth was derived from agricultural data contained in Cauca Valley official statistics; and the last indicator was taken from the Colombian registries of displaced individuals.These indicators were selected as proxy variables related with our study hypothesis.
Vaccination coverage is an indicator based on a mass immunization plan (Plan Ampliado de Inmunización, PAI, in Spanish) 13 .The outcome used for this indicator is the measles, mumps and rubella vaccine (MMR triple viral), which is considered to be a good indicator because it requires only one dose at one year old, and it is administered after the child has had all previous vaccines in the vaccination scheme.This indicator reflects the performance of primary health care services since good health care coverage is reflected by a municipality having a high percentage of vaccinations.Previous studies have reported a decrease in vaccination coverage over the last 15 years 14,15,16 .
The production function in a subsidized regime is a composite indicator.It relates the economic resources from all financial sources with the expenditure on health care personnel dedicated to identify and insure the most vulnerable families, or to carry out stewardship activities 13 .It was calculated as the ratio between individuals in a subsidized regimen/total municipal population x 100.The Colombian health system is based on managed care and, therefore, the separation of financing and the provision of health care functions are the principles used to promote cost-efficiency 17 .The system has two types of affiliation: contributory and subsidized regime.The former covers those who have the ability to pay (people with full or partial employment) and the latter provides services to those who are not able to pay the necessary contributions (indigent and unemployed people).In this context, good performance in a subsidized regime theoretically has a high percentage of enrollement 13 .
Education quality is an indicator constructed using the percentage of students with medium, high or very high scores on the national exam administered by the Colombian Institute for the Development of Higher Education (Instituto Colombiano para el Fomento de la Educación Superior, ICFES, in Spanish) 13 .It is expressed as a percentage and a municipality has better performance when a high percentage of students have medium or high scores.The area size (hectares) of planted sugar cane is an indicator of the first process in the production of sugar and ethanol to produce fuel.Data used in this analysis were extracted from the Cauca Valley's 2007 official statistics for agricultural assessment.Original data are collected by the Regional Agricultural Planning Unit (Unidad Regional de Planificación Agropecuaria, URPA, in Spanish), and these are available at the official webpage (Evaluaciones agricolas 2000-2009.http://www.valledelcauca.gov.co/agricultura/publicaciones.php?id=1966, accessed on 25/Apr/2009).
Violence-related migration in Colombia is a complex demographic process.It is characterized as either a protracted internal displacement for which the processes of finding lasting solutions have stalled and/or when displaced individuals are marginalized as a consequence of violations or a lack of protection of human rights, including economic, social and cultural rights 18 .According to some authors, threats by armed actors are the proximal cause of forced displacement and problems related to land possession are the distal cause 19 .Said displacement increases the demand for basic services and infrastructure to satisfy migrants' needs in the reception municipality.Immigration data used for this study were extracted from the official government registries collected by Acción Social (Estadísticas de la población desplazada.http://www.accionsocial.gov.co/Estadisticas/publicacion%20marzo%20 31%202009.htm,accessed on 14/May/2009).
Table 1 summarizes these characteristics according to municipality.Note the heterogeneity among the municipalities.In general, the higher percentages for the attributes studied occur in Cali and other municipalities with greater population density and degree of urbanization.

Data analysis
First, a graphical description of the outbreak was developed based on incidence rates (per 100,000 inhabitants), using data from epidemiologic surveillance as the numerators and the official population estimates of each municipality as denominators.This procedure identified the municipalities with the higher incidences.After two basic set operations, intersection (u) and union (U), were used to identify specific municipal characteristics potentially related to the occurrence of the outbreak.Intersection is defined as the set whose elements are elements of all the sets involved, and union as the set whose elements are elements of at least one of the sets involved in the operation.
Certain sets were characterized using these operations.The interpretation was based on rules described in Table 2. Sets were described per extension (a list with all its elements inside curly brackets), or intension (using a notation to denote the set containing all elements with the condition).In some cases, Venn or Edwards-Venn diagrams were used 20 .To facilitate the interpretation, the concepts of "sufficient determinant" and "necessary determinant" in a manner similar to the Susser and Rothman's causality frameworks were used 21,22 .However, it is important to remember that our analysis does not establish causality but, rather, contrasts contexts, which is one of the three proposed uses of small-N studies 23 .In our case, when a determinant is required to be present it is a "sufficient determinant", and when without its presence the higher incidence doesn't occur it is a "necessary determinant".These methods were considered appropriate since there were few observations available for a formal multiple statistical analysis.

Results
The chickenpox outbreak is summarized in Figure 1.It was observed that the higher incidences occurred in 2006 and 2007.During these years it was evident that Pradera, El Dovio and Ulloa were the municipalities with the highest incidences of cases (the peaks were higher than 1,000 cases per 100,000 inhabitants), though the first high peak occurred in Ulloa in 2006.Other municipalities with a high number of cases (> 500 cases per 100,000 inhabitants) at any moment during the outbreak were Calima, La Cumbre, Obando, Restrepo, Riofrio, Roldanillo, Vijes and Zarzal.Note that in 2006, Pradera was the first municipality with incidences higher than 1,000 cases per 100,000 inhabitants.El Dovio and Ulloa did not experience this incidence level until 2007.
The following analysis is intended to identify the specific characteristics of Pradera, El Dovio and Ulloa, the municipalities with the highest incidences.Set analysis to identify specific characteristics allowed for the observation of some interesting aggregations (Figure 2).There was a specific set for Pradera characterized by: education quality less than 50%, vaccination coverage more than 80%, inconsistent data on subsidized regime affiliation, more than 10,000 hectares dedicated to sugar cane plantation, and violence-related immigration between 25 and 50 per 10,000 inhabitants (A u H u I u R u V).El Dovio had specific characteristics which included education quality less than 50%, vaccination coverage between 50-80%, subsidized regime between 65-75%, no sugar cane plantations, and violence-related immigration between 26-50 per 10,000 inhabitants (A u G u K u N u V).Ulloa was a municipality characterized by education quality higher than 90%, vaccination coverage more than 80%, subsidized regime more than 90%, no sugar cane plantations, and violence-related immigration higher than 75 per 10,000 inhabitants  Rules used to describe the sets for Cauca Valley municipalities, Colombia.

Sets Description
Education quality (%) Note that sets A, H, N, and V were present in two municipalities, and the other sets were in one municipality exclusively.However, the set A u V (lower level of education quality and an intermediate rate of violence-related immigration) is a very interesting combination of determinants that are present in Pradera and El Dovio.Finally, a complementary analysis with the other eight municipalities with higher incidences (Calima, La Cumbre, Obando, Restrepo, Riofrio, Roldanillo, Vijes and Zarzal) was conducted.The most frequent unitary sets for each determinant were: A (3/8) and D (3/8); F (4/8) and G (3/8); J (3/8); N (4/8), and X (5/8).Moreover, two intersectional sets were identified: A u J u W, which were present in Obando and Zarzal, and M u N u X, which were present in Restrepo.This latter set is also characteristic of Ulloa, as was described previously.Other sets (not shown) were identified but they were not frequent or did not change the described findings.Therefore, in this analysis there were no "sufficient" or "necessary determinants", although set A was the more important from an epidemiologic viewpoint.

Discussion
The chickenpox outbreak described herein affected more than 26,000 individuals in the Cauca Valley region.The distribution of the outbreak was bimodal, with two peaks: the first in 2004 and the second in 2007.The set analysis identified specific municipal characteristics potentially related with the outbreak.The data suggest that municipalities with poorer education quality (set A) were more prone to higher incidences.This set and its superior adjacent were in 12 (28.6%)and 22 (52.4%)municipalities, respectively.
Unfortunately, we do not know of similar studies to which to compare these results, since the classic approach to studying outbreaks tends to identify causes only in terms of biological characteristics present among individuals.The study by Pollock & Golding 24 , although it included some socioeconomic variables, did not explore contextual variables as in the present study.These authors included 21,123 British children and its main result was that social advantage was linked to patterns of susceptibility to VZV infection.However, this finding is not comparable with our results because the epidemiology of VZV in tropical climates is different.While in temperate climates children are the more affected population, in tropical climates chickenpox mainly affects young adults.This observation contrasts with Cauca Valley data, where all age groups were affected, suggesting the importance of determinants other than biological attributes, such as humidity or temperature.
Another important result was the usefulness of set theory to describe and contrast contextual attributes related to the outbreak.Due to the small sample size available, the use of conventional statistical methods does not provide clear results.Moreover, it is important to remember that the utilization of statistical methods based on probabilities is useful when wanting to make inferences to a population based on a sample; our study did not have an interest in making inferences to a "supra-population".
In addition, using the census from Cauca Valley's municipalities, with 100% of the available data, the identification of socio-historical specific attributes (or context) of the outbreak was attempted.It was more relevant because during 2008 -one year after the latest data used in our study -there was a sugar cane workers' strike, for which one of the most important motivations was poor working conditions.The fact that health and labor problems occurred jointly requires studying the context that allows for the development of both social phenomena.
Set theory methods with different degrees of complexity have been used in several fields, such as diagnostic imaging, genetics, gerontology, homeopathy, immunology, pneumology, and pharmacology.Of these, the only epidemiological studies are those by Soriano et al. 25 and Viegi et al. 26 .To our knowledge, the present study is the first investigation of an outbreak using set theory.It is our opinion that the use of mathematical methods such as set theory can be complemen-  A = education quality less than 50%; D = education quality more than 90%; G = vaccination coverage between 50-80%; H = vaccination coverage more than 80%; I = inconsistent data for subsidized regime; K = subsidized regime between 65-75%; M = subsidized regime more than 90%; N = no sugar cane plantations; R = planted sugar cane more than 10,000 hectares; V = violence-related immigration between 26-50/10,000 inhabitants, and X = violence-related immigration more than 75/10,000 inhabitants.
tary when used with an understanding of their rationality.With mathematics, it is possible to explore data in cases that do not fulfill the law of large numbers.Although small-N studies are not frequently used in epidemiology (the more well-known exception is the "N = 1 clinical trial") 27 , they are an important methodological tool for social sciences (for instance, in historical and comparative research) 23,28,29,30 .They are acceptable to describe differences, ascertain determinants, and establish causality in a few cases.According to Lieberson 30 , causality in small-N studies is possible only if four assumptions are accepted: (i) a deterministic causal approach, (ii) no measurement errors, (iii) unicausality, and (iv) absence of interaction.With the use of set theory, however, the last two assumptions are not necessary.Union and intersection operations accept the multicausality of disease and allow for exploring interactions between variables; nevertheless, the first two assumptions must be met.
Some limitations of this study should be considered to understand the scope of the results described.The most important constraint is the lack of understanding of the whole causal web involved in the complexity of a single out-break.We recognize that many non-measured factors may potentially be related with the occurrence of an outbreak.And yet, this study was able to identify some of the contextual attributes related to the high incidence of VZV in certain municipalities.Additionally, in this study measurement error was possibly present for all attributes analyzed.
In conclusion, the study of outbreaks requires more than individual variables to understand fully causality.The contexts in which high incidences occur can be different (variables) or constant when analyzing certain populations, but with conventional epidemiologic methods it is only possible to explore in cases of heterogeneity.Thus, the description and contrast of contexts is important, even with small sample sizes.This study made such an analysis possible by using simple set theory operations.Furthermore, similar studies could incorporate a similar approach to improve the understanding of the causes of outbreaks.Simple tools such as those described herein may better integrate field epidemiology and social epidemiology.These methods can be used to increase the number of studies on social epidemiology of infectious diseases 31 .

Varicela; Brotes de Enfermedades; Prestación de Atención de Salud
Cad. Saúde Pública, Rio de Janeiro, 27 (7):1393-1402, jul, 2011 Contributors A. J. Idrovo participated in the study design, analysis and interpretation of data, article write-up and approval of the final version.C. Albavera-Hernández and J. M. Rodríguez-Hernández participated in the analysis and data interpretation, write-up and approval of the final version.

Rule 1 :
less than 50 A = {x Є S | All x such that they meet rule 1} Rule 2: 50-65 B = {x Є S | All x such that they meet rule 2} Rule 3: 66-90 C = {x Є S | All x such that they meet rule 3} Rule 4: more than 90 D = {x Є S | All x such that they meet rule 4} Vaccination coverage (%) Rule 5: inconsistent data E = {x Є S | All x such that they meet rule 5} Rule 6: less than 50 F = {x Є S | All x such that they meet rule 6} Rule 7: 50-80 G = {x Є S | All x such that they meet rule 7} Rule 8: more than 80 H = {x Є S | All x such that they meet rule 8} Subsidized regimen (%) Rule 9: inconsistent data I = {x Є S | All x such that they meet rule 9} Rule 10: less than 65 J = {x Є S | All x such that they meet rule 10} Rule 11: 65-75 K = {x Є S | All x such that they meet rule 11} Rule 12: 76-90 L = {x Є S | All x such that they meet rule 12} Rule 13: more than 90 M = {x Є S | All x such that they meet rule 13} Planted sugar cane (2007) Rule 14: no plantations N = {x Є S | All x such that they meet rule 14} Rule 15: less than 1,500 hectares O = {x Є S | All x such that they meet rule 15} Rule 16: 1,501-5,000 hectares P = {x Є S | All x such that they meet rule 16} Rule 17: 5,001-10,000 hectares Q = {x Є S | All x such that they meet rule 17} Rule 18: more than 10,000 hectares R = {x Є S | All x such that they meet rule 18} Violence-related immigration (2006-2007) Rule 19: no displacement S = {x Є S | All x such that they meet rule 19} Rule 20: less than 25/10,000 inhabitants T = {x Є S | All x such that they meet rule 20} Rule 21: 26-50/10,000 inhabitants V = {x Є S | All x such that they meet rule 21} Rule 22: 51-75/10,000 inhabitants W = {x Є S | All x such that they meet rule 22} Rule 23: more than 75/10,000 inhabitants X = {x Є S | All x such that they meet rule 23}
Figura 2Edwards-Venn diagrams with the main fi ve sets identifi ed in the analysis for the three municipalities with higher chickenpox incidences.

Table 1
Characteristics of Cauca Valley municipalities, Colombia.