Implementation of Respondent-driven Sampling among Female Sex Workers in Brazil, 2009

Implementação do método de amostragem respondent-driven sampling entre mulheres profi ssionais do sexo no Brasil, 2009 Abstract Female sex workers are known in Brazil and elsewhere in the world as one of the most-at-risk populations for risk of HIV infection, due to their social vulnerability and factors related to their work. However, the use of conventional sampling strategies in studies on most-at-risk subgroups for HIV is generally problematic, since such subgroups are small in size and are associated with stigmatized behaviors and/or illegal activities. In 1997, a probabilistic sampling technique was proposed for hard-to-reach populations, called Respondent-Driven Sampling (RDS). The method is considered a variant of chain sampling and allows the statistical estimation of target variables. This article describes some assumptions of RDS and all the implementation stages in a study of 2,523 female sex workers in 10 Brazilian cities. RDS proved appropriate for recruiting sex workers, allowing the selection of a probabilistic sample and the collection of previously missing information on this group in Brazil.


Introduction
At the national and international levels, HIV prevention work has been based on the natural history of infection, experiences in specific AIDS programs, and mathematical and simulation models 1,2 .Increasing importance has been assigned to most-at-risk population groups, who, although relatively small in size, play a preponderant role in the spread of the disease, especially in countries with concentrated epidemics 3 .
In relation to sexual transmission, female sex workers as a group are recognized as a most-atrisk population for HIV infection, both in Brazil 4,5,6 and in various other countries 7,8,9,10,11,12 , due to their social vulnerability and factors related to their work, such as multiple sex partners.In addition to sexually transmitted infections (STI), which act as co-factors 13 , studies of sex workers have shown that time in the profession and the consumption of illegal drugs are also associated with greater risk of HIV infection 4,5,14,15,16 .
In Brazil, the size of the subgroup of female sex workers has been estimated at 1% of the Brazilian female population 15-49 years of age 17 , that is, more than half a million women.Since 2002, prostitution has been officially recognized as a profession in the country (Ministry of Labor and Employment, Brazilian Classification of Occupations.http://www.mtecbo.gov.br/cbosite/pages/home.jsf,accessed on Oct/2009), but is still
The situation of increased risk of HIV infection, syphilis, and other STI led to several studies on female sex workers in Brazil beginning in the 1990s 5,19 .However, these studies generally used convenience samples, thus hindering the estimation of variables for nationwide monitoring of the HIV epidemic in this population group 20 .
Studies in most-at-risk subgroups for STI through conventional sampling strategies are generally problematic.Since these subgroups are numerically small and linked to stigmatized or illegal activities, they are considered hard-to-reach populations 21 .Since the mid-1990s, the development of probabilistic sampling methods has been encouraged, not only to ease the difficulties in access, but also to allow statistical estimation of the target variables 21,22,23 .In this context, timespace sampling (TSS) 24 and respondent-driven sampling (RDS) were proposed 25 .
Since it was first proposed, RDS has been widely used in various countries in studies of most-at-risk population subgroups for HIV 26,27,28,29,30,31,32,33 .In a recent review of studies in the literature that used RDS in hard-to-reach populations, Johnston et al. 34 identified 128 HIV surveillance studies performed outside the United States from 2003 and 2007.Likewise, Malekinejad et al. 35 identified 123 studies, 59 of which in Europe, 40 in Asia/Pacific, 14 in Latin America, 7 in Africa, and 3 in Oceania.
In Brazil, the commitment to HIV surveillance in most-at-risk groups required nationally representative studies.In 2006, the Department of STD, AIDS, and Viral Hepatitis, under the Ministry of Health, promoted the transfer of sampling methodology for hard-to-reach populations 36 .In the years 2008 and 2009, research projects were conducted in three population subgroups (men who have sex with men -MSM, female sex workers, and drug users), using RDS as the sampling method.The current article aims to describe some assumptions of the RDS sampling method and all the stages in the method's implementation in a study of female sex workers, disclosure of a recently proposed sampling methodology for hard-to-reach populations in the Brazilian literature.

Assumptions of the RDS method
The RDS method was proposed in 1997 by Heckathorn 25 .It is considered a variant of chain sampling, since members of the target population group recruit their own peers to participate in the study, assuming that capturing individuals in hard-to-reach populations is facilitated when the recruitment is done by individuals from the same population, who are part of a social network.
However, unlike other, non-probabilistic chain sampling methods like the so-called "snowball" technique 20 , RDS is implemented under statistical assumptions that allow calculating the selection probabilities, thus making it a probabilistic sampling method.
RDS assumes that persons with a given characteristic or activity are connected in a social network and have links to other persons with similar characteristics.The data collection is done through successive recruitment cycles called "waves".First, individuals from the target population called "seeds" are selected non-randomly to participate in the study 25 .The seeds are asked to recruit a fixed number of peers from among their friends and acquaintances in the same population subgroup.The peers recruited by the seed also recruit other peers, and so on.
The participants' tendency to recruit peers with similar characteristics to their own has been called the "homophilia effect" 37 .To account for the non-random selection of individuals and the possible over-representation of individuals with given characteristics in the study population, the recruitment is modeled by a Markov process 22,38 .
A Markov process is a given phenomenon that can be classified in finite and discrete states, and in which the probability of transition between such states, also in a discrete time interval, only depends on the current and previous state 39 .And furthermore, according to the "law of large numbers" 40 , the probability of the process in any state over the course of a large number of steps will be independent of the state in which it began.In the case of recruitment in the RDS method, this means that memory of the recruitment occurs wave-to-wave, that is, the characteristics in the recruited individual only depend on the characteristics of his/her direct recruiter, and not on those of the recruiter's recruiter or of any participant in previous waves.After a sufficient number of waves, the characteristics of the individuals in the final sample are independent of the seed's characteristics.
Theoretically, the suggested number of waves for reaching equilibrium is six, since according to Watts 41 , all the individuals are connected to each other through only a half dozen links.However, if the homophilia effect is large, the sampling process needs to consist of a sufficiently large number of waves in order to reach Markov equilibrium (for given variables) and the minimum estimated sample size.
In the RDS method, recruitment is done on the basis of a set number of coupons (usually 2 to 4) distributed to participants in order to invite their peers.To participate in the study, the invited individuals must present their coupons as a guarantee of the network recruitment.Each coupon has a serial number which is used later for the social network design and recruitment patterns.Distribution of a set number of coupons to all the participants decreases the possibility that recruiters with large social networks will be overrepresented in the sample as compared to those with small networks.It also reduces the initial influence of the seeds on the final sample and stimulates long recruitment chains, increasing the sample's power to capture ""hidden" individuals in the study population 25,38 .
The RDS method also draws on the strategy of giving incentives to the participants.The first or "primary" incentive is given to participants when they complete their participation in the study.The second, called the secondary incentive, is given to participants for each peer successfully recruited into the study.Incentives have been used to promote the recruitment and thus the growth of the networks.They can be monetary or not, as for example non-perishable foods or food stamps, or tickets to shows or nightclubs.Choice of the type of incentive depends on the target group's characteristics.
In relation to the study venue, it is important that the study be carried out in an easy-to-reach place for the participants and that it is free of prejudices.Health services are often chosen, in order to promote their use by the study population.

Analysis of data collected with RDS
In 2004, Salganick & Heckathorn 22 proposed that data collected with RDS be weighted according to an expansion factor based on the inverse of the selection probability.Under the hypothesis that the seed is selected with probability proportional to her/his network size, and that participants randomly recruit their peers from their network of friends and acquaintances, the authors demonstrated that the probability of selecting each participant is also proportional to the network size, i.e., to the number of acquaintances known in the target population.
In order to measure the effect of the sampling plan on the variance of the mean estimates, one uses the so-called "design effect", which is calculated as the ratio between the estimate of the variance determined by the sampling plan and the estimate of the variance obtained by a simple random sample of the same size.Design effect is also used to calculate the sample size of future studies that may use the same complex sampling design 42 .
In the case of samples collected with RDS, Salganick 43 initially estimated design effects of around 2, through "bootstrap" simulation methods, whose general principle is to use the observed sample to generate a set of identical samples and produce a set of replicas of the estimates, thereby allowing calculation of the variance and confidence intervals.Later, simulation models that take into account the homophilia effect and dependence of observations showed much larger design effects, greater than 4 44 .In 2010, the same authors criticized the previously proposed estimators to calculate the variance and confidence intervals 45 .
For statistical analysis of data collected with RDS, an application was developed called "Respondent-Driven Sampling Analysis Tool" (RDSAT -http://www.respondentdrivensampling.org/reports/RDSAT_56_Manual.pdf,accessed on 14/Aug/2009), which weighs and adjusts the data according to the recruitment patterns observed during the study, considering the homophilia effect and the size of each individual's social network.However, the application presents important limitations, like the impossibility of performing multivariate analyses.
Since RDS is a complex and relatively new sampling method, the appropriate statistical techniques for analyzing the collected data are still being developed.For example, over time, two distinct estimators have been proposed for estimating prevalence rates, which have been discussed in the recent literature on RDS 46 .While the point estimators are quite similar, some questions remain about the best way to estimate variance 47 .

Implementation of RDS among female sex workers in Brazil
The study, called the "Health Chain", was carried out in Brazil from August 2008 to July 2009, aimed at estimating the prevalence rates for HIV and syphilis and identify knowledge, attitudes, and practices related to HIV infection and other sexually transmitted diseases in female sex workers.The research project was approved by the Institutional Review Board of the Oswaldo Cruz Foundation (case n o .395/07).
To conduct the research, the Department of STD, AIDS, and Viral Hepatitis chose 10 Brazilian municipalities (counties) based on the size and importance of the local AIDS epidemic.The sample size (2,500 women) was calculated by es-timating a 6% HIV prevalence rate, with a 95% confidence interval, and two-tailed error of 1.5%, considering a design effect of 1.5.In each of the municipalities, the attempt was made to distribute the sample proportionally to the municipality's population, while setting a minimum sample of 100 women.Table 1 shows the distribution of the planned and collected sample in the 10 municipalities.
ing the teamwork in the service; an interviewer, in charge of applying the behavioral questionnaire; a counselor, in charge of pre-and post-test counseling; and the testing professional, a health worker with university training in a biomedical field, responsible for applying and interpreting the HIV and syphilis tests.The research teams were also responsible for monitoring the networks and encouraging participation by the sex workers.For this purpose, promotional materials for the study were prepared and distributed in the main prostitution venues in the target cities, with the following slogan: "If you receive an invitation from the Health Chain, be sure to participate.Help form the chain of health".
Before the data collection phase, in each municipality a preparatory survey was done to facilitate the study's implementation at each site according to the characteristics of prostitution in each city.The following issues were addressed: identification of the main prostitution venues, diversity of types of female sex workers present in the city, choice of the health service for conducting the study, and identification of possible seeds.In the preparatory survey, focus groups and/or in-depth interviews were held with the sex workers and their local representatives, using a previously prepared script and qualitative methodology 48 .
In each municipality, five to ten initial participants were chosen, called "seeds", selected non-randomly.Seeds were chosen with differences in age group, color/race, socioeconomic class, schooling, and types of work venues, like the street, nightclubs, saunas, hotels, call services, and others.In addition, this initial focus was on sex workers that were well-connected in the community and that had reported extensive social networks during the preparatory survey.
Each seed received three coupons to give to her friends or acquaintances.The women invited by the seeds that participated in the study comprised the first "wave".After participating in the interview, they received three coupons themselves to invite their friends and acquaintances.This process was repeated until the sample was reached in each site.Figure 1 shows the flow of participation by each sex worker, beginning with her arrival at the health service.
The coupons were numbered in order to prevent counterfeiting and to allow identifying the recruitment patterns in the population.The participants were only identified by the identification code on the coupon, consisting of two letters (the municipality's initials) and a set of numbers indicating the seed and the wave.The number of coupons handed out for each participant to recruit her peers was predetermined (three).In The eligibility criteria were the following: (1) age 18 years or older; (2) being a woman; (3) working as a sex worker in the municipality where the study was being conducted; (4) having had sex in exchange for money at least once in the previous four months; (5) accept participating under the study conditions and agree to sign the free and informed consent form; (6) present a valid invitation to participate in the study; (7) not having participated in the study previously; and (8) not being under the influence of illegal drugs or alcohol at the time of the interview.
The study consisted of a preliminary interview to verify the eligibility criteria and characterize the network and work venue, a self-completed computer questionnaire, and rapid tests for syphilis and HIV.In all ten municipalities, the fieldwork was done in health services, with the teams consisting of: a supervisor, in charge of organizing the field work in the health service, administering the invitations, and supervis-order to manage the invitations, a program was generated on ACCESS (Microsoft Corp., USA), called "Invitations Manager".
The study questionnaire was self-completed and included questions on the following themes: (1) socio-demographic characteristics (age, schooling, marital status, color/race, religion, economic class, etc.); ( 2  The questionnaire was filled out directly by the interviewee on a computer using a program called Audio Computer-Assisted Self-Interview (ACASI -Tufts University, Boston, USA.http:// www.tufts.edu/med/nutrition-infection/tnccdaar/acasi/acasi.html).This program allowed the interviewee to listen to all the questions and possible answers, thus facilitating understanding of the questionnaire and simultaneously guaranteeing confidentiality.
To allow calculating the prevalence rates for HIV and syphilis among female sex workers, rapid tests were performed (using capillary blood samples) in the participants, complying with the guidelines and protocols of the Department of STD, AIDS, and Viral Hepatitis, with pre-and post-test counseling by the team counselor.The health workers that performed the rapid tests had university training in a health field and had been specifically trained by the Department of STD, AIDS, and Viral Hepatitis before the study began.
The participants that tested positive received counseling in order to minimize the psychological risks and notify the steady partners, having been referred to health services under the Unified National Health System (SUS) for the appropriate follow-up according to the guidelines of the Department of STD, AIDS, and Viral Hepatitis.The services for referral of positive cases in each participating municipality were identified before launching the fieldwork.
During the research project's implementation, each participant received the primary incentive when she completed the interview, plus an additional reward (secondary incentive) in the form of cash (varying from BR$5.00 to BR$10.00) for recruiting each peer that successfully completed the study.The primary incentive consisted of condoms, lubricant, educational material, food stamps, and transportation vouchers, a snack along with the interview, and a little prize picked among a variety of beauty products, like a hairbrush, make-up, or manicure kit.
As for the statistical analysis, the data were weighted using the following question to measure the network size: "How many female sex workers do you know personally?".In each municipality, the expansion factors were inversely proportional to the network size.In addition, since the study was done in 10 municipalities, in order to produce results for the entire sample, calibration was performed according to the relative number of women 15-49 years of age in each municipality, assuming a constant proportion of female sex workers in all the sites studied.Interestingly, unlike other studies that used the RDS methodology, the statistical analysis was done for entire sample, with the data collected in the ten municipalities.The methods proposed for data analysis, including estimation of the variance and design effect in the prevalence rates, as well as the multivariate logistic regression models, are beyond the scope of the current article and will be presented in subsequent publications.
Concerning the possible selection bias described by Heckathorn in 2002 38 , in the case of the Chain of Health project, the homophilia effect occurred according to HIV serological status, since participants with HIV infection tended to recruit other HIV-positive participants.These findings indicate a great dependency in the observations, which needs to be considered in the data analysis.By way of illustration, Figure 2 shows the network of female sex workers participating in the study in the city of Rio de Janeiro, where this tendency can be observed.

Discussion
RDS is a method based on various statistical assumptions in order to reach a representative sample of the study population.Despite its potential to produce probabilistic samples, since it is a relatively recent method, it has been analyzed and rethought with each new application of the procedure 20,21,22,25,37,43,49 .
In Brazil, besides making it possible to fill in existing information gaps on the population of female sex workers, the study enabled evaluating the experience in the application of the RDS sampling method in this population subgroup in ten Brazilian municipalities.
In relation to the incentives, the primary incentive was used since the beginning of the study in all the sites.The secondary incentive was not used at the beginning of the study, on the assumption that there would be spontaneous interest in participating, based on the concepts of solidarity and the social networks among the female sex workers.However, in the majority of the participating municipalities, it proved necessary to use the secondary incentive in order to bolster participation in the study, since the networks were developing too slowly and there were constraints on time and resources to conduct the research.According to Heckathorn 25 , while the primary incentive motivates individual participation based on autonomous decisions in the concept of cooperation, the secondary incentive brings the social influence to the surface, which is taken advantage of in the sampling process.
However, provision of the secondary incentive has generated controversy in the international literature 50 .On the one hand, if the dual incentive stimulates both recruitment and participation in the study, on the other it becomes a kind of parallel currency in groups with low socioeconomic status, generating duplicities in the sample, participation of non-eligible individuals, and even clashes among individuals in the population segment.
In the case of the Health Chain study, the use of the secondary incentive functioned to stimu-late and speed up the inclusion of participants, and had the merit of obtaining the participation of "hidden", socially excluded sex workers, without the slightest awareness of the importance of their participation, but only interested in the incentives they were going to receive.However, although the process fostered the use of health services (even though artificially), the desire to participate without meeting the eligibility criteria and the improper receipt of the incentive led to cases of physical assault against the research team.
In the application of RDS among female sex workers in Brazil, a link was also made with the social movement.In each of the municipalities in the study, focal points were chosen, that is, persons with influence in the community of sex workers, who helped to publicize the study and accompany the participants with invitations to the health service.The use of focal points helped overcome the distrust and fear among sex workers, who are routinely exposed to violence and social discrimination 51 .
As for the network size, the study showed that some seeds take a long time to "germinate", while some types germinate faster than others, interfering in the sample's diversity.In addition, due to the homophilia effect, a long network does not always guarantee capturing the diversity of types of individuals in the population segment, an essential characteristic for the sample to be representative of the target population.In this sense, the data weighting is essential for the sample to be representative of the population group.
As for the data weighting in the current study, the size of the network among sex workers varies according to the work venue, and is smaller among women that work the streets as compared to other places (nightclubs, saunas, hotels).Thus, the original weighting method will have to be improved, considering the possibility of stratification by work area.
In relation to the homophilia effect, a tendency was observed among participants with HIV infection to also recruit HIV-positive peers.Considering that the great majority (75%) of HIV-positive participants were unaware of their serological status, this tendency is probably explained by common characteristics in their peer network, like age, prostitution site, type of client, social condition, price charged for sex, drug use, and other factors, and should be investigated in future studies.From the epidemiological point of view, while dependence of observations is interesting because it allows understanding the risk factors associated with the social networks, from the statistical point of view the structure of dependence should be considered when estimating variables and requires specific statistical techniques for the data analysis.
In sum, international literature, many authors have used traditional multivariate statistical techniques considering independent observations 52,53,54,55,56,57,58 , while few studies have taken into account the structure of dependence of observations in the data analysis 29,59 .
In short, the use of RDS proved appropriate for studying female sex workers in Brazil, since it facilitated recruitment and allowed the selection of a probabilistic sample in this population subgroup.On the other hand, the experience with the implementation of RDS leaves some challenges to be explored, like improvement of the methodology for statistical analysis of data collected with RDS, adequate data calibration, and interpretation of the social networks.

HIV; Prostituição; Comportamento Sexual
) knowledge, attitudes, perceptions, and risk practices for HIV and other STI; (3) HIV transmission routes and forms of prevention; (4) history of HIV/STI testing, knowledge of serological status; (5) history of STI and presentation of symptoms; (6) sexual initiation; (7) number and types of sex partners (men, women; occasional, stable, commercial); (8) condom use (last sexual relations, regularity, unprotected relations with different types of partners and different types of sexual practices); (9) characteristic of sex trade; (10) drug and alcohol use; and (11) access to preventive activities.

Figure 1 Participation
Figure 1 Participation stages in the Chain of Health project.

Figure 2 Distribution
Figure 2Distribution of female sex workers' network in the study in Rio de Janeiro, Brazil, according to HIV sero-prevalence.

Table 1
Planned and actual (collected) sample size in the ten municipalities in the study on respondent-driven sampling in female sex workers.Brazil, 2009.