Epidemiological methods for research with drug misusers : review of methods for studying prevalence and morbidity

Epidemiological studies of drug misusers have until recently relied on two main forms of sampling: probability and convenience. The former has been used when the aim was simply to estimate the prevalence of the condition and the latter when in depth studies of the characteristics, profiles and behaviour of drug users were required, but each method has its limitations. Probability samples become impracticable when the prevalence of the condition is very low, less than 0.5% for example, or when the condition being studied is a clandestine activity such as illicit drug use. When stratified random samples are used, it may be difficult to obtain a truly representative sample, depending on the quality of the information used to develop the stratification strategy. The main limitation of studies using convenience samples is that the results cannot be generalised to the whole population of drug users due to selection bias and a lack of information concerning the sampling frame. New methods have been developed which aim to overcome some of these difficulties, for example, social network analysis, snowball sampling, capture-recapture techniques, privileged access interviewer method and contact tracing. All these methods have been applied to the study of drug misuse. The various methods are described and examples of their use given, drawn from both the Brazilian and international drug misuse literature.


INTRODUCTION
Research studies undertaken with drug users usually employ samples that are either drawn from the general population, using probabilistic methods, or convenience samples, using patients enrolled directly from outpatient or inpatient clinics.As will be discussed below both methods have advantages and disadvantages.Recently, new methods have been developed which, although non-probabilistic, incorporate significant modifications to the data collection process enabling more representative samples to be obtained.These modifications allow us to go some way towards making inferences about the general population of drug users, something that would not be possible with ordinary convenience samples.Some of these methods are not exactly "new" but their application to epidemiological research, particularly in the drug misuse field, is a relatively recent occurrence.
In this review both traditional and newer methods will be described with examples of their use and application which have been drawn from both the international and Brazilian drug misuse literature.The emphasis is on methods that aim to estimate the prevalence of drug misuse or to characterise drug misusers.

Random Sample
The epidemiological gold standard for prevalence studies is the probability or random sample.In such samples all members of the population have the same chance of being selected.For relatively common conditions, such as depression or even alcohol mis-use this method is ideal (Meltzer et al. 39 , 1994) but problems arises when the condition has a low or very low prevalence.For example, studies suggest that the prevalence of cocaine use among school-aged children in Brazil is around 0.5% or less (Galduróz et al. 23 , 1997).Therefore, to find just one case one would need to interview at least 200 people.To make a statistically powerful estimate of the true prevalence one would need to interview several thousand pupils.The situation is made worse if one wishes to do more than simply count the number of cases, but actually interview subjects in detail.
Almeida-Filho 1 (1992), in the Brazilian multicentre study of psychiatric morbidity, interviewed a random sample of 6,470 individuals with a psychiatric screening instrument (QMPA) and then assessed in greater detail 30% of identified cases and 10% of non-cases (n=836).The samples came from three, non-randomly chosen Brazilian cities: Brasília, S. Paulo and Porto Alegre.Psychiatric morbidity ranged from 30 to 50% but the prevalence of individual psychiatric syndromes was considerably lower.For example, the prevalence of alcohol abuse or dependence among men was around 15% (a total of 42 cases), whilst for women the prevalence was much lower, 0% in S. Paulo and 2.5% in Porto Alegre.Had one wished to perform a detailed analysis of female alcoholics, there would have been insufficient numbers to do so.Indeed, even the estimated prevalence of alcohol dependence among women reported in this study cannot be made with a high degree of statistical confidence.The overall prevalence among women was 1.2% (6 cases in a sample of 501 women) with a 95% confidence interval ranging from 0.2% to 2.1% -a tenfold difference.To achieve a prevalence estimate with a narrower confidence interval, one would need to use an even bigger sample than has yet been used, probably of the order of tens of thousands of individuals.However, if we were to screen so many people, the majority would still not have the condition being studied.Or put another way, the ratio of cases to non-cases would be extremely low.If each interview has to be paid for then the overall costefficiency of this method would be very low, not to mention extremely time consuming.
Two additional problems with probability samples are the effect of randomising clusters on the representativeness of the sample and the difficulty of extrapolating the results to those segments of the population that for some reason were unavailable for study?A random sample of the whole of the population of Brazil would be impossible to achieve for two main reasons.First, one would need a list of the whole population that was complete and up to date.It is unlikely that such a complete list exists, because not all births are registered, although identity cards are obligatory not every one has one, electoral registers are incomplete, etc.Even if such a list existed, it could never be maintained up to date for very long because of births, deaths and migration.Second, in a large country like Brazil a 1% random sample of the whole population could lead to such a wide geographical distribution of selected individuals that it could take years and vast sums of money to be able to travel to interview each person.Consequently, population prevalence studies using random samples almost always use sequential sampling in which some unit other than the individual is randomised.These units, sometimes described as clusters or conglomerates, may be states, municipalities, electoral wards or neighbourhoods.Unfortunately, clusters are not usually uniform or homogenous, for example, the State of Bahia is quite different from the State of Rio Grande do Sul and the favela Buraco Quente in S. Paulo does not share the same characteristics as the more exclusive neighbourhood of Morumbi.If details of these characteristics are available, then it is possible to stratify the sample taking them into account.Important stratifying characteristics might include: population density, age structure, degree of urbanisation and prevailing socio-economic level of the inhabitants.However, if these details are not available one can only guess and hope that the final sample is representative.
The second problem with stratified random samples using clusters is that certain sections of the population may be excluded depending on the nature of the unit used.In population prevalence studies, the most commonly used clusters are neighbourhoods or geographically defined sectors of a city.However, within these sectors it is usual that only residences, i.e. houses or apartments be chosen.But not all people live in such accommodation and in terms of drug use, some high-risk individuals may be missed.For example, there may be a high concentration of drug users in prisons, psychiatric hospitals and hostels.Whether to include such institutions in random samples using neighbourhood conglomerates is a difficult question, because many of the people currently in them may be temporary residents.An additional problem is that there is a higher prevalence of drug misuse among homeless people (Armando et al. 2 , 1990), who would be missed by a sample that only included residential accommodation.How to sample the homeless is complicated by the fact that there is usually no information on the sampling frame, i.e. how many homeless people there are and how they are distributed in the city.
Studies of drug users that aim to go beyond estimating prevalence, in particular those that aim to undertake more detailed evaluations of drug users tend to use non-random samples.Until relatively recently the most commonly used non-random sample was the convenience sample.

Convenience Samples
Convenience samples are generally made up of samples of patients from hospital treatment services or patient data obtained from readily available sources, such as hospital admission statistics, case registers and case notes.When studying problems such as drug dependency, which has a low prevalence, it is far easier to accrue large numbers of patients directly from specialised clinics or residential rehabilitation centres.Such an approach is much more cost-efficient than using a probability sample, as all the patients interviewed will have the condition being studied, although strict inclusion criteria may reduce the proportion of "suitable" cases.Several descriptive studies undertaken in Brazil have used convenience samples to study drug users (Murad 40 , 1983; Bastos et al. 5 , 1988; Castel & Malbergier 12 , 1989; Silveira-Filho & César 48 , 1991; Bucher et al. 9 , 1995; Dunn et al. 17 , 1996).For example, Dunn et al. 17 , (1996) used information obtained from the electronic data bases to investigate changes in the routes of administration of cocaine among drug users attending two out-patient clinics over a fouryear period.
Convenience samples have several limitations, the most important of which is their lack of representativeness which means that results from such studies cannot be generalised to the whole population of drug users.Research has shown that patients attending hospital services have longer histories of drug use, are more severely dependent, have more physical and psychological complications and have more difficulty achieving abstinence (Chitwood & Chitwood 13 , 1981; Graeven & Graeven 27 , 1983; Carroll & Rounsaville 10 , 1992; Rounsaville & Kleber 47 , 1985).
The problem of generalisability is illustrated by Dunn et al. 17 study (1996), which showed that over a four-year period (1990 -1993) there had been an increase in the number and proportion of patients reporting that their preferred route of administration of cocaine was by smoking.At the same time there had been a reduction in the proportion of users who were injecting cocaine.But what extrapolations can we make about routes of cocaine administration in the whole population of drug users in S. Paulo during the same period?Unfortunately, we cannot say that there was a true move towards smoking crackcocaine among drug users; other factors may have produced or at least influenced the results.It may well be that since crack arrived in Brazil in the late 1980s, the number of cocaine injectors and snorters has remained the same and that crack users represent a new and growing population of people who would not otherwise have experimented with cocaine.Alternatively, cocaine snorters and injectors may have found smoking crack-cocaine more enjoyable and therefore given up their previous routes of administration.Another possibility is that intravenous cocaine users, many of whom are infected with HIV (WHO Collaborative Study Group 51 , 1993), have been dying of AIDS, leading to a relative increase in the number of crack users.Or perhaps the two clinics studied became famous for treating crack users during this period and, therefore, began to attract a disproportionate number of such referrals.
Measures can be taken to try to ensure that con-venience samples are more heterogeneous.This can be done by ensuring that patients are drawn from a variety of settings, so that the characteristics of the final sample are more varied.To achieve this one would need to know something about the general characteristics of patients in contact with all the different kinds of treatment services.Such services might include emergency services, outpatient clinics, inpatient units, private clinics, residential rehabilitation centres, non-governmental organisations and self-help groups.One would also need to know how the characteristics of patients in contact with such services differed from those of drug users in the community, i.e. the sampling frame.Unfortunately, this information is rarely available, so even with careful sampling, one cannot be sure that the final sample will share the same characteristics as the general population of drug users.

Social Network Analysis
Social network analysis (O'Reilly 43 , 1988) is a relatively new method that has been applied to the study of drug users.Most research in this field has been done on the role of peer influence in adolescent drug use (Baurman & Ennett 6 , 1996), post-treatment abstinence (Goehl et al. 25 , 1993) and HIV transmission (Pivnick et al. 45 , 1994).A person's social network consists of all those people with whom the individual has contact and in particular those with whom he or she shares some sort of emotional tie.With this of approach one attempts to identify the extent of an individual's social network, characterise its members and plot it graphically (Neaigus et al. 42 , 1994).Initially the patient is asked to name all the people with whom he is in close and regular contact or has had contact with in a specified time period, e.g. during the last month.Some researchers use criteria to define what is meant by a "close" relationship, for example, someone with whom you could discuss a personal problem, borrow money, go on holiday with or celebrate a birthday (Fraser & Hawkins 20 , 1984).The idea is not simply to gather a list of relatives, friends and acquaintances but individuals with whom the subject has some proximity.Specific questions are then asked to try to characterise the different subgroups or domains within the network, for example, family members, sexual partners, friends, work-mates, neighbours, and drug users.The characteristics of each member are recorded, either by direct interviewing or from information obtained from the patient.Measures are made of network size (total number of members in the network) and network density (the number of members who know one another).
Social networks have been found to promote conforming behaviour, which may be either conventional behaviour (Philips 44 , 1981) or deviant, such as delinquency (Cohen 14 , 1955).Fraser & Hawkins 20 (1984) studied the pre-treatment social networks of opioid and non-opioid drug users.They found that opioid users tended to have a higher proportion of members involved in street crime and less members who used only alcohol or cannabis as well as less contacts that came from social organisations, such as the workplace.
Social network studies have also been used to investigate the risk of HIV transmission within and between networks of drug users (Neaigus et al. 42 , 1994) as well as the prevalence of drug use and HIV within individual networks (Pivnick et al. 45 , 1994).Neaigus et al. 42 (1994) distinguished between risk networks (the people with whom HIV risk behaviour occurs) and social networks (the people with whom there are social interactions) in a study of intravenous drug users in New York City.They discovered that risk networks overlap with social networks, with 70% of injectors sharing equipment with sexual partners, family members, friends or acquaintances.They also discovered that injectors with more frequent social contacts with non-injectors engaged in lower levels of injecting risk behaviour.The authors suggest that HIV prevention programmes should be developed at the level of social networks, using peer pressure to promote risk reduction.

Capture-Recapture Methods
Capture/recapture methods are relatively new in terms of epidemiological research but have been used for many years in ecological studies of animal populations (Dunn & Andreoli 16 , 1994).The method was first devised by Laplace to estimate the population of France in the eighteenth century et antinues to be used to this day in the USA to adjust population estimates made using census data (Wolter 52 , 1991).Capture/recapture is mainly used to estimate the prevalence of relatively chronic conditions and is ideally suited to the study of diseases with a low prevalence or "hidden populations" for which probability sampling and direct counting are impracticable (LaPorte 33 , 1994).
The simplest form of capture/recapture is the twosample method.This can best be illustrated using an animal model, for example, how to estimate the number of rabbits on Coney Island.An island is chosen because there is no immigration or migration of animals.An assumption is made that the population is stable with the number of births matching the number of deaths.First one would set a series of traps all over the island, placed so that all rabbits had roughly the same chance of being caught.At the end of that day the number of trapped rabbits would be counted.The rabbits would be marked with some indelible substance, such as a daub of paint, to allow future identification and then set free.A few days later, the procedure would be repeated, but perhaps with the traps being placed in different locations to reduce the risk of trap avoidance.On the second occasion one would count both the number of rabbits caught and the number of marked rabbits caught (i.e.rabbits that have been caught on both occasions).The total number of rabbits caught and the number caught twice are proportionally related to the total number of rabbits on the island and using a relatively simple formula an estimate of the rabbit population of Coney Island can be made, along with appropriate confidence intervals (Dunn & Andreoli 16 , 1994).In epidemiological research instead of "catching patients" we use different sources of patients.One needs to use at least two sources but the method works better if several sources are used.
Certain assumptions have to be made about these sources, firstly that all subjects who have the condition have an equal chance of being "captured" by each of the sources used -a characteristic known as "equal catchability" (Hook & Regal 31 , 1993).This means that the personal characteristics of the subjects, for example being female, being poor or being black, will not influence his or her chances of appearing at any one of the sources.Inequalities in access to health services often mean that this assumption cannot be sustained.
The second assumption is that the samples are independent of one another, which means that if a subject appears in one source his chances of also appearing in another will not be affected.This method has been extensively used to estimate the prevalence of drug abuse in various cities throughout the world (Bloor et al. 8 , 1991; Simeone et al. 49 , 1993; Larson et al. 34 , 1994; Mastro et al. 38 , 1994; Squires 50 , 1995; Korf et al. 32 , 1994;Hay & McKeganey 30 , 1996)., (1991), used capture-recapture to study the prevalence of intravenous drug misuse in Glasgow, Scotland.They used three sources of patients: those in contact with a range of drug treatment services, a databank of all patients having HIV tests who had been identified as intravenous drug users and drug users who had been arrested by the police for non-cannabis related drug offences.Positive dependency was found between the drug treatment clinics and the HIV databank, suggesting that drug users who had presented for treatment were more likely to have been referred for an HIV test.Logistic regression analysis was used to correct for the degree of inter-dependence between sources and a final estimated prevalence of 9,424 intravenous drug users was calculated (95% confidence interval 6,964 to 11,884).Despite its limitations this is the most accurate estimate currently available and has been used to shape public health policy, in particular to estimate future demand for treatment and to develop prevention programmes.

Snowball Sampling
Snowball sampling is a method that has been used extensively in qualitative sociological research and has only recently been applied to the study of drug users (Avico et al. 3 , 1988).Asking individuals that have the characteristic being studied to identify et antact other individuals who share the same characteristic makes up a snowball sample.These individuals are then asked to do the same until an extensive chain of contacts has been built up.The main advantage of snowball sampling is that it allows one to build up large samples of subjects that might otherwise be very difficult to encounter.Such populations are often described as hidden populations (Bloor et al. 8 , 1991), for example prostitutes or crack users.
To start a snowball sample in the drug misuse field, a number of initial contacts are made from settings which it is known that drug users frequent.In practice these settings might include an outpatient clinic, a self-help group and a location where drugs are sold.Each initial contact is then asked to name someone else he or she knows who is also a drug user and to approach this person to see if he or she would agree to participate in the study.As some ini-tial contacts may have a much wider circle of drugusing acquaintances than others, it is sometimes necessary to put a limit on the number of individuals that any one subject can nominate.The process is repeated in turn with each new contact until a sufficient number of subjects have been interviewed or no more new contacts are forthcoming.
To improve heterogeneity, the initial contacts should include representatives of all important subgroups, such as men, women, people from different social backgrounds, ethnic minorities and different age groups.In addition to this, it is possible to stipulate that if initial contacts come from treatment settings, they only nominate drug-using acquaintances that are not currently in treatment.Alternatively, if one wishes to have a control or comparison group one could ask each contact to nominate a non-drugusing acquaintance.
The main advantage of snowball sampling is that it allows more representative samples of drug users to be recruited than would be possible using ordinary convenience samples.Although just how representative the final sample will be is unlikely to be known unless the characteristics of the sampling frame (all drug users) are known.
Snowball sampling has been used in several Brazilian studies, including: Nappo et al. 41 (1994) among crack users in S. Paulo; Carvalho et al. 11 (1996) in a prevalence study of HIV, hepatitis B, hepatitis C and syphilis among intravenous drug users in Santos; Galduróz & Masur 22 (1990) among illicit drug users in S. Paulo; and Lopes et al. 35 (1996) who used it in a case-control study of risk factors for drug abuse among adults in Rio de Janeiro.In the latter study, initial contacts came from ex-drug users, patients in treatment counsellors working in the drug abuse area.These people nominated drug users who were not currently in treatment -the "cases".Each case was then asked to nominate both another drug users and a non-drugusing friend, who would be used as the control.
In a separate study, Lopes et al. 36 (1996) addressed the possibility that snowball sampling using friendship matching might be subject to selection bias, i.e. subjects with psychiatric disorder being more likely to nominate a friend who shared the same psychiatric characteristics.This was investigated by calculating the proportion (p1) of exposed controls (those with a previous history of psychiatric disorder) selected by exposed cases and comparing this with the proportion (p2) of exposed controls selected by unexposed cases (those without a history of psychiatric disorder).According to Flanders & Austin 19 (1986), selection bias does not occur if p1 = p2.Lopes et al. 36 confirmed that there had been no selection bias as p1=0.52 and p2=0.51.

Privileged Access Interviewer Method
Griffiths et al. 28 (1993) have devised a method aimed at sampling hidden populations called privileged access interviewer method, a variant of snowball sampling, which they have used to study drug use among prostitutes in south London (Gossop et al. 26 , 1994).This method differs from snowball sampling and social network analysis in that it is the characteristics of the interviewer, rather than the subject, that are used to advantage to enter into contact with hidden populations.The privileged access interviewers are trained interviewers who have ready access to drug users, either because they themselves are exusers or because their work brings them into close contact with users.
In a study of sexual behaviour and its relationship to drug taking among south London prostitutes, Gossop et al. 26 (1994) used seven privileged access interviewers to interview 51 prostitutes.The interviewers came from a variety of backgrounds: four were themselves working either as prostitutes or as "maids" to prostitutes; one knew a pimp and contacted subjects through him; whilst the others were outreach workers whose work brought them into regular contact with prostitutes.
The main difficulty with this method is the identification of suitable privileged access interviewers.On the one hand interviewers need to have enough street credibility to be able to approach and be trusted by drug users but must also be reliable and sufficiently well educated to be trained to use a structured interview schedule and follow the agreed protocol.

Contact Tracing
Contact tracing is a long established method used to track down the contacts of people with infectious diseases.It was developed in the 1940s as a public health measure to control the spread of sexually transmitted diseases.The decision as to what type of contact will be traced depends on several factors: the degree of infectivity of the disease, the seriousness of the infection and the route of transmission.For example, with gonorrhoea, a sexually transmissible disease with a short incubation period, contacts will include only those people with whom the subject has had recent sexual contact.Whilst with tuberculosis, where transmission of the bacillus occurs via the respiratory system, contacts would include a wider range of individuals, such as family members, close friends and work-mates.
Drug users, who share injecting equipment, are prone to catching certain blood-borne infections, such as HIV and hepatitis B and C. In Brazil, studies suggest that the prevalence of HIV among intravenous drug users is between 40 and 60% (WHO Collaborative Study Group 51 , 1993), that of hepatitis B 40% to 75% (Barata et al. 4 , 1993;Carvalho et al. 11 1996) and hepatitis C 75% (Carvalho et al. 11 , 1996).Studies from other countries suggest the prevalence of hepatitis C among intravenous drug users may surpass that of either hepatitis B or HIV (Crofts et al. 15 , 1994 and Majid et al. 37 , 1995).Drug users who do not inject may still, be involved in at-risk behaviour, for example, crack smokers who spend hours smoking in large groups in relatively confined and poorly ventilated places ("crack houses"), may be at risk of tuberculosis (Gilbert & Aitken 24 , 1994; Reyes et al. 46 , 1996).
Contact tracing has played an important role in documenting the spread of HIV and other infectious diseases among drug misusers.For example, four new cases of HIV were found among six contacts of the first intravenous drug user diagnosed as having HIV in Australia (Blacker et al. 7 , 1986).In Brazil, contact tracing has been used to study the transmission of malaria among intravenous drug users (Barata et al. 4 , 1993).This study followed an outbreak of five cases of malaria in a non-endemic area (Bauru) where investigation revealed that intravenous cocaine use was the only potential risk factor.Each case was interviewed and asked with whom they had shared injecting equipment in the last 3 months.Attempts were made to interview all contacts personally and find out if they too had injected with anyone else.All subjects were asked to give a blood sample, which was tested for malaria, hepatitis B, HIV and syphilis.From the initial five malaria cases an injecting network of 119 individuals was identified.Of these, 102 agreed to be interviewed and 99 consented to give blood.The results showed that 21% were infected with malaria, 40% with hepatitis B, 58% with HIV and 12% with syphilis.Sharing of injecting equipment had been common, with 58% having shared with between 1 and 6 individuals and 20% with between 6 and 20.Some authors believe that contact tracing should play a greater role in HIV prevention strategies among injecting drug users (Hall & Dolan 29 , 1996).From a research point of view, contact tracing could be exploited further in the study of risk-behaviour within individual injecting networks and see how this is related to the mechanisms and thresholds of transmission for different infectious agents (Dunn 18 , 1997).

CONCLUSION
Probability sampling still remains the gold standard by which all other sampling methods must be judged.None of the new methods described here will give results that can be generalised to the whole population of drug users with the same confidence that the results from a simple probability sample can be.However, when dealing with a condition, such as illicit drug misuse, which has a low prevalence and is a hidden activity, probability samples may be impracticable.Sequential stratified random sampling, using geographical clusters, may not always give samples that are truly representative if the characteristics of the clusters are unknown.Furthermore, part of the at-risk population may be unavailable for study (e.g.people in prisons or hospitals).The new methods described in this review aim to overcome some of these limitations by obtaining samples that are more heterogeneous and that do not suffer from the degree of selection bias that limits the utility and generalisability of ordinary convenience samples.
Until recently, Brazilian studies of drug users have tended to use simple convenience samples, with small numbers of subjects often recruited from just one institution.More ambitious studies have used large stratified random samples to estimate the prevalence of psychiatric morbidity, including alcohol misuse, in the general population (Almeida-Filho 1 , 1992) and drug misuse in school children (Galduróz et al. 23 , 1997).A small but growing number of pioneering researchers have used some of the new methods described above, including snowball sampling (Galduróz & Masur 22 , 1990; Nappo et al. 41 , 1994; Carvalho et al. 41 , 1996; Lopes et al. 35 , 1996) and contact tracing (Barata et al. 4 , 1993).Hopefully, this review will bring to the attention of other researchers, working in the drug misuse field or in public health, the importance of these alternative approaches.These methods are more complex and demanding than simple convenience samples but their potential is enormous, particularly in the areas of drug and HIV prevention, harm reduction, calculating the need for service provision and formulating public health policy.