Plano amostral da Pesquisa Nacional sobre Comportamento Sexual e Percepções sobre HIV / Aids , 2005 Sampling plan for the National Survey Sexual Behavior and Perceptions of the Brazilian Population concerning HIV / AIDS , 2005

RESULTADOS DA AMOSTRAGEM: Foi elaborado plano probabilístico, com 5.040 unidades amostrais, obtidas sobre a população brasileira: indivíduos com idades entre 16 e 65 anos, residentes nos grandes centros urbanos brasileiros. Trata-se de plano amostral complexo, distribuído em oito domínios principais de estimação, desenhado em múltiplos estágios, com um homem ou mulher entrevistada no último desses estágios. Cada unidade entrevistada e cada domicílio têm probabilidade específi ca de pertencer à amostra.


INTRODUCTION
Nationwide surveys in the area of sexual behaviors, risks and protection against HIV/AIDS and other sexually transmitted infections (STIs) are necessary in any society that wishes to formulate and evaluate public policies in this fi eld based on consistent empirical data. 1 Likewise, it is important to investigate some of the interfaces between the STIs and consumption of psychoactive substances or the broader fi eld of sexual and reproductive health -for example, decisions concerning the use of contraceptive methods.
With few exceptions, surveys conducted in Brazil and in the majority of lowand middle-income countries are characterized by local or regional coverage.However, national surveys are indispensable, especially in a country with continental dimensions like Brazil, marked by heterogeneities and social, economic and cultural contrasts.A reliable portrait of this society and its dynamic phenomena -like changes in the sphere of sexual behaviors or the HIV/AIDS epidemic -requires the systematic conduction of national enquiries supported by consistent sampling plans.
The objective of the present paper was to describe methodological aspects concerning the defi nition of the survey domain and the sampling plan designed for a national survey.

Defi nition of the survey domain
The 2005 "Comportamento sexual e percepções da população brasileira sobre HIV/Aids" Survey (Sexual Behavior and HIV/AIDS Perceptions of the Brazilian Population) encompasses all the Brazilian states, while its previous version, carried out in 1998, included 24 Brazilian states and the Federal District, excluding the states of Tocantins and Roraima.a The data of the 2005 survey were obtained by means of a multistage probabilistic sample totaling 5,040 respondents, aged 16 to 65 years, living in the large urban regions of Brazil.Therefore, the sample studied in the 2005 survey was larger than that of the 1998 survey, which was composed of 3,600 individuals.
The Ministry of Health has established four geographic strata of interest, corresponding to groups of Brazilian states, namely: states of the North and Northeast regions; states of the Central-West and Southeast regions excluding São Paulo; states of the South region; and, finally, the state of São Paulo as an additional domain.

Target population
The reference system adopted to defi ne the population of interest to the survey was the Demographic Census of the year 2000, conducted by the Instituto Brasileiro de Geografi a e Estatística (IBGE -Brazilian Institute for Geography and Statistics).
The survey used as unit of analysis one of the data aggregation units utilized by IBGE, the information reference system, grouped into 558 microregions.Besides the information that identifi es the microregions (name and state), the following variables were used: total population, urban population and population aged 16 to 64 years, the latter as an approximation to the population of interest, whose upper limit is 65 years of age.
The microregions cover large territorial areas, which might hinder the access to some units and, consequently, increase survey costs.To minimize this problem, operational decisions were made.The fi rst one was to restrict the survey to dwellers of the urban areas of the microregions.
The second decision was to include only large urban conglomerates, defined as microregions which, in 2000, had more than 100,000 inhabitants living in their urban areas.This measure reduced the number of microregions to 276, corresponding to a 12% reduction in the number of dwellers of interest to the survey, as  Thus, the survey's target population was defi ned by the inclusion of all dwellers aged 16 to 65 years, living in urban areas of the microregions which, in 2000, had more than 100,000 inhabitants in their urban zone, except for microregions of the North Region that did not include the capitals of the respective states.
This number of dwellers represented, in 2000, 88% of the Brazilian population in the age group living in urban areas of the country, which corresponded to approximately 80 million people.Table 1 shows the effects of the operational measures upon population totals, and also on the strata of interest.As expected, the largest loss occurred in the North/Northeast stratum, with 76% of coverage, while in São Paulo this rate was 98%.

Sampling plan
The selected sampling plan is stratifi ed in multiple stages, with unequal probabilities of inclusion of the events under analysis.
The four geographic strata established by the Ministry of Health were adopted: states of the North and Northeast regions; of the Central-West and Southeast regions, excluding São Paulo; of the South region; and of the state of São Paulo.
Besides producing reliable estimates for these four strata, the study attempted to produce good estimates concerning the behavior of the population of the Brazilian capitals.To achieve this, in the scope of each stratum, the microregions that contained the capitals were separated from the other microregions.Thus, the 259 microregions were distributed over the eight domains, as presented by Table 2.
However, in the sampling plan design, it was decided that all the microregions containing capitals should be represented in the sample.From the sampling point of view, this means that each one of this microregion was considered a stratum.Therefore, the survey population was divided into eight domains of interest to the analysis and 31 strata to the sampling plan design: four containing the inland microregions per geographic region and 27 for each microregion containing the capital of the respective state.The sampling plan design followed some restrictions and premises: • due to the available budget and in view of the objectives, it was established that the viable size of the sample would be 5,040 individuals; • aiming to obtain the same precision in each stratum, the size of the sample was fi xed at 1,260 households per stratum, an alternative that is more adequate for sub-populations with the same variability; • estimates' precision was based on the supposition of a simple randomized sampling design, which would produce proportion estimates with sampling error of approximately 3% in each geographic stratum, and power to detect signifi cant differences between strata of around 4 percentage points. 1 • assuming that the similarity between answers of dwellers from the same census tract (intraclass correlation [1][2][3] ) increases the sampling error, 6 and that investigations into the population's social and economic characteristics have suggested that the optimum number per census tract should not surpass 15 households, the reference number for drawing within each tract was fi xed at nine households.This corresponded to the draw of 140 tracts per geographic stratum; • each microregion containing the state capital constituted a special substratum within the respective domain of interest.Thus, the study had 27 pre-selected microregions.
The selected sampling plan is stratified into four stages for the strata that do not include capitals, being reduced to three in those with the microregions of the capitals, due to the elimination of the draw of the microregion.The sampling units in each stage were defi ned as follows: • primary sampling unit (PSU) -the microregion was used; • secondary sampling unit (SSU) -refers to the draw in the second stage and corresponds to the urban census tract.For this purpose, the census tracts defi ned by IBGE for the 2000 Demographic Census were used; • tertiary sampling unit (TSU) -corresponds to the private household; • quaternary sampling unit (QSU) -individual between 16 and 65 years old.
In sum, the initial sample of 5,040 units was equally divided into 1,260 units per geographic stratum, in order to obtain the same statistical precision for the estimates in each one of the four regional strata.The draw of nine households in each census tract, to control for conglomeration effects, implied the distribution of 560 census tracts over the four large geographic strata, allocating 140 tracts per stratum.In consonance with the above-mentioned reasons and in order to obtain estimators with equal precision in the eight domains of interest, 70 census tracts should be used in each one of them.However, the particularity related to the composition of the number of capitals existing in the different geographic domains determined some adaptations.
As the São Paulo-Capital stratum has only one microregion and the South-Capital one has three, it was decided that 49 census tracts would be allocated to each of these strata.This measure, together with the draw of nine households in all the tracts, would guarantee 441 households in the scope of these strata, an adequate number to produce estimates with an acceptable reliability level.The difference of 91 tracts of this amount to 140 was attributed to the stratum that does not include the microregion of the capital.To each microregion of the sample, in the subsequent stage, seven census tracts were drawn.
To the North/Northeast Capital region, with 16 microregions and a slightly larger population, it was decided that 77 census tracts would be assigned and the remaining 63 would be destined to the microregions that do not contain capitals.The combination of these decisions can be seen in Table 3, which describes the fi nal allocation of the number of microregions, census tracts and households assigned by the sample.
For the sample draw in each stratum of the microregions that did not include the capital, both the microregions and the census tracts were drawn with probabilities that were proportional to the respective sizes.For the microregions, the number of inhabitants and for the census tracts, the number of occupied households, both according to the data of the 2000 Demographic Census, were adopted as measures of their respective sizes.Households and dwellers were drawn with equal probability.
For the selected household, one person aged between 16 and 65 years was drawn by means of a drawing table that had been previously attributed to the household. 4ith this table, the total sample was balanced between men and women, and this balance was controlled by the age group.
The draw of the tracts within the microregion occurred in the following way: • to each drawn microregion, seven census tracts were allocated; • the municipalities of the drawn microregion were organized according to size; thus, seven implicit tract strata (zones) were created; • to each zone corresponded the draw of one census tract; • the tract that was drawn in this way, whose probability was proportional to its size in the year 2000, was recounted, so as to obtain the number of updated households, that is, its actual size.This count was performed before the draw of the tract's households.
Therefore, a sample with a proportional representation of the cities' size was guaranteed, and, consequently, a greater spread of the drawn units, called "fi rst selection units".
To each household, another one was drawn, within the same census tract and next to the fi rst.It served as a substitute in case of loss of the fi rst selection unit.
In each one of the four great geographic strata, the number of census tracts in each capital was divided proportionally to their population.It was ensured, however, that each capital had at least two tracts.After this allocation, the draw for each one of the 27 microregions of the capitals followed the same procedure described for the microregions that did not include capitals.Thus, the municipalities of each microregion containing the capital were organized by size, the census tracts were divided into zones and one tract of each zone was drawn.The process to select households and dwellers was the same one used for the inland strata, including the draw of substitution units.

SAMPLING RESULTS
To perform the data collection in the fi eld, a market research company was hired to interview the 5,040 individuals.The development of the questionnaire and the training of the interviewers and fi eld coordinators were carried out and supervised by the survey coordination.
In the fi eldwork, 13 census tracts (2.3% of the total) needed to be substituted: six in the North/Northeast region, fi ve in the South region and two in São Paulo, basically because they presented rural characteristics or due to impossibility of access (condominiums).These were substituted for tracts that were located near them and had similar characteristics (middle-income).
Tables 4 and 5 show the absolute and relative fi gures to evaluate the sample's behavior.
Approximately 70% of the sample was obtained with the fi rst selection households.The main reasons that determined the use of the reserve units were problems with the households (e.g.closed, used only in the summer, places not used for dwelling).The refusal proportion was around 10%, a proportion that can be considered low for surveys of this nature.Furthermore, 5% of the households were discarded because they did not have dwellers with ages complying with the survey's inclusion criteria.
Evaluating the sample performance in the regions, it was observed that the sample collected from fi rst selection units was closer to what had been planned in the North/Northeast stratum, while in the other three regions the performances were very similar, and the substitution of fi rst selection units was more frequent.The analysis of these fi gures suggests the absence of relevant bias deriving from the data collection work, and it can be considered that these substitutions occurred at random.
The analysis of the data obtained by the complex sampling plan requires the use of techniques specially developed to this purpose, mainly statistics with weighted data for the non-biased, or almost non-biased, estimation of the population parameters. 5,6The use of estimators designed for simple randomized samples may introduce relevant fl aws in these estimates 6 and the inadequacies would be even more remarkable if employed to calculate sampling errors.The majority of the computer programs available in the market have options for the calculation of weighted statistics.The same does not occur in the calculation of the sampling variances of estimators, which requires the presence of special modules developed for that purpose. 6e survey's data were used in the analysis of the several dimensions of the Brazilians' sexual behavior, by different researchers and by means of distinct analysis techniques.The studies begin with the calculation of simple descriptive measures (means, proportions, ratios, indexes, linear combinations of variables, crossed tables and others).To obtain adequately precise estimates, the recommended procedures in the calculation of these statistics were: • the use of a weighting system to produce estimates; • the intensive employment of ratio estimators; • estimates of population totals in an indirect way, and • the use of specifi c SPSS packages to estimate sampling errors.
Each interviewee was associated with a w i weight that is the inverse of his/her probability of inclusion.Although one person per household was drawn, the probability of including the household is not the same as that of the interviewee.Thus, another set of weights was constructed to estimate, whenever necessary, characteristics referring to the household.Details of weights calculation can be found in the Appendix, which explains the statistical procedures.Usually, the sum of these weights refl ects the total domain described in the reference system.This sum corresponded to 5,040, which is the effective size of the collected sample.Therefore, we moved from one weighting system to another by means of the multiplication of a constant.In short, each individual was associated with a w i weight so that Σw i =5040.
A key statistic in the calculation of several characteristics for a certain variable Y is its weighted total in the sample, given by the expression T y =Σw i y i , where yi is the observed value of the Y characteristic for the i th individual of the sample.For example, a non-biased estimator of the population mean is the weighted mean in the sample Σw i y i /Σw i .The employment of any of the two weighting systems would lead to the same estimate, as the multiplication factor would appear simultaneously in the numerator and in the denominator.The majority of the statistics employed in the analyses focused here can be expressed as being of the type ratio r=T y /T x , where T are estimators of the totals of the characteristics of interest X and Y. Therefore, they would also be adequate population estimators.
Proportion estimation is a direct application of the aforementioned ratio estimator.To exemplify: to estimate the proportion of women in the survey domain who, being between 30 and 60 years old, had more than three children, one needs only to fi nd this value in the sample.That is, the quotient p x =T y /T x where T x = total number of women and T y = total number of women between 30 and 60 years old with more than three children, both estimated following the indicated recommendations.
The fi nding of estimates of totals for this population was based on data from population projections carried out by IBGE.In the previous example, to estimate the total number of women between 30 and 60 years old with more than three children, it would be necessary to obtain, from the IBGE's projections, the projected number N x of women existing in the universe of reference, and the estimated total will then be N x .px .Depending on the available information about the populations, it would also be possible to use more refi ned methods, taking into account estimates produced for domains and strata.
Determining the sampling errors of the estimators in complex sampling plans requires special packages for their calculation.In this survey, the specifi c SPSS subroutines were used.

COMMENTS
The present paper briefl y described some methodological procedures that are necessary to estimate population parameters and their respective sampling errors in studies on diverse aspects of the Survey of Sexual Behavior and HIV/AIDS Perceptions of the Brazilian Population, in 2005.
To perform the fi eld interviews in the 5,040 units allocated in the diverse domains of the survey, it was necessary to have approximately 30% of substitute units, mainly due to refusals and to lack of dwellers with the desired characteristics in the household.The behavior of these substitutions was very similar and did not imply relevant bias.The rate of interviews performance in the fi rst selection households of the North/Northeast region was the highest, while in the other three regions these rates were lower and similar to one another.
Many of the studies described in this supplement compare the results found in 2005 with those of the 1998 survey.The results of the fi rst survey suggested that the researchers should improve their data collection instruments.This was made in such a way as to ensure results comparability.Regarding the sampling plan, the main changes introduced in 2005 were: the redefi nition of the strata, aiming at other regional confi gurations, and the increase in the number of microregions eligible for the target population.
Despite the increase in the number of microregions, the same did not occur with the target population, dwellers with ages ranging between 16 and 64 years.The 1998 strata would have, in 2000, around 72 million people, while for the planning of 2005, this number, also in 2000, would be close to 79 million, corresponding to an increase of 7 million people (9.8%).Both surveys would have, therefore, a common universe of 90% of the population of interest with the data from the 2000 Demographic Census.
In the 1998 survey, the population was divided into three strata: North/Northeast (NONO), Expanded Central-West (COEX) and Expanded South (SULX), while the 2005 survey is distributed over four strata.With the purpose of comparing differences between the two analyzed domains, data from the 2000 Census were used, that is, it was estimated how the fi gures would be in 2000 with the conditions imposed in the 1998 design and in the 2005 one.The results are shown in Tables 6 and 7.
Observing the Tables, it can be verifi ed that in 1998 the domain was constituted by 183 microregions.This increase occurred due to the incorporation of 88 new microregions and to the exclusion of 12 former ones.The increase occurs mainly as a result of the incorporation of new microregions of the most populated states of Brazil: São Paulo, Minas Gerais, Rio de Janeiro, Rio Grande do Sul and Pernambuco.
Although comparisons between the two surveys are drawn based on distinct populations, signifi cant differences observed in relation to the studied characteristics should be attributed to effective changes in behavior, rather than to possible differences between the populations.Attributing a possible difference between results of the two surveys to the fact that we are dealing with different populations would only be possible if the behavior of these new 10% of the population were, for the most part, in the opposite course and direction compared to the results of the 1998 survey.Given the composition of these new microregions, it is very unlikely that this should occur.
showed by Table1.Finally, like in the 1998 survey, 17 microregions in the North Region that did not include the respective state capitals were eliminated, due to access problems.At the end, 259 microregions constituted the domain of interest of the 2005 survey.

Table 1 .
Population fi gures and the population percentage reached by the survey.Brazil, 2005.
* Source: Demographic Census 2000.Fundação IBGE.** Population living in permanent private households.Elaborated by the authors.*** Due to the Census' aggregate information, the number of inhabitants aged 16 to 64 years was used as the measure of the size of the target population.

Table 2 .
Geographic regions according to the distribution of microregions over domains and population aged 16-64 years.Brazil, 2005.

Table 4 .
Characteristics of the number and proportion of conducted interviews.Brazil, 2005.

Table 5 .
Some indicators of interviews performance in relation to the total number of eligible households.Brazil, 2005.

Table 6 .
Distribution of the number of microregions and respective populations aged 16 to 65 years.Brazil, 1998 and 2005.Out, indicating that it did not belong to the stratum of that year, NONO=North and Northeast; COEX=Expanded Central-West; SULX = Expanded South; COSU=Central-West and Southeast; SPAU=São Paulo and RSUL=South.Source: Data from the 2000 Demographic Census

Table 7 .
Distribution of the population aged 16 to 64 years over the 1998 and 2005 strata.