SciELO - Scientific Electronic Library Online

vol.30 suppl.1Ampliando o debateDeterminação da idade gestacional com base em informações do estudo Nascer no Brasil índice de autoresíndice de assuntospesquisa de artigos
Home Pagelista alfabética de periódicos  

Serviços Personalizados



  • texto em Português
  • nova página do texto(beta)
  • Inglês (pdf) | Português (pdf)
  • Artigo em XML
  • Como citar este artigo
  • SciELO Analytics
  • Curriculum ScienTI
  • Tradução automática


Links relacionados


Cadernos de Saúde Pública

versão impressa ISSN 0102-311X

Cad. Saúde Pública vol.30  supl.1 Rio de Janeiro  2014 


Sampling design for the Birth in Brazil: National Survey into Labor and Birth

Mauricio Teixeira Leite de Vasconcellos 1  

Pedro Luis do Nascimento Silva 1  

Ana Paula Esteves Pereira 2  

Arthur Orlando Correa Schilithz 2  

Paulo Roberto Borges de Souza Junior 3  

Celia Landmann Szwarcwald 3  

1Escola Nacional de Ciências Estatística, Instituto Brasileiro de Geografia e Estatística, Rio de Janeiro, Brasil.

2Escola Nacional de Saúde Pública Sergio Arouca, Fundação Oswaldo Cruz, Rio de Janeiro, Brasil.

3Instituto de Comunicação e Informação Científica e Tecnológica em Saúde, Fundação Oswaldo Cruz, Rio de Janeiro, Brasil.


This paper describes the sample design for the National Survey into Labor and Birth in Brazil. The hospitals with 500 or more live births in 2007 were stratified into: the five Brazilian regions; state capital or not; and type of governance. They were then selected with probability proportional to the number of live births in 2007. An inverse sampling method was used to select as many days (minimum of 7) as necessary to reach 90 interviews in the hospital. Postnatal women were sampled with equal probability from the set of eligible women, who had entered the hospital in the sampled days. Initial sample weights were computed as the reciprocals of the sample inclusion probabilities and were calibrated to ensure that total estimates of the number of live births from the survey matched the known figures obtained from the Brazilian System of Information on Live Births. For the two telephone follow-up waves (6 and 12 months later), the postnatal woman’s response probability was modelled using baseline covariate information in order to adjust the sample weights for nonresponse in each follow-up wave.

Key words: Sampling Studies; Stratified Sampling; Statistical Modeles; Parturition


According to do Carmo Leal et al. 1 the objectives of the National Survey into Labour and Birth were: (1) to describe the incidence of excessive caesarean section (according to Robson’s groups) and examine the consequences on women’s and new-borns’ health; (2) to investigate the relationship between excessive caesarean section and late preterm birth and low birth weight; and (3) to investigate the relationship between excessive caesarean section and the use of technological procedures after birth.

This article describes the sample design used in the survey including the definition of the survey population, the stratification of primary sampling units, the criteria for selection of hospitals, days and postnatal women, the base sample weights calculation and their calibration. It also describes the strategy used for estimating the response probabilities of respondents in the two additional telephone follow-up waves six and 12 months after the interview in the hospital, in order to calculate the sampling weights for the respondents in each follow-up wave.

Survey population, first stage sampling frame and stratification

The survey population 2 corresponds to the set of postnatal women who gave birth in 2011 in hospitals with 500 or more live births in 2007, according to the Information System on Live Births (SINASC. The SINASC was created by the Brazilian Department of Health in 1990 to gather epidemiological information on live births in hospitals and households all over the country.

For operational reasons, a number of groups were excluded from the survey population including postnatal women with severe mental health disorders, those who were homeless or were foreigners who did not understand Portuguese, deaf/mutes, and women sectioned by court order. Given the survey population definition, only hospitals with 500 live births or more in 2007 were included in the first stage sampling frame. In the end 1.403 of the 3.961 hospitals registered in 2007 were eligible for the study, accounting for 2,228,534 (77.1%) of the 2,891,328 live births that year.

In order to ensure different types of hospital governance (public, private and mixed) in all the five macro-regions of the country, divided into the set of state capitals and the other cities, which have important differences in dimension and kinds of health services, the hospitals in the first stage sampling frame were stratified by the combination of macro-region, capital or not and type of hospital governance, defining the strata presented in Table 1. Mixed governance was used for private hospitals that had beds contracted by the public sector.

Table 1 Number of live births and hospitals in survey population and sample size, according to strata. 

Macro-regions and hospital type of governance Total State capitals Non-capitals
Live births in 2007 Hospitals in 2007 Hospital sample size Effective sample size of women Live births in 2007 Hospitals in 2007 Hospital sample size Effective sample size of women Live births in 2007 Hospitals in 2007 Hospital sample size Effective sample size of women
Total 2,228,534 1,403 266 23,894 802,543 308 84 7,551 1,425,991 1,095 182 16,343
 Public 932,617 531 95 8,537 412,069 137 30 2,699 520,548 394 65 5,838
 Mixed 966,190 649 115 10,330 186,580 61 24 2,157 779,610 588 91 8,173
 Private 329,727 223 56 5,027 203,894 110 30 2,695 125,833 113 26 2,332
 Public 136,987 91 17 1,531 57,320 14 5 448 79,667 77 12 1,083
 Mixed 74,641 47 10 899 31,366 12 5 450 43,275 35 5 449
 Private 10,721 9 5 450 10,721 9 5 450 0 0 0 0
 Public 341,638 211 31 2,779 141,079 44 6 538 200,559 167 25 2,241
 Mixed 273,815 160 28 2,516 51,892 17 5 450 221,923 143 23 2,066
 Private * 46,213 31 9 801 42,502 26 6 539 3,711 5 3 262
 Public 313,853 155 26 2,341 141,235 53 8 722 172,618 102 18 1,619
 Mixed 402,730 273 42 3,776 61,976 14 5 452 340,754 259 37 3,324
 Private 213,047 136 21 1,888 113,219 51 8 718 99,828 85 13 1,170
 Public 74,770 36 11 991 31,126 10 6 541 43,644 26 5 450
 Mixed 156,559 130 24 2,159 15,384 4 4 360 141,175 126 20 1,799
 Private 40,141 31 11 989 22,947 13 6 539 17,194 18 5 450
 Public 65,369 38 10 895 41,309 16 5 450 24,060 22 5 445
 Mixed 58,445 39 11 980 25,962 14 5 445 32,483 25 6 535
 Private 19,605 16 10 899 14,505 11 5 449 5,100 5 5 450

*Two private hospitals sampled in non-capital cities of the Northeast region could not take part in the study and could not be replaced.

Sample size and its allocation by stratum

According to do Carmo Leal et al. 1, the sample size in each stratum was calculated based on the caesarean section rate in Brazil in 2007 of 46.6%, with 5% significance to detect differences of 14% between public, mixed and private hospitals and power of 95%. The minimum sample per stratum was 341 postnatal women. Since the sample was clustered by hospital, a design effect of approximately 1.3 was used to inflate the initial sample sizes, leading to a minimum sample size of 450 postnatal women per stratum.

Although not usual in sample survey, this way to determine sample size is common in clinical trials and randomized experiments. It derives from a two-tailed test of the hypothesis of equality between the proportions within treatment and control groups 3. For this calculation the expression 3.14 from Fleiss 4 was used.

According to do Carmo Leal et al. 1, the sample size has a power of 80% to detect adverse outcomes in the order of 3%, and differences of at least 1.5% among large geographic regions or type of hospital governance (public/private/mixed).

Considering the minimum size of 450 postnatal women by stratum, it was decided to select at least five hospitals by stratum, leading to a sample size of 90 postnatal women by hospital. If an equal allocation among the strata were used, these parameters would lead to a sample size of 210 hospitals. However, a proportional allocation to the number of hospitals was used and conducted to a sample size of 266 hospitals, since in all strata with an allocated sample size smaller than five hospitals, the sample size was increased to five in order to ensure a minimum of five hospitals and 450 postnatal women, as indicated in Table 1.

Hospital selection

In the first stage, the hospitals were selected with probability proportional to size (PPS), defined by number of live births of the hospital according to SINASC 2007. As usual in PPS selection, the hospitals with large numbers of live births (more than 13 per day on average, in this case) were included with certainty in the sample and treated as selection strata for sampling days and postnatal women. In the case of strata having five or less hospitals, a take-all procedure was used and each hospital was also treated as a selection stratum for the subsequent sampling stages.

The hospital selection was done systematically 5, after sorting the hospitals in each stratum in ascending order by number of live births in 2007. The sample inclusion probabilities of hospitals are provided in expressions (1a) and (1b) of Figure 1.

Figure 1 Sample probability schem 

Selection of survey days

In the second stage of sampling, an inverse sampling method 2,6 was used to select as many days as necessary to reach 90 postnatal women interviewed in the hospital. This method, originally proposed by Haldane 6 to estimate frequencies and proportions, can be defined as a technique to sample as many units (in this case, days) as needed to be observed in order to obtain a pre-specified number of successes or, in this case, 90 interviews performed with postnatal women in the hospital.

It is called inverse sampling because rather than defining a fixed number of days sufficient to have an expected sample size of 90 interviews as done by Veloso et al. 7, it defines the number of interviews performed as the stopping rule of the consecutive sample of survey days. The first survey day in each hospital was always selected with equal probability during the year, as indicated by expression (2) of Figure 1. The -1 in the numerator and denominator in expression (2) are explained by the loss of one degree of freedom due to the stopping rule, as defined by Haldane 6.

To account for the difference of number of live births in weekends and work days, a minimum of seven consecutive days was mandatory and the size of field team was determined to ensure this rule.

Selection of postnatal women

The number of postnatal women to be selected per day and hospital depended on the number of live births and the numbers of interview shifts and interviewers per day in the hospital. To establish the number of shifts and interviewers, the mean number of live births per day per hospital in 2007 was used and four combinations were defined: (1) one interviewer and one shift for four interviews; (2) one interviewer and two shifts for six interviews; (3) two interviewers and one shift for eight interviews; and (4) two interviewers and two shifts for twelve interviews.

To ensure a random selection of postnatal women, the survey central office has prepared tables with the number of order of the women to be interviewed according to the numbers of live births (up to 40) and interviews per day and hospital (4, 6, 8 and 12). The number of order of the postnatal women was defined by the order of entrance in the hospital. Some additional numbers of order have been selected for replacement of non-responses.

Unfortunately, the number of live births per hospital and survey day were not recorded during the field work. To overcome this problem, the SINASC 2011 and 2012 files were processed to determine the number of live births in each hospital and survey day, as required to calculate the inclusion probabilities described in expression (3) of Figure 1.

Treatment of non-responses

Nine sampled hospitals refused to take part in the survey, and three had the maternity service closed prior to the start of the fieldwork. The established replacement procedure for hospital non-response consisted in replacing the non-responding hospital by the next hospital in the stratum, according to the sort order of hospitals in the first stage sampling frame. Despite this, it was not possible to replace two non-responding hospitals among private hospitals located in non-capital cities in the Northeast region, as indicated in Table 1.

Postnatal women’s non-response was treated, if possible, by replacement according to selection tables prepared for each hospital or by the inverse sampling procedure used in survey day selection (more days added to the sample until 90 complete interviews were achieved per hospital). In the case of closure of the maternity service during the field work, the inverse sampling procedure was interrupted, restarting as soon as the maternity service was open.

A total of 1,356 (5.7%) postnatal women selected were replaced, 15% due to early hospital discharge and 85% due to refusal to participate. The sample size was composed of 23,940 postnatal women interviewed in 266 hospitals. During processing, records with no data from the woman or no new-born medical records were excluded and the final sample size accounted for 23,894 postnatal women (Table 1).

Sample weighting and calibration of sample weights

As indicated in Figure 1, the base sample weights were calculated by the reciprocals of the product of the inclusion probabilities in each sampling stage.

As usual in official statistical surveys (according to Silva 8), calibration of the base sample weights was performed to enforce coherence between sample estimates and known population totals obtained from an external source. In addition, up to a point, calibration helps to compensate for potential sampling and nonresponse biases.

Since the field work was conducted in 2011 (and at the beginning of 2012 for a few hospitals), it seemed appropriate to keep the coherence between sample based estimates and the total number of live births as obtained from the SINASC 2011 for the hospitals in the sampling frame, i.e. those with more than 500 live births in 2007.

For this reason, a ratio type calibration procedure of the base sample weights was performed within each of the selection strata, as indicated in expression (6) of Figure 1.

Results comparing population data with estimates obtained using both the base and calibrated sample weights are presented in Table 2. These results show the coherence between estimates based on calibrated weights and the known population totals, as expected. Also as expected, calibration leads to a slight increase in the variation of the sample weights as shown in Table 3. This increase in sample weight variation is the price to assure coherence for estimates.

Macro-regions and type of hospital governance Population data from SINASC 2011 Base sample weight Calibrated sample weight
Estimate Relative error (%) * Estimate Relative error (%) *
Total 2,337,476 2,697,463 15.4 2,337,476 0.0
 Public 962,273 1,058,939 10.0 962,273 0.0
 Mixed 1,036,634 1,170,514 12.9 1,036,634 0.0
 Private 338,569 468,010 38.2 338,569 0.0
 Public 154,305 161,788 4.8 154,305 0.0
 Mixed 57,571 83,284 44.7 57,571 0.0
 Private 12,690 13,430 5.8 12,690 0.0
 Public 334,541 376,493 12.5 334,541 0.0
 Mixed 230,107 360,287 56.6 230,107 0.0
 Private 110,702 67,497 -39.0 110,702 0.0
 Public 337,772 362,600 7.4 337,772 0.0
 Mixed 501,644 458,582 -8.6 501,644 0.0
 Private 154,042 296,744 92.6 154,042 0.0
 Public 66,793 75,919 13.7 66,793 0.0
 Mixed 182,224 197,981 8.6 182,224 0.0
 Private 42,932 67,762 57.8 42,932 0.0
 Public 68,862 82,139 19.3 68,862 0.0
 Mixed 65,088 70,381 8.1 65,088 0.0
 Private 18,203 22,577 24.0 18,203 0.0

*Relative error (%) = (Estimate – population data) x 100/population data.

Summary statistic Base sample weight Calibrated sample weight 1st follow-up wave sample weight 2nd follow-up wave sample weight
Number of observations 23,894 23,894 16,109 11,925
Minimum 7.4 4.5 6.0 7.0
First quartile (Q1) 69.4 55.3 76.8 103.3
Median 96.1 78.6 119.0 162.6
Third quartile (Q3) 132.6 114.8 175.5 255.2
Maximum 3,499.9 4,194.9 3,870.4 7,395.8
Range (maximum – minimum) 3,492.5 4,190.4 3,864.4 7,388.8
Interquartile range (Q3 – Q1) 63.2 59.5 98.7 151.9
Mode 19.3 14.9 29.6 39.5
Mean 112.9 97.8 149.1 211.0
Standard deviation 97.6 97.0 151.5 222.4
Coefficient of variation (%) 86.4 99.2 101.6 105.4

Sample weights for the two telephone follow-up waves

As expected, it was not possible to contact all postnatal women interviewed in the baseline survey during the two telephone interview follow-up waves. Some possibilities could be used to correct the non-response: (1) probabilistic imputation of non-respondents’ data; (2) treating the responding sample as a subsample of the baseline sample; or (3) modelling the probability of response in each follow-up wave as a function of some covariates obtained in the baseline survey and using these to derive nonresponse weight adjustments for responding women in each follow-up wave.

Considering the information on responses achieved in each follow-up wave as provided in Table 3, note that 67.4% and 49.9% of the women interviewed in the baseline survey responded in the first and second follow-up waves respectively. Due to the high nonresponse rates, the first two options were not considered suitable alternatives for nonresponse compensation.

Thus the solution adopted was to model the response probabilities using the covariate information available from the baseline survey. The procedure used was proposed by Little 9, and is also described in Lepkowski 10 and Brick & Montaquila 11.

The general idea behind the procedure used to obtain the sample weights in each telephone interview follow-up wave can be described in four steps, as presented in Figure 2.

Figure 2 Modeling response probabilities to calculate adjustments to the weights of the two segments. 

In the first step, a model was fitted to explain the probability of responding to each follow-up wave for each postnatal woman in the baseline sample using the baseline covariate information as well as the follow-up wave response indicator. This procedure was applied independently for each follow-up wave.

In the second step, the predicted values of the response probabilities in each follow-up wave were estimated using the model fitted in step one.

In the third step, for each follow-up wave the quintiles of the predicted response probabilities were used to define five weight adjustment classes in which a response rate was estimated by the ratio of the sum of respondents’ baseline calibra-ted sample weights to the total of baseline calibrated sample weights of postnatal women of the class, as indicated by expression (9) of Figure 2.

In the last step, the reciprocals of the response rates estimated by follow-up wave and weight adjustment class were used to adjust the baseline calibrated sample weights of the postnatal women interviewed in each follow-up wave.

For the models of response probability, the set of potential predictor variables initially considered included: macro-region; located in capital city or not; type of hospital governance; postnatal woman’s socioeconomic class (A+B, C, or D+E), delivery payment (public, private health insurance, or directly out of pocket), postnatal woman age class (12-19 years, 20-34 years, and 35 years or more); “Have you got any work where you get paid?” (yes or no); “Were you satisfied with your pregnancy at its beginning?” (yes or no); “Still birth or neonatal death of child?” (yes or no); race or skin color (white, black, brown, yellow, or indigenous); “Were there obstetric complications during gestation leading to negative perinatal outcomes?” (yes or no); and for the second follow-up wave only, has the woman responded to the first follow-up wave (yes or no).

For the first follow-up wave, the significant predictor variables were the three variables that defined sample strata (macro-region, capital or not and type of hospital governance), postnatal woman’s socioeconomic class and postnatal woman’s age class.

For the second follow-up wave the significant variables were the same five variables listed above plus “Have you got any work where you get paid?”, “Were you satisfied with your pregnancy at its beginning?” and “Still birth or neonatal death of child?”.

In the correction of follow-up sample weight (third step), the predicted response probabilities were not used directly to adjust the baseline calibrated sample weights in each follow-up wave to avoid undesirable variation in the final weights. In fact, Kish 12 demonstrates that sample weights may reduce bias but often increase the variance of weighted estimators, since the ratio between the variance of the weighted estimator and the variance of the corresponding un-weighted estimator is equal to 1 plus the square of the coefficient of variation of the sample weights. Thus the solution in the third and fourth steps leads to a better solution in correcting the follow-up sample weights for nonresponse, while keeping the increase in weight variation to a minimum (Table 3).


To the regional and state coordinators, supervisors, interviewers and crew of the study and the mothers who participated and made this study possible.


do Carmo Leal M, da Silva AA, Dias MA, da Gama SG, Rattner D, Moreira ME, et al. Birth in Brazil: national survey into labour and birth. Reprod Health 2012; 9:15. [ Links ]

2.  Cochran WG. Sampling techniques. 3rd Ed. New York: John Wiley & Sons; 1977. [ Links ]

Altman DG. Practical statistics for medical research. London: Chapman and Hall, 1991. [ Links ]

4.  Fleiss JL. Statistical methods for rates and proportions, 2nd Ed. New York: John Wiley & Sons; 1981. [ Links ]

Madow WG. On the theory of systematic sampling, II. Annals of Mathematical Statistics 1949; 20: 333-54. [ Links ]

Haldane JBS. On a method of estimating frequencies. Biometrika 1945; 33:222-5. [ Links ]

Veloso VG, Portela MC, Vasconcellos MTL, Matzenbacher LA, Vasconcelos ALR, Grinsztejn B, et al. HIV testing among pregnant women in Brazil: rates and predictors. Rev Saúde Pública 2008; 42:859-67. [ Links ]

Silva PLN. Calibration estimation: when and why, how much and how. Rio de Janeiro: Instituto Brasileiro de Geografia e Estatística; 2004. (Textos para Discussão da Diretoria de Pesquisas, 14). [ Links ]

Little RJ. Survey nonresponse adjustments. International Statistical Review 1986; 54:139-57. [ Links ]

Lepkowski J. Non-observation error in household surveys in developing countries. In: Department of Economic and Social Affairs, Statistics Division, editor. Household surveys in developing and transition countries. New York: United Nations; 2005. p. 149-69. (Series F, 96). [ Links ]

Brick JM, Montaquila JM. Nonresponse and weighting, In: Pfeffermann D, Rao CR, editors. Handbook of statistics 29A. Sample surveys: design, methods and applications. Philadelphia: Elsevier; 2009. p. 163-85. [ Links ]

Kish L. Weigthing for unequal Pi. Journal of Official Statistics 1992; 8:183-200. [ Links ]


National Council for Scientific and Technological Development (CNPq); Science and Tecnology Department, Secretariat of Science, Tecnology, and Strategic Inputs, Brazilian Ministry of Health; National School of Public Health, Oswaldo Cruz Foundation (INOVA Project); and Foundation for supporting Research in the State of Rio de Janeiro (Faperj).

Received: October 09, 2013; Revised: February 26, 2014; Accepted: March 24, 2014

Correspondence M. T. L. Vasconcellos. Escola Nacional de Ciências Estatística, Instituto Brasileiro de Geografia e Estatística. Rua André Cavalcanti 106, Rio de Janeiro, RJ 20031-170, Brasil.


M. T. L. Vasconcellos and P. L. N. Silva prepared the sample weighting procedures and prepared the first version of the manuscript, which was modified and approved by all authors. A. P. E. Pereira and A. O. C. Schilithz selected the sample, calculated and calibrated the sample weights, and approved the manuscript. P. R. B. Souza Junior and C. L. Szwarcwald designed the sample and approved the manuscript.

Creative Commons License This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License, which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.