Introduction

Spirometric measurements can be interpreted by comparing numbers derived from a certain population. Spirometry reference values (or standard numbers) are derived from healthy individuals (^{1}). The reference population should represent the general population (^{2},^{3}).

According to the Brazilian Institute of Geography and Statistics (IBGE - Instituto Brasileiro de Geografia e Estatística) (^{4}), Brazil has had many spontaneous migratory flows since 1884, which continue into the present day. Germans, Spaniards, Italians, Japanese, Portuguese, Syrians, Turks, Africans, along with the native indigenous population constitute the Brazilian gene pool (^{4}). Thus, Brazilian spirometric reference values could and should present characteristics similar to other populations around the world.

The aim of this study is to present an equation for the Brazilian population using an alternate method in order to validate the present reference values for spirometry (^{5}), yield new numbers, or support data from international literature.

Material and Methods

We conducted a cross-sectional study and evaluated healthy individuals (20 to 80 years old) with sedentary lifestyles from December 2010 to July 2014. The study was approved by the Research Ethics Committee of the Universidade do Estado do Rio de Janeiro (UERJ, 2782/2010-CAAE 0226.0.228.00-10). Volunteers were randomly selected from various regions of the state of Rio de Janeiro, in Southeastern Brazil. The locations chosen for pre-selection included the Sports Authority of Rio de Janeiro, Senior Citizens University, Herbert Viana Hemotherapy Unit, Greater Rio Samba Schools (Salgueiro, Vila Isabel, and Mangueira). Volunteers were undergraduate and graduate students from UERJ and Veiga de Almeida University, visitors of patients from Hospital Universitário Pedro Ernesto, and citizens from various neighborhoods in Rio de Janeiro. Ethnicity was self-defined.

Considering a Brazilian population of 190,000,000 at the time this research started in December of 2010 (IBGE) (^{4}), the sample size was calculated to yield a 95% confidence level with a 5% margin of error. Despite the population growth to 204,761,379 (IBGE 2015) from 2011 to 2015, the calculated number of individuals remained the same. Therefore, in order to obtain a representative sample of the Brazilian population, 377 individuals were necessary. The percentage of individuals for each age group was calculated based on IBGE (^{4}) data.

The inclusion criteria were sedentary individuals between 20 and 80 years of age, and absence of a smoking history, and pulmonary, cardiac, and neurological diseases. A sedentary lifestyle was defined as no or irregular physical activity less than 150 min a week according to the World Health Organization (^{6}).

The exclusion criteria were smokers and ex-smokers, illegal drug users, and individuals with any of the following: active lifestyles; spirometry results not meeting the criteria for acceptability and reproducibility (^{2}); radiographic evidence of pleuropulmonary lesions [including pulmonary masses, hyperinflation pattern (flattened hemidiaphragm) or interstitial disease]; specific abnormal findings on electrocardiograms such as ischemic region, myocardial infarction, tachyarrhythmia, or complete ventricular blockages; respiratory infections of the upper and/or lower airways in the 6 weeks before the spirometry; a Charlson index above 1 (^{7}); cognitive deficit preventing comprehension of the questionnaire and implementation of the pulmonary function test (^{8}); debilitating and chronic diseases, such as cardiopathies, pneumopathies, and neuropathies; unstable angina or arterial hypertension (systolic arterial pressure >200 mmHg or diastolic arterial pressure >110 mmHg) (^{2}); a recent history of cardiac arrhythmia or myocardial infarction (^{2},^{3}); medication use for treating cardiopathies (beta-blockers) (^{9}); respiratory atopies (^{2},^{3},^{10}); and hemoptysis.

All chest radiographs were analyzed by a pulmonologist holding a specialist title from the Brazilian Society of Pneumology and Tisiology (BSPT). The electrocardiogram was performed by the Cardiology Service. The physician responsible for the report was a cardiologist from the Brazilian Society of Cardiology (BSC). Physical therapists received prior training for the administration of the pre-selection questionnaire (^{2}), the Charlson index (^{7}), and the International Physical Activity Questionnaire (^{11}).

Spirometry

Spirometry exams were conducted from 8:00 am to 12:00 noon, at the Laboratory of Pulmonary Function of the UERJ, a referral center with 25 years of experience accredited by the BSPT for training professionals in pulmonary function in the state of Rio de Janeiro.

The exams followed the American Thoracic Society (ATS) 1987 protocol (^{12}), which was adapted for the Brazilian Guidelines for Pulmonary Function Tests (2002) (^{2}).

The device used was the Vitatrace VT 130 SL (Codax Ltda., Brazil) breath spirometer, which was integrated to the Spiromatic 2.0 program (Engelógica, Rio de Janeiro, Brazil). A maximum of 8 breaths (forced spirometry manipulations) and a minimum of 3 were taken in order to meet the acceptability and reproducibility criteria.

The acceptability criteria of the curves adhered to the following criteria proposed by ATS (^{12}) and BSPT (^{2}): a retroextrapolation volume <5% of the forced vital capacity (FVC) or 150 mL; forced expiration duration of at least 6 s; occurrence of a plateau in the volume-time curve for at least 1 s, after a minimum expiratory time of 6 s, or a volume in the last second lower than 25 mL, and a number of 3 to 8 breaths, with at least 3 acceptable and 2 reproducible curves.

The criteria for curve reproducibility followed those proposed by ATS/ERS (European Respiratory society) (^{3}) and BSPT (^{2}): difference in FVC in the best two curves <150 mL, forced expiratory volume in 1 s (FEV_{1}) in the best two curves <150 mL, and difference in the peak of expiratory flow (PEF) in the best curves <10%.

The outcome variables were FVC, FEV_{1}, FEV_{1}/FVC, PEF, 25 to 75% forced expiratory flow from the curve (FEF_{25-75%}), FEF_{50%}, FEF_{75%}, and average forced expiratory flow time (FEFT).

Statistical analysis and derivation of equations

Kolmogorov-Smirnov tests were performed in order to determine whether the study population was homogeneous. Subsequently, parametric tests (Student's *t*-test and the Pearson correlation equation) were used to analyze values with normal distribution, taking into account the average and standard deviation. Anthropometric and spirometry data are reported as medians and percentages, using number and point graphs. All analyses were carried out in Stata 14 (StataCorp LP, USA). Estimated coefficients with P<0.05 were considered to be significant.

In the univariate regression analysis, the dependent variables were the spirometric indices. Correlation coefficient tests of the functional parameters with anthropometric variables and their transformations were conducted. Variables that had a P<0.10 were selected for inclusion in the multivariate analysis.

After determining the multiple regression equations, residuals were identified and their adherence to the normal curve was graphically confirmed. In addition, the asymmetry of the equation was analyzed by 4 tests: Mardia's asymmetric, Mardia's kurtosis, Henze-Zirkler, and Doornik-Hansen. The one with the greatest value was noted.

The residual method was used to establish the threshold of the reference value, once the regression equation was calculated. The calculated value was subtracted from the average residual value of the equation and corrected by the standard deviation, in order to achieve 95% confidence level (multiplied by the constant 1.645).

The following three equations were calculated for FVC and FEV_{1}: multivariate linear regression, logarithmic regression, and logarithmic regression with the spline variable. This variable is being used for equations of the Global Lung Function Initiative (GLI) (^{13}) and recent Japanese equations (^{14}).

Logarithmic spline equations were calculated with the lambda-median-coefficient of variation (LMS) method. The purpose of the spline is to turn the dependent variable into a non-linear variable (from parametric to non-parametric), allowing it to be an exploratory parameter that can vary slightly (non-linearly). The spline variable was defined after finding the mean, deviation, and asymmetry of the equation, considering the standard normal distribution:

where LMS (λ, μ, σ) = λ (asymmetry), μ (mean), σ (deviation).

After the transformation and assuming that the variable (y) features a smooth curve, the Box-Cox method was applied, with age interpolation via the Box-Cox Cole and Green methods.

Regression equations were calculated according to Pereira et al. (^{5}), Knudson et al. (^{10}), Crapo et al. (^{15}), Hankinson et al. (^{16}), Pérez et al. (^{17}), Falaschetti et al. (^{18}), and Brändli et al. (^{19}) and compared with the equations of the present study (referred to as Rufino et al.). The values from Kubota et al. (^{14}) and GLI (^{13}) were compared using the spline method (or LMS). It was also used the Bland-Altman method with GraphPad Prism 6:04 (USA) for agreement analysis.

Results

From December of 2010 to July of 2014, a total of 454 individuals were recruited, with 55 being excluded (Figure 1). The sample showed homogeneity when grouped according to age and according to gender (Table 1). More women than men were recruited; this is due to the higher frequency of diseases among men, as well as a higher level of physical activity compared to women. The weight (in kg) was higher in men than in women. However, when corrected by height squared to calculate BMI, differences in weight were non-significant. The lack of significant differences between BMI and age demonstrated that male and female groups were homogeneous (Table 1). The age varied between 20 and 74 in males, and between 20 and 80 in females. There were no differences among age groups selected in 20-year intervals, and between African and non-African descent, based on self-defined ethnicity (Figure 2).

Except for FEV_{1}/FVC ratio, spirometric indices were all higher in males than in females. There was a decrease of approximately 20 mL per year of age in FVC and FEV_{1} for both genders. The correlations of anthropometric variables were compared with spirometric indices and the r^{2} value was identified (Tables 2 and 3; Figure 3).

Equation coefficients for men and women are reported in Table 3. The logarithmic spline data calculated through the LMS method are reported in Table 4. Table 5 shows the correlations between the proposed equations and the other eight equations currently used worldwide. The non-significant differences between the findings obtained through the linear equation and the LMS (spline) equation are reported in Table 6.

Discussion

One of the most important tools in the sustainability of equations for reference values in a healthy population is the proper definition of health itself (^{1},^{20},^{21}). Being "healthy" means not having a disease or having an effectively controlled disease. However, the drugs used for controlling diseases often change pulmonary function (e.g., beta-blockers) (^{9}), which in turn decreases the pool of healthy people in lung analysis, as the frequency of diseases increases with age (^{22},^{23}). In a Scottish study (^{23}) involving 1,751,841 patients, a large proportion of individuals >60 had eight disorders. At the age of 85, all had at least one disorder and almost 10% had eight disorders, including systemic arterial hypertension, diabetes mellitus, osteoporosis, obstructive chronic lung disease, and cancer. In other words, having multiple co-morbidities is a reality in older people. Therefore, to develop equations for the elderly is a challenge. In Brazil, the average life expectancy is about 74 years old (^{4}), which is another limiting factor.

Enright et al. (^{24}) derived regression equations from "healthy" adults aged 65 years or older with a smoking history of up to 5 pack-years, but that quit smoking before turning 50. There were 288 individuals, with 82 below the age of 70 years. The results demonstrated that PEF was greater than expected in 10% of the cases. Part of this finding was attributed to the systemic arterial hypertension in the controlled "healthy" subjects. The higher the arterial pressure, the greater the maximum expiratory flow. The influence from arterial pressure was also a finding in the work of Pereira et al. (^{5}), which linked lower FVC and FEV_{1} values to arterial hypertension.

Another analysis included healthy individuals who were physically active. It is well known that physical activity can increase lung volume, especially the practice of water polo, basketball, rowing (e.g., canoeing), handball, and soccer (^{25},^{26}). The equations developed in this study were generated from a sedentary population. This probably caused the values to be lower in comparison with the other equations.

Brändli et al. (^{19}) and Crapo et al. (^{15}) did not exclude smokers or ex-smokers from their sample. Today, smoking is seen as an unhealthy habit, since it is well-known that it can lead to an acute or chronic reduction in pulmonary function (^{2},^{3},^{27},^{28}).

The place where the exams are carried out also have a relevant effect on pulmonary function. Values generated with field devices generally use pneumotachographs and are corrected or rendered homogeneous by BTPS (body temperature and pressure, saturated) conditions. However, the environmental pollution factor cannot be corrected and it directly interferes with lung function, especially in cities with high pollution levels (^{29}). Equations of Crapo et al. (^{15}) and Knudson et al. (^{10}) featured this factor.

The values used in our equations were standardized by using only one device, by the same technique, taken in the same laboratory during the morning, which differs from all other equations presented and discussed in this article.

In order to properly represent the population being studied, we enrolled the minimum number required. Quanjer et al. (^{30}) considered 100 to be the minimum number per gender required for creating an equation. Similar to the study carried out by Knudson et al. (^{10}) it took many years to obtain the appropriate sample size for our study. Knudson et al. (^{10}) performed a double-triple questionnaire check. We performed a double questionnaire check for morbidity and mortality. The questionnaire by Charlson was also used as a check against the first, which was broader and included questions regarding symptoms. In order to avoid biases regarding "forgotten" diseases or disease denial or impaired respiratory function, chest radiographs were performed in all patients, similarly to Crapo et al. (^{15}), as well as electrocardiograms. These criteria made our sample selection a very rigorous process.

Ethnicity is an important variable of complex identification. In our data, self-declared ethnicity did not differ significantly among the groups from which the reference values were derived. It is understood that ethnicity affects body proportions, such as the Cormic index, which is the relation between the height measured at sitting position (encephalic-trunk height) and standing height. Lung volume would be more correlated with seating height than with standing height (stature). This can occur in up to 53% of African descendants and Caucasian-Americans (^{31}). Our analysis did not confirm differences between ethnicities as it has also been shown in some Brazilian genetic studies (^{32},^{33}). This might be explained, at least in part, by the broad miscegenation in our population.

Gender can account for up to 30% of pulmonary function variation and the separation of reference equations by gender is common (^{34}). We found differences of up to 31% in respiratory volumes between genders. It is understood that men’s larger lung size also interferes with all the other airway components (^{35}). This partially explains the lower FEV_{1}/FVC in men than women, implying that the airways are more subjected to dynamic compression (^{20}).

The choice of using a certain reference equation can result in the characterization of a specific respiratory disorder in some individuals (^{36}). The GLI (^{13}) used a new statistical model (LMS), which is the transformation of metric data into parametric data. The LMS is a method that can equalize errors. It turns nonparametric samples in parametric samples and it has been used in various parts of the world. Thus, the method can merge data from different parts of the world using the interpolation variables. The ERS task force derived equations for reference values using a bank of 160,000 individuals from 72 pulmonary function laboratories in 33 countries. After applying the exclusion criteria, a significant number of 97,759 "healthy," non-smoking individuals, who were 2.5 to 95 years of age with different ethnicities, were included, such as Caucasians, African-Americans, northern and southeastern Asians, Latin Americans, Native Americans, Polynesians, and Arabs. It is very difficult to obtain homogeneous data with such a mixture of numbers and ethnicities. Therefore, such research should be considered a proposal and not an operational standard, since it was not a prospective and controlled study and data were collected in pulmonary function labs with different quality levels. Despite these contradictory facts, the equations were made using the LMS method (^{13}). When we compared the data from the linear equations for men and women, both FVC and FEV_{1} did not show statistical differences. In other words, the LMS method or linear regression should yield similar values, which can be beneficial when using the LMS method for international equations in the future.

The linear regression equations featured similar values for the coefficients, residuals, and asymmetry when compared to the logarithmic equations, including those for respiratory flows, which is also accepted by the Knudson et al. (^{10}) and Hankinson et al. (^{16}) equations.

Studies have suggested that FVC and FEV_{1} are proportional to body size (^{2},^{20},^{37}–^{39}). This means that a taller individual, with bigger lungs, would have a greater decrease in pulmonary volumes with age, while smaller individuals would have a smaller decrease (^{20},^{38}). In our study, however, both men and women showed similar decreases with age.

The methodology used in our study of Brazilian equations was crucial towards the differences in absolute values obtained in the study by Pereira et al. (^{5}), whose research generated higher values than ours. The reference values of Knudson et al. (^{10}) are still valid for the Brazilian population.

The equations derived from the study by Pereira et al. (^{5}) are being used in Brazil. However, the authors had not published spirometry equations for Afro-Brazilian ethnicity. Our findings did not differ between self-defined Afro-Brazilian and non-Afro-Brazilian. Thus, the equations presented can also fill this gap. Another aspect to be noted is that there is a tendency in the literature to qualify the Brazilian ethnicity as the same of other Latin America countries. However, the migratory and colonizing currents among Latin American countries were different, which could interfere in spirometry values.

The coefficient of determinations of the spirometric equations were not close to 1. One of the reasons for this could be that the equations have frequently used the same variables (gender, age, height). One of the main advantages of our study is the use of a simple formula without logarithmic scales. Another important positive aspect is that we provide a new Brazilian equation obtained with a different method than previous studies. As in other countries, such as the United States, lung function laboratories may choose which equation is more suitable.

The LMS model for producing equations can be used in the Brazilian population. One of the characteristics of this method is statistical evolution and the potential to have standard spirometry reference values in the future.