Acessibilidade / Reportar erro

An Analysis of the 2002 Presidential Elections Using Logistic Regression* * Editors’ Note: The need to speed up the launch of the first issue of BPSR, which had already been delayed several times, regrettably led the Editors to overlook their duty to inform two contributors of the overlap between their respective pieces. This explains the publication of this Research Note by Jairo Nicolau, in which he sets out to analyse the 2002 Brazilian presidential election by means of the technique of logistic regression, claiming that although this technique is widely used for election studies in other countries, it had been little used in Brazil to date, and of the article by Yan de Souza Carreirão (Relevant Factors for the Voting Decision in the 2002 Presidential Election), in which he investigates this same election by testing some of the main hypotheses about electoral behaviour in the country by means of logistic regression analyses.

Abstract

The 2002 elections were a watershed in Brazilian electoral history. Three aspects of the process in particular have been amply stressed in several analyses. The first is the symbolic dimension of Lula's personal victory, the biography of a man of the people who rises to the country's most important office. The second is the victory of PT, the main leftwing party in the country, winning federal office 22 years after being founded. The third is the dimension of the victory, with the president obtaining a resounding vote (61% of the valid votes), which surpassed that of any other Brazilian president since 1945. Despite this, the efforts made by political science to analyse the 2002 elections remain limited, in particular regarding the use of opinion poll results.


This research note does not intend to carry out a thorough and detailed analysis of the choice of candidates or of the key events in the campaign. The aim is to investigate the variables that may be associated with the voting decision in the 2002 presidential elections. To this end, the results of the election survey conducted by Instituto Universitário de Pesquisas (Iuperj)-2002 and the technique of logistic regression will be used. The latter is widely used for election studies in other countries, but little used in Brazil to date. Therefore, as well as providing a substantial analysis of the data, this research note suggests a methodological option for future studies on Brazilian elections.

Few works have attempted to explain the determinants of the vote in the Brazilian presidential elections on the basis of micro-data. Two works stand out for covering more than one election (Singer 2000; Carreirão 2002Carreirão, Yan de Souza. 2002. A decisão do voto nas eleições presidenciais brasileiras. Rio de Janeiro: FGV Editora.). Singer (2000) analysed the results of surveys carried out nationwide in 1989 and in the state of São Paulo in 1994. One of the book's purposes is to show the association between ideology, measured as voters’ positioning on the right-left spectrum, and the vote. The data are analysed by means of bivariate analyses, using classic association tests (chi-square and Cramer's V). Carreirão (2002)Carreirão, Yan de Souza. 2002. A decisão do voto nas eleições presidenciais brasileiras. Rio de Janeiro: FGV Editora. analysed several opinion polls conducted during three presidential election campaigns (1989, 1994 and 1998). His aim also was to measure the impact of a set of variables on the vote using bivariate analysis and association tests (gamma). The author considered a larger number of variables and presented the results in a more detailed fashion than Singer (2000).

Despite carefully describing the context and players involved in each election, both works have the limitation of not using multivariate techniques. This means the reader is prevented from knowing to what extent the independent variables are associated with one another, or what the impact is of each variable when others are analysed simultaneously. A series of issues remain in the air. What might the impact be of voters’ positioning on the right-left spectrum when the effects of party preference and evaluation of the federal government are analysed jointly? Could it be that socio-demographic variables, such as educational level and age, continue having an effect when variables associated with political attitude, such as party preference and evaluation of the government, are considered?

Carreirão and Barbetta (2004)______, and Pedro Alberto Barbetta. 2004. A eleição presidencial de 2002: A decisão do voto na região da grande São Paulo. Revista Brasileira de Ciências Sociais 56. took a step forward when they proposed a multivariate model for analysing the results of the 2002 elections. They used a multivariate technique (logistic regression) to analyse the data from an opinion poll conducted in Greater São Paulo. But the authors’ pioneering effort was harmed by certain factors. One, recognized by them, refers to the date of the fieldwork, May 2002, therefore before the campaign and the televised electoral broadcasts began. Others derive from the technical choices adopted. The first is the decision to use four binary models that compare separately the preference for a candidate with the option for all the others. For example, one model compared Lula's vote with that of all the candidates plus blank and spoilt votes. Further to the theoretical limitations derived from aggregating in the same category substantially different choices, this decision tends to inflate the hit rate of the “others” category1 1 The hit rate of each model was the following: Lula: 82,5% – others: 86,5%; Serra: 46,4% – others, 94%; Garotinho: 95,6% – others: 44,8%; Ciro: 98,5% – others: 20%. It is no coincidence that the hit rate of the “others” category was so high in all four models. . The most appropriate option would have been to use a multinomial logistic regression model, which is employed when the dependent variable is not binary (Tabachnick and Fidell 2001). Another decision that may have affected the analysis of the data is the option for the stepwise method (Backward LR), which excludes all the variables that are not statistically significant from a certain level on (p>0,5). The authors offer no theoretical justification for employing a model excluding the impact of substantially relevant variables.

Logistic regression is a multivariate technique that allows one to analyse the relationship between independent variables (quantitative or categorical) and categorical dependent variables (Miles and Shevlin 2001; Tabachnick and Fidell 2001). The main virtue of this technique is to permit a multivariate analysis for categorical data — data traditionally analysed by means of bivariate analyses. Initially utilised in medical research, in which the result often is whether or not a particular illness is carried, logistic regression has been more and more used in the social sciences, particularly in electoral studies (Clarke et al. 2004Clarke, Harold D., David Sanders, Marianne C. Stewart, and Paul Whiteley. 2004. Political choice in Britain. Oxford: Oxford University Press.; Evans 2004).

Data

The database used was that of the Iuperj-2002 survey, carried out between December 12 and 15, with 2004 voters from 115 municipalities. For the analysis of the data, I used the SPSS binary and multinomial regression models. Annex 1 ANNEX 1 Percentage of the First Round Vote, according to a Set of Variables Lula Serra Garotinho Ciro Blank/Spoilt Age 16-24 57 19 13 6 5 25-34 57 18 9 9 7 35-44 54 19 9 10 8 45-59 47 30 10 8 5 60+ 45 29 12 8 7 Cramer's V = 0,08; p<0,0001 Sex Male 58 20 8 8 6 Female 49 24 12 9 6 Cramer's V = 0,01; p<0,0001 Schooling Illiterate/Up to 4th grade 54 23 11 7 5 5th to 8th 56 20 11 7 6 9th to 11th 50 22 10 11 7 Higher 48 28 7 12 4 Cramer's V = 0,06; p = 0,06 Self-Defined Colour White 48 26 11 10 6 Non-White 58 18 10 7 6 Cramer's V = 0,12; p<0,0001 Religion Catholic 57 24 5 9 5 Pentecostal Evangelical 32 16 44 3 6 Non-Pentecostal Evangelical 44 20 24 5 7 Others 56 19 5 9 11 Cramer's V = 0,42; p<0,0001 Evaluation of Fernando Henrique Government Positive (Excellent, Good, Positive Average) 43 33 10 9 5 Negative (Terrible, Bad, Negative Average) 64 11 11 8 7 Cramer's V = 0,28; p<0,0001 Party Sympathy PT 86 4 5 2 3 PMDB 38 45 7 8 3 PSDB 20 63 6 10 1 Others 35 31 19 14 1 None 45 24 13 10 8 Cramer's V = 0,23; p<0,0001 Position on the Left-Right Spectrum Left 74 9 7 6 5 Centre 49 20 11 11 8 Right 39 38 12 9 3 Doesn't know - Didn't Answer 48 22 13 8 9 Cramer's V = 0,19; p<0,0001 presents the results of a bivariate analysis (vote for president/various variables) for the two rounds. It is possible to observe the percentage received by the candidates in each category, as well as a statistical test (Cramer's V) to evaluate the significance of each association. Eight variables were selected, five socio-demographic (age, sex, colour, schooling and religion) and three that evaluate political attributes (evaluation of the Fernando Henrique Cardoso government, sympathy for a political party and position on the right-left spectrum). The treatment given to each is described below.

Dependent Variables

A bigger challenge in relation to the independent variables is aggregating them in a small number of categories. Even though some information is lost, this step is necessary, as categories with a number of cases affect the result of the logistic regression.

For the first round, I compared only the main candidates’ vote, each representing a specific category: Lula, Serra, Garotinho and Ciro Gomes. Voters who spoilt or left their ballots blank, who voted for two other candidates (Rui Pimenta and José Maria) or who did not answer, were considered missing (154 cases in total).

For the second round, the option for one of the two candidates (Lula or Serra) was considered. Voters who spoilt or left their ballots blank, or who did not answer, were considered missing (164 cases).

Independent Variables

  • Age – The original variable, measured as an interval variable, was transformed into a categorical variable, with five age groups: 16-24, 25-34, 35-44, 45-59 and 60+ years of age.

  • Gender – Male and female.

  • Colour – The questionnaire asked the interviewees to self-classify in one of four categories: white, black, brown and yellow; these were re-grouped into two categories: white and non-white.

  • Schooling – Four categories: illiterate/up to 4th grade, 5th to 8th grade, 9th to 11th grade and higher education.

  • Religion – The various religious denominations were grouped into three categories: catholic, evangelical and others.

  • Evaluation of the Fernando Henrique Cardoso government – The original variable was grouped into two: positive (excellent, good, positive average) and negative (terrible, bad and negative average).

  • Party Sympathy – The original categories were grouped into five bands: PT, PMDB, PSDB, others and none.

  • Position on the right-left spectrum – The survey suggested that voters self-classify on a five-point scale; the values were re-codified into three categories: 1 and 2 (left); 3 (centre); 4 and 5 (right); furthermore, voters who didn't know how to answer were considered.

Multivariate Analysis

Results of the first round

An analysis using multinomial logistic regression was carried out taking Lula's vote as the unit of reference for the comparison with Serra, Garotinho and Ciro Gomes. A test of the model with the eight independent variables in contrast with the model that included only the constant was statistically significant (chi-square = 851.40, p<0.0001), indicating that the predictors as a whole really distinguish Lula's vote in comparison with that of the other candidates. The pseudo-R2 (Nagelkerke) of 0.42 demonstrates that the model's total variance is good. Using the eight independent variables, the model was able to classify correctly 86% of Lula's vote, 52% of Serra's and 40% of Garotinho's. For Ciro Gomes the result was not satisfactory, as there was no case of correct prediction. The model's general percentage of hits is of 65%.

Tables 1, 2 and 3 show the coefficients of the regression (log-odds), the significance level, the odds-ratio and the 95% confidence interval for the odds-ratio. Table 1 presents the comparison between Serra and Lula voters. A large number of categories is statistically significant and probably distinguish Lula voters from Serra voters. Observing the odds-ratio column is particularly interesting. The values above 1 indicate that the chances of the voter voting for Serra increase, while the numbers below 1 indicate that the chances of him/her voting for Serra decrease (or that the chances of him/her voting for Lula increase). For example, an elderly voter (over 60 years old) is 2.07 times more likely to vote for Serra than a young voter (16-25 years old). On the other hand, the fact that a voter is sympathetic to PT reduces by 1/6 the probability of him/her voting for Serra. The statistically significant categories associated with an increased chance of one voting for Serra are: being female; being 45-59 years old; being over 60 years old; being white; being in the 5th to 8th grade schooling bracket; having a positive evaluation of the Fernando Henrique government; being sympathetic to PMDB and PSDB; and being rightwing. The categories associated with a decreased chance of one voting for Serra are: being male; being sympathetic to PT; and being leftwing.

TABLE 1
Results of the Multinomial Logistic, 2002 Elections - Serra's vote compared to Lula's vote
TABLE 2
Results of the Multinomial Logistic, 2002 Elections - Garotinho's vote compared to Lula's vote
TABLE 3
Results of the Multinomial Logistic, 2002 Elections - Ciro Gomes's vote compared to Lula's vote

Table 2 shows the results of the comparison between Garotinho and Lula. Only five categories are statistically significant (p<0.05), two of them associated with increased chances: sympathy for other parties and evangelicals. The increase in the odds-ratio for these is meaningful. The chances of voting for Garotinho increase by a factor of 11.5 when one compares evangelical with catholic voters. The factors associated with decreased chances of one voting for Garotinho are: being male; being sympathetic to PT; and being leftwing.

Table 3 shows the results of the comparison between Ciro Gomes and Lula. The chances of one voting for Ciro Gomes increase in the following cases: being over 60 years old; being in the 9th to 11th grade or higher education brackets; and being in the ‘other’ religious category (i.e., non-catholics and non-evangelicals). The chances decrease among voters sympathetic to PT.

Results of the second round

An analysis was conducted using binary logistic regression to compare Lula and Serra's vote (Table 4). A test of the model with the eight independent variables in contrast with the model that considered only the constant is statistically significant (chi-square = 475.06; p<0.0001), indicating that the set of predictors really distinguish Lula's vote from Serra's vote in the second round. The pseudo-R2 (Nagelkerke) of 0.34 demonstrates that the model's total variance is reasonable. Using the eight independent variables, the model is capable of classifying correctly 91% of Lula's voters and 41% of Serra's. The model's total hit rate is 78%. The statistically significant categories associated with increased chances of voting for Serra are: being female; being 45-59 or over 60 years old; being white; being in the 9th to 11th grade schooling bracket; having a positive evaluation of the Fernando Henrique government; being sympathetic to PSDB; and being rightwing. The categories associated with decreased chances of one voting for Serra are: being male; being sympathetic to PT; and being leftwing. In relation to the first round, two categories ceased to be statistically significant when comparing Serra and Lula: having a higher education and being sympathetic to PMDB.

TABLE 4
Results of the Binomial Logistic, 2002 Elections, Second Round - Serra's vote compared to Lula's vote

Conclusion

This first analysis of the 2002 election results using logistic regression brings to light a number of interesting results. Despite dealing with a small number of variables, the models for the two rounds had reasonable variance and good classification of the cases, Ciro Gomes excepted. (This is probably owed to the small number of cases considered for this candidate.) The coefficients of the variables also show that certain voter characteristics probably distinguished the candidates, particularly between Serra and Lula: gender, schooling, age, position on the right-left spectrum, evaluation of the government and sympathy for political parties. It is likely that including other variables in the model — variables relating to perspectives for the future, evaluation of certain campaign issues and some of the candidates’ attributes — would generate more accurate estimates and increase the percentage of correct answers.

There is a long tradition of research in traditional democracies on the determinants of the vote. Recently, it has benefited from advances in data analysis and from a rich theoretical debate. In Brazil, we still have a long way to go, above all with regards to the improvement of data gathering and analysis. To this end, the exercise effected in this research note suggests that embracing logistic regression as a major tool and more systematically, would be in order.

  • 1
    The hit rate of each model was the following: Lula: 82,5% – others: 86,5%; Serra: 46,4% – others, 94%; Garotinho: 95,6% – others: 44,8%; Ciro: 98,5% – others: 20%. It is no coincidence that the hit rate of the “others” category was so high in all four models.
  • Translated from Portuguese by Leandro Moura

Bibliography

  • Carreirão, Yan de Souza. 2002. A decisão do voto nas eleições presidenciais brasileiras Rio de Janeiro: FGV Editora.
  • ______, and Pedro Alberto Barbetta. 2004. A eleição presidencial de 2002: A decisão do voto na região da grande São Paulo. Revista Brasileira de Ciências Sociais 56.
  • Clarke, Harold D., David Sanders, Marianne C. Stewart, and Paul Whiteley. 2004. Political choice in Britain Oxford: Oxford University Press.

ANNEX 1


Percentage of the First Round Vote, according to a Set of Variables

ANNEX 2


Percentage of the Second Round Vote, according to a Set of Variables

Publication Dates

  • Publication in this collection
    08 Sept 2023
  • Date of issue
    2007

History

  • Published
    May 2006
Associação Brasileira de Ciência Política Avenida Prof. Luciano Gualberto, 315, sala 2047, CEP 05508-900, Tel.: (55 11) 3091-3754 - São Paulo - SP - Brazil
E-mail: bpsr@brazilianpoliticalsciencareview.org