The Relationship between Market Sentiment Index and Stock Rates of Return: a Panel Data Analysis

This article analyzes the relationship between market sentiment and future stock rates of return. We used a methodology based on principal component analysis to create a sentiment index for the Brazilian market with data from 1999 to 2008. The sample consisted of companies listed on BM&F BOVESPA which were grouped into quintiles, each representing a portfolio, according to the magnitude of the following characteristics: market value, total annualized risk and listing time on BM&F BOVESPA. Next, we calculated the average return of each portfolio for every quarter. The data for the first and last quintiles were analyzed via two-factor ANOVA, using sentiment index of the previous period (positive or negative) as the main factor and each characteristic as controlling factors. Finally, the sentiment index was included in a panel data pricing model. The results indicate a significant and negative relationship between the market sentiment index and the future rates of return. These findings suggest the existence of a reversion pattern in stock returns, meaning that after a positive sentiment period, the impact on subsequent stock returns is negative, and vice-versa.


Introduction
In recent decades, there have been several studies trying to improve classical theoretical models incorporating behavioural aspects often neglected. The growth in this non-traditional approach has been motivated by the need to explain regularly observed phenomena in financial markets which were incompatible with the predictions of classical models. Baker and Wurgler (2007) argue that it has been increasingly difficult to explain some financial events by the traditional theory of finance. Such events include investors subject to emotions who do not always value asset prices as the net present value of its discounted future cash flows. In this context, sentiment can be defined as beliefs about future cash flows and investment risks that are not rationally justifiable considering the information available to the investor.
Early research on behavioural finance occurred in the 1980s, and its main purpose was to demonstrate whether the stock market, as a whole, suffered from mispricing. Without much theoretical support, scholars were searching for evidence contradicting the efficient market hypothesis (EMH), leading to anomalies as price mean reversion (De Bondt & Thaler, 1985;Fama & French, 1988;Poterba & Summers, 1988) or excessive volatility in the market index not justified by the volatility of the firms' fundamentals of value (Shiller, 1981). More recent studies attempted to provide further explanations for the influence of financial market sentiment considering the two types of investors according to the classification of De Long, Shleifer, Summers and Waldmann (1990): (a) the rational arbitrageurs not influenced by sentiment, and (b) irrational investors, vulnerable to exogenous sentiment. Both types trade in a competitive market and set prices and expected returns for the assets. The intention of rational agents to make profit out of incorrect pricing is limited in several aspects, such as brief window of opportunity to trade, transaction costs and risks. These barriers justify the deviation of the prices from its fundamental value. Mispricing has two potential sources: (a) change in irrational investors' sentiment or (b) barriers to rational arbitrageurs.
The EMH assumes that price changes must be generated by random processes, with no systematic pattern. If patterns exist, investors would incorporate them to predict future prices and earn abnormal returns. However, assuming that investors do not follow a fully rational behaviour since they present bounded rationality and are subject to the influence of sentiment, and because cross-sectional and/or longitudinal patterns of sentiment-driven mispricing would be difficult to identify directly, our main research question is: are there any longitudinal and cross-sectional predictability patterns in stock returns depending upon proxies for sentiment? To achieve this purpose, the paper: (a) proposes a methodology for creating a sentiment index for the Brazilian market, and (b) verifies whether there is a relationship between market sentiment and future stock rates of return through ANOVA and a panel pricing model estimated with POLS, random and fixed effects and system GMM.
This paper contributes to the current Brazilian literature in behavioural finance by providing an innovative market sentiment index creation methodology based on indirect measures from Brazilian firms. Each measure used in the process is fully justified as being related to market sentiment and the results obtained follow an economic intuition. This paper also advances previous works in this field of study by testing hypothesis on the relation between sentiment and future stock rates of return via ANOVA models and GMM-estimated asset pricing models. Results show a significant and negative relationship between these two variables, suggesting the existence of a reversion pattern in stock returns, meaning that after a positive sentiment period, the impact on subsequent stock returns is negative, and vice-versa. This paper is organised as follows: after this Introduction, next section presents the Literature Review on market sentiment; then Methodology explains the creation of a sentiment index, ANOVA and portfolio formation, and a pricing model for panel estimation methods. The following section discusses the results of the Brazilian Market Index, the ANOVA tests and panel data estimation results, and the last section presents the conclusions. BAR, Rio de Janeiro, v. 9, n. 2, art. 4, pp. 189-210, Apr./June 2012 www.anpad.org.br/bar

Literature Review
According to Zhang (2008), sentiment can be defined as any erroneous beliefs that individuals have about an economic variable, such as asset prices. For Smidt (1968), it is the presence of sentiment that leads to speculative bubbles. For Zweig (1973) sentiment is related to cognitive biases of investors. C. M. Lee, Shleifer and Thaler (1991) define the market sentiment as part of their expectations about the returns of assets which are not justified by economic fundamentals. Baker and Wurgler (2006) define sentiment as the investor propensity to speculation; that is, sentiment drives the demand for speculative investments.
According to Shiller (1984), investors' behaviour often leads to fluctuations in asset prices, with no justifiable rationale. Black (1986) called investors' expectations about the returns of assets that are not based on its fundamentals of value noise trader sentiment. Likewise, Baker and Wurgler (2006) argue that the main cause of price fluctuations is the difficulty in valuing companies since investors do not have homogeneous expectations as predicted by the EMH. How market sentiment affects asset prices is a question that still generates different opinions. There are two possible explanations for the existence of these disparities: individuals correctly use misinformation or individuals incorrectly use accurate information. The first alternative assumes that investors adjust their beliefs about the fundamentals of value incorporating the noise, and the second assumes that they do it while misusing statistical tools.
The measurement of sentiment can be made through a latent variable, as Hair, Anderson, Tatham and Black (1998, p. 581) states: "construct or latent variables cannot be measured directly, but can be represented or measured by one or more variables". Thus, one way proposed by researchers to measure the expectation of investors about price trends in the market was by creating an index. There are several explanations for the association of a given variable to the construct of sentiment. Some of them relate to the market negotiability (turnover, IPOs, volatility) and others try to capture investors' mood variations (weather, sunny hours in day, season of the year, soccer results). For a detailed description of sentiment variables used in behavioural finance studies, see Qiu and Welch (2004), Bandopadhyaya and Jones (2006).
Many studies have been trying to find out if sentiment has a predictive power on stock returns. There is a variety of sentiment measures that were included in pricing models to test its relationship with stocks' price behaviour. Lutz (2010) verifies the influence of three different sentiment measures on future performance of stock prices: the Baker and Wurgler's Sentiment Index (Baker & Wurgler, 2006; the smoothed earnings-price ratio and the VIX (Volatility Index) calculated by the Chicago Board Options Exchange. His dependent variable is the market weighted portfolio return, using Fama-French approach. In this study, we use individual stocks in the pricing model, since there is not a concern of stocks being continuously traded without interruption (Saito & Bueno, 2007). His findings present that those sentiment measures have very little out-of-sample predictive power, though they present significant in-sample results. Shu (2010) studies the influence of mood on financial market behavior. The study shows how investor mood variations affect equilibrium asset prices and expected returns. The results indicate that both equity and bill prices correlate positively with investor mood, with higher asset prices associated with better mood. BAR, Rio de Janeiro, v. 9, n. 2, art. 4, pp. 189-210, Apr./June 2012 www.anpad.org.br/bar

Market sentiment index
An initial aspect to be discussed is how market sentiment can be quantified, and then examine whether there is some predictability of returns from this variable. Thus, it is necessary to create a variable that can measure the market sentiment and then check its relationship with stock returns listed on Sao Paulo Stock Exchange (BM&FBOVESPA). To estimate the sentiment index, we chose to apply the multivariate technique of Principal Component Analysis (PCA). According to Johnson and Wichern (2002), the PCA aims to explain the covariance structure of a set of variables with the use of linear combinations of these variables in order to reduce and provide better interpretation of the data.
The purpose of PCA is to replace the original variables by a smaller number of components without incurring a great loss of information. The sufficient number of principal components to adequately represent the theoretical construct under study can be defined by: (a) the relative values of the eigenvalues (variances of the components); (b) the total variance explained by the components; or (c) the interpretation of components and their relationship to the theory. Jolliffe (2002, p. 113) states that the percentage of total variance explained by the number of components remaining in the analysis will vary according to characteristics of the data.
One method used in the literature to determine the number of components to be retained in a PCA is the Kaiser's rule (Kaiser, 1960), which states that all components with eigenvalues greater than 1 should be retained. The justification lies in the fact that if all variables were uncorrelated with each other, each eigenvalue () would be equal to 1. Jolliffe (2002, p. 114) states that if  <1, then the component provides less information than the original variable and should not be used. Another technique for identifying the number of components is the parallel analysis, developed by Horn (1965). Parallel analysis is a method for determining the number of components to be retained from a PCA. The procedure consists of creating a random dataset with the same number of observations and variables as the original data. The correlation matrix for this randomly generated dataset is obtained and the eigenvalues are computed. When the eigenvalues from this random data are larger than the eigenvalues from the PCA of the original data, the components are mostly random noise and should not be retained in the model and can probably be regarded as spurious (Franklin, Gibson, Robertson, Pohlmann, & Fralish, 1995). Besides the number of retained components, one must be careful of the magnitude of the last component's eigenvalue. A value that is too small may indicate a linear dependence on the data (Johnson & Wichern, 2002). If this occurs, one or more variables are redundant in the model and should be excluded.
To construct the market sentiment index, we used the following variables, already used in other works such as Wurgler (2006, 2007) and Wang, Keswani and Taylor (2006): . S: percentage of equity share in new issues, given by E t /(E t + D t ), where E t is the total volume of equity issued by firms, and D t is the total volume of debt issued in offerings, according to Brazilian securities and exchange commission (Comissao de Valores Mobiliarios [CVM]); . NIPO: number of initial public offerings on BM&FBOVESPA, quarterly totalized; . TURN: stock turnover, given by the ratio between n t (total quantity of traded stocks at each quarter) and N t (total amount of outstanding shares at the end of each quarter); . DIV: difference between the logarithms of the market-to-book ratios of dividend-payer firms and non-payers. To aggregate these ratios of all dividend payers and non-payers, we calculated a weighted average using the market value of each company; . TRIN: technical analysis index to capture the market perception, called the Trading Index or contrarian indicator to detect overbought and oversold levels in the market. It is also known as the BAR, Rio de Janeiro, v. 9, n. 2, art. 4, pp. 189-210, Apr./June 2012 www.anpad.org.br/bar Arms Index, named after its creator, Richard Arms, in the 1970s. It measures the ratio between the average volume of declining stocks and the average volume of advancing stocks. A TRIN ratio of 1 means the market is in balance; above 1 indicates that more volume is moving into declining stocks; and below 1 indicates that more volume is moving into advancing stocks.
Another important aspect to be considered during the index construction is the correct time instant of the variables, whether they will be contemporary or lagged to form the index, since some of them must reflect changes in sentiment before others (Baker & Wurgler, 2007;Brown & Cliff, 2004). To determine this time instant, first we estimated the index with all five variables and their first lags. Other lags could also be used, but since we worked with quarterly data, it is unlikely that events that occurred six or more months before will have a greater influence on sentiment than more recent or contemporaneous events. From this first stage index, we calculated the correlation matrix between the index and all variables and their lags. To decide which instant of time (t or t-1) should remain in the index, we compared the magnitude of the correlation between each variable (and its lag) with the first stage index, choosing the one with the higher value. After choosing the appropriate instant of time, the parsimonious sentiment index was then calculated.
Theoretically, variables which are related to the investor behaviour should anticipate market sentiment. Thus, it is expected that TURN t-1 , DIV t-1 and TRIN t-1 present greater correlation with the sentiment index than their contemporaneous values. Moreover, variables that reflect the firm behaviour, like S t and NIPO t , should be directly related to market sentiment, being more correlated with the index than their respective lags.
Regarding the expected signs, variables related to the intensity of the volume of traded stocks are directly related to market sentiment. Thus, S and NIPO, which indicate a greater supply of equity shares by companies, as well as TURN, that shows increased trading on the stock exchange, must have positive sign in the sentiment index. On the other hand, variables TRIN and DIV, should present negative signs. Dividend-payer firms, in theory, have fewer opportunities to grow since they are not retaining resources to reinvest, and demand for them should occur more strongly when the market is pessimistic and less confident in investment projects. Conversely, when the market is optimistic, the demand should be greater for firms with investment opportunities which pay fewer dividends. The variable TRIN, likewise, has an inverse relationship with the sentiment index. Higher TRIN values indicate the expectation of a pessimistic market and vice-versa.
In order to assure that the sentiment is related to the stock rates of return, minus the effects of the economic cycle, we generated an orthogonalised index with the residuals of the regression of the original variables against the economic cycle variables. In this research, the economic cycle variables used were the Gross Domestic Product (GDP), and two dummy variables, dGDP and dSELIC. The first one assumes value 1 in case of positive change in GDP from one quarter to another and 0 otherwise. The variable dSELIC, in turn, assumes value 0 in case of increase in the Brazilian base interest rate (SELIC), and value 1 otherwise. The process of orthogonalisation softens the peaks and valleys, but did not affect the trend of the index.

ANOVA
In order to verify the existence of the relationship between market sentiment and the future stocks rates of return, we adopted the statistical methodology of analysis of variance (ANOVA). According to Neter, Kutner, Nachtsheim and Wasserman (1996), ANOVA is a versatile statistical tool to study the relationship between a response (dependent) variable and one or more explanatory (independent) variables, especially if the latter represents a qualitative characteristic. In this study, the dependent variable is the quarterly rate of return of portfolios, each representing a quintile, formed according to the magnitudes of the characteristic under analysis. In ANOVA each explanatory variable is called a factor. We adopted a two-factor ANOVA for every estimation. One common factor in all analyses presented in this paper is the level (positive or negative) of the market sentiment index. The other factor relates to the attribute used in the formation of portfolios. The firms characteristics were: (a) the market value of the company, (b) the total risk, and (c) age, measured as the number of years since the firm's first appearance on BM&FBOVESPA. These attributes were measured contemporaneously to the rates of return, and the sentiment index refers to the previous quarter (t-1).
The factors may be classified into different categories (levels). The first factor, market sentiment, has two levels: positive or negative, depending on the sign of the variable itself. The other factors, which are related to firm characteristics, are also separated in two levels: companies that are at the most extreme (first and fifth) quintiles. The decision to discard the intermediate quintiles is justified by the fact that firms with extreme values for those attributes are potentially more easily identified by investors in the market, while those in the intermediate quintiles may not be clearly distinguished. The combined levels of the factors are called treatments. Thus, when sentiment level is positive and the attribute (for example, risk) is classified as high, there is a treatment combination that corresponds to positive and high. It means that two factors, each with two levels, generate four different treatments.
Multifactor ANOVA studies have some advantages over single-factor ANOVA. According to Neter et al. (1996, pp. 797-798), the first benefit is the efficiency aspect: in a traditional approach each single factor would have to be manipulated at a time, ceteris paribus, which is not always possible in an observational study. The second advantage is related to the larger amount of information that would be needed to safely draw the same conclusions in a single-factor study. Since multi-factor ANOVA takes into account interaction effects between treatments, samples can be smaller. Finally, another advantage concerns to the validity of the results, since it is possible to insert another factor to control the results. In this research, the main factor is the market sentiment. The other factor, the characteristic of the firm, is used as a control, since it can also influence the response variable.

Pricing model
For a deeper investigation into the relationship between the sentiment index and the stocks rates of return, an asset pricing model was estimated. A major goal in Finance research is to determine which factors better explain individual assets returns, and asset pricing theory attempts to identify these factors. We proposed a panel data regression model to estimate and test the asset pricing relationship. The estimated model was: where is the stock rate of return of firm i in quarter t; is the sentiment index in period t-1; is the parameter associated with the sentiment index; represents the vector of k control variables, and is the vector of dimension (k x 1), transposed, of control variables parameters. By definition, , the error term, should not be correlated with the regressors.
The control variables used in the model were considered important factors in previous asset pricing empirical research. The purpose of including these variables in the model was to verify the influence of sentiment over stocks' rates of return, free from their effects. The following control variables were used: . firm size (ln MV) measured by the natural logarithm of the market value of the company; . market-to-book ratio (MtB); . financial leverage (LEV): measured by the ratio between the gross debt and market asset value of the company; . . growth opportunity (GROWTH): given by the percent variation in the net revenues of the company; . dummy variable indicating the industry of the firm: financial firms were excluded from the sample due to their specific leverage characteristics.
The parameters of the pricing model equation were initially estimated with pooled ordinary least squares (POLS). This method has the disadvantage of not taking into account the unobserved heterogeneity. It means that the POLS estimation does not contain a term related to non-observed effects which captures the peculiarities of the firms that remain invariant over time and that can influence the behaviour of the dependent variable.
The unobserved heterogeneity can be, for example, the firm image perceived by the market or even the quality of management. To consider this aspect, we estimated equation (1) with panel data: fixed effects (FE) and random effects (RE). The RE method assumes that the correlation between the explanatory variables and the unobserved effect is zero. The FE method allows the existence of that correlation, and both estimation results are also reported in Table 10. FE estimation always gives consistent results, although sometimes it is not the most efficient model. To compare both models, we used a modified version of the Hausman test as described by Wooldridge (2002, pp. 290-291) which makes the test robust to heteroskedastic and/or autocorrelated errors. The null hypothesis of the test is that the differences between the coefficients for the two methods are not statistically significant. In case of rejection of the null hypothesis, only FE would be consistent.
Both RE or FE estimation procedures require the assumption of strict exogeneity on the explanatory variables. This means that the error term of the model is non-correlated with the regressors in every instant of time. To check the condition of strict exogeneity on the regressors and validate the RE or FE estimation, Wooldridge (2002, p. 285) proposes two tests. The first one is based on first differences and the second one on the fixed effects estimators. The results led to the rejection of the null hypothesis of strictly exogenous regressors, indicating the need of an estimation method that appropriately addresses the problem of endogenous independent variables.
The GMM estimator can deal with problems of endogeneity using instrumental variables. According to Bond, Hoeffler and Temple (2001, p. 9), in System GMM estimation the instruments used in the level equations are the lagged first differences of the series, and this procedure requires the non-correlation between the lagged first differences of endogenous regressors and the level error term, including the specific effect. Specification tests were applied to verify if model estimation results were acceptable or not.

Sentiment index results
The descriptive statistics of the variables that make up the sentiment index are presented in Table 1 The eigenvalues of the components indicate that the first component explains 49.03% of the total variance of the sample, which is a major part of the common variation of the variables. On the scree plot, we can see that only the first component has an eigenvalue greater than 1, leading to the formation of an "elbow". It is possible to visualise in Figures 1(a) and 1(b) two methods of determining the number of components to be used in the PCA. Figure 1 The same calculating procedures to the index with the original variables were applied to the orthogonal variables. The orthogonalisation process intended to purge the macroeconomic effects of the sentiment index. The equation of the sentiment index with the orthogonalised variables, , is: All but two variables showed the same time instant in both equations. The exceptions were S and TRIN. In the orthogonalised index, , the variable S was the only one not to present the expected time, since we expected it to be the same as NIPO. The signs of the coefficients of all variables in both equations were as expected: positive for S, NIPO and TURN and negative for DIV and TRIN. The magnitude of the coefficients was also close in both indexes, indicating that the process of orthogonalisation did not cause significant changes. Table 2 presents the descriptive statistics of both sentiment indexes, the one with the original variables and the one with orthogonalised variables. Measures of central tendency and dispersion show that both indexes are similar.

ANOVA results
To analyse the relationship between the sentiment index and stock future rates of return, we formed portfolios based on three firms' characteristics. To be part of the sample, the company should have had a negotiability ratio (BM&FBOVESPA-created index) greater than 0.01 in the corresponding year. When the company had more than one class of shares listed at BM&FBOVESPA, we selected the class with greater trading volume. This restriction is needed since some variables such as market value or leverage of the company would be the same for different stock classes. After that, the quarterly rates of return of the sampled companies were classified into quintiles according to the magnitude of: (a) market value of the company, (b) annualized total risk and (c) age. Companies with lower market values (or total risk or age) form the first quintile, whilst the fifth quintile is formed by the highest market value firms (or total risk or age). In the specific case of age, the bottom quintile is formed by companies that are listed on BM&FBOVESPA since January, 02, 1986 (initial available date on Economatica database).
The separation of companies into quintiles results in a non-uniform distribution for the number of companies in each portfolio over time, ranging from a minimum of 10 firms in the first quintile (in the first quarter of 2002) to a maximum of 42 firms (in the fourth quarter of 2007). The average return per quintile was calculated assuming a naive allocation portfolio, meaning that the weight of each asset is equal to 1/n, where n is the number of stocks in the quintile. Naive allocation was used instead of any other strategy because it represents the simplest technique that could be followed by investors with no return forecasting ability (S. Lee & Stevenson, 2003) and because despite the sophisticated allocation strategies available, many of them do not consistently beat a naive portfolio in terms of Sharpe ratio or certainty-equivalent return as reported by DeMiguel, Garlappi and Uppal (2009), and because, according to Tang (2004), portfolios of the same sizes used in this research can eliminate 95% or more of diversifiable risk.
The orthogonalised sentiment index was quarterly classified as positive or negative and then related to the rate of return of each portfolio in the following period. The portfolios presented in Figure  3, numbered from 1 to 5, are grouped by the market value of the firms. Portfolio 1 contains the smallest sized (measured by market value) firms, increasing gradually until portfolio 5, formed by the biggest companies. The size effect, as proposed by Banz (1981), is not verified for this Brazilian sample. It can be noticed that companies with higher market value have higher average returns than smaller firms, contradicting the findings of Banz (1981). This effect is even stronger after a period of positive sentiment, when the average difference between large and small firms is more noticeable. The second characteristic analysed was total risk, measured by the standard deviation of daily rates of returns. It can be seen in Figure 4 that riskier firms (in the higher quintiles) do not have the expected higher rates of return. After conditioning to the sentiment level, riskier firms have negative rates of return after a period of positive sentiment and more positive rates of return after a period of negative sentiment. The intuition from the classical theory says that the higher the risk, the higher the returns should be. However, the sentiment index seems to better explain this difference since the returns are higher after a period of negative sentiment and lower after a positive index. Future rates of return according to the orthogonalised sentiment index in the previous quarter and total risk. The observations are quarterly rates of return of each portfolio. These were classified into quintiles from 1 to 5. The first quintile contains the observations of rates of return of lower total risk companies. Darker columns represent the average rates of return of the portfolios after a quarter of negative sentiment. Lighter columns represent the average rates of return of the portfolios after a quarter of positive sentiment.
A final characteristic examined was the number of years since the firm's first appearance on BM&FBOVESPA. We sought to determine whether there is a relationship between the rates of return and age. Since there was a significant amount of stocks with price series beginning on January 02, 1986, these companies were all classified in a separate category marked with an asterisk in Figure 5. The remaining companies were divided into quartiles following the same logic used for size and risk.
It was observed that older companies have positive returns, especially after a negative sentiment period. After a period of positive sentiment, young companies have negative returns, and it gradually increases with firm age. Younger firms only show positive returns after a negative sentiment period, but not as positive as older firms' returns. These results suggest that older companies, on average, provide higher returns than younger firms regardless of the previous sentiment level. It can also be said that after a positive sentiment period only younger firms show negative rates of return. In this case, the level of sentiment just changes the magnitude of the positive returns of older companies (quintile 5); older companies always present higher returns when compared to younger companies, for any sentiment level. This result may indicate that there is an age premium in the Brazilian market, that older companies are more well-known and established, with more consistent performance than younger firms. Future rates of return according to the orthogonalised sentiment index in the previous quarter and age. The observations are quarterly rates of return of the portfolios. In this case, since many companies had the same initial listing date available (January, 02, 1986), we decided to present the returns of these companies in a separate category, marked by an asterisk. Remaining firms were classified into quartiles from 1 to 4, where 1 is the portfolio formed by younger companies. Darker columns represent the average rates of return of portfolios after a negative sentiment period. Lighter columns represent the average rates of return of portfolios after a positive sentiment period.

ANOVA results for market value factor
The firms' market value was analysed as a control factor in the present study. The average rates of return for each treatment are shown in Table 3, as well as the standard deviation and the number of observations for each treatment. Figure 6(a) displays the average rates of return estimated for each treatment. It is possible to visualise that the lines that connect the averages for the levels of sentiment (positive and negative) are not parallel, indicating that there may be an interaction effect between factors. This interaction is more clearly identified when estimating the ANOVA itself, whose results are presented in Table 4, model 1. The level of observed significance for the interaction is very close to 0.05.  Results of the Levene's test indicated that the variances of the residuals across treatments were not statistically equal. Due to this result, a more robust approach, in order to cope with the heteroskedasticity errors, was estimated: ANOVA with HC3 type correction in the covariance matrix, as mentioned by Davidson and Mackinnon (1993, pp. 552-556). Results of this new estimated model are presented in Table 4, model 2. The observed significance levels were not very different from those of model 1. Besides the problem of heteroskedasticity, non-normality is also an issue of potential concern for the ANOVA. According to Neter et al. (1996, p. 762), when the sample size is sufficiently large, the normality test should be done for each treatment. In general, non-normality is a problem that comes with heteroskedasticity and in this study it was not different. One way to deal with this nonnormality issue is to apply some transformation in the response variable. However, this strategy was not successful because even after transformation, the rate of return proved to be not normally distributed. The remaining alternative was to verify whether the results are similar even after the application of a non-parametric approach. In this new analysis, the rates of return were classified into ranks and these values were treated as the dependent variable. Figure 6(b) presents these new results and model 3 in Table 4 indicates that the interaction effect is even more significant in the nonparametric approach. Results proved to be consistent with the two previous models, demonstrating the robustness of the estimation. If the interaction effect is significant in an ANOVA, it means that a certain factor influences the levels of the other factor in different ways. Looking again at Figure 6 it is possible to observe that if there was no interaction, the lines would be parallel. For example, if the average return of high market value companies after a period of positive sentiment was lower, parallelism would be obtained. In order to formally identify which are the effects of the interaction between the factors, we calculated the simultaneous confidence intervals for multiple comparisons of means using the Tukey-Kramer method, which is suitable when the treatments have different number of observations (Hsu, 1996). Assuming a 95% simultaneous significance level, the confidence intervals of the differences between treatments are presented in Table 5. The differences are statistically different from zero whenever the treatment Positive and Low is involved, indicating that low-market value firms after positive sentiment periods have rates of return that are significantly lower than other treatments. Therefore, an investor should notice that after a period of positive sentiment it is not recommended to invest in companies with small market values since they present significantly lower rates of return than larger companies.

ANOVA results for risk factor
Next, we investigated the relationship between the factors "market sentiment" and "total risk". Table 6 presents descriptive statistics for the treatments obtained from the combination between these two factors. The difference in standard deviation magnitudes between groups low and high for factor risk is notable. Figure 7(a) displays the estimated average for each treatment. Once the results are controlled by sentiment, higher returns are not always obtained for higher risk portfolios. Model 4, presented in Table 7, indicates that there is no significant interaction effect, or even significant difference between the rates of return of portfolios formed by high-risk and low-risk firms. However, significant difference between rates of return were found for the sentiment factor, indicating that after a period of negative sentiment the rates of return are higher than those observed after a positive sentiment period, despite the level of portfolio risks.  Even with the covariance matrix correction for heteroskedasticity in the residuals (model 5) or using ranks rather than rates of return (model 6), the results were similar to those of model 4, indicating that the results are quite robust. Figure 7(b) shows the representation of the non-parametric approach. Briefly summarizing, total risk is not an adequate factor to drive investment decisions in the presence of the sentiment factor. The latter, in fact, determines such decisions, since the rates of return are higher after a period of negative sentiment and lower after a positive sentiment period.

ANOVA Results for Age Factor
Finally, the relationship between the factors age and market sentiment was investigated. Table  8 presents the descriptive statistics for treatments related to these two factors. Figure 8(a) displays the estimated average for each treatment. The estimation of the model 7, presented in Table 9, indicates that there is no interaction effect between the factors. However, there is a significant difference between the average rates of return of each factor individually: portfolios have higher rates of return after periods of negative sentiment despite the average age of firms, and older firms' portfolios have higher rates of return despite the sentiment level of the previous period.    Model 8 results, estimated with the covariance matrix correction for heteroskedasticity, do not differ much from the ones in model 7. However, model 9, which uses ranks instead of the return rates, shows results slightly altered. In particular, the observed significance level for the sentiment factor is no longer significant at the 5% level. Figure 8(b) suggests a parallelism between the levels of sentiment when using the nonparametric approach, which means that the effect of interaction is even less significant. For the age factor, ANOVA suggests that it is more profitable, on average, to invest in

Pricing model
The parameters of the pricing model equation were initially estimated with pooled ordinary least squares (POLS). This method has the disadvantage of not taking into account the unobserved heterogeneity. It means that the POLS estimation does not contain a term related to non-observed effects which captures the peculiarities of the firms that remain invariant over time and that can influence the behaviour of the dependent variable. The only reason for reporting these results in Table  10 is for comparison purposes.
The result of the Hausman test led to the rejection of the null hypothesis, i.e., the FE model should be preferred. Note. The dependent variable is the stock return rate of firm i in quarter t. The independent variables were defined in section 3.4. The estimates for the industry dummies and the intercept are not reported in the table. Time dummies were not used, since the variable is already orthogonalized and captures the effect of macroeconomic changes occurring in the period. The estimator used is the System GMM with one or two stages. It is assumed that only the industry dummies are exogenous. The standard errors were obtained using the data clustered by firm and robust to all forms of heteroskedasticity and autocorrelation of the model errors. *, ** and *** denote the statistical significance at the levels of 10%, 5% and 1% respectively. For the first and second orders autocorrelation, Hansen's J and the DIF-Hansen tests, it is presented the test statistic and, in parentheses, its descriptive level (p-value).
A problem that may arise from the use of System GMM estimators is the large number of instruments, which can lead to the over-identification of the model. Therefore, we applied the Sargan/Hansen over-identification test. The null hypothesis of the test is the non-correlation between the set of instruments and the errors, which implies the correct linear specification of the model. The results presented in Table 10 suggest that these conditions are acceptable, since the null hypothesis was not rejected in any of the specifications. Tests for first and second order autocorrelation (m1 and m2) proposed by Arellano and Bond (1991, pp. 281-283) are also reported in Table 10. If second order autocorrelation is present, some lags may be invalid as instruments. Results show a consistent pattern BAR, Rio de Janeiro, v. 9, n. 2, art. 4, pp. 189-210, Apr./June 2012 www.anpad.org.br/bar with the hypothesis of non-correlation in all models, with a negative and statistically significant value for m1 and not significant for m2.
In order to verify the validity of the additional assumptions required by System GMM when compared to Difference GMM, we performed the difference-in-Hansen test (DIF-Hansen in Table 10). The null hypothesis of the test is that the additional instruments in System GMM are valid. The results show that the null hypothesis cannot be rejected, which reinforces the use of the System GMM. For this reason, Difference GMM results are omitted here. To verify the robustness of System GMM estimation results, one and two-step procedures were run. The two-step estimation, though asymptotically more efficient than the one-step, tend to present downward-biased standard errors. To mitigate this problem, the finite-sample correction proposed by Windmeijer (2005) was used.
It can be seen from Table 10 that the variable that represents the sentiment index is negative and statistically significant in all models. This implies that after a period of positive sentiment, stocks' rates of return are lower and vice-versa. This result corroborates the findings in the analysis of variance. Another important fact to be noted is that the beta coefficient was not significant in the presence of the market sentiment index. All other control variables were statistically significant to some degree despite the estimation model. This result points to the importance of the sentiment index as a relevant factor in pricing models, even in the presence of the measure of systemic risk.

Conclusion
In the classical theory of finance, investor sentiment is not considered an important variable for explaining stock prices. The results presented in this article refute this idea. After the proposition of a methodology for creating a sentiment index for the Brazilian market, we analysed the relationship between the stock rates of return and the level of market sentiment using analysis of variance and a panel data pricing model. Firms were quarterly classified into quintiles according to the following factors: market value, total risk and age. For each quintile (representing a portfolio of stocks) we calculated the average returns according to the level of the sentiment index in the previous quarter (negative or positive). After a positive sentiment period, stocks which are attractive to optimistic investors and speculators (smaller, riskier and younger firms), and less attractive to arbitrageurs, have lower returns. Moreover, after a period of negative sentiment, this pattern is attenuated (for age and market value factors) or even reversed (for the risk factor).
These conclusions were achieved after a two-way ANOVA with sentiment as the main factor and each of the firm characteristics as the controlling factor. For each attribute the ANOVA helped to identify the presence of interaction between the two factors. In case of no interaction, each factor was individually analysed. The market value factor was the only one that showed a statistically significant interaction with sentiment. In this case only, the four treatments were analysed separately. Results showed that after a period of positive sentiment, low market value stocks had significantly lower returns than other combinations of factors.
The interaction effects between sentiment and each of the other two control factors were not statistically significant. For the risk factor only sentiment was significant, confirming that after a period of negative sentiment the rates of return are higher than those after a positive sentiment period. Risk itself was not a significant factor: high-risk portfolios rates of return were no different from lowrisk portfolios rates of returns. For the age factor, it was found that after a period of negative sentiment, returns were significantly higher than after a positive sentiment period, and that the portfolios comprised of older companies had significantly higher returns.
All initial ANOVA results were subsequently validated by more robust estimation techniques. Concerns with heteroskedastic residuals have been mitigated with the use HC3 type covariance matrix BAR, Rio de Janeiro, v. 9, n. 2, art. 4, pp. 189-210, Apr./June 2012 www.anpad.org.br/bar correction as described in Davidson and Mackinnon (1993). Issues with normality of residuals were mitigated with the estimation of a non-parametric model as suggested by Neter et al. (1996). Results did not change in a relevant manner, showing their robustness.
Finally, we estimated a pricing model including the market sentiment index, the systematic risk (beta) and factors such as market value, market-to-book ratio, leverage, and growth opportunities. Pricing model results confirm that the sentiment variable plays a relevant role. The stability and robustness of these results were investigated by estimating the model using different techniques: POLS, random effects, fixed effects and system GMM. No significant variation was found. These results open up possibilities for future research in finance: other ways of measuring investor sentiment can be employed, the process of orthogonalising the index can be done against other variables, and different control variables can be included in the pricing model. The inclusion of a behavioural variable is encouraged in future asset pricing research.