Readability as a measure of textual complexity: determinants and evidence in Brazilian companies

The aim of this article was to evaluate the effect of company earnings and of harmonization with IFRS on the readability of Management Reports in the Brazilian stock market. There is a gap to be filled both in the elaboration and adaptation of readability measures to the context studied, as the studies tend to replicate the original formulas, and in identifying the determinants of the readability of Brazilian company reports, as the research in this field remains in its infancy and the results are inconclusive. The results provide indications for investors to identify complex textual information and may help public policymakers to establish a simple writing manual, along the lines of the SEC’s 1998 Plain English Handbook. The modified metrics and the one developed overcome the criticisms regarding the use of readability formulas in accounting research and could be used in substitution of the original metrics in future studies. An econometric model was used that presents the determinants of readability. Readability was calculated for the Results Analysis section of the Management Report. The resulting construct is understood via three attributes: persistence, current performance, and the reference benchmark. Harmonization with IFRS is a dummy variable, which delimits the pre- and post-IFRS periods. The hypotheses were tested in a sample of Brazilian companies made up of 714 company-year observations covering the period from 2006 to 2019. The descriptive results show that there is an apparent improvement in the readability of the reports in the pre- and post-IFRS period comparison. The econometric evidence shows that, in general, companies with persistent and positive earnings present less complex reports and are more likely to have highly readable reports, because managers publish reports with better readability to signal positive results to the market.


INTRODUCTION
The set of qualitative and quantitative information expressed in the accounting statements and in the reports elaborated by management is companies' main means of communication with the interested parties, and it is fundamental to assist in the decision making and monitoring of companies. Thus, it is legitimate to argue that the communication of accounting events is just as relevant as the measurement itself. The utility of the information depends on how the message is sent and perceived by the user, because without effective communication, the accounting loses its informational property and just becomes an archival report of statistics about company performance.
The development of reports for practice and research has come into the spotlight. Studying their content is relevant because: (i) they contain a wide set of information (Rutherford, 2005); (ii) they are presented in textual form, in the form of words, tables, graphs, and images (Brennan & Merkl-Davies, 2018), or verbal form, as teleconferences, management presentations, and meetings (Beattie, 2014); (iii) the preparers may present discretionary information due to the freedom involved in their preparation; and (iv) they involve serving publics with different informational needs and varied abilities and knowledge to handle the information. These characteristics prompt a more in-depth investigation into the role of the textual content of reports within the informational environment.
The importance of reports is recognized by public policymakers. According to the Brazilian Securities and Exchange Commission (CVM), information about accounting-financial events should be written in a simple and direct way (CVM/SNC/SEP Circular Letter n. 1,2005). Likewise, the Brazilian Accounting Pronouncement Committee (CPC) has shown concern about the use of textual information, by supporting the idea of less technical and more informative writing (Technical guideline OCPC-07, 2014).
Analyzing the narrative part of reports provides an understanding of the textual complexity dimension. In the accounting literature, the studies use readability measures as a proxy for textual complexity (Jones & Shoemaker, 1994;Loughran & McDonald, 2016) and they agree that, for accounting information to achieve its objective, it should be as simple as possible, that is, it should present greater readability (Rutherford, 2003).
Developed in the research on linguistics and educational psychology, readability combines factors related to a text that influence the way in which a group of readers understands it (Dale & Chall, 1948;McLaughlin, 1969). As a general rule, readability formulas attempt to measure the complexity of a text through the use of frequent words and complex syntactic structures. Thus, it is argued that readability captures the degree of textual complexity, where texts with higher (lower) readability provide a more (less) complex read.
Although it is considered an important topic, there are criticisms about the research that uses the original readability metrics in the accounting literature, because the formulas should be adapted to the context studied and not simply replicated (Loughran & McDonald, 2016;Rennekamp, 2012). Yet the use of readability formulas should also not be abandoned or discouraged (Stone & Parker, 2013), but instead redefined (Loughran & McDonald, 2016;SEC, 1998). Readability indices are not a "seal of approval" for writing (Bogert, 1985), but rather help in identifying elements that hinder interpretation. Particularly, this study introduces new data and readability measures. By modifying the traditional readability formulas, they were adapted to the context studied. It also develops a readability metric based on the Coh-Metrix-Port computational textual analysis tool. It is worth noting that, unlike Malaquias and Silveira (2019), who elaborate a readability measure in the national context, this study employs other textual attributes to capture readability, as well as using multivariate statistics and additional robustness tests, which are absent in the aforementioned research.
By measuring readability, it is possible to identify the determinants of textual complexity in the informational environment. Conceptually, managers have incentives to modify the texts of reports according to company earnings, with the aim of actively indicating a positive result (Rutherford, 2003;Smith & Taffler, 1992) or hiding a poor result (Bloomfield, 2008;Li, 2008). This is because, in their role as preparers, managers can benefit from informational asymmetries via linguistic artifices, by presenting a view of company results that is in their own interests (Guay et al., 2016).
The research that strictly analyzes readability in the Brazilian stock market is recent and lacks a conceptual and empirical complement. Gomes, Ferreira, and Martins (2018) and Santos, Calixto, and Bispo (2019) directly study the length of footnotes in relation to technical guideline OCPC 07 of 2014 and identify that this guideline explains the reduction in the length of footnotes. Silva, Rodrigues, and Abreu (2007) investigate a possible relationship between Management Reports and the financial result of companies and identify that larger companies have more extensive reports. Silva and Fernandes (2009) apply the Flesch index to analyze the readability of 4,533 relevant facts disclosed in the years from 2002 to 2006 and observe that only 10% are easy to read. Borges and Rech (2019) seek to identify the determinants of the readability of footnotes. Their results show that, in general, footnotes present a high degree of complexity. By analyzing the bilateral relationship between readability and earnings, Souza et al. (2019) identify that managers deliberately add complexity to the accounting narratives in order to hide information about poor corporate performance. Holtz and Santos (2020) investigate the determinants of the readability of footnotes and verify that the size and performance of companies impact readability metrics.
Even with the aforementioned research, there remains a gap to be filled, both in the elaboration and adaptation of readability measures to the context studied, and in identifying the determinants of readability in Brazilian company reports, as the research in this field remains in its infancy and the results are inconclusive.
It warrants mentioning that Brazil, in particular, has high ownership concentration levels (Leal et al., 2015) and has undergone a recent process of harmonization with the International Financial Reporting Standards (IFRS). Thus, Brazil may be a representative case for other emerging markets, and so a significant contribution to the literature on textual complexity is expected.
Based on the aforementioned gaps and motivated by the need to provoke a reflection on the growing importance of accounting narratives as an instrument for communicating information, this study intends to answer the following question: what are the effects of company earnings and of harmonization with IFRS on the readability of Management Reports in the Brazilian stock market?
To operationalize the research, textual complexity is configured by means of readability indices, the object narrative of the study is the Management Report, harmonization with IFRS is a variable that distinguishes the pre-from the post-IFRS period, and company earnings are understood through three attributes: persistence, current performance, and the reference benchmark.
The findings may be useful for capital market participants. It is hoped that investors will identify and demand less complex disclosures, that is, with greater readability, for their analyses. Moreover, the results may provide support for regulators and standard setters to apply public policies, to mitigate the use of reports that are seen as complex at a local level, since there is no regulation for universal standards for writing reports. It is hoped that managers will observe the potential of reports and undertake proactive initiatives. In a more ambitious analysis, the readability indices applied in this study could be incorporated into a simple writing manual, along the lines of the Plain English Handbook of the US Securities and Exchange Commission (SEC), of 1998.

Relationship between Readability Measures and Earnings Persistence, Current Performance, and the Reference Benchmark
The preparer's perspective is a dimension that can explain the causes of readability levels. Based on this perspective, managers can use subtle mechanisms to influence investors' behavior. The argument is that managers tend to actively indicate a "good" result, while they seek to hide a "poor" result (Bloomfield, 2008;Li, 2008;Rutherford, 2003;Smith & Taffler, 1992). In both cases, it is assumed that managers, in their role as preparers, are partial and present modified earnings.
In accordance with the aforementioned discussion, Li (2008) highlights that: (i) companies whose current performance is considered to be poor publish longer and less readable annual reports and (ii) companies with less complex annual reports have greater performance and more persistent earnings. Dempsey, Harrison, Luchtenberg, and Seiler (2012) corroborate these findings, by identifying that companies with poor results present less readable annual reports. Lo, Ramos, and Rogo (2017) identify that: (i) companies that manage the reference benchmark, that is, they modify their current earnings with the aim of exceeding those of the previous year, present annual reports with low readability; and (ii) companies with satisfactory results present reports with greater readability.
The choice of earnings measures as determinants of readability directs the explanation of the empirical findings. Fundamentally, three attributes of earnings are employed as determinants: persistence, current performance, and the reference benchmark.
Persistence is a desirable attribute that measures the sustainability of earnings. For example, persistent earnings: (i) are sought because they are recurrent (Penman & Zhang, 2002); (ii) have greater informational content (Kormendi & Lipe, 1987); (iii) reduce companies' cost of own capital (Francis et al., 2004); among others. As persistent earnings are desirable, companies are expected to have incentives to publish more readable reports when earnings are persistent. In the accounting literature, Li (2008) investigates the aforementioned relationship, separating his study sample into one group of companies with gains and another with losses. In turn, Souza et al. (2019) use proxies for readability as predictors of future earnings.
The current year's performance, henceforth called performance, forms part of the debate because managers may improve the readability of their reports to indicate positive results (Rutherford, 2003;Smith & Taffler, 1992). Using readability metrics of the "the less, the better" type, Dempsey et al. (2012), Li (2008), Lo et al. (2017), and Souza et al. (2019) observe a negative relationship between performance measures and readability, that is, positive changes in performance promote an improvement in readability.
According to the notion of benchmark, it is conjectured that if a company does not achieve a reference benchmark -current earnings values in comparison with the previous year -the managers will act to hide the information (Lewellen et al., 1996;Lo et al., 2017). This leads to the understanding that more complex reports enhance an information-based obfuscation problem (Bloomfield, 2008). Lo et al. (2017) find evidence that companies that manage their earnings to achieve or exceed a reference benchmark present less readable reports.
Based on the preparer's perspective, it is argued that managers may modify reports according to company earnings. In light of this discussion, the following hypotheses are presented: H1a: The readability of Management Reports is higher for companies with persistent earnings.
H1b: The readability of Management Reports is higher for companies with better performance.
H1c: The readability of Management Reports is lower for companies that have not exceeded a reference benchmark.

Harmonization with IFRS and its Relationship with the Readability of Reports
Factors of the environment outside the company interfere in the way reports are elaborated, because sometimes these factors predominate, as is the case of harmonization of accounting standards. According to the assumption of obligatory disclosure, the changes promoted over the years affect textual attributes that can modify readability (Cazier & Pfeiffer, 2017;Dyer et al., 2017). This discussion suggests that readability is impacted by accounting harmonization.
The IFRS constitute an accounting standard based on principles whose main aim is to promote an improvement in accounting practices, compared with local standards or other norms (Barth et al., 2008). The stated objective of IFRS 01 (First-time Adoption of International Financial Reporting Standards) is to ensure that accounting statements and intermediate statements contain high quality information. This quality can be extended to textual presentation in the form of reports. Cheung and Lau (2016) show that the annual reports of Australian companies are longer but less complex in the post-IFRS period. Boubaker, Gounopoulos, and Rjiba (2019) highlight that harmonization with IFRS improved the readability of the annual report of French companies. In a multicountry study, Lang and Stice-Lawrence (2015) suggest that the textual attributes improved after harmonization with IFRS. In summary, there are indications that IFRS adoption enabled improvements, even if indirectly, in the readability of reports.
Considering the regulatory factor as a measure of the external environment that can influence the form in which reports are elaborated, the following hypothesis is proposed:

SAMPLE SELECTION AND DATA HANDLING
To examine the hypotheses, we chose a sample of Brazilian companies whose shares were traded on the Brasil, Bolsa, Balcão (B3) stock exchange, with available data covering 2006 to 2019. Table 1 summarizes the sample selection and processing procedure. Note: Financial sector companies (Finance and insurance and Funds) were excluded due to structural, operational, and financial differences (Healy & Wahlen, 1999). The "Others" sector was excluded due to the difficulty of allocating the companies into a specific sector. Companies with missing data for calculating the variables were excluded. The liquidity index was obtained using the formula elaborated by the Economática® database itself. This is labelled as Liquidity and is presented as a control variable in Table 2. Companies with a stock liquidity index below 0.0001 in at least one year of the period analyzed were excluded. According to Silveira (2006), the market values of shares may not be realistically reflected in companies with low liquidity, which hinders the calculation of financial variables. Finally, given that the trends over time may modify the reports (Cazier & Pfeiffer, 2017;Dyer et al., 2017), we chose to adopt a balanced panel.

Source: Elaborated by the authors.
To calculate the readability measures, the texts were collected from the Results Analysis section of the Management Reports. We chose to collect from the aforementioned section as this is equivalent to the Management Discussion and Analysis, which is mostly studied in the international studies. This enables a comparison with those studies.
To make the documents readable in the textual analysis software: (i) non-textual elements were excluded, (ii) abbreviations with periods were converted to normal abbreviations, (iii) words with a hyphen were modified, (iv) possible punctuation errors were eliminated, and (v) possible orthographical errors and conversion process errors were corrected.
Initially, the pre-processed texts were converted to PDF and submitted for analysis in the Atlas.ti® software and in the online syllables separator, available at https:// www.separarensilabas.com/index-pt.php. This enabled the elaboration of a "list of words with three or more syllables. " To avoid misleading results due to the excessive classification of easy-to-read words as complex, own names and words in a language other than Portuguese were excluded from the "list of words with three or more syllables. " These exclusions did not substantially alter the meaning of the text, but as they overestimated the readability calculation the exclusions were necessary. After this pre-processing stage, the "list of words with three or more syllables" was renamed the "list of complex words. "

Modified readability formulas
From an empirical perspective, the main readability formula used in the accounting research are the Flesch index, developed by Rudolf Flesch, and the Fog index, developed by Robert Gunning (Li, 2008;Loughran & McDonald, 2016). It should be noted that the original formulas considered words with three or more syllables as complex.
In a recent study, Kim, Wang, and Zhang (2019) modify the traditional readability formulas. The authors maintain the concept of complex words from the original formula, but they subjectively exclude words with three of more syllables that are judged as easy to read. In this research, the list of complex words follows the logic employed by Kim et al. (2019), but with less subjective exclusions. The complex words from the original Flesch and Fog formulas are substituted by the list of complex words described in the previous section.
Originally, higher Flesch values and lower Fog values indicate less complex texts. To standardize the interpretation of the results and obtain a better econometric fit, the natural logarithm (ln) was applied to the Flesch measure, and the Fog measure was multiplied by -1. The modified formulas, which cover the new list of complex words, are labelled as ModFlesch and ModFog and presented in equation 1 and in equation 2, respectively. The modified formulas are valid because there are demands that the text places on the readers' previous knowledge and on working memory . Both characteristics are associated with the length of the words, as a measure of previous knowledge, and with the length of the sentences, because longer sentences require more working memory and reading capacity (McNamara et al., 2014).

Proposed readability measure
To calculate the proposed readability measure, the texts were initially submitted to the online Coh-Metrix-Port software, available at http://143.107.183.175:22680/. This is the version with 48 metrics adapted to Portuguese of the computational textual analysis tool elaborated for writing in English, Coh-Metrix. This software, documented by Graesser et al. (2004), was developed, refined, and tested between 2002 and 2011, at the University of Memphis.
Coh-Metrix offers new possibilities for understanding the factors related to a text. Supported by a multilevel discourse structure , it enables the complexity of a text to be measured as a multidimensional construct, thus overcoming the criticisms of applying traditional readability measures. Chang and Stone (2019) introduce Coh-Metrix in accounting studies. The authors use eight orthogonal factors to measure readability and analyze and test the elaborated variable in a set of 370 auditing proposals sent to the state and local governments of the USA. This was the first study, as far as we know, to use Coh-Metrix to elaborate a readability measure in the context of corporate communication.
In Coh-Metrix-Port, the 48 metrics are grouped into ten modules. To calculate the proposed measure, the texts in Word format were submitted to Coh-Metrix-Port. After the individual analysis, 15 metrics that cover six modules were selected to form part of the readability formula. The modules chosen were: 1. Logical operators: In the field of semantics, logical operators are linguistic elements responsible for highlighting the intention of a discourse. These elements were coined by Ducrot (1972 (Pezatti & Camacho, 1997). Form of calculation: ∑Constituents = (Number of modifiers by nominal syntagmas) + (Sample mean of words before main verbs in the main clause of the sentence). 4. Ambiguity: Ambiguity is a linguistic phenomenon that enables more than one meaning for a word (Ferreira, 2000). Form of calculation: ∑Ambiguity = For every adjective/adverb/noun/verb in a text, the number of meanings presented in the TEP Brazilian Portuguese Electronic Thesaurus (Maziero & Pardo, 2008) are added up and the total is divided by the number of adjectives/adverbs/nouns/verbs, respectively. 5. Coreference: Coreferences are expressions in a text that have the same referent, that is, they refer to the same person or thing (Morgado, 2011). Form of calculation: ∑Coreference = Proportion of adjacent sentences that share one or more arguments (nouns, pronouns, or nominal syntagmas). 6. Anaphoras: Anaphoras retrieve by means of direct or indirect reference to a previous term, thus, anaphoras are related to the notion of repetition (Milner, 1982). Form of calculation: ∑Anaphoras = (Proportion of anaphorical references among adjacent sentences) + (Proportion of anaphorical references that refer to a constituent present in up to five previous sentences).
The proposed readability formula is presented in equation 3. This process provides the variable labelled as "Simple Writing, " in which scores close to 0 represent documents that better fit writing standards, that is, which are less complex.  Table 2 shows a summary of the main independent and control variables used in additional robustness tests.

Table 2
Summary of the variables used in the research

Persistence
Estimate of the coefficient of inclination β1i based on the model presented in equation 4. This is the first-order autoregressive model (AR1) estimated for every company-year using 7-year windows.
Earnings per share Earnings per share µ Concept: A positive relationship is expected between the Persistence measure and the readability variables. Earnings per share is understood as the values of profit or loss per share. The interpretation of eq. 4 is that the higher β 1i values indicate high persistence and values close to 0 indicate transitory performance or low persistence. Note that the choice of analysis window follows the research of Francis et al. (2004), who opted to employ a 10-year window. In this research, a 7-year window is preferred, because we chose to employ a balanced data panel. Thus, by choosing a bigger window of years, the final sample is substantially reduced, as was observed in unreported tests, which would hinder the statistical inferences. The procedure for calculating persistence by estimating an autoregressive model followed the studies of Barton, Hansen, and Pownall (2010) and Francis et al. (2004). Unlike the present study, which employs Earnings per share, Barton et al. (2010) use a composite earnings variable and Francis et al. (2004) employ a measure of net income before extraordinary items. To identify possible econometric problems, the augmented Dickey-Fuller test was applied (Said & Dickey, 1984). The procedure was carried out with the test equation with trend, with drift, and with a 1-period lag, as discussed by Gujarati and Porter (2011). Initially, the test was conducted by company for all the data windows, from 2000 to 2019. At this stage it was not possible to reject H0 for 12 companies, that is, 12 companies presented a non-stationary data series. To test the effect of this violation of the property of a series, the observations were excluded from the sample and the parameters of model (1) were re-estimated. As there were no discrepancies in relation to the original values, we chose to maintain the observations in the database. Subsequently, the augmented Dickey-Fuller test was conducted by company-year using 7-year windows, which resulted in 14 statistics for the test by company. As before, the companies that presented a non-stationary data series were excluded from the sample of companies and the parameters of model (1) were re-estimated. As no significant changes occurred in the estimate of the parameters, the companies were maintained in the sample.

Performance
= EBIT / Total assets Concept: The Performance proxy captures accounting performance. A positive relationship is expected between the Performance measure and readability. The operationalization of the variable is in accordance with Li (2008) and Lo et al. (2017). Benchmark = 1 if the variable for Performance from year +1 to year t1 is negative, or 0 otherwise.
Concept: a negative relationship is expected with the readability measures, indicating that companies that did not achieve the reference benchmark in relation to the previous year tend to disclose more complex reports. This variable is in accordance with the one presented by Li (2008) and Lo et al. (2017).

Control variables
Concept: Bloomfield and Wilks (2000) and Heflin, Shaw, and Wild (2005) argue that an improvement in disclosure increases the demand for shares, with a resulting increase in their liquidity. Boubaker et al. (2019) and Lang and Stice-Lawrence (2015) are examples of studies that empirically test this relationship. And a positive sign is expected of the liquidity proxy when related to readability.

Size = ln of Total assets
Concept: Bigger companies tend to be more complex, both operationally and geographically (Dempsey et al., 2012). Such complexity may be reflected in the readability of their reports, because bigger companies issue more complex reports (Ajina et al., 2016;Dempsey et al., 2012;Li, 2008;Lo et al., 2017). The sign expected for the size variable is negative. Volatility = Standard deviation of the Performance variable (EBIT / Total assets) for a 5-year window.
Concept: It is assumed that, in more volatile business environments, corporate communication is more complex (Li, 2008). Thus, a negative sign is expected for this variable. Cheung and Lau (2016), Li (2008), and Lo et al. (2017) are examples of studies that use this variable. Indebtedness = (Current liabilities + Non-current liabilities) / Total assets Concept: Companies with a high level of debts may publish more complex reports, with the aim of persuading capital providers (Dempsey et al., 2012). A negative relationship is expected between indebtedness and readability as in Ajina et al. (2016) and Dempsey et al. (2012).

Age
= ln of the difference, in days, from the date the company was founded to the closing date of the financial statements Concept: Companies in operation for longer may present less complex reports due to the greater asymmetry and uncertainty in their information (Li, 2008). This variable was used by Li (2008) and Lo et al. (2017). The Ranking variable takes the value of 1 if the company belongs to the last two quintiles (Q4/5) of the readability ranking and 0 if it belongs to the first two quintiles (Q2/5). The readability ranking is captured after multiplying the percentages of shared variance, obtained from the factor loadings of the textual characteristics ModFlesch, ModFog, and Simple Writing.

Note: ln is the abbreviation of the natural logarithm. EBIT is the abbreviation of Earnings before interest and income tax.
Source: Elaborated by the authors.

Models and Analysis Technique
The relationship between readability, earnings, and harmonization with IFRS, established in the literature, is empirically tested by the following econometric model: Initially, the parameters of the model are estimated with the ModFlesch readability measure. Subsequently, the model is re-estimated, taking as a dependent variable the readability indices obtained from the modified Fog formula (ModFog). Finally, the model is re-estimated with the readability values calculated by the proposed formula (Simple Writing). It is worth noting that the readability indices are of the "the greater, the better" type.
Based on the discussion proposed by Loughran and McDonald (2016) about the fact that sector characteristics impact the textual analysis and the understanding that temporal trends affect textual attributes such as readability (Cazier & Pfeiffer, 2017;Dyer et al., 2017), the econometric models were estimated, including fixed effects by sector and year in a balanced panel.   The value of the ModFlesch index should not be compared with the standard interpretation of the original index, given that the variable was modified in this study. To provide a reference, the effect of the modifications proposed in the Flesch measure (ModFlesch) is compared with the natural logarithm of its original peers (Original Flesch), for that reason this is presented in Table 3. The mean and median Original Flesch values are lower when directly compared with the modified formula (mean = 3.46 and median = 3.53). This indicates that the original formula presents overestimated values, leading to the classification of reports as more complex, when they are not. The dispersion of the data is greater in comparison to the modified variable (0.32 against 0.04), because the minimum (2.30) and maximum (3.91) values are far from the lower (3.33) and upper (3.66) quartiles. Altogether, a greater concentration of values above the sample mean is noted, as the result of negative asymmetry for the Original Flesch variable. In a direct comparison, substituting the modified variable for its original version in the econometric models would result in biased results, because the textual complexity of the reports would be overestimated.

Descriptive Statistics for the Main Interest and Control Variables
The original values of the Fog measure (Original Fog) are presented for the purposes of comparison with its modification (ModFog). The original index estimates the years of formal education needed to understand the text. The mean (median) value of the Original Fog index is -10.83 (-10.56), which is lower than the modified Fog values, ModFog, -10.29 (-10.02). The adjustment in the mean value of 0.54, derived from the difference between the means of the original and modified Fog, is less accentuated than the one revealed by Kim et al. (2019). The modification procedure of the aforementioned authors resulted in a mean difference of 6.736 (original Fog = 19.693 and modified Fog = 12.957). The difference between this research and the study by Kim et al. (2019) is explained by the formula for calculating complex words, given that Kim et al. (2019) elaborated a list of 2,028 words that were considered complex, according to their understanding. This procedure was not adopted as it was considered discretionary and prohibitive of replication.
The proposed readability measure, Simple Writing, presents values concentrated in the upper part of the sample, which shows negative asymmetry (median higher than the mean: -6.19 > -10.38). The discrepancy between the quartiles (Q 1/4 = -10.83 and Q 3/4 = -3.63) and the oscillations in the dispersion measure (SD = 12.13) are more sensitive in the Simple Writing variable, due to it covering various linguistic elements (Logical operator, Tokens, Constituents, Ambiguity, Coreference, and Anaphora). This could suggest that managers may not use all these linguistic artifices with the same frequency to modify the textual complexity in their reports.
On average, the companies present low sustainability of earnings (persistence = mean 0.22). This does not necessarily show that the values of earnings are negative, but rather that persistent earnings are, on average, less recurrent. The higher standard deviation than the mean and median (0.45 > 0.22 and 0.25) is the effect of the oscillation in the companies' earnings per share. From jointly analyzing the values above Q 3/4 (0.62) and below Q 1/4 (-0.16), it is observed that the mean of values in the last quartile is 0.77 against a mean of -0.38 for the first quartile. Thus, in general, considering the absolute values, the companies present less persistent earnings, but the more persistent values are, on average, higher than the least persistent ones.
The Performance variable presented (i) a mean and median value of 0.08, (ii) a relatively low variance between the extreme points (minimum = -0.09 and maximum = 0.24), and (iii) a low variance in relation to the mean (SD = 0.07). Analyzing them altogether, the positive and symmetrical value in relation to the mean is consistent with the general notion of a "good" result. Because of this, managers have incentives to disclose more readable information, to highlight the companies' "good" current result.
In relation to the control variables, it is observed that: (i) on average, the companies in the sample present high liquidity (Liquidity, mean = 0.54); (ii) there are indications that there were interruptions and/or accentuated changes in performance over the years analyzed (Volatility, mean = 0.03, standard deviation = 0.04); (iii) the mean level of indebtedness is 0.63, reflecting a debt structure with a tendency for external financing rather than self-financing; and (iv) the Size and Age measures have an apparently symmetrical distribution with dispersion and extreme values in acceptable parameters, showing that they are correctly specified.
Panel B of Table 3 provides the statistics of the main interest variables by reference benchmark type. We chose to segregate the benchmark into negative and positive for comparison purposes. It was found that 366 (348) of the companies, which represents 51% (49%) of the sample, present negative (positive) year-to-year variations for the benchmark measure. This result is reflected in the low variation of the sum and mean values of the main interest variables. The variability and proportional distribution enable the isolated effect on the readability measures to be identified, of both the negative and the positive benchmark, with the study focusing on analyzing the negative variations of the reference values.
Panel C of Table 3 compares the descriptive statistics of the main interest variables in relation to harmonization with IFRS. Altogether, the results indicate an improvement in readability and a reduction in company earnings, when comparing the pre-and post-IFRS periods. These findings are partially consistent with the predictions that harmonization with IFRS enabled an improvement in the informational environment. It is worth noting that the univariate comparisons ignore other factors that impact the relationships investigated. Moreover, the results were shown to be sensitive to the difference of means or medians tests employed.
Panel D of Table 3 presents the Pearson's correlation matrix. Ignoring the specificities of the estimates for the same construct, the coefficients of the correlations between the dependent variables and the main interest variables are within acceptable parameters -lower than 0.8, according to Gujarati and Porter (2011). The coefficients are also not considered high when comparing the associations between the independent variables. Specifically, the ModFlesch, ModFog, and Simple Writing readability measures are generally positively (negatively) associated with the Persistence and Performance (benchmark) earnings measures. This suggests that the reports are less complex for the companies that present persistent and positive earnings, and more complex for the ones that did not achieve a reference benchmark.
In relation to the proposed measure, it is observed that the Simple Writing variable presents a low value of association with the modified readability measures (ModFlesch = coef. 0.41, p-value < 0.01 and ModFog = coef. 0.34, p-value < 0.01). This result shows that the proposed readability variable covers different linguistic elements to those employed to calculate the modified measures. Table 4 presents the estimates and re-estimates of model (1) that show the impact of earnings and harmonization with IFRS on the readability of the Results Analysis section of the Management Report.

Analysis of the estimates and re-estimates of the parameters of model (1)
Non-metallic minerals and mining, (7) Oil and gas, (8) Chemicals, (9) Steelmaking and metallurgy, (10) Telecommunications, (11) Textiles, (12) Vehicles and parts and Transport and services. According to Gujarati and Porter (2011) 4.1.1.1 Impact of the earnings measures (persistence, performance, and reference benchmark) on readability H 1a affirms that readability is greater in companies with persistent earnings. The results of the Persistence variable support this hypothesis. Its coefficient is positive and significant for all specifications of the model (column I = coef. 0.011, p < 0.01; column II = coef. 0.393, p < 0.05; and column III = coef. 50.417, p < 0.01). These findings are consistent with the notion that companies with persistent earnings have less complex reports, because the managers publish reports with better readability to indicate the recurrent earnings to the market. These findings add to the international discussions that emphasize that persistent earnings are sought (Francis et al., 2004;Kormendi & Lipe, 1987;Penman & Zhang, 2002) and they complement the studies of Li (2008) and Souza et al. (2019).
Consistently with H 1b , which affirms that readability is greater in companies with better performance, the Performance variable is positively related with the ModFlesch and ModFog readability measures, with a coefficient of 0.078, p < 0.01 and coefficient of 3.696, p < 0.01, respectively. This means that, in the case of satisfactory performance, the managers opt to present less complex reports. This improvement in readability enables the information for decision making to be more easily extracted. This finding contributes to the academic debate that managers have incentives to improve the readability of their reports, to signal a positive result (Rutherford, 2003;Smith & Taffler, 1992), as well as providing additional support to the empirical studies of Dempsey et al. (2012), Holtz and Santos (2020), Li (2008), Lo et al. (2017), and Souza et al. (2019).
The counterintuitive result for the Performance variable is the negative coefficient in column (III). This adverse result should be interpreted with caution: first, statistical significance is not observed for the Performance variable; second, the coefficients of the variable, when it explains the Simple Writing measure, are sensitive to alterations in the estimators employed; third, the underlying idea of the proposed model is to measure the direct effect of earnings on the readability of reports. However, it is possible that other characteristics at the company level could explain how much performance influences readability, such as earnings management practices, as investigated by Lo et al. (2017), and financial and institutional characteristics at the company level, which is the object of study of Ajina et al. (2016) and Loughran and McDonald (2014).
Consistently with the prediction of H 1c , the reference value measure, benchmark, is negative and statistically significant in all specifications (column I = coef. -0.005 p < 0.10; column II = coef. -0.288, p < 0.10; and column III = coef. -4.100, p < 0.10), suggesting that the companies that did not exceed their previous performance publish more complex reports. This means that, in the case of a negative benchmark, managers prefer not to improve the readability of their reports. This result is aligned with the research of Lo et al. (2017), in which an increase in textual complexity is associated with a negative earnings benchmark.
In summary, the hypotheses derived from the relationship between the earnings construct and readability measures, H 1a , H 1b , and H 1c , should be interpreted separately. Hypotheses H 1a and H 1c are supported for all specifications of model (1). On the other hand, H 1b is only validated in the models with the modified readability measures (ModFlesch and ModFog).

Impact of the harmonization with IFRS measure on readability
The result for the IFRS variable is positive and significant when related to the modified readability measures (column I = coef. 0.018, p < 0.05 and column II = coef. 0.668, p < 0.10). According to the postulations of H 2 , there are indications that the readability of reports is greater after harmonization with IFRS. This finding is aligned with the general notion that harmonization improves the informational environment of companies (Horton et al., 2013) and specifically enables improvements in the readability of reports, as observed by Boubaker et al. (2019), Cheung and Lau (2016), and Lang and Stice-Lawrence (2015). Yet, there is no confirmation for H 2 regarding the relationship between IFRS and the proposed variable, Simple Writing. This result could be interpreted through two sets of optics: through the first, the harmonization process enabled, even if indirectly, an improvement in the basic lexical elements of the text (length of sentences and syllables), but it did not help in modifying more subtle textual elements, which are modeled in the Simple Writing variable; through the second, the Management Report is not typically accounting in essence, as it is produced by a manager, with a focus on company management and performance, which leads to the understanding that the report could be written without observing the prevailing accounting standards.

Impact of the control variables on the estimates of the parameters of model (1)
For the control variables, a negative and significant relationship is observed between the liquidity of the shares and the ModFlesch and Simple Writing readability proxies, with a coefficient of 0.007, p < 0.05 and coefficient of -7.753, p < 0.01, respectively. The liquidity measure is imprecise by nature, and the existing literature did not apply the Liquidity variable directly as a predictor of readability. The opposite sign to what was expected may be explained by the form of operationalization of the variable and the type of report used.
The company size measure has a positive and significant relationship with ModFlesch (coef. 0.004, p < 0.05) and Simple Writing (coef. 4.527, p < 0.01). Unlike the previous studies, company size is positively reflected in readability, indicating that bigger companies publish less complex reports. One explanation for this may be derived from the external monitoring dimension of the political costs hypothesis of Watts and Zimmerman (1986): bigger companies are more closely monitored by the interested parties, and these may reward (punish) companies with less (more) complex reports.

Additional Analyses
This section defines additional analyses and applies sensitivity tests to support the main analysis, as follows in Table 5.  Table 5 shows the relationship of Persistence and Performance, as well as the intensity of these measures, with the probability of occurrence of high readability. Columns I and IV help in validating hypotheses H 1a and H 1b , by showing that the probability of occurrence of reports with high readability is related to persistent (coef. 0.948, p < 0.01) and positive earnings (coef. 4.521, p < 0.01). Column III suggests that the disclosure of reports with high readability is concentrated in companies with high persistence (coef. 2.925, p < 0.01). On the other hand, the coefficient of the High Performance variable, column IV, does not confirm the aforementioned relationship.
In summary, the evidence is consistent with the idea that companies present reports with greater readability when their earnings are persistent and positive. They also contribute to understanding that companies with high persistence have more incentives to present reports with high readability. In contrast, no significant evidence is found that managers elaborate less complex reports in the presence of performance classified as high intensity.

CONCLUSION
The demand for qualitative and quantitative information contained in accounting statements and in the reports elaborated by management requires a better understanding about their communication. Based on the preparer's perspective, managers may prepare reports for the purposes of anticipating investors' reactions, with the aim of actively indicating a positive result or hiding a poor one. Embedded in this discussion, this research aimed to evaluate the effect of company earnings and harmonization with IFRS on the readability of Management Reports in the Brazilian stock market.
The criticisms of the traditional formulas motivated the modifications and the elaboration of an alternative readability measure. The modified readability measures (ModFlesch and ModFog) and the one elaborated with the help of the Coh-Metrix-Port tool (Simple Writing) appear to better capture textual complexity, given that the traditional measures classify information as more complex, when it is not. Fundamentally, employing the traditional measures in the statistical tests would produce biased results, because the aforementioned measures underestimate the capacity of the information users. The conclusions can be summarized as follows.
First, the Management Reports of companies with persistent and positive earnings generally present better readability, that is, less textual complexity. In particular, the effect is confirmed for the intensity of earnings, as reports of companies with high persistence are more likely to present high readability. As persistent and positive earnings are sought, managers apparently signal these results to the market. This may, for example, reduce informational asymmetry in the market, as less complex reports are less costly for extracting relevant information.
Second, when companies do not exceed the gains of the previous year, that is, their reference benchmark, the score of the readability indices decreases. Thus, there is evidence that managers may reduce the readability of their reports with the aim of obfuscating information, when their companies do not exceed the previous year's earnings.
Third, the results indicate an increase in the readability of the reports in the post-IFRS period. Although it could be argued that the IFRS accounting standards have increased accounting complexity (Morais, 2020;Pawsey, 2017), for example in areas such as financial instruments, pensions, the impairment test, and stockbased compensation (Ernst & Young, 2006), the reports do not reflect the negative effects of this complexity, but instead reveal significant improvements in the informational environment.
We recommend that future studies relate the modified readability indices and the measure elaborated with variables at the company and market level. Also, new studies could use the discussions raised in order to elaborate a dictionary of corporate language, for example, presenting technical jargon and their definitions. This, together with the analysis of readability indices, would help in the convergence of a more informative language.