1. Introduction
It is a well known fact that volatility plays a central role in modern finance theory, whether in assets pricing and valuation, portfolio allocation or risk management, among other areas. To many investors, volatility is a synonym for risk: They have a certain level of risk they can, or are willing to, bear on their portfolios. An accurate forecast of their assets' volatility and correlations is one of the crucial conditions in order to correctly assess those portfolios risk levels.
A volatility model is usually expected to forecast not only the absolute forecast of returns, but the whole density of those future returns. The volatility forecasting theme has always been problematic for both academics and practitioners: Unlike other metrics, volatility is not directly observable in the market, what poses an initial problem when assessing a volatility forecasting model performance. Furthermore different users (e.g., traders, risk managers, policy makers or academics) can have different understandings of volatility, and those disparities may "arise from differences in how volatility affects their trading strategies, and in how they understand the fundamental mechanism of security valuation in a financial market". Therefore, we can assume that there is no such absolute better volatility estimation method, since this may differ for different uses.
When it comes to volatility modelling, there is an enormous amount of literature and research published. Among those, the AutoRegressive Conditional Heteroskedastcity (ARCH) class models and their numerous ramifications, occupy a special place. Originally presented in the seminal work of ^{Engle (1982)} to describe UK inflationary uncertainty, this class of models was later widespread used in characterizing time-varying financial market volatility. Particularly one of the most important extensions, the Generalized AutoRegressive Conditional Heteroskedasticity (GARCH) model, originally presented in ^{Bollerslev (1986)}, and its subsequent developments, proved to be fairly robust, becoming very popular among academics and practitioners.
The main objective of this work is to perform an empirical analysis to identify if GARCH family based volatility forecasting models applied to USD-BRL currency can benefit from using exogenous variables, namely: implied volatility, represented by the FXVol index calculated by the Brazilian exchange BM&FBovespa; and realized volatility, through intraday high frequency data. In addition to that, we also compare those models performances against more naive volatility estimative techniques, such as historical standard deviation of returns and the exponentially weighted moving average (EWMA) model. The evidence found in this study supports the conclusion that both, implied and realized volatilities, could provide useful information when modelling volatility, at least for the local USD-BRL currency market. The model with these variables outperformed, with statistical significance, all other models analyzed in this study, including standard GARCH family models.
The foreign exchange derivatives market in Brazil is growing very fast. According to BM&FBovespa data, derivatives that have Foreign Exchange as underlying asset occupy the second largest position, after interest rate derivatives, in terms of notional value: roughly USD 400 billion, representing 26% of the total market. In spite of this, there is a relatively small presence of studies investigating this asset volatility in local exchange market. In fact, as far as it was possible to verify, unlike studies in North American stocks market, where the VIX is widely used as representative of implied volatility, there are no studies done in Brazil using as implied volatility measure the index calculated by BM&FBovespa. Therefore, we are confident that our work is original and will be relevant not only for other academics, but also for practitioners.
2. Theoretical and Literature Review
Volatility, in a broad view, can be defined as "the spread of all likely outcomes of an uncertain variable". Given the importance of the subject for several areas in finance, it is not surprising that volatility modelling and forecast has drawn a lot of attention from academics and practitioners through the last decades, fostering the development of several models. Poon and ^{Granger (2001)} provide a comprehensive review of several of those studies at the time the paper was published. The authors classify the models in two major families: Time-series volatility forecasting models, which encompasses those models that use historical data to formulate forecasts, including auto-regressive models and stochastic volatility models, among others. A second one, called option based volatility forecasting models, are those that use the information embedded in traded options to infer the underlying asset volatility. Nowadays, we could also cite models based on non-parametric methods, as well as those based on neural networks algorithms, although these models are out of the scope of this article.
2.1. Realized Range
One set of methods to estimate volatilities is based on daily amplitude, or range, as described, for example, in ^{Yang and Zhang (2000)} and ^{Bollen and Inder (2002)}. Those can be defined as the difference between the highest and the lowest natural logarithm prices over a sample interval. For a trading interval t, let's denote O_{t} as the opening price, C_{t} as the closing price, H_{t} as the highest price, and L_{t} as the lowest price. We can then define:
^{Parkinson (1980)} developed an estimato r that is based only in opening, maximum and minimum prices, and can be defined as:
As ^{Yang and Zhang (2000)} highlight, this estimator is valid in the case there are no jumps, what means the asset being continuously negotiated.
^{Garman and Klass (1980)} developed an estimator that they claim to be more efficient than the one developed by ^{Parkinson (1980)}, which also considers the daily return, as described below:
Both those estimators rely on the assumption that 𝜇 = 0, what means there is no drift in price evolution process. ^{Rogers et al. (1994)} presents an estimator that overcomes this limitation; however, as in Equation 2, it is valid in the case where there are no jumps:
Aiming to provide an estimator that overcome both those limitations, regarding drift 𝜇 and opening jump, ^{Yang and Zhang (2000)} presented an estimator based on multiple period data, with the following form:
where 𝜎^{2}_{(RS)} is given in 4, and 𝜎^{2}_{o}, 𝜎^{2}_{c} and k are defined as:
with
2.2. Realized Variance
^{Andersen and Bollerslev (1997)} were the first to propose the sum of squared intraday returns as an estimator for the actual volatility. We can represent intraday returns as:
where P_{t,d} is the last price observed at timestamp d at date t, r_{t,d} is the return at date t in the d − th interval, reflecting the price variation from timestamp d − 1 to d, at date t. Hence, the realized volatility as a sum of squared returns (SSR) would be:
In their original work, they defined realized variance (RV) as the sum of 288 5-minutes squared returns, and found evidence that the performance of ARCH-based models increases with intraday data being used. Similar results were presented in ^{Andersen et al. (1999)} and ^{Andersen et al. (2001)}. It was also pointed that, due to characteristics of financial time-series, intervals shorter than 5 minutes are usually polluted with serial correlation, consequence of microstructure noisy effects. It was determined that a interval between 5 and 30 minutes is generally a good balance between accuracy of the continuous record underlying realized volatility measures, and long enough such that the influences from market microstructure are not overwhelming (^{Torben G. Andersen et al. 2001}).
In case of markets in which assets negotiation does not occur continuously, such as equities or non-global currencies, how to incorporate overnight return deserves some notes. ^{Hansen and Lunde (2005a)} discuss this matter and proposes three different approaches to integrate the overnight returns with trading hours data and obtain a measure of daily integrated variance. The first alternative to integrate the overnight returns would be simply add those to the estimator defined in 13, as below:
where the two terms can be interpreted as estimators of integrated variance, the first one representing the inactive period and the second for the active market period. As highlighted in ^{Hansen and Lunde (2005a)}, an obvious drawback of this measure is that the squared overnight return, o^{2}_{t}, is a noisy estimator for the non-trading hours, similar as r_{2}^{t} fails as estimator for the daily integrated variance.
A second approach would be to scale the estimator defined in Equation 8, calculated using trading hours data, by a factor in such a way that the resulting estimator has the correct expected value. This can be represented as:
This approach was used in several studies, such as ^{Martens (2002)}, ^{Koopman et al. (2005)} and ^{Hansen and Lunde (2005b)}. ^{Hansen and Lunde (2005a)} and ^{Martens (2002)} propose ways to estimate the factor 𝜍̂.
To develop the third alternative, ^{Hansen and Lunde (2005a)} work on the idea that both previous estimators, 𝜎̃^{2}_{(+ON),t} and 𝜎̃^{2}_{(SCALE),t} are linear combinations of o^{2}_{t} and 𝜎̃^{2}_{(SRR),t}, with weights (1,1) and (0,̃) respectively. Hence, those can be expressed as:
with 𝜛_{1} and 𝜛_{2} chosen accordingly (for example, minimizing a given error measure). ^{Hansen and Lunde (2005a)} provide a closed-form solution for those weights 𝜛 as below:
where p is a relative importance factor, defined as:
and 𝜈_{0} ≡ E(𝜎̃^{2}), 𝜈_{1} ≡ E(o^{2}), 𝜈_{2} ≡ E(𝜎̃^{2}_{(SSR),t}), 𝜂_{21} ≡ var(o^{2}_{t}), n^{2}_{2} ≡ var(𝜎̃^{2}_{(SSR),t}), and 𝜂_{12} ≡ cov(o_{t}^{2}, 𝜎̃^{2}_{(SSR),t}).
In practice, the authors propose to estimate those parameters through calculating simple sample averages, and can be rewritten as:
2.3. Autoregressive Conditional Heteroscedastic Family Models
Originally developed by ^{Engle (1982)}, the Autoregressive Conditional Heteroscedastic (ARCH) model estimates the conditional variance 𝜎̂_{t}^{2} as a function of lagged squared past returns r_{t-i}. We can write an ARCH(q) model as below:
where 𝜎_{t}^{2} is the conditional variance; z^{t} is a sequence of independent and identically distributed (iid) random variables with mean 0 and variance 1 (often assumed to follow a normal Gaussian or a Student-t distribution); 𝜔 > 0 and 𝛼_{i} ≥ 0, for all i > 0, in order to assure that 𝜎_{t} > 0.
Since volatilities series present a high persistence, too many parameters are often needed when building an ARCH model, what might lead to higher estimation errors (^{Tsay 2005}). To overcome this problem, ^{Bollerslev (1986)} proposed an extension, known as generalised ARCH (GARCH) model, that according to the author "allows a much more flexible lag structure". In GARCH models, conditional variance 𝜎_{t}^{2} is not only a function of squared past returns, r^{2}_{t-i}, but also of previous conditional variances. Given that r_{t} and 𝜀_{t} follow Equations 15a and 15b, the GARCH(p, q) model can be written as:
where 𝜔 > 0, 𝛼_{i} ≥ 0, 𝛽_{j} ≥ 0 and ∑^{q}_{i=1} 𝛼_{i} + ∑^{p}_{j = 1} 𝛽_{j} < 1. The later restriction implies that the unconditional variance of r_{t} is finite, therefore making the model to be stationary.
ARCH and GARCH models still lack in handling the asymmetric effects between positive and negative returns. ^{Glosten et al. (1993)} proposed a model that addresses this issue, through applying a threshold component to the r^{2}_{t-i} term. It is known as GJR model, from the authors names. A GJR(p, q) model assumes the form:
where 𝜔 > 0, 𝛼_{i}, 𝛽_{j} and 𝛾_{i} are non-negative parameters that satisfy similar conditions as those of GARCH models, and N_{t-i} is an indicator for negatives r_{t−i}, that is:
^{Nelson (1991)} proposed another alternative model to capture asymmetric effects of asset returns on volatility processes, the exponential GARCH model. The eGARCH(p, q) model can be expressed as below:
One of the advantages when using the eGARCH model is that the coefficients non-negativity restrictions are relaxed, which simplifies the model estimation procedure.
2.4. Implied Volatility
Previous works suggest that implied volatility (IV) can provide accurate estimators of the underlying asset price volatility. Some authors argue that conditional on observed options prices and a valuation model, the expected IV should represent a market's best prediction of a given asset future volatility, otherwise arbitrage opportunities would emerge (^{Jorion (1995)} and Poon and ^{Granger (2001)}, to mention two examples). Its forward-looking nature is intuitive and different than historical backward looking models, making it a natural proxy for future volatility. However, the use of IV is subject to a series of definitions that, once incorrectly made, act as pitfalls that might undermine a model performance, as mentioned in ^{Blair et al. (2001)}.
In order to mitigate these potential issues, several works such as ^{Fleming et al. (1995)}, ^{Blair et al. (2001)}, ^{Koopman et al. (2005)}, ^{Becker et al. (2006)} and ^{Becker et al. (2007)}, use the data of implied volatility indices. According to ^{Fleming et al. (1995)}, the idea of creating a volatility index that reflects the options market quotes has emerged briefly after the introduction of exchange traded options in 1973.
In the case of the studies previously mentioned, the index used was VIX, calculated by the Chicago Board Options Exchange (CBOE). The VIX aims to reflect the 30-day expected volatility of the S&P 500 Index embedded in options prices. The components of the VIX calculation are near-the-money and next-term puts and calls options, with more than 23 days and less than 37 days to expiration. The detailed methodology for calculating the VIX can be found in CBOE (2014), as well as detailed numerical examples.
In Brazil, the BM&FBovespa exchange created a local index, to reflect the implied volatility present in the foreign exchange market of USD BRL, denominated FXVol. The index is calculated based on the listed dollar options contracts, negotiated at BM&FBovespa. It reflects the implied volatility for the next 21 working days period, and when the options maturity does not match this period, an interpolation is performed.
The index is calculated using as framework the variance swaps pricing methodology, through a replicating portfolio, and it undertakes no assumption regarding a model for options pricing, except that the underlying asset price evolves in a diffusive (continuous) process, hence with no jumps. Therefore, it claims to be model independent. A complete demonstration of how it is calculated can be found at ^{Dario (2007)} and a detailed description of the variance swaps methodology can be seen in ^{Dario (2006)}. The index FXVol is calculated on a daily basis,but it is expressed in annualized terms, considering a year with 252 business days.
We have opted not to use another methodology as a proxy for the implied volatility due to several reasons. To the best of our knowledge, there is no other methodology empirically proved to work better in Brazil. Notice also that the FXVol index is able to capture the volatility smile trend, which means that it incorporates more realistic caudal probabilities when compared to implied volatilities originated from at-the-money derivatives. Another reason for this choice is that the data is easily available, with no need for further computations. Finally, the FXVol index methodology is independent of our research and provided by a renowned institution.
2.5. Empirical Literature Review
The topic is extremely debated in the international literature. For example, Poon and ^{Granger (2001)} analyzed 93 published papers that assessed the performance of several volatility models through different asset classes. Most of the literature produced on that matter is devoted to stocks market, seconded by exchange rates. In general, the authors concluded that among the analyzed studies, there were evidence in favor of implied volatilities based models forecasts, when compared pairwise against historical volatility and GARCH models, including currencies markets. However, the results of combination of forecasts were mixed. Some examples of previous international works discussing GARCH, RV and IV models forecasting power include ^{Christensen and Prabhala (1998)}, ^{Becker et al. (2009)}, ^{Koopman et al. (2005)}, ^{Mixon (2009)}, ^{Jorion (1995)}, ^{Blair et al. (2001)}, and ^{Bentes (2015)}, just to cite a few. The results, in general, go in favor of RV and IV based models.
When it comes to the Brazilian market, there are also several studies on that topic. ^{Andrade and Tabak (2001)} investigates the relationship between USD-BRL exchange rate implied volatility, obtained from a backward induction process, using a Garman-Kohlhagen model and options closing prices, and subsequent realized volatility. The authors found strong evidence that IV contains information about future realized volatility, for both realized volatility measures used. Nevertheless, in line with other studies, the authors found also evidence that IV was an upward-biased estimator of future volatility. ^{Moreira and Lemgruber (2004)} investigates the use of high frequency data in the estimation of daily and intraday volatility, in order to compute value at risk forecasts for the IBOVESPA. The author concluded that use of intraday data for obtaining forecasts of daily volatility is feasible and it presents good results.
^{Chang et al. (2002)} examines the relation between USD-BRL exchange rate implied volatility, obtained from option prices, and subsequent realized volatility. The authors investigate whether implied volatilities contain information about volatility over the remaining life of the option which is not present in past returns. Their evidence suggested that implied volatilities gave superior forecasts of realized volatility if compared to GARCH(p,q), and Moving Average predictors, and that econometric models forecasts do not provide significant incremental information to that contained in implied volatilities.
^{Mota and Fernandes (2004)} examine the forecast for Ibovespa volatility comparing those obtained from ARMA models, using Garman-Klass estimators, with those forecasts obtained from GARCH models, using as benchmark the realized volatility obtained from 15 min intraday data. The authors concluded that those forecasts obtained from GK-ARMA models are in general as good as those obtained from more complex GARCH models. More examples of previous literature in Brazil are Santanda and Bueno (2008), ^{Woo et al. (2009}), ^{Mendes and Accioly (2012)}, ^{Maciel (2012)}, and ^{Vicente et al. (2012)}.
Perhaps the most related work to this study is ^{Accioly and Mendes (2016)}: they investigate if the inclusion of realized range variables in GARCH and eGARCH models for the local Brazilian stocks market adds information and improves the volatility forecasts. They used different definitions of realized range on eight different stocks. The authors found evidence in favor of the extended models, bringing information to the volatility process, with significant persistence reduction. We give a step forward working in a different market (foreign exchange) with also the inclusion of the foreign exchange implied volatility index (FXVol) as a forecasting variable.
3. Data and Methodology
3.1. Data
There are three main types of inputs used in this work, as follows: Brazilian reais exchange rate against United States dollar (USD-BRL) daily quotes, USD-BRL intraday prices and daily foreign exchange implied volatility index (FXVol). Due to data availability, the source for USD-BRL daily and intraday quotes used was the Bloomberg database, which contains consensus quotes formed from the information provided by multiple brokers operating in the FX market. The following sections describe in further detail each one of them.
3.1.1. Daily USD-BRL Data
The daily data used in our study consists of USD-BRL currency quotes, where USD is the base currency: In other words, the quotes convention used in this work represents which BRL amount, the quote currency, is equivalent to one unit of USD, the base currency. The period is from January 2nd, 2009 to June 30th, 2015. Given the very low liquidity and, sometimes, absence of data, we had considered only business days in which BM&FBovespa has operated. Also, due to the complete absence of intraday data on seven days, we have decided to remove these dates from the database as well. In the end, we work with a total of 1,600 trading days.
For each date t, four prices were obtained: Open, Close, High and Low (respectively O_{t}, C_{t}, H_{t} and L_{t}). Those were captured from Bloomberg database, ticker <USDBRL Curncy>, PX_OPEN, PX_LAST, PX_HIGH and PX_LOW quotes. Figure 1 illustrates the closing prices evolution through time.
The choice of the period after 2008 is due to two main reasons. First, the intraday data available in Bloomberg does not goes much time beyond the horizon which we are analyzing. Second, after the mortgage crisis and Lehman's debacle in 2008, there were major changes in the way market participants model, evaluate and monitor financial risks. Therefore, including data prior to this period would increase the chance of noisy results, due to structural breaks in models.
3.1.2. Intraday Prices
The intraday data set was obtained from Bloomberg system, using the screen GIT for the same <USDBRL Curncy> ticker used when getting daily data. It consists of USD-BRL quotes with 5-minutes intervals between 9:00 and 18:00 BRT, encompassing, when a complete daily data set is available, 108 intraday returns observations, and a total 172,601 individual intraday observations: In some dates (242 of the total 1,600), there were some gaps in Bloomberg data, and not all 5-minutes quotes were available. In the cases those gaps occurred between daily first and last quote of the day, we have chosen to interpolate the available data in order to fulfill the gaps. After this procedure, only 28 dates had less than 108 intraday timestamps in our database (days where the first available quote is later than 09:00 or the last one is earlier than 18:00).
The option for the 5-min intervals is based on previous studies, which consider intervals between 5 and 30 minutes as the optimum point that balances the accuracy of the observations with any microstructure problems that can raise from lower frequencies, such as inappropriate auto-correlation in the series of intraday returns (e.g. ^{Andersen et al. (1999)}, ^{Andersen et al. (2001)}, ^{Martens (2001)}, ^{Blair et al. (2001)}, ^{Mota and Fernandes (2004)}, ^{Koopman et al. (2005)}, among several others).
3.1.3. Implied Volatility
The implied volatility data used in this study is the FX Volatility index (FXVol), calculated by BM&FBovespa and previously described in section 2.4.
The historic series for the index (FX Vol) was downloaded from BM&FBovespa website, from the Information Recovery System. Figure 2 illustrates the evolution of the index for the period we are analyzing. There were eight dates in the period analyzed for which BM&FBovespa did not have the FXVol index information available: We have then decided to linearly interpolate those values using the available data series, in order to fulfill the gaps.
3.2. Analysis of Daily Returns
Figure 3 illustrates the daily returns for USD-BRL currency closing prices during the period analyzed. Figure 3 suggests that the daily close-to-close return series of USD-BRL currency presents some level of clustering and resembles stationary.
When we perform the augmented Dickey-Fuller test (ADF) in our daily returns database, we have a significant result (p-value: 0.1%; statistic: -43.0585), which supports the stationarity hypothesis of daily returns, as expected.
The autocorrelation (ACF) and partial autocorrelation (PACF) functions of squared returns series, r_{t}^{2}, indicate relevant serial dependence, as shown in figure 4, suggesting that the series presents conditional heteroscedasticity. Conducting Engle's test for heteroscedasticity on the residual series (𝜀_{t} = r_{t} − ^{r}̄), using as alternative hypothesis a 2-lag model, we have a result that strongly rejects the null hypothesis of no ARCH effects on the series, with a statistic value of 174.4, in favor of an ARCH(2) alternative, which supports the adoption of a GARCH(1,1) model in our analysis.
3.3. Realized Volatility
In this section we will describe the steps followed to obtain our estimative of realized variance, 𝜎̃^{2}, and consequently realized volatility, 𝜎̃, to be used as an input in our models and as a benchmark to assess those models forecasting accuracy.
Initially, in figure 5 we can see the historical series of the sum of squared intraday returns, 𝜎̃^{2}_{SSR}, as described in Equation 8.
However, as we previously discussed in section 2.2, the USD-BRL Currency market is not continuously traded, therefore, some sort of adjustment would be needed to use those numbers. Hence, we calculated the realized variance following ^{Hansen and Lunde (2005a)} proposal, as described in Equation 11. Henceforth in this work, aiming to simplify the notation, unless otherwise noticed, every time we refer to realized variance, 𝜎̃^{2} or realized volatility, 𝜎̃, we will be referring to 𝜎̃^{2}_{𝜛} and 𝜎̃_{𝜛}, respectively.
3.4. Models to Be Compared
The first estimate will be simple calculation the standard deviation (SD) of daily returns, as an estimate for the asset volatility as below:
where 𝜎̂^{2}_{(SD)} represents the conditional variance forecast for date t and r_{t} is the daily return as of date t. The choice of n is quite arbitrary, and as highlighted in ^{Mendes and Duarte (1999)}, it can impact impact the results obtained. In this work, we compute 𝜎̂^{2}_{(SD)}, t with a moving window with the last 20 business days, similar as done in ^{Chang et al. (2002)}. We had tested other moving window sizes and, although the results are not precisely the same, they do not impact the overall qualitative results found in this research.
The second estimative will be calculated using an exponentially weighted moving average (EWMA) forecasting model, as follows:
where 𝜆 will assume the value of 0.94, as commonly used in the literature such as in ^{Moreira and Lemgruber (2004)}, among several others.
Similar as ^{Blair et al. (2001)}, ^{Koopman et al. (2005)}, ^{Becker et al. (2007)} and ^{Accioly and Mendes (2016)}, we will use standard GARCH models, including asymmetric returns, realized volatility and implied volatility components on its most comprehensive version. Due to stability problems when estimating models' coefficients, we had chosen to include the exogenous variables (i.e., implied volatility and realized variance) only in eGARCH model version. The general specification of the models is given as follows:
Here N_{t−1} assumes value of 1 when 𝜀_{t−1} < 0, otherwise is zero; 𝜎^{2}_{t-1} is the proxy of realized variance, measured from 5-min returns from USD-BRL currency quotes, as detailed in section 3.3 and IV_{t−1} corresponds to the daily implied volatility index, obtained from annualized FXVol volatility index.
Equation 21b represents the first two models we will estimate:
I. A GARCH(1,1) model, (GARCH): 𝛿 = 𝜑 = 0.
II. A GJR(1,1) model, (GJR).
Through placing constraints on Equation 21c coefficients, we can represent the remaining 4 variance models we will estimate, as described below:
III. An eGARCH(1,1) model (eGARCH): 𝛿 = 𝜑 = 0.
IV. An eGARCH model that includes realized variance as an exogenous variable (eGARCH + RV ): 𝜑 = 0
V. An eGARCH model that includes implied volatility as an exogenous variable (eGARCH + IV ): 𝛿 = 0
VI. An unrestricted model, that considers both, realized variance and implied volatility, simultaneously (eGARCH + RV+ IV).
The general parameter vectors 𝜓_{k,t}, for each k model at date t, is 𝜔, 𝛼, 𝛽, 𝛾, 𝛿, 𝜑, and those are estimated by maximum likelihood methodology, using the R software and the rugarch package, which is documented at ^{Ghalanos (2018)}.
3.5. Forecasting
Following the same criteria as in ^{Koopman et al. (2005)}, the forecasting study is carried based on a rolling window procedure. In each case, the volatility models presented in section 3.4 are estimated 400 times, based on 400 different samples of 1,200 days of observations.
Our first sample window starts on January 2nd, 2009 and ends on November 13th, 2013. Using this data, a forecast of volatility is generated for November 14th, 2013. The second sample starts on January 3rd, 2009 and goes up to November 14th, 2013, such that a forecast is generated for November 18th, 2013 (next business day in Brazil), and so on.
The volatility forecasting evaluation process is an important matter to determine whether a model k is superior to other. This depends on the choice of both a proper volatility benchmark and a proper comparison criterion, which are not obvious tasks. In the next subsections we discuss these two issues.
3.6. Volatility Benchmark
Volatility is not directly observable. Consequently, any forecasting error is not observable as well. As we have seen, in several works, the realized variance obtained from intraday returns is often assumed to be an appropriate proxy for the latent volatility process.
In this study, we will consider the realized volatility as the squared root of the realized variance, as detailed in section 3.3, calculated using the squared returns of 5-minutes intervals as our proxy for volatility in model parameter estimation, when applicable, and as our benchmark for model evaluation purposes.
As a robustness exercise, similar to the one performed in ^{Benavides and Capistrán (2012)}, we will run the evaluation process using, as an alternative benchmark, the estimator proposed in ^{Yang and Zhang (2000)} that uses daily range information and is described in section 2.1, Equation 5.
3.7. Comparison Criterion
Several evaluation criteria can be chosen to assess the predictive power or accuracy of a volatility model, and each one is subject to discussion. It is not obvious, as seen in several studies (e.g., ^{Bollerslev et al. (1994)} and ^{Hansen and Lunde (2005b)}), which loss function L would be more appropriate when evaluating volatility models.
^{Bollerslev et al. (1994)} criticizes the widespread use of mean squared errors, arguing that it would not properly penalize a given method that provides estimates of zero or even negative volatilities, which are clearly misleading and unacceptable. To mitigate this issue, the authors suggest the use of squared percentage errors, absolute percentage error or even the loss function implicit in the Gaussian likelihood, known as Quasi-Likelihood. On the other hand, ^{Koopman et al. (2005)} argues that measures like root mean squared percentage errors and mean absolute percentage errors attribute relatively less weight to forecast errors associated with high values of realized volatility, when compared to root mean squared error or mean absolute errors. Hence, both of these statistics groups are of interest on their own characteristics. For example, in case of value-at-risk applications, one may be more interested in higher volatility scenarios forecast accuracy than under lower volatility conditions.
Therefore, rather than making a single choice, we will use all the following loss functions in our analysis:
Finally, in order to perform a pairwise comparison between the forecasts and identify which model produces best out-of-sample results, we propose, based on ^{Accioly and Mendes (2016)}, to use a statistic presented in the seminal work of ^{Diebold and Mariano (1995)}, which aims to compare predictive accuracy, and we briefly describe it below.
Let's assume we have two 𝜎̂^{2}_{t} competing models, i and j. Hence, the h-step ahead forecast errors ε for the two models on a given forecast date t are:^{1}
It is assumed that the loss associated with a given model k forecast is a function g(ε_{k,t}), such as it (1) assumes value zero when no error occur; (2) is never negative and (3) must necessarily have a positive relation with errors. Typically, g(⋅) is the square or the absolute value of errors, and both those functions will be used in our study.
To determine if a given model predicts better than another, we may test the null hypothesis:
against the alternative hypothesis:
The loss differential between forecasts i and j at time t can be written as g(ε_{i,t+h|t}) − g(ε_{j,t+h|t}). Hence, in other words, the null hypothesis of equal predictive accuracy can be represented as:
The Diebold-Mariano test statistic S is then obtained from:
where d̄_{ij} = n^{-1} = Σ^{n}_{t=t0}
d_{i,j,t} stands for the number of loss differential observations and
4. Results and Discussion
4.1. In-Sample Results
Table 1 presents the estimated parameters values, p-values, log-likelihoods (lnL), Akaike information criterion (AIC) and Bayesian information criterion (BIC) for the models defined in section 3.4. Those results are based on observations for the full sample period, based on 6.5 years of data, between January 2nd, 2009 and June 30th, 2015. It comprises 1,600 daily return observations.
Models | ||||||
---|---|---|---|---|---|---|
_{GARCH} | _{GJR} | _{eGARCH} | _{eGARCH+RV} | _{eGARCH+IV} | _{eGARCH+RV+IV} | |
ω | 0.0107 | 0.0114 | -0.0036 | -0.0493 | -0.0485 | -0.0688 |
(0.004) | (0.001) | (0.357) | (0.023) | (0.074) | (0.210) | |
α | 0.0997 | 0.1317 | 0.0605 | 0.0682 | 0.0736 | 0.0751 |
(0.000) | (0.000) | (0.000) | (0.000) | (0.000) | (0.001) | |
β | 0.8916 | 0.8992 | 0.9783 | 0.9463 | 0.9486 | 0.9328 |
(0.000) | (0.000) | (0.000) | (0.000) | (0.000) | (0.000) | |
γ | -0.0839 | 0.1834 | 0.1527 | 0.1743 | 0.1562 | |
(0.000) | (0.000) | (0.000) | (0.000) | (0.000) | ||
δ | 0.0379 | 0.0310 | ||||
(0.026) | (0.177) | |||||
φ | 0.0355 | 0.0217 | ||||
(0.084) | (0.428) | |||||
InL | -2,044.63 | -2,033.96 | -2,034.81 | -2,029.17 | -2,030.41 | -2,028.31 |
AIC | 4,095.26 | 4,075.91 | 4,077.62 | 4,068.35 | 4,070.82 | 4,068.61 |
BIC | 4,111.40 | 4,097.42 | 4,099.13 | 4,095.24 | 4,097.71 | 4,100.88 |
The complete models specification is:
Some conclusions can be drawn from those results. Volatility persistence for all models are estimated as being significant and close to the unity. For example, in GARCH model we have 𝛼̂ + 𝛽̂ = 0.9913 and in GJR model 𝛼̂ + 𝛽̂ + 𝛾̂/2 = 0.9890, confirming earlier findings in literature, such as seen in ^{Koopman et al. (2005)}, ^{Blair et al. (2001)}, ^{Accioly and Mendes(2016)}, among others.
Another interesting observation is related to the non-linearity parameters, 𝛾̂, on GJR and eGARCH models. As we can see, those are significant even at a 1% significance level in all cases, and 𝛾̂ has relevant values, when compared to 𝛼̂, suggesting substantial asymmetric affects of returns on variance. Also, as we can see in Table 1, two of the expanded models proposed, eGARCH + RV and eGARCH + IV, present significant coefficients at a 10% significance level on the exogenous variables, 𝛿̂ and 𝜑̂, while in the unrestricted model, eGARCH + RV + IV, both exogenous variables coefficients are not significant at a 10% significance level.
When we look at our loglikelihood criterion, we can observe that the unrestricted model presents the best figure, as expected, given that this model is a generalization of the other eGARCH models. However, when we look the other two criteria, AIC and BIC , we can see an advantage to model eGARCH+RV when compared to the remaining, even though the differences are not sizeable.
4.2. Out-of-Sample Analysis
In the out-of-sample analysis, we compare one day ahead volatility forecasts, 𝜎̂_{k}, against the benchmarks as previously described in section 3.7, for the period between November 14th, 2013 to June 30th, 2015. Given the lack of previous studies covering both the same asset and the same metrics used in this work, it is difficult to be categorical about the results in absolute terms or even to compare the loss functions figures against other studies. Therefore, all comparisons and analysis will be made in relative terms, among the models built here.
Tables 2 and 3 presents the forecasts accuracy figures, introduced in section 3.7, using 𝜎̃ and 𝜎_{YZ} as benchmarks, respectively. Each table contains three sections. The first is composed by five columns and presents the loss functions numbers calculated, as previously described. The following five columns ranks each one of the models performance on each criterion separately, from 1 being the best to 8, as the worst performer given a specific criterion. The next two columns consolidate the ranks previously calculated, summing the relative positions, sum of Rankings, and ranking these summation results in the Final Ranking column. As a robustness check, we also calculated the z-scores for each model loss function result, given each loss function sample average and standard deviation, and obtained an average of those z-scores. The resulting ranking from those averages is similar to the one presented in column Final Ranking in both cases, without any significant change.
Models | Criterion | Ranking (1:Best, 8:Worst) | Sum of Rankings | Final Ranking | |||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
RMSE | MAE | RMSPE | MAPE | MdAPE | RMSE | MAE | RMSPE | MAPE | MdAPE | ||||
SD | 0.0196 | 0.0016 | 0.7369 | 0.5288 | 0.4110 | 8 | 8 | 2 | 2 | 5 | 25 | 5 | |
EWMA | 0.0188 | 0.0109 | 0.7171 | 0.5214 | 0.4028 | 4 | 5 | 1 | 1 | 3 | 14 | 2 | |
GARCH | 0.0188 | 0.0109 | 0.9335 | 0.6325 | 0.4278 | 7 | 4 | 6 | 6 | 6 | 29 | 6 | |
GJR | 0.0188 | 0.0110 | 0.9554 | 0.6461 | 0.4301 | 6 | 6 | 7 | 7 | 7 | 33 | 7 | |
eGARCH | 0.0188 | 0.0112 | 0.9821 | 0.6675 | 0.4336 | 5 | 7 | 8 | 8 | 8 | 36 | 8 | |
eGARCH+RV | 0.0185 | 0.0107 | 0.9275 | 0.6275 | 0.4100 | 3 | 3 | 5 | 5 | 4 | 20 | 4 | |
eGARCH+IV | 0.0185 | 0.0105 | 0.8714 | 0.5919 | 0.3901 | 2 | 2 | 4 | 4 | 2 | 14 | 2 | |
eGARCH+RV+IV | 0.0184 | 0.0103 | 0.8548 | 0.5795 | 0.3871 | 1 | 1 | 3 | 3 | 1 | 9 | 1 |
For a given model k, RMSE is defined as
Models | Criterion | Ranking (1:Best, 8:Worst) | Sum of Rankings | Final Ranking | |||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
RMSE | MAE | RMSPE | MAPE | MdAPE | RMSE | MAE | RMSPE | MAPE | MdAPE | ||||
SD | 0.0166 | 0.0110 | 1.1490 | 0.6654 | 0.4049 | 8 | 8 | 8 | 4 | 2 | 30 | 7 | |
EWMA | 0.0159 | 0.0106 | 1.1147 | 0.6566 | 0.3859 | 7 | 7 | 7 | 2 | 1 | 24 | 4 | |
GARCH | 0.0145 | 0.0101 | 1.0063 | 0.7016 | 0.4604 | 6 | 4 | 3 | 6 | 6 | 25 | 5 | |
GJR | 0.0144 | 0.0102 | 1.0314 | 0.7138 | 0.4819 | 5 | 5 | 5 | 7 | 7 | 29 | 6 | |
eGARCH | 0.0143 | 0.0103 | 1.0387 | 0.7270 | 0.4869 | 4 | 6 | 6 | 8 | 8 | 32 | 8 | |
eGARCH+RV | 0.0142 | 0.0101 | 1.0102 | 0.6987 | 0.4591 | 3 | 3 | 4 | 5 | 5 | 20 | 3 | |
eGARCH+IV | 0.0140 | 0.0097 | 0.9552 | 0.6579 | 0.4394 | 1 | 2 | 1 | 3 | 4 | 11 | 2 | |
eGARCH+RV+1V | 0.0140 | 0.0097 | 0.9562 | 0.6540 | 0.4296 | 2 | 1 | 2 | 1 | 3 | 9 | 1 |
For a given model k, RMSE is defined as
We can observe in Table 2 that, in terms of RMSPE and MAPE, EWMA and historical sd models present the best performance when compared to the GARCH models for this sample, with the eGARCH + RV + IV being the third top performer. However, when assessing their performance through all other criteria, the unrestricted model, eGARCH + RV + IV, and eGARCH + RV outperform all other models. In fact, based on the overall ranking shown in last column, eGARCH + RV + IV is the model that presents the most consistent performance, for the sample period analyzed, followed by eGARCH + IV and EWMA models.
When using the realized range, 𝜎_{YZ}, as benchmark, a slightly different scenario is observed. Once again, our top performer is still model eGARCH + RV + IV, however it is seconded now by models eGARCH + IV and eGARCH + RV, leaving EWMA behind, as shown in Table 3. In fact, we can see that EWMA relative performance in this case is much more irregular, given it was ranked as 7^{th} best in three out of five criteria, and is among the top performers in the remaining two, MAPE and MdAPE. Irrespectively of the benchmark, one conclusion arises: the inclusion of both exogenous variables improved the performance of all GARCH family models. In fact, as it can be observed from both tables, the three GARCH models are performing, on average, not much better than the simple historical SD measure.
Those results are completely in line with the results of Diebold and Mariano tests shown in Tables 4 and 5. As we can observe, when we consider the squared error as loss function (Table 4), model SD forecasts are significantly different than all other models, with 10% significance level, being outperformed by all of them. On the other side, we can see model eGARCH + RV + IV, which beats all other models at the same significance level, except for EWMA. Regarding EWMA model, it is worth noticing that, except for the SD model comparison, it is not possible to reject the null hypothesis that its errors are, on average, equal to the remaining models, which might undermine its apparent superiority shown in Table 2.
Models | j | ||||||||
---|---|---|---|---|---|---|---|---|---|
SD | EWMA | GARCH | GJR | eGARCH | eGARCH+RV | eGARCH+IV | eGARCH+RV+IV | ||
i | SD | 3.0828 | 2.0266 | 2.0996 | 2.0684 | 2.8190 | 2.9151 | 3.3131 | |
(0.002) | (0.043) | (0.036) | (0.039) | (0.005) | (0.004) | (0.001) | |||
EWMA | -3.0828 | -0.1437 | -0.0645 | -0.0160 | 0.6489 | 0.7166 | 1.0345 | ||
(0.002) | (0.886) | (0.949) | (0.987) | (0.517) | (0.474) | (0.302) | |||
GARCH | -2.0266 | 0.1437 | 0.3384 | 0.3989 | 2.8713 | 2.0918 | 3.2261 | ||
(0.043) | (0.886) | (0.735) | (0.690) | (0.004) | (0.037) | (0.001) | |||
GJR | -2.0996 | 0.0645 | -0.3384 | 0.2822 | 4.1574 | 2.2446 | 3.5315 | ||
(0.036) | (0.949) | (0.735) | (0.778) | (0.000) | (0.025) | (0.000) | |||
eGARCH | -2.0684 | 0.0160 | -0.3989 | -0.2822 | 4.2519 | 2.8433 | 4.2281 | ||
(0.039) | (0.987) | (0.690) | (0.778) | (0.000) | (0.005) | (0.000) | |||
eGARCH+RV | -2.8190 | -0.6489 | -2.8713 | -4.1574 | -4.2519 | 0.5827 | 2.1951 | ||
(0.005) | (0.517) | (0.004) | (0.000) | (0.000) | (0.560) | (0.029) | |||
eGARCH+IV | -2.9151 | -0.7166 | -2.0918 | -2.2446 | -2.8433 | -0.5827 | 1.8963 | ||
(0.004) | (0.474) | (0.037) | (0.025) | (0.005) | (0.560) | (0.059) | |||
eGARCH+RV+IV | -3.3131 | -1.0345 | -3.2261 | -3.5315 | -4.2281 | -2.1951 | -1.8963 | ||
(0.001) | (0.302) | (0.001) | (0.000) | (0.000) | (0.029) | (0.059) |
Diebold and Mariano test statistic S_{ij} as as described in Equation 26. Positive values for S_{ij} represent that g(ε_{i}) > g(ε_{j}). In parenthesis, we report the p - values for a two-tailed Student-t's test on null hypothesis H_{0}:E[g(ε_{i})] = E[g(ε_{j})]. Numbers in bold indicate the rejection of null hypothesis at the 10% significance level.
Models | j | ||||||||
---|---|---|---|---|---|---|---|---|---|
SD | EWMA | GARCH | GJR | eGARCH | eGARCH+RV | eGARCH+IV | eGARCH+RV+IV | ||
i | SD | 3.1916 | 1.8440 | 1.4913 | 0.9619 | 2.2717 | 2.7842 | 3.3404 | |
(0.002) | (0.066) | (0.137) | (0.337) | (0.024) | (0.006) | (0.001) | |||
EWMA | -3.1916 | 0.1899 | -0.1964 | -0.6768 | 0.6798 | 1.2348 | 1.7962 | ||
(0.002) | (0.850) | (0.844) | (0.499) | (0.497) | (0.218) | (0.073) | |||
GARCH | -1.8440 | -0.1899 | -1.4709 | -2.8449 | 1.8688 | 3.3933 | 5.2654 | ||
(0.066) | (0.850) | (0.142) | (0.005) | (0.062) | (0.001) | (0.000) | |||
GJR | -1.4913 | 0.1964 | 1.4709 | -2.9483 | 4.8847 | 6.0395 | 7.6759 | ||
(0.137) | (0.844) | (0.142) | (0.003) | (0.000) | (0.000) | (0.000) | |||
eGARCH | -0.9619 | 0.6768 | 2.8449 | 2.9483 | 7.3480 | 9.0032 | 9.0786 | ||
(0.337) | (0.499) | (0.005) | (0.003) | (0.000) | (0.000) | (0.000) | |||
eGARCH+RV | -2.2717 | -0.6798 | -1.8688 | -4.8847 | -7.3480 | 3.3721 | 8.0125 | ||
(0.024) | (0.497) | (0.062) | (0.000) | (0.000) | (0.001) | (0.000) | |||
eGARCH+IV | -2.7842 | -1.2348 | -3.3933 | -6.0395 | -9.0032 | -3.3721 | 4.1862 | ||
(0.006) | (0.218) | (0.001) | (0.000) | (0.000) | (0.001) | (0.000) | |||
eGARCH+RV+IV | -3.3404 | -1.7962 | -5.2654 | -7.6759 | -9.0786 | -8.0125 | -4.1862 | ||
(0.001) | (0.073) | (0.000) | (0.000) | (0.000) | (0.000) | (0.000) |
Modified Diebold and Mariano test statistic S_{ij} as described in Equation 26. Positive values for S_{ij} represent that g(ε_{i}) > g(ε_{j}). In parenthesis, we report the p - values for a two-tailed Student-t's test on null hypothesis H_{0}: E[g(ε_{i})] = E[g(ε_{j})]. Numbers in bold indicate the rejection of null hypothesis at the 10% significance level.
When we analyse the results based on the absolute errors loss function, shown in Table 5, the conclusions are reinforced. Notice that when comparing SD against GJR and eGARCH, we cannot even reject the null hypothesis that those models forecasts are not different from historical standard deviation forecasts. Another change from previous results is that now, model eGARCH + RV + IV is significantly superior even to EWMA model at a 10% significance level.
When we look at Diebold and Mariano test results using 𝜎_{YZ} and 𝜀_{k}^{2} as loss function, presented in Table 6, we can see a clear superiority of models eGARCH + RV + IV and eGARCH + IV, since they outperform all other models. On the other hand, we see again model SD being outperformed, as well as we can see the poor performance of EWMA model in this scenario. The same results are reinforced when we use |𝜀_{k}| as loss function, as shown in Table 7.
Models | j | ||||||||
---|---|---|---|---|---|---|---|---|---|
SD | EWMA | GARCH | GJR | jeGARCH | eGARCH+RV | eGARCH+IV | eGARCH+RV+IV | ||
i | SD | 2.8644 | 4.2542 | 4.3423 | 4.1941 | 4.7584 | 4.8284 | 5.0789 | |
(0.004) | (0.000) | (0.000) | (0.000) | (0.000) | (0.000) | (0.000) | |||
EWMA | -2.8644 | 3.1936 | 3.5394 | 3.4625 | 4.0242 | 4.0134 | 4.2580 | ||
(0.004) | (0.002) | (0.000) | (0.001) | (0.000) | (0.000) | (0.000) | |||
GARCH | -4.2542 | -3.1936 | 0.5684 | 0.8222 | 2.0203 | 3.3261 | 3.5494 | ||
(0.000) | (0.002) | (0.570) | (0.411) | (0.044) | (0.001) | (0.000) | |||
GJR | -4.3423 | -3.5394 | -0.5684 | 0.8596 | 2.7043 | 3.7586 | 3.6856 | ||
(0.000) | (0.000) | (0.570) | (0.391) | (0.007) | (0.000) | (0.000) | |||
eGARCH | -4.1941 | -3.4625 | -0.8222 | -0.8596 | 1.5582 | 3.6355 | 2.8593 | ||
(0.000) | (0.001) | (0.411) | (0.391) | (0.120) | (0.000) | (0.004) | |||
eGARCH+RV | -4.7584 | -4.0242 | -2.0203 | -2.7043 | -1.5582 | 2.4431 | 3.3110 | ||
(0.000) | (0.000) | (0.044) | (0.007) | (0.120) | (0.015) | (0.001) | |||
eGARCH+IV | -4.8284 | -4.0134 | -3.3261 | -3.7586 | -3.6355 | -2.4431 | -0.3195 | ||
(0.000) | (0.000) | (0.001) | (0.000) | (0.000) | (0.015) | (0.750) | |||
eGARCH+RV+IV | -5.0726 | -4.2580 | -3.5494 | -3.6856 | -2.8593 | -3.3110 | 0.3195 | ||
(0.000) | (0.000) | (0.000) | (0.000) | (0.004) | (0.001) | (0.750) |
Modified Diebold and Mariano test statistic S_{ij} as described in Equation 26. Positive values for S_{ij} represent that g(ε_{i}) > g(ε_{j}). In parenthesis, we report the p - values for a two-tailed Student-t's test on null hypothesis H_{0}: E[g(ε_{i})] = E[g(ε_{j})]. Numbers in bold indicate the rejection of null hypothesis at the 10% significance level.
Models | j | ||||||||
---|---|---|---|---|---|---|---|---|---|
SD | EWMA | GARCH | GJR | eGARCH | eGARCH+RV | eGARCH+IV | eGARCH+RV+IV | ||
i | SD | 2.0254 | 2.3394 | 2.0378 | 1.6774 | 2.4855 | 3.2683 | 3.4941 | |
(0.043) | (0.020) | (0.042) | (0.094) | (0.013) | (0.001) | (0.001) | |||
EWMA | -2.0254 | 1.4112 | 1.1529 | 0.8294 | 1.6197 | 2.4636 | 2.6473 | ||
(0.043) | (0.159) | (0.250) | (0.407) | (0.106) | (0.014) | (0.008) | |||
GARCH | -2.3394 | -1.4112 | -0.8951 | -1.5395 | 0.8874 | 3.7987 | 4.0510 | ||
(0.020) | (0.159) | (0.371) | (0.124) | (0.375) | (0.000) | (0.000) | |||
GJR | -2.0378 | -1.1529 | 0.8951 | -1.4989 | 2.7156 | 6.3132 | 5.9250 | ||
(0.042) | (0.250) | (0.371) | (0.135) | (0.007) | (0.000) | (0.000) | |||
eGARCH | -1.6774 | -0.8294 | 1.5395 | 1.4989 | 3.7673 | 7.9018 | 6.2532 | ||
(0.094) | (0.407) | (0.124) | (0.135) | (0.000) | (0.000) | (0.000) | |||
eGARCH+RV | -2.4855 | -1.6197 | -0.8874 | -2.7156 | -3.7673 | 5.7867 | 7.5518 | ||
(0.013) | (0.106) | (0.375) | (0.007) | (0.000) | (0.000) | (0.000) | |||
eGARCH+IV | -3.2683 | -2.4636 | -3.7987 | -6.3132 | -7.9018 | -5.7867 | 0.3720 | ||
(0.001) | (0.014) | (0.000) | (0.000) | (0.000) | (0.000) | (0.710) | |||
eGARCH+RV+IV | -3.4941 | -2.6473 | -4.0510 | -5.9250 | -6.2532 | -7.5518 | -0.3720 | ||
(0.001) | (0.008) | (0.000) | (0.000) | (0.000) | (0.000) | (0.710) |
Modified Diebold and Mariano test statistic S_{ij} as described in Equation 26. Positive values for S_{ij} represent that g(ε_{i}) > g(ε_{j}). In parenthesis, we report the p - values for a two-tailed Student-t's test on null hypothesis H_{0}:E[g(ε_{i})] = E[g(ε_{j})]. Numbers in bold indicate the rejection of null hypothesis at the 10% significance level.
The main take-away from all these results, irrespectively of error measure, loss function or specification of realized volatility, is that the models with the exogenous variables outperformed, with statistical significance, all other models analyzed in this study, including the standard GARCH family models. This leads to the conclusion that realized volatility and implied volatility potentially contain relevant information when it comes to USD-BRL Currency volatility forecasts. For robustness purposes, we have run the same analysis for forecasts up to 21 days ahead (results available with the authors upon request). The results are in line with the ones presented her, such that the conclusion above remains true (and, in fact, even stronger).
5. Conclusion
This study investigated the inclusion of exogenous variables in GARCH type models on USD-BRL currency volatility forecasts. More specifically, we included in the standard eGARCH model specification two variables: the USD-BRL Currency implied volatility, IV, represented here by the FXVol index calculated by BM&FBovespa; and the realized volatility, 𝜎̃, obtained from high frequency data. We then assessed the performance of those extended models forecasts against those produced from unmodified models as well as standard models such as historical standard deviation and the EWMA model.
The evidence found in our study supports the conclusion that both, implied and realized volatilities, provide useful information when modelling volatility for the local USD-BRL currency market. The extended models outperformed the standard GARCH models as well as the historical standard deviation and the EWMA model. This result is robust when using different error measures, different loss functions and different measures for the realized volatility as our benchmark. It is interesting to notice that implied volatility is an expected volatility in the future while realized volatility is the volatility in the past: (expected) future and past seem to be important!
One limitation of this work is related to the studied models' bias. Hence, this leads to our first suggestion for future research, which would be to investigate the impact of implied volatility and high frequency intraday data on different volatility forecasting models' specifications, such as regime switching ARCH (SWARCH) or stochastic volatility models, for instance. Another topic for future development relates to the models performance evaluation. One could try to use different frameworks to assess models performance, such as the superior predictive ability (SPA) test, as used in ^{Hansen and Lunde (2005b)} and ^{Koopman et al (2005)}.
Another suggestion for future research is related to the benchmarks used. One could investigate different aspects of realized variance as the proper measure for USD-BRL Currency latent volatility process, such as different intraday quotes sampling frequency and microstructure effects, or overnight jumps and what is the best approach to integrate those, as well as investigate alternative sources for the USD-BRL currency data, such as the BM&FBovespa DOL Futures. Regarding the use of the FXVol index as a benchmark for implied volatility, one promising research possibility would be to investigate other methodologies to incorporate the observed volatility smile trend. It is important to remember this is a subject with a wide range of possible approaches, where both worlds, academic and practitioners, certainly have a lot to contribute with. Hopefully, the conclusions and ideas from this work will concur to the future development of new studies, which will help us to achieve a better understanding of how our local market volatility works, fostering the local development of new products and helping policymakers and, ultimately, the whole society.