The role of transition regime models for corn prices forecasting

: Given the relevance of corn for food and fuel industries, analysts and scholars are constantly comparing the forecasting accuracy of econometric models. These exercises test not only for the use of new approaches and methods, but also for the addition of fundamental variables linked to the corn market. This paper compares the accuracy of different usual models in financial macro-econometric literature for the period between 1995 and 2017. The main contribution lies in the use of transition regime models, which accommodate structural breaks and perform better for corn price forecasting. The results point out that the best models as those which consider not only the corn market structure, or macroeconomic and financial fundamentals, but also the non-linear trend and transition regimes, such as threshold autoregressive models.


Introduction
There is a growing need for a broader understanding of commodities price determinants and dynamics. Primary agricultural products such as corn play a dual role in this discussion. They are not just strongly affected by feed and food issues but also by strong growth of the global population, the need for sustainable practices of production of food, and for use in fuel complexes, mainly due to the rise in renewable energy use (Gurge, 2011).
According to Food and Agriculture Organization of the United Nations (2018) database, the world population is projected to reach almost 10 billion by 2050. Additionally, persistent poverty, unemployment, and inequality tend to restrain access to food and represent an obstruction to food security and nutrition goals, as well. Defeating poverty and inequality impacts undernourishment positively, sparking demand for commodities, and consequently affecting their price dynamics.
Fuel commodities, such as corn, are sensible to shocks in the energy sector. As reported by the department of economic research of United States Department of Agriculture (2019), the corn crop channeled to ethanol production went from 32% to 45%, in 2018. This fact is more relevant if we consider that in 2005 only 10% of corn stocks were used for this purpose and in 2012, given corn prices upward trend, it reached 62%.
As this commodity is extremely demanded worldwide, it represents an important asset to primary-sector dependent countries. Thus, many authors explore the role of corn exports in the trade balance. For example, Alvim & Waquil (2005), who analyzed how trade agreements affected corn and other Brazilian grains in foreign trade. Also, Mariscal & Powell (2014), who investigated how booms and breaks impact trade balance long-run trends of developing countries. Also, Winkelried (2018) who analyzed the Presbisch-Singer hypothesis of a secular decline in commodity prices with time-series econometrics.
The precision of commodities prices is critical in the decision-making process of policymakers in their interventions in world trade, economic growth and, sectoral, and sustainability policies. Hitherto, the recent literature is concerned with variables or methods which promote the best accuracy in forecasting models.
Studies tried to improve forecasting performance by modeling price dynamics with different assumptions. For example, Ahumada & Cornejo (2016) explored the interdependence of corn, soy, and wheat using a Vector Error Correction Model (VECM) and, Zhang et al. (2018) used a quantile regression and a neural network to soy market in China. Winkelried (2016) decomposed food prices with an L1 filter. Favro et al. (2015) investigated correlation and causality between corn prices, soybean prices, and poultry farming.
Regarding the relationship between commodities and financial variables, Cuaresma et al. (2018) highlights is the importance of studying the impact of primary sector price index volatility and stock market volatility by focusing on the Arabic coffee case.
Outlining corn markets, the literature is interested in two approaches. The first one is attentive to volatility forecasting. Benavides (2009) assess it comparing implicit volatility with time series models. McPhail et al. (2012) and Serra & Gil (2013) attacks it in a more structural approach. They shed light on the role of global demand, speculation and energetic necessity, and macroeconomic conditions, as well. The second approach focuses on accessible but precise models to price forecasting to practitioners, as traders and regulators or producers. Bastianin et al. (2014) points out the economic relationship and time precedence of ethanol and corn. Hoffman et al. (2015), checks the quality of World Agriculture Supply and Demand Estimates -WASDE, estimates of the American agriculture department. Jadhav et al. (2017) used ARIMA models in corn price forecasting. Xu (2018) analyzes the co-integration bounded by corn prices of 182 spot markets. Osathanunkul et al. (2018) studies causality between oil and food commodities by a Bayesian Vector Auto-regressive Model (BVAR) and a Markov Switching BVAR (MS-BVAR).
In this sense, this paper aims to study the adherence and performance of several methods and models on corn price forecasting. It contributes to recent literature by accommodating non-linearity and structural breaks in time series using regime shift models. Our findings may help financial analysts, policymakers, and traders once it points out the use of econometric treatments to augment corn-price forecasting accuracy.
Beyond this introduction, the article is composed of four additional sections. The second presents the database and method. The third show data treatment. The fourth section declares the results. The last one presents a few conclusions.

Methodology and Data
To achieve our objectives, a mix of variables and models were used for corn price forecasting, then we compared their forecast accuracy measures by Root Mean Square Errors -RMSE, after using the Diebold-Mariano pairing method. The addition of micro and macroeconomic variables under different assumptions to improve the performance of corn-price forecasting required the evaluation of time-series dynamics.
The time series used are corn production, corn stocks, inflation, corn prices, and interest rates. We used monthly observations that goes from 1995 to 2017 (more details in Table 1).
The data treatment follows Enders (2008), and it begins with the identification and removal of seasonality using X-13 Arima. Then, we tested for unit-root and structural breaks, with ADF and Zivot-Andrews tests, respectively. Additionally, we used the Box-Jenkins procedure for the correction of autocorrelation. According to Table 2, the following 13 econometric models were used: random walk, AR(p), MA(q), ARIMA(p,d,q), VARs, SETAR(p),STAR e LSTAR. Excluding Self-Exciting Threshold Autoregressive (SETAR), the other models were specified with a constant. Following Hotelling (1931) to accommodate effects on the level of these models. Specifications of estimated models are presented in Table 2.
VAR A model follows Reichsfeld & Roache (2011) and Xu (2018) acknowledging the price discovery function of futures contracts over spot prices. VAR B is specified considering corn market structure as highlighted in Gallagher (1986)

VAR, STAR and empirical strategy
Following Vector Autoregressive Models (VAR) introduced in Sims (1980), the problem of identification and autocorrelation of innovations in multivariate models can be surpassed when endogenous effects are considered. Algebraically it can be specified as in Equation 1.
Simplifying 2 results in the VAR estimated Equation 3: where t X represents the matrix of variables, 0 A a matrix of constants, i A is the matrix of estimated parameters, and t ∈ represents the error term of this model. Therefore, the interdependence of the variables in VAR accommodates endogenous effects and autoregressive effects. However, the model is an incomplete approach in the presence of non-linearities in the time series phenomenon to be studied. In that regard, STAR model extends VAR approach by introducing a state-space transition among regimes. The model was firstly proposed in Chan & Tong (1986).
Given the dependence of corn prices with global real economic activity, and the possibility of structural breaks, we tested transition regime models. As in Albuquerquemello et al. (2018), who suggested the SETAR model performed better than linear models for oil prices forecasting, we tested it.
For the identification and estimation of VAR models in this paper, we follow Enders (2008) and Lütkepohl (2005) steps: a) unit root tests to check for the order of integration regarding each variable; b) lags order identification relying on information criteria; c) cointegration identification using Johansen (1988); d) evaluation of autocorrelation, heteroskedasticity, and stability of the model; e) in-sample and out-of-sample forecasting; f) evaluation of forecasting performance using accuracy measures.

Forecasting Evaluation
Accuracy evaluation is schematized in two steps: i) comparing Root Mean Square Error to in-sample previsions; e b) using statistic proposed in Diebold & Mariano (2002) to out-of-sample forecasts.
Diebold-Mariano test infers over the difference of precision of two different models. 1 The null hypothesis is that the two forecasts have the same accuracy; whereas the alternative hypothesis is that two forecasts have different levels of precision.
The RMSE and Diebold-Mariano tests can be specified as in Equations 4 and 5: are forecasting error to h-steps ahead in model A and B respectively, ˆd w is the variance-covariance matrix d , corrected to serial autocorrelation.

Time Series Analysis and Variables Treatment
This section presents a preliminary analysis of the variables used, and statistical treatment required to correct identification, as well. In this sense, subsection 3.1 explores the features of variables and period studied, and stylized facts. Subsection 3.2 presents the procedures and interpretation of variables treatment, using statistics to evaluate unit-roots, structural breaks, and cointegration.

Time Series Analysis
The time series of corn prices go from January/1995 to December/2017. In Figure 1 it can be observed the trends and series behavior. The period between 2000 and 2008 is marked by an accelerated uptrend in price, which is probably explained by a rising per capita income in India and China, hence stimulating higher domestic demand for this grain, and the ethanol market (production) in the U.S.
Notwithstanding, Baffes & Haniotis (2010) outlines that the rising consumption of this commodity by India and China is lower than global demand growth. Bobenrieth et al. (2004) and Runge & Senauer (2007) connect this price spiraling to a growing demand for corn ethanol by the USA. The latter is not a consensus in the literature, as Gilbert (2010) says that food price rises are more related to index-based investment in agricultural futures markets rather than to the above-mentioned factors. The decreasing shift in corn prices between 2008 and 2011 is largely explained by the subprime crisis. The aftermath is marked by a reaction of prices due to the low level of stocks. Roberts & Schlenker (2009) and Roberts & Schlenker (2013) shed light on the substitutability of corn on global market. The market structure of corn suggests that when stocks are extremely low, prices become highly sensitive to marginal shocks.
According to United States Department of Agriculture (2015), corn is the most produced grain in the world. The production is concentrated in three big players: the USA, China, and Brazil; representing 65% of global production. Thus, the drop in prices can be explained by rising stock levels in China. Figure 2 presents the trajectory of the main determinants of corn spot prices, as pointed out by the literature review. According to Figure 2, all variables present a structural break due to the global financial crisis in 2008. Additionally, it can be observed a shift in the term structure of interest rates in the USA, corroborating with Krishnamurthy & Vissing-Jorgensen (2011) perception. Graphically, we can identify two main features to be treated: seasonality and trend.

Variables Treatment
Seasonality, if not correctly addressed can be dismal to forecasting. Hereof, it is used X-13 Arima, where the null hypothesis of the QS test is the non-existence of seasonal effects. The Table 3 presents these results. The null hypothesis of no seasonality is not accepted to ethanol production and stock, and inflation (CPI). These variables are important in this study due to their cross-dependence with the oil market.
Augmented Dickey-Fuller (ADF) test results are exposed to Table 4. Only short and long-term interest rates are stable in level, other ones present stochastic trend, being stationarity yielded by differencing the series. Later on, is applied Zivot-Andrews to certify about structural breaks in time series. The results are presented in Table 5. Following Zivot & Andrews (2002), the break is mapped and endogenously estimated. The null hypothesis is of a unit root process with a potential break, the alternative hypothesis is the existence of a unique potential break in a trend stationary process. In Table 5, the Zivot-Andrews test infers to most of the variables a structural break in the aftermath of the subprime crisis in 2008. Just Inflation, ethanol spot price, and ethanol production seem to be decoupled from this event. This is expected as Inflation is the first-difference series. Also, the ethanol market suffered numerous influences with domestic demand and China, and India's demand.
SETAR model is used to accommodate structural breaks and non-linearities in the variables used. It follows the tradition of regime shift models, where a threshold separates levels of parameters. When the threshold is identified, the lags model is found by information criteria AIC (see Table 6).
In ARMA models, three criteria set an optimal lag of 1. VAR models swing from 5 to 10 lags at maximum. SETAR, LSTAR, and STAR models presented an optimal lag length of 2 months. There is a concern in the long-term joint dynamics of these models, tested by the Johansen co-integration test (see Table 7). The results in Table 7, which present Eigen and trace tests, are above their critical at 1%, which means all the models are cointegrated. Therefore, is desirable to treat these models as Vector Error Correction (VEC) by inserting an innovation in the main equation. Finally, these models are estimated and their residuals tested to autocorrelation and heteroskedasticity, as shown in Table 8.  Box & Pierce (1970) and White (1980) in order to detect autocorrelation and heteroscedasticity.
As shown in Table 8, most models presented stability in their residuals. Therefore, the estimators of these models are consistent.

Results
In this section, we present the main results of this study. In section 4.1 is evaluated the forecasting accuracy in-sample and out-of-sample. The outcomes reveal to policy-makers, traders, agriculture and equity research analyst, hedgers, and portfolio managers, a deeper understanding of the best econometric model to forecast corn prices.

Forecasting Performance
The main signal of forecasting performance comes from the study of the loss function. In Table 9, the Root Mean Square Error measure is calculated to in-sample prediction.  Table 9 is showed the RMSE results of each model. The lower in-sample RMSE is presented by model VEC A, the one that uses the price discovery role of futures contracts in the prediction. This model is the same one used by Xu (2018) for forecasting corn prices in the U.S. market. The results of the out-of-sample are shown in Table 10.  Despite the above result, we can only affirm that VEC F has better out-of-sample performance if it survives the Diebold-Mariano pairing method for the loss function prediction errors. In that regard, the Diebold-Mariano pairing method is useful to test the statistical difference among forecasts see Table 11. Given the results in Table 11, the SETAR model has greater performance than other models regarding out-of-sample forecasting. In second place is the LSTAR. When considering the loss function of forecasting errors, the Diebold-Mariano test points out the SETAR and LSTAR model as more accurate than the VEC F model. This result is very interesting once it shows that even when including corn market fundamentals and macroeconomics variables, univariate models that treated structural breaks and non-linearities performed better. This finding corroborates the importance of studying the dynamics of corn prices, as previously done by other authors regarding other commodities, such as crude oil, by Albuquerquemello et al. (2018). In this sense, corn price series are similar to oil price series, as world demand and real economic activity can suddenly influence its long-run trend, causing a structural break.

Robustness test
This section provides a robustness test for forecasting corn prices. The robustness exercise used is the exclusion of the first difference in the variables. In our estimates, the main models for forecasting corn prices within the sample were VEC A, VEC B, VEC C, and VEC D, with VEC A having greater predictive power, as shown in Table 9. Our robustness test corroborates the importance of including the future price in a vector autoregression since even excluding the first difference in the indicators, VEC A remains the main model for forecasting corn prices. This result is shown in Table 12.

Conclusion
Given the relevance of corn for food and fuel industries, scholars and financial market practitioners are constantly testing new models for corn price forecasting. This latter has the purpose of achieving the highest forecasting accuracy, reducing trading costs, and enhancing hedging practices against volatility in prices. This paper deepens in this topic, estimating and comparing numerous econometric models.
Once it is common the presence of several shifts and level changes in time series, we have decided to investigate it in corn prices. The empirical exercise found a statistically significant structural break in 2010. Then, we have checked if the treatment of these problems could enhance the forecasting performance. For this purpose, we used Self-Exciting Threshold Autoregressive (SETAR) family models.
The estimation and comparison of SETAR family models and models commonly employed in literature, such as ARIMA and VEC, provided useful findings. The main results point out that the SETAR model performed better than others for out-of-sample forecasting. Considering in-sample forecasting, the best models were VEC A, having the lowest RMSE. These results highlight the importance of the previous treatment of structural breaks and non-linearities regarding corn price series, as it helps in achieving higher performance.
The results acknowledge market practitioners and researchers about important facts, such: i) the data treatment considering structural breaks and non-linearities helps in forecasting corn prices; ii) models that accommodate structural breaks can easily be implemented for forecasting, speculation, and hedge purposes, in this sense we strongly recommend the use of SETAR family models for worldwide corn market analysis.
The present work was able to achieve the objectives and highlight which are the most suitable models to be used in maize forecasting. However, the work has some limitations. A relevant limitation is that the study did not use models that are capable of dealing with a large number of predictors, that is, large (sparse) models. Among this class of models, the factor models, machine learning and deep learning models deserve to be highlighted. Currently, due to the high availability of data, high dimension models are gaining prominence in the forecasting exercise, as the lecture indicates that such a class of models may be more accurate than traditional models. And the models we use in the study have the limitation of being able to deal with a small set of predictors. Also, the variable of interest can be influenced by highfrequency (daily) variables, and the work left the Midas model out of the forecast model mix. The latter can estimate a dependent variable with a frequency different from its predictors.
Based on the limitations mentioned above, it is suggested to carry out forecasting exercises that include a larger number of predictors, for the use of high-dimension models. Also, it is important to investigate the use of the Midas model.