Price trends of Agave Mezcalero in Mexico using multiple linear regression models

ABSTRACT: This study developed a multiple linear regression model to estimate the Average rural prices (ARP) in Mexico with information taken from the period 1999-2018. The variables used to generate this model were the supply and demand as represented by planted area, yield, exports and the ARP of Agave Tequilero and Mezcalero. The analysis was carried out through the multiple linear regression model (MLRM) with the least squares method and using the statistical package R. The following variables were identified as having a significant influence on the determination of the ARP: the yield of Agave Mezcalero (YAM), the ARP of Agave Tequilero and the new planted area of Agave Tequilero (NPAATt-6) with an adjustment of 6 periods. Overall, three models were generated: model 2 was considered the most appropriate because it allows carrying out future forecasts with the new planted area with Agave Tequilero with 2 independent variables. YAM and NPAATt-6 were useful in predicting 65.5% of the annual variations in the ARP and helped recognize the negative trend of the Agave price from 2020 to 2024. Therefore, the use of the MLRM to estimate the Agave ARP can be a useful tool in predicting the performance of this crop.


INTRODUCTION
In Mexico, the Agave Mezcalero production chain has important economic and social relevance.In 1994, the World Intellectual Protection Organization (WIPO) granted the Denomination of Origin of Mezcal (DOM) to the following states: Oaxaca, Guerrero, Durango, San Luis Potosí, and Zacatecas.Later, in 2001, the Mexican Institute of Industrial Protection (IMPI), incorporated as well to the DOM Guanajuato, Tamaulipas, Michoacán and Puebla.Overall, nine states are protected by the DOM and among them Oaxaca is considered the main producer of Agave Mezcalero.In 2018, Oaxaca contributed up to 56% of the national total planted area (SIACON-SADER, 2020).The Mezcal Regulatory Council (CRM) estimates that the production of certified Mezcal, according to NOM 070 (DOF, 2017), at the national level increased from 980,000 liters in 2011 to 7.15 million in 2019, out of which Oaxaca produced 77. 4 and 90.1% respectively (CRM, 2020).Likewise, the 2011 census of Agave producers and mezcal distillers presented the political districts of the state of Oaxaca that make up the mezcal region (Figure 1), namely Tlacolula, Yautepec, Miahuatlán, Sola de Vega, Ocotlán, Ciência Rural, v.53, n.2, 2023.
Finally, it is important to mention that the area planted with A. angustifolia Haw. has undergone considerable change.The CRM indicated that 86% of the mezcal commercialized in 2019 was obtained from A. angustifolia and the remaining 14% from other species (CRM, 2020).In recent years, mezcal obtained from wild species has led to the increase in the price of mezcal, but further research is needed on this issue because its magnitude has not been measured.
One of the central objectives of economic development has been to increase the income of the people in economically depressed areas (bANERJEE et al., 2019).In rural localities, this development has been carried out in three directions around agricultural production as an income-generating activity: the dispersion of subsidies or direct delivery of cash (MARDERO et al., 2018); agricultural financing or credit (bANERJEE et al., 2019;DONG & YANG, 2021); and investment in infrastructure, equipment, or strategic productive assets such as plant material or development of perennial plantations (MUÑOZ-RODRÍGUEZ et al., 2020).
As part of the investment in strategic productive assets, the production of Agave Mezcalero and its distillate can be a profitable option (NAVARRETE-bOLAÑOS et al., 2021) to achieve the objective of economic development in rural populations.This requires planning production over time in order to avoid the negative effects of price fluctuations.
The main problem that has arisen in the last 54 years in the Agave Mezcalero production chain is the cyclical crisis of shortage and excess supply of Agave.This is due to its various usages, such as being raw material in distilling mezcal and more recently for producing Agave syrup.As a result, this has caused crises every 17 to 18 years, as happened in 196618 years, as happened in , 198418 years, as happened in , 200118 years, as happened in and 201918 years, as happened in (PALMA et al., 2016;;PLASCENCIA & PERALTA, 2018;SIACON-SADER, 2020) National and international supply and demand factors, as well as institutional ones, may be behind these fluctuations (GAYTÁN, 2018).
It is probable that the variation in Agave prices has influenced the agricultural supply of raw material and the demand for land.Based on field research carried out over ten years in the state of Oaxaca and other parts of Mexico, it was observed that in years of abundant availability of Agave, prices for this raw material fell sharply.This in turn has caused the abandonment of plantations and a reconversion to cultivating fields with such crops as corn, livestock fodder, vegetables or fruit trees, depending on the geographical region.On the contrary, when prices rise the cultivated area increases and new areas are even opened in barely disturbed ecosystems, such as low elevation deciduous forests.This not only puts animal and plant biodiversity at risk, but also increases the extraction of wild Agave (bLAS-YAÑEZ & THOMÉ-ORTIZ, 2021), which results in a decrease in their populations and puts their long-term conservation at risk (AGUIRRE-DUGUA & EGUIARTE, 2013).Currently, Mexico is experiencing a boom of new Agave plantations for obtaining mezcal after the drastic decrease in its cultivation from 2010 to 2015.This led to the increase in market prices of the raw material from $900.00 to $10,500.00 pesos per ton from 2010 to 2018.
One of the techniques used to forecast prices with more than two variables is the multiple linear regression model (MLRM), which has already been used to estimate gold price (ISMAIL et al., 2009;MANOJ & SURESH, 2018) and trends in the stock market (SHAKHLA et al., 2018).In agriculture it has been used to analyze of corn production (LÚCIO et al., 2001) as well as predict the firmness in the quality of Kiwi (TORKASHVAND et al., 2017).MARTÍNEZ (2014) carried out through a polynomial econometric analysis of grade four a forecast of the average rural price of Agave Mezcalero.This represented an accurate estimate of the tendency of increasing prices for the period 2015-2018.This model considered the relationship between two variables: the average rural price and the area sown.However, this model did not consider among the explanatory variables the influence of Agave Tequilero, whose production is of utmost importance in defining the price of Agave Mezcalero.Therefore, a model is required that can perform analysis between more than two variables, such as the MLRM.
In this context, it is necessary to obtain updated and sufficient information to design and plan cultivation programs and to generate price forecasts.This tool will allow both producers and government institutions to make better investment decisions in the production of Agave Mezcalero, with the aim of reducing the pronounced crisis stages and overproduction of raw material in the Agave Mezcalero production chain.based on the above, this study developed a multiple linear regression model to estimate trends in the ARP of Agave Mezcalero in Mexico during the period 1999-2018, in order to provide strategic information to the government and to producers involved in the crop.

Variable selection
The selection of variables was based on interviews with ten experts knowledgeable in the subject; three of them were researchers, three representatives of government institutions, and four producers involved in the Agave Mezcalero production chain.The information obtained from the interviews coincided with that indicated in the literature.
These are the variables that had an influence on the average rural price of Agave Mezcalero: area of Agave Tequilero planted and harvested, consumption of Agave by the industry, exports of Mezcal and Tequila (bAUTISTA & SMIT, 2012;CAMACHO-VERA et al., 2021); planting density affecting Agave yield in plantations, as well as the planted and harvested area of Agave Mezcalero (bAUTISTA & SMIT, 2012;MARTÍNEZ, 2014); and the average rural price of Agave Tequilero (PALMA et al., 2016).

Data collection process
Average rural prices (ARP) of Agave Mezcalero are given in Mexican pesos from a time series that was obtained from SIAP (AgriFood and Fisheries Information Service) (SIACON-SADER, 2020) for 1999 to 2018.These prices are the dependent variable in the model.The following variables that influence the average rural price of Agave Mezcalero were identified as yield in tons per hectare of Agave Mezcalero (YAM), yield in tons per hectare of Agave Tequilero (YAT), average rural price in pesos of Agave Tequilero (ARPAT), new planted area of Agave Tequilero Ciência Rural, v.53, n.2, 2023.Cruz-Ramírez et al. in hectares NPAAT t-6 , new planted area in hectares of Agave Mezcalero NPAAM t-6 , volume of exports of Agave Tequilero in thousands of pesos (EXPT), Agave consumption for the tequila industry (ACI), harvested area of Agave Mezcalero in hectares (HAAM), harvested area of Agave Tequilero in hectares (HAAT), total planted area of Agave Tequilero in hectares (TPAAT) and total planted area of Agave Mezcalero in hectares (TPAAM).
To define the lagging variables NPAAT t-6 and NAPAM t-6 in a given year, an adjustment was made to the information presented by the SIAP in accordance with its regulations and estimation procedure (SIAP, 2019).Thus, the total planted area in the previous year was subtracted from the total planted area in the current year.To analyze the impact of this new surface, an average maturation of the planted area was considered for a 6-year harvest (6-period lag).In this way, for example, the impact on the price of 2018 is analyzed according to the new planted area established in 2012.
ACI and EXPT were obtained from datasets published by the Tequila Regulatory Council (CRT, 2020) and the remaining independent variables were obtained from SIAP (SIACON-SADER, 2020).both datasets considered information from 1999 to 2018.The Pearson correlation coefficient between the ARP and each of the independent variables identified was estimated.Levels of significant association and higher values of the correlations were considered as a good sign to proceed with the regression analysis with the subsequent choice of the best model.

Specification of the regression model
To evaluate the importance of endogenous variables in the model and their influence on the ARP, the estimates were made through MLRM using the least squares method (LSM) (HYNDMAN & ATHANASOPOULOS, 2018) using the following equation: Where b1 is the intercept term, while the coefficients b2, b3…, bk are known as the coefficients corresponding to the variables predicted in the model.These coefficients measure the effect of the change in the variable X k on the expected effect in the value of Y; meanwhile the other variables remain constant (HILL et al., 2018) .To complete the model, a random error term named with the letter "e" is considered.
The LSM was used for the calculation of the coefficients estimated from the data, since this provides a way to obtain the coefficients by minimizing the sum of the squares of the errors.
Therefore, values for b1, b2, b3,…., Bk are chosen that minimize the following expression (HYNDMAN & ATHANASOPOULOS, 2018): 1) Y refers to the dependent variable, namely the ARP of Agave Mezcalero.The independent variables are those related to the production, price and commercialization of Agave Tequilero and Mezcalero.For the regression analysis in each particular case, the significance level of the t test, the coefficient of determination R 2 and the value of the F-test were considered.
When proposing the first MLRM with the significant variables reported from the correlation analysis, a possible multicollinearity problem was found between the explanatory variables.To correct this, it was necessary to reduce the number of variables in the model.When the number of variables that impact the price is very high, the progressive regression method can be carried out, as has been the case in estimating the price of gold (ISMAIL et al., 2009) and more recently of residential property prices (CUI, 2020).In this paper, the number of variables that impact the price was reduced, since 2 independent variables presented correlation issues.In order to correct the multicollinearity problem, two MLRMs were carried out in which these two variables were mutually excluded with the aim of evaluating the performance of each model and selecting the most suitable one.The 3 models were processed in the R statistical package (R DEVELOPMENT CORE TEAM, 2020) version 3.6.3.Results reported in models 2 and 3 were processed in a line chart (Figure 2) using Microsoft Excel 365.Therefore, we could compare the prediction level of each model with respect to the observed ARP, which are those published by SIAP during the period 1999 to 2019.

RESULTS AND DISCUSSION
Table 1 shows the ARP correlation matrix against each of the independent variables.A highly significant positive linear correlation (P < 0.001) was found between the ARP and ARPAT.In turn, a negative and significant linear correlation (P < 0.01) was found between the ARP and NPAAT t-6 , as well as between the ARP and YAM (P < 0.05).This indicated that when the ARPAT increases, the ARP also increases, while when the YAM and NPAAT t-6 increase, the ARP decreases.Likewise, a correlation reported between the independent variables ARPAT and NPAAT t-6 , which could cause multicollinearity problems when both variables were included in the model.ARPAT had the highest correlation followed by NPAAT t-6 and YAM.
Model 1 includes all the potential independent variables that have been identified and is defined as follows: ARP = 5975 * -77.93 * YAM + 0.2149 *** ARPAT -0.01251 NPAAT t-6 Where * P < 0.05, ** P < 0.01, *** P < 0.001 NPAAT t-6 did not have a regression coefficient with an acceptable level of significance.The model predicts 88.7% of the variations in the ARP.Currently the multicollinearity problem is not well defined (WOOLDRIDGE, 2019).Nonetheless, it is recommended that there is not a strong correlation between two or more independent variables within the model.For this reason, in the following models the variables ARPAT and NPAAT t-6 were considered separately.
Model 2 includes the YAM and the lagging variable NPAAT t-6 as independent variables, and is defined as follows: ARP = 13020 *** -174.2 ** YAM -0.03655 ** NPAAT t-6 The two regression coefficients are significantly different from zero.The variance inflation factor (VIF) for the YAM and NPAAT t-6 is 1.09 and 1.09, respectively.These results are below the threshold (less than 10), indicating there are no multicollinearity problems.The value of the Durbin Watson test was 1.52, which indicated that the assumption of non-autocorrelation of the residuals is fulfilled.65.7% of the variations in the ARP are explained in this model by the independent variables.The MLRM has been used as an adequate tool to evaluate the trend in prices considering the effects of lagging independent variables (X t-n ) in the available data (ISMAIL et al., 2009).
Model 3 considers the YAM and ARPAT as explanatory variables and is specified as follows: ARP = 5594.64-74.86 * YAM + 0.24 *** ARPAT The two regression coefficients are significantly different from zero.The variance inflation factor (VIF) for YAM and ARPAT is 1.44 and 1.44 respectively.In this model, 86.8% of the variations in the ARP can be explained by the YAM and ARPAT, while the value of the Durbin Watson test was 1.79.The procedure carried out in this paper is in line with that proposed by SHAKHLA et al., (2018) to forecast stock market price trends.These authors used the price of other financial indicators within the explanatory variables of an MLRM, since the prices between similar or substitute products may present high levels of correlation.
In models 2 and 3 the assumption of normality in the distribution of the residuals was met according to the frequency distribution in a histogram and according to a normal probability graph (Q-Q plot).The assumption of homoscedasticity was validated by plotting the residuals against the values adjusted by the model.
Figure 2 shows the estimated results for models 2 and 3 in comparison with the ARP observed in the analyzed period.Model 2 allows price estimates for 6 years after the period analyzed with NPAAT t-6 and the average observed for the last 5 years of the YAM; the year 2019 offers newly observed data that serves to measure the predictive capacity of the model.It should be underlined that model 2 predicts a decrease in prices for the year 2020.
The data sets used share no information with respect to the trend of the ARP of wild Agave, since there are no statistics available for these species.However, a decrease in the ARP estimated in this paper will be directly related in practice to the general prices of the Agave used to produce mezcal, insofar as they are substitute or complementary products, such as the relationship observed in model 3 between the ARP and the average rural price of Agave Tequilero.Additionally, it was not possible to evaluate the effect of the increase in mezcal exports in the ARP because the published statistics are incomplete for the period analyzed.It is essential for decision-makers within the Agave Mezcal production chain to have information about the trends in prices of Agave.To overcome the overproduction problem of Agave Mezcalero, economic and productive planning measures for planting, considering demand and supply in the local and national market, as well as introducing price control measures are required (bAUTISTA et al., 2017).
Different studies showed a close relationship between the production of Agave Tequilero and Agave Mezcalero.PLASCENCIA & PERALTA (2018) affirmed that this relationship has been direct and historical, whereas the dependence  of both Agaves to produce syrup is considered another important factor (MARTÍNEZ, 2014).As a result, the effects of the tequila industry and related products on the mezcal industry are visible (PALMA et al., 2016).Therefore, the production variables of the Agave Tequilero were analyzed in this study to estimate the magnitude in which they affect the ARP of the Agave Mezcalero.The prediction equations were composed of variables with information accessible in practice.However, in the case of wild Agave that are also used to produce mezcal there is no statistical information available.Hence, the studies carried out were developed only with information of the A. angustifolia Haw., as well as based on analysis carried out by MARTÍNEZ et al. (2014).
The trend in the ARP of Agave was analyzed by developing three MLRMs.First, all the potential independent variables were included in a first model.The second model was considered the most appropriate because it allows carrying out forecasts by taking into consideration the new planted area with Agave Tequilero.Finally, the third model had the highest predictive power, although it was limited since there was a dependency between the ARP and ARPAT.Regression coefficients provided a means to assess the relative importance of the independent variables in the establishment of the ARP.Increases in the ARPAT result in increases in the ARP, while increases in the YAM and NPAAT t-6 cause decreases in the ARP.
Subsequent studies should provide mechanisms to improve the compilation of statistical information that can be used to improve price forecast models.

CONCLUSION
Three independent variables were identified that had an influence on the ARP in the period analyzed.These variables were yield in tons per hectare of Agave Mezcalero (YAM), the ARP in pesos of Agave Tequilero (ARPAT) and the new planted area in hectares of Agave Tequilero NPAAT t-6 with a 6-period lag.Two models were proposed excluding the correlated independent variable to eliminate the multicollinearity problem within the MLMR with respect to the 3 variables that affect the ARP.Among the models developed, the model with YAM and NPAAT t-6 as explanatory variables was useful in predicting 65.5% of the annual variations in the ARP and allowed forecasting a negative trend in the price of Agave from 2020 to 2024.This information could be useful for planning a planting program of Agave Mezcalero.The evidence suggested that due to the growth of Agave plantations, the ARP will decrease to a low point in 2024, but still remain above the critical prices presented during the period 2006-2014 in which producers showed significant losses in income derived from the crop.The analysis carried out suggested that the increase or decrease in prices is determined to a greater extent by the planted area and consequently by the production offered both of Agave Mezcalero and Agave Tequilero.Therefore, the planning of production over time is the best option to address the problem of price fluctuations, as has been the case with other agricultural crops.In sum, a multiple linear regression model can be a useful tool in predicting the average rural price of Agave.

Figure 1 -
Figure 1 -Producing States in Mexico and the Mezcal Region in Oaxaca.Source: own elaboration with data from OEIDRUS (2011) and DOF (DOF, 2015).

Figure 2 -
Figure 2 -Time series of the Average rural prices of Agave Mezcalero, with observed and estimated data from models 2 and 3. Source: own elaboration.

Table 1 -
Correlation matrix between the average rural price of Agave Mezcalero and the selected independent variables of production, price and commercialization of Agave Tequilero and Mezcalero + .