Estimates of reference evapotranspiration in the municipality of Ariquemes ( RO ) using neural networks GMDH-type

Reference evapotranspiration is a climatological variable of great importance for water use dimensioning in irrigation methods. In order to contribute to the climatic understanding of Ariquemes, Rodônia state, Brazil, the study aims to model the behavior of the time series of reference evapotranspiration using a GMDH-type (Group Method of Data Handling) artificial neural network (ANN) and to compare it with the SARIMA (Seasonal Autoregressive Integrated Moving Average) methodology. Data from the National Institute of Meteorology INMET, obtained at the Automatic Weather Station of Ariquemes, from January 2011 to January 2014, were used. Data analysis was performed using software R version 3.3.1 through the GMDH-type ANN package. Modeling by GMDH-type ANN led to results similar to the results of the SARIMA model, thus constituting an option to predict climatic time series. GMDH-type models with larger numbers of inputs and layers presented lowest mean square error.


Introduction
Study and knowledge on the behavior of climatological variables are essential to understand the climate of a certain region because climatic predictions may help in the decisionmaking, aiming at either the right time for planting or the choice of the irrigation technique, and contribute to the maximization and development of agricultural production.The state of Rondônia, Brazil, and the municipality of Ariquemes are in great agricultural expansion, but studies involving climatic variables are still scarce for the region (Carvalho et al., 2016).In this context, reference evapotranspiration is an important variable for agriculture and, along with crop coefficient (Kc), is extremely important for the processes of irrigation, water use dimensioning, and production in various localities (Alves Sobrinho et al., 2011;Carvalho et al., 2011;Cavalcante Junior et al., 2011).The non-existence of a standard method to accurately predict climatic time series boosts the search for techniques that are able to perform predictions considerably close to the observed values, justifying the use of artificial neural networks (ANNs), which are Metaheuristics based on the functioning of the human brain (Braga et al., 2016).According to Acock & Pachepsky (2000), Ferraz (2014) and Camelo et al. (2017), ANNs have been successfully used to model complex time series in various fields of knowledge.
The type of ANN used in the present study was GMDH (Group Method of Data Handling), a group method of data handling based on the Rosenblatt's perceptron method (Farlow, 1981), in which the original mathematical algorithm was developed by the Russian mathematician Ivakhnenko (1968) and the implementation for R software (R Core Team, 2016) was developed by Dag & Yozgaligil (2016).
This study aimed to model the behavior of the time series of reference evapotranspiration in the municipality of Ariquemes (RO), using a GMDH-type artificial neural network, and compare it to the SARIMA (Seasonal Autoregressive Integrated Moving Average) model.

Material and Methods
Ariquemes is a municipality in the northwestern part of the Rondônia state, Brazil, located at latitude 9º 54' 48" S and longitude 63º 2' 27" W, at altitude of 142 m (Figure 1).Its population is estimated as 107,345 inhabitants (IBGE, 2017) and it has an area of 4,427 km 2 .
The climate of Ariquemes follows the Köppen's classification which is applied to almost the entire state of Rondônia, being equatorial, predominantly hot and humid, with two welldefined seasons: the rainy season, which lasts seven months (October to April), and the dry season (June to August), with May and September as months of transition between both regimes (Carvalho et al., 2016).
Reference evapotranspiration was calculated using the data available in INMET (2014), obtained in the Automatic Weather Station of Ariquemes/RO, from January 2011 to January 2014.The tower of the weather station is situated by the geographic coordinates 9º 56' S and 62º 57' W, at 140 m above the sea level (Figure 1).The evapotranspiration data used in the study were calculated according to the Penman-Monteith-FAO model (Allen et al., 1998) and published by Carvalho & Delgado (2016).
The development of the Group Method of Data Handling (GMDH) algorithm was described by Dag & Yozgaligil (2016) and assumes a time series with t time units and p inputs.Models are constructed for the data with time lags, where the numbers of observations are represented in the columns by t -p, and p is the number of inputs of the lagged time series.The variable z is constructed with the values that best estimate the y values and the other variables are taken into the model with lags of the time series x i , with i = 1, 2, …, p.
Then, the algorithm selects the best model which explains the relationship between response and lagged time series, via transfer functions.The following transfer functions were used (Kondo & Ueno, 2012;Basheer & Khamis, 2016) ), polynomial (z = y) and tangent [z = tan(y)].
In the perspective of the time series, the algorithm learns the relation between lags.After learning the relations, it selects the way to follow in algorithm.Then, initially the GMDH (Ivakhnenko, 1968) is used to construct the polynomials (Eq.1): This algorithm has 6 coefficients to be estimated in each model and they are estimated by Regularized Least Square Estimation (RSLE).
The GMDH algorithm is a system of layers which contain neurons.The number of neurons in each layer is defined by the number of inputs of variables, given by the combination of all pairs taken two by two, i.e., where: m -number of variables; a, b, c, d, … -coefficients of the variables in the polynomial (called weights); y -response variable; and, x i and x j -the lagged time series to be regressed (lags).
Usually, the terms are calculated up to square terms, having the following form (Eq. 2): The GMDH algorithm considers all pairwise combinations of p of the lagged time series.Thus, each combination enters each neuron.Using these two inputs, a model is constructed to estimate the desired output.Eq. 3 considers m = 2 (x i and x j ), and six coefficients are required in each model when estimated: According to an external criterion, p neurons are selected and h -p neurons are eliminated from the network.The present study used the same external criterion of the author of the package, i.e., the Prediction Mean Square Error (PMSE).Thus, outputs obtained from selected neurons become the inputs for the next layer, continuing the process up to the last layer, where only one neuron is selected.The output obtained from the last layer is the forecasted value for the time series.
The GMDH-type Neural Network package of R software proposed by Dag & Yozgatligil (2016) provides the following values of regularizing parameters for λ = {0, 0.01, 0.02, 0.04, 0.08, 0.16, 0.32, 0.64, 1.28, 2.56, 5.12, 10.24}.The authors claim that regularized least squares estimation is used when there is possibility of a multi-collinearity problem by integrating a regularization parameter, λ, into the estimation step.However, Tikhonov (1963) claimed that for λ = 0 the method of regularized least squares converges to the least square method.In this case, for the fitting of the models of the study, where different inputs and layers were tested, it was opted for considering λ = 0, i.e., the models were fitted by the usual method of least squares.
The data set was separated into a learning set and a testing set.In the GMDH package, the standard is 70% of the time series for the learning set and the rest for the testing set.
In the SARIMA modelling followed in the study (Box et al., 2008), at each time instant t, there is a set of values that the series can assume and with which the possibilities of occurrence are associated.The SARIMA model is adequate for prediction of time series whose stochastic process is not stationary.Thus, the original series will be subjected to some differencing procedures in order to make it stationary.The required number of difference to make a series stationary is called order of integration (d).The general SARIMA structure (p, d, q) x (P, D, Q)s is expressed by Eq. 4: where: B -represents a lag operator; B S -represents a seasonal lag operator; f P (B) -represents the autoregressive component of order p; q q (B) -represents the moving average component of order q; F P (B S ) -represents the seasonal autoregressive component of order p; q Q (B S ) -represents the seasonal moving average component of order Q; ϵ t = f p (B)Z t -white noise; d -represents the number of differences; ( (3) (4) This difference operator is defined as:

Results and Discussion
In the period from January 2011 to December 2013, the mean value of reference evapotranspiration for the station of Ariquemes (RO) was 4.03 mm d -1 (standard deviation -s = 0.97 mm d -1 ), varying from 1.06 mm d -1 in the minimum estimate to 6.76 mm d -1 in the maximum estimate.The relative air humidity in this period was 81.02% on mean (s = 8.52), varying from 39.79% in the dry period to 94.33% in the rainy period.The mean rainfall was 6.44 mm d -1 (s = 14.24), reaching maximum value of 120.80 mm d -1 , with annual mean of 2290 mm y -1 .Air temperature was on average 25.5 ºC, varying from 15.3 to 40.1 ºC.
Table 1 shows the mean square errors (MSE) of the fitted models for the reference evapotranspiration -ETo (mm d -1 ).
The model with 4 inputs, 3 layers and Sigmoid transfer function (Table 1) had the lowest mean square error (MSE = 0.7006).The coefficients of the fitted model are described in Eq. 6: Data analysis was conducted using R software Version 3.3.1 (R Core Team, 2016) through the GMDH-type Neural Network package (Dag & Yozgatligil, 2016) and SARIMA (Boshnakov & Halliday, 2018).As a procedure of data analysis, an exploratory analysis was initially carried out to identify mean values observed, as well as maximum and minimum values for the period, identifying possible errors or extreme values.Then, the models were fitted according to the number of input variables (3 and 4 inputs), number of layers (1, 2 and 3 layers) and transfer functions (Polynomial, Sigmoid, RBF and Tangent); in this case, the best fitted model was considered as that with lowest mean square error (MSE) of the predicted values.
After that, the equations used by the GMDH-type Neural Network method were identified and the predicted, observed and forecasted values and confidence intervals for the series were plotted.
In the SARIMA modeling, the best fits of the models were selected based on the following criteria: Mean Square Error (MSE) and Akaike Information Criterion -AIC.AIC is used to compare two models and is expressed in Eq. 5: The mean square error (MSE) is defined as: where n is the number of predictions (Montgomery & Runger, 2016).It is worth highlighting that the components of the SARIMA modeling will be analyzed and, if not significant, new models will be fitted, removing its components, i.e., the autoregressive, moving average, seasonal autoregressive and seasonal moving average components may be removed from the models, thus obtaining the models Autoregressive Moving Average (ARMA), Autoregressive Integrated Moving Average (ARIMA), among others.
To measure the quality of the fit of the models, the forecasted values were compared to the values observed in the period from January 1 to 5, 2014.This analysis considered the percentage relative error  It can be noted that the model ARMA (2, 0, 2) obtained lower MSE (0.696) than the GMDH modeling (MSE = 0.7006).Despite the lower value observed for the ARMA model, it is suggested that the results of the modeling by GMDH-type neural network were satisfactory because the mean square errors were similar.Dag & Yozgaligil (2016) found better indices in GMDH-type ANN when compared to the modeling methods ARIMA and Exponential smoothing to predict time series of cancer death rates in Pennsylvania from 1930 to 2000.Samsudin et al. (2009), also using cancer death data, from Pennsylvania, but analyzing with Minitab software, compared the results of the GMDH model, hybrid with genetic algorithms, with those of ARIMA and found favorable results for the GAGMDH-type neural networks.
In the study conducted by Samsudin et al. (2011), analysing time series of monthly fluxes of the Selangor and Bermam rivers in Malaysia, the authors observed better results of GMDH-type ANN compared to SARIMA modeling.Likewise, Shabri & Samsudin (2014) found favorable results for GMDHtype ANN in their study conducted in northern Canada, with lynx trapped per year.2. GMDH models with higher number of inputs and layers were more efficient, with lower mean square error.

Figure 1 .
Figure 1.Map of location of the automatic weather station (AWS) of Ariquemes (RO) represents ordinary difference; and, ∇ S d = (1 -B S ) D -represents seasonal difference.
forecasted value; and, x -value observed on the day.

Figure 2 .Figure 2
Figure 2. Observed and fitted values for the series of ETo through the Group Method of Data Handling (GMDH), with 4 inputs, 3 layers and sigmoid transfer function

Figure 3 .
Figure 3. Observed and fitted values in the model Autoregressive Moving Average (ARMA) (2,0,2) for the ETo time series in the municipality of Ariquemes (RO) in the period from January 1, 2011, to January 5, 2014

Figure 4 .
Figure 4. Fitted and observed values in the models Autoregressive Moving Average (ARMA) (2,0,2) and Group Method of Data Handling (GMDH) for the ETo time series in the municipality of Ariquemes (RO) in the period from December 1, 2013, to January 5, 2014

Conclusions 1 .
Modeling by GMDH-type Neural Networks led to results similar to the results of the SARIMA model, thus constituting an option for prediction of climatic time series.