Mathematical modeling of climatological data to estimate passion fruit crop yield ( Passiflora edulis

1Doctorate student in Engineering, Full-Time Research Professor, Universidad Libre, Bogotá, Colombia. E-mail: leylan.ramirezc@unilibre. edu.co 0000-0002-0651-0971) 2PhD in Geography, Director of the Environmental Engineering Program, Universidad Libre, Bogotá, Colombia. E-mail: ginap.gonzaleza@ unilibre.edu.co 0000-0001-9210-3653) 3PhD in Agroecology, Full-Time Research Professor, Universidad Pedagógica y Tecnológica de Colombia, Escuela de Administración de Empresas Agropecuarias, Duitama, Boyacá, Colombia. E-mail: jose.cleves@uptc.edu.co 0000-0001-97179753) Abstract Passion fruit crop yield depends on the behavior of climatic variables, and modeling the dependence relationship of these variables regarding crop yield offers information aimed at facilitating agribusiness decision making. As main aim, passion fruit crop yield was estimated using mathematical models. A multivariate and univariate statistical analysis of meteorological variables was carried out during the observation period between 2007 and 2014 of selected weather stations, identified and located in the Colombian middle tropics (County of Huila). The relationship between yield with the following agroclimatic variables were analyzed: temperature, sunlight, relative humidity, rainfall and ENSO at monthly resolution with empirical and mechanistic models, recommended in scientific literature. Results showed that the multiple regression model requires the highest yield peaks; the adjustment of the multiple regression model is low, while univariate models such as the ARIMA model showed better adjustment in the time series analyzed. The Stewart’s water-yield model has better performance to estimate yield as a function of evapotranspiration in the different phenological phases.


Introduction
Climate plays an important role in the development of crops, and the three most important climate components are light, temperature and rainfall (CAMPOS ARANDA., 2005). In addition, relative humidity and the El Niño-Niña Southern Oscillation (ENSO) climatic pattern measured from the ONI index are also considered, because these extreme phenomena have an impact on crop yields, causing threats to food security (IDEAM, 2013).
Additionally, it has been shown that the microclimate has an impact on plant growth and development, which are linked to climate behavior on a larger scale (macroclimate), and under these conditions, it is possible to generalize the results of studies to describe the behavior of the relationships between vegetation and climate (CAMPOS ARANDA., 2005).
In the case of the production of fruit species such as Passiflora edulis Sims L. f. Flavicarpa y purpurea, it depends on multiple agroclimatic variables, such as altitude, latitude, temperature, relative humidity, radiation, rainfall, wind speed and sunlight, among others, which are conditions that humans cannot control, directly affecting crop performance. Management decisions depend on the farmer's knowledge (cultural or learned) regarding the climate behavior and its impact on crop yield (MORLEY-BUNKER, 1999).
The scientific-technical background of production forecasts is based on the knowledge of the relationship between ecophysiological requirements of the species and the environmental supply, referring in this particular case to the supply and intensity of meteorological factors, with fundamental support of statistics (KANTANANTHA; STEWART, 2007), so that they accurately fulfill the prediction and are useful for other specialists (DELGADILLO-RUIZ et al., 2016;RUÍZ-RAMÍREZ;HERNÁNDEZ-RODRÍGUEZ;ZULETA RODRÍGUEZ, 2011;MARTÍNEZ VENTURA, 2006).
In this regard, forecasting models that estimate production volumes in semi-permanent fruit species such as passion fruit contribute to its understanding. The models most widely used are simulation models (SOTO G.; COTES T.; RODRÍGUEZ C., 2017), statisticalenvironmental models, statistical-biometric models and statistical models by sampling (KANTANANTHA; STEWART, 2007).
For this reason, quantitative techniques are useful tools to minimize economic losses, strengthening the availability and supply of the product in the market. In addition, farmers reduce the uncertainty in their decisions and economic consequences due to the uncertain behavior of the climate system, which has strong impacts in the phytosanitary management, quality, production volume and finally prices (PANNELL; MALCOLM; KINGWELL, 2000; MOSCHINI; HENNESSY, 2001).
Other fields of application using forecasts to estimate crop yields are food security, crop insurance policy indemnities, import and export plans, as they are relevant decisions for planning state subsidies. These decisions are considered of national interest, therefore of strategic nature (KANTANANTHA; STEWART, 2007). At producer level, yield forecast before harvest time generates information to plan crops and considers strategies to minimize eventual losses (KANTANANTHA; STEWART, 2007).
For the purpose of this research, empirical and mechanistic modeling approaches widely reported in scientific literature were analyzed to estimate yield from agroclimatic variables, and the advantages of their application were analyzed through the selected case (KANTANANTHA; SERBAN; GRIFFIN, 2010;KANTANANTHA;STEWART, 2007;KUSUMASTUTI;DONK;TEUNTER, 2016;MUSSHOFF;HIRSCHAUER, 2007). Forecasts aim to capture more closely the dynamics of each of cultural practices (transitory and permanent), taking into account the production cycle (MARTÍNEZ VENTURA, 2006).

Selection of agroclimatic stations
Weather stations were selected from the catalog of stations of the Institute of Hydrology, Meteorology and Environmental Studies (IDEAM, 2019), according to the following criteria: information completeness above 75%, greater number of series / year of climatic variables, which were main weather stations (CP) and finally the coverage of climatic stations taking as reference their minimum Euclidean distance to locations of observed production units.
The mathematical programming model is described below: Sets l : Producers J: Weather stations Parameters d ij : Euclidean distance from producer i to agroclimatic station j.
Decision variable x ij : Binary variable 1 if the weather station is assigned to producer i or 0 otherwise (1) Subject to: Rev. Bras. Frutic., Jaboticabal, 2021, v. 43, n. 3: (e-182) Where equation (1) is the objective function that minimizes the distance between two points, equation (2) allows assigning producers to the closest weather stations and equation (3) allows all producers to be assigned to one (3) weather station. This model was solved with the Branch and Bound method (LAWLER; WOOD, 1966) through the Open Solver software (OPENSOLVER.ORG, 2020). The selected stations are listed below (Table 1): Within the location of the most important passion fruit producers, the municipality of La Plata (Huila) stands out, located in the central mountain range at coordinates 2 o 23'00'' N and 75 o 56'00", average temperature of 23 o C, and altitude of 1180 meters above sea level. Crop production has different technification degrees, and weather stations with the shortest distance to passion fruit producers are 21105040 and 21035040, selected in this research, based on the mathematical programming model.

Weather data
Historical data (average / month) of climatic variables temperature, relative humidity, sunlight and rainfall were collected. ONI indexes from 2000 to 2004 and from 2007 to 2014 were obtained with monthly resolution, and the database construction obeyed the time periods with greater availability of information.
Crop yield data According to reports from the Ministry of Agriculture and Rural Development, yield data (ton * ha -1 ) / month of passion fruit cultivation of municipalities where weather stations were located were collected, and the yield behavior according to the phenological stage of the crop was considered.
The first step consisted in the creation of the database in a time series structure (DALININA, 2017;DERRYBERRY, 2014;R, 2019;METCALFE;COWPERTWAIT, 2009), evaluating stationarity, seasonality and decomposition of crop yield series.

Descriptive exploratory analysis
The methodology proposed by different authors was followed (BOX et al., 2015;METCALFE;COWPERTWAIT, 2009;CLAYTON et al., 2002;SHUMWAY;STOFFER, 2017), since their techniques involve aspects of trend and seasonality, among others. Analyses were carried out using the version 5.0.0 of the Python Programming language under the Jupyter interface (programming language for statistical processing and graph elaboration) and the Rstudio language version 1.2.1335 (available at https://jupyter.org, https: //www. rstudio.com/).

Description of empirical mathematical models
To explain the yield behavior of Passiflora edulis Sums L. f. Flavicarpa and purpurea, empirical mathematical models based on performance forecasts were used, which are a statistical procedure with certain disadvantages since it must have significant volume of data during several periods of time to generate the model reliability. These simple and direct methods have been developed with variables such as rainfall, temperature, solar radiation and sometimes nutrients to estimate yield (VANDENDRIESSCHE; VAN ITTERSUM, 1995). The techniques reported with this approach are: Autoregressive Integrated Moving Average (ARIMA), Multiple Regression, robust regression models and Neural Networks. Each of the techniques that were developed are described below.

Multiple Regression
A regression model in which the dependent or response variable "Y" is related to more than one explanation variable is called multiple linear regression model (DERRYBERRY, 2014) (Eq. 4).

(4)
Where disturbance e i is the error associated with measurement i of the X pi value, under the assumptions that e i ~ N (0,s 2 ). The analysis between two or more variables can be performed through equations as proposed with this statistical analysis method. Residues represent the variability that is not captured by the deterministic relationship represented by regression functions (HENGL, 2009; HENGL; HEUVELINK; ROSSITER, 2007).
Based on descriptive statistical analyses, the model of all climatic variables was proposed, applying the backward methodology of elimination of less significant variables proposed by Montgomery, Peck and Vining, (2002).

ARIMA model
The ARIMA model has been widely used in all scientific-technical fields to identify temporal stochastic processes and estimate models for their prediction and control (Box and Jenkins, 1976). In the ARIMA model (p, d, q), p is defined as the order of the autoregressive process, d is the number of differences that are necessary for a process to be stationary and q is the order of the moving average process, which can be represented as follows (Eq. 5).
logistic regression since between the input layer and the output layer, there may be one or more non-linear layers, called hidden layers (SCIKIT-LEARN.ORG, 2019).
The input layer consists of a set of neurons that represent each input entity. Each neuron in the hidden layer transforms values from the previous layer into a weighted linear sum, followed by a nonlinear function, such as the hyperbolic tangent function. The output layer receives values from the last hidden layer and converts them into output values (SCIKIT-LEARN.ORG, 2019). The regression class used with the multilayer perceptron trains with backward propagation without activation function, the square error uses it as a loss function, and the output is a set of continuous values. The mathematical formulation is given by a set of training data ( parameters of the model W 1 , W 2 are the weights of input and hidden layers, respectively, and represent the trend added to the hidden layer and the output layer (SCIKIT-LEARN.ORG, 2019).

Description of mechanistic mathematical models
Mechanistic mathematical models assume that the crop growth system has known structure and can be mathematically described. The aims of these models allow integrating knowledge and testing hypotheses; however, the models have high predictive level since there are easily measurable variables (VANDENDRIESSCHE; VAN ITTERSUM, 1995). Within this category, recommended models are the water-yield models (ALLEN et al., 1998). The main advantage of the latter is the access to knowledge for non-experts, non-researchers and farmers.
Mechanistic mathematical models describe crop physiology but still require further development to improve prediction accuracy (VANDENDRIESSCHE; VAN ITTERSUM, 1995). The formal description of estimation models considered in this research will be given below.

Water-yield models
The water-yield response is a function defined by FAO (FAO, 2012), which model addresses this relationship through a simple Eq. 6 expressed as: Where Ya and Y x are actual and maximum yields, ET a and ET x are actual and maximum evapotranspiration and K y is the yield response factor that represents the decrease in evapotranspiration over yield losses.

(5)
Where d is used to convert the original series into stationary, parameters j 1 ...j p ...are the autoregressive part, q 1 ...q q ,…, belong to the moving averages, the constant term and the stochastic disturbance. This methodology is based on the use of one-variable data to identify the characteristics of its underlying probabilistic structure, in contrast to traditional procedures used to identify models based on an explanatory theory of the phenomenon under study (DALININA, 2017).

Robust regression models
The TheilSen Regressor estimator uses the generalization of the median in multiple dimensions, being therefore anon-parametric statistical model robust to multivariate atypical values. This estimator is impartial for the real slope in simple linear regression (KUMAR SEN, 1968). For many distributions of the response error, the estimator has high asymptotic efficiency in relation to the least squares estimate (KUMAR SEN, 1968).
Gaussian process regression implements Gaussian processes for regression purposes, which is possible through optimization and random algorithms to improve he adjustment (KUMAR SEN, 1968).

Multilayer Perceptron (MLP)
The multilayer perceptron (MLP) trains iteratively since each time step is calculated through partial derivatives of the loss function with respect to parameters of the model to update it (KINGMA; BA, 2014;HE et al., 2015).
MLP is a supervised learning algorithm that learns, with the f (.): R m → R o function, training a data set, where m is the number of input dimensions and o is the number of output dimensions. Given a set of X = x 1 , x 2 ,...,x m characteristics and a y goal, one can learn through a non-linear regression approximation. It differs from Other yield-water models are described as production functions that depend on the crop water requirements. The best known and most accepted are the Jensen and Stewart's model. Models are used by simulating the relationships of yield with the evapotranspiration deficit of various crop growth stages.
The Jensen's model was developed by Jensen (1968) (JENSEN, 1968), and is given by the following Eq. 7: To determine maximum yield, the study on the effect of irrigation and fertilization on passion fruit yield and quality was considered (DORADO G.; TAFUR H.; RIOS R., 2013). From the treatment of maximum irrigation and its response to yield without fertilization, the following polynomial regression model is obtained, which interprets this behavior, with adjusted R of 0.96 (Fig. 1).
From these studies, the following sensitivity indexes proposed by Stewart and Jensen are obtained for each of the stages and proposed models for each phenological stage analyzed for the passion fruit crop and the climatic conditions of the station identified with code 21055020.

Evaluation of the fit of models
To evaluate the goodness-of-fit of models, regression (b) and determination coefficients (R 2 ) were analyzed. In addition, estimation errors and modeling quality were also calculated through the Mean Absolute Deviation calculated as Eq. 10: Where Y a is the yield of irrigation deficit treatments; Y ck is the crop yield with full irrigation treatments; ET ai is the current crop evapotranspiration of growth stage of irrigation deficit treatment; ET cki is the maximum evapotranspiration in the growth stage i of total irrigation; λ t is the Jensen index of crop yield sensitivity under water deficit; i is the growth stage; n is the number of stages.
The above Eq. 8 can be solved by obtaining The Stewart's model developed by Stewart (1976) Where K yi is the water deficit sensitivity index of the crop yield, the other parameters have been previously defined.

Determination of the water deficit sensitivity indexes of models
Given the complexity and availability of information to establish the effects of irrigation in relation to yield response, regression models are used to estimate K yi and λ t constants and the Stewart and Jensen's models, respectively. For the evaluation of models, the weather station with the greatest amount of information and measured variables was used, which in this case was station 21055020. To calculate evapotranspiration, the Penman Monteith equation recommended by FAO (FAO, 2012) and accepted by the international scientific community in various research studies for its adjustment with respect to observed values was used. Evapotranspiration of the passion fruit crop was calculated from crop coefficients (Kc) for each phenological stage reported by Torrente (2009) Where S i is the relative value obtained from the yield model and M i is the observed crop yield value.
Regarding ARIMA, Robust Regression and multilayer perceptron models, it is important to note that for the selection of the best model to predict the series under study, the Akaike information criteria (AIC) (CRYER; CHAN, 2008) was used, for which the lower the values of these measures, the better the model is in terms of relative quality in relation to the loss of information of the statistical model estimated for the crop yield series.

Dispersion statistics for climatic and yield variables
The central tendency measures suggest normal distribution behavior (Table 2).  Among agroclimatic variables, rainfall has high variability, since its records range from 6.6 mm / month to 364.7 mm / month, a similar case was found for sunlight, which ranges from 54.2 hours / month to 218.9 hours / month, on average, and relative humidity does not exceed 94%. The analysis of atypical data under the methodology of interquartile ranges as a measure of statistical dispersion allows locating the extreme values of variables analyzed in the case of station 21035040, variable Relative Humidity had extreme values that correspond to 87%, 90, 5% and 94%, which occurred in the period from June to August 2008 respectively, based on historical information reported by IDEAM. The anomaly of the La Niña phenomenon was reported in the Andean region with strong intensity in periods between June 2007 and February 2008, and according to the source consulted, the occurrence of this phenomenon generated excessive rains in the middle and southern part of the Andean region, which significantly affected air temperature and rainfall (IDEAM, 2014). Another extreme value detected was rainfall of 260.5 mm / month in November 2013.
In station 21105040, relative humidity had atypical value of 54% in September 2009; in this period, El Niño phenomenon with weak intensity was reported by IDEAM, where there was a rainfall deficit condition (IDEAM, 2014), and according to observed data, rainfall in that period corresponded to 19.5 mm / month.

Passion fruit crop yield
In the seven years considered in station 21035040 near passion fruit producers, the average passion fruit crop yield was 1.38 ton * ha -1 month, 75% of this variable reached maximum value of 1.59 ton * ha -1 month, with maximum total value of 2.23 ton * ha -1 month. Cyclical pattern of annual variation was observed, with higher production volume in the months of February and March and lower production volume in the months of July and August (Fig.2). In the cross-display diagrams for climatic variables, marked variations were observed in relation to crop yield. Less marked annual cyclical variations were detected for temperature, and some atypical values in some months for variables sunlight and rainfall.
On the other hand, the bivariate correlation matrix for station 21035040 shows less strong and positive association between yield and temperature (0.3426), as well as between ENSO and temperature (0.389) (Fig. 3).

Figure 3. Cross correlation matrix between variables. Station 21035040
In station 21105040, the strongest linear association was found between variables rainfall and relative humidity (0.636), and strong negative association between relative humidity and temperature (-0.80), the other variables present less strong associations (0.36). The linear relationship is positive and less strong between yield and temperature for station 21035040 (0.34), while for station 21105040, yield has association with relative humidity (0.3) and rainfall (0.36) (Fig. 04).

Mathematical models to predict low yield with empirical approach
Multiple linear regression model In this methodology, five models were tested to find the agroclimatic variables that best explained passion fruit crop yield. As previously explained, the statistical analysis did not show significant linear association between variables. In this sense, it was decided to analyze multiple regression models to explain yield for the selected stations (Table 3).  In the first multivariate regression model, variable temperature, p-value significantly lower than 0.05 was observed (0.001); additionally, the probability value of the F statistic indicates that the model is significant at 5 %, with p-value (0.007) lower than 0.05. Low yield variability explained through the model was also observed, evidenced by R 2 (0.141) and adjusted R 2 (0.103) (CRYER; CHAN, 2008). In the case of the multivariate regression model of station 21105040, it was observed that p-value lower than 0.05 (0.002) corresponds to relative humidity, (0.001), temperature and low adjustment value with R 2 (0.327) and adjusted R 2 (0.266).
Although the adjustment of models is low, the possibility of estimating descriptive models is suggested; for practical purposes of the research, it was decided to continue with the analysis of residuals as validation of assumptions that give statistical support to estimates and predictions of the model (Fig. 5).
In Figure 5, the normal probability graph for each of stations 21035040 and 21105040, a line is observed in the studentized residuals, demonstrating the non-normality of errors.

Figure 5. Analysis of residuals
The Shapiro Wilks test is applied in this first analysis, as a normality test and to test the hypotheses Ho: Residuals have normal behavior, Ha: Residuals do not present normality for a = 0.005 with confidence level of 95 %, the p-value of the test for calculated residuals of station 21035040 is 0.003632, which is lower than a, confirming the rejection of the null hypothesis and the non-normality of residuals.
According to the above, to the multiple regression model for station 21035040, the significant variable that best explains yield is temperature. To improve the estimation of the model, parameters b are once again determined, where the independent variable is temperature and the dependent variable is yield. The expression obtained was:  To verify the good specification, the Shapiro Wilks test is applied, and the p-value result of 0.007516 once again confirms the violation of the normality assumption of residuals.
To determine the good specification of the model, ANOVA was applied and F test was implemented, the hypotheses to be tested are Ho: Poor model specification and Ha: Good model specification. In this case, p-value 0.0006318 < a = 0.005 was obtained, which indicates rejecting the Ho hypothesis, which suggests that the model is biased and inconsistent. Under these criteria, it is necessary to apply data transformation to improve normality assumptions and model specification.
The suggested method for the transformation is Box-Cox (KUTNER et al., 2005), as it allows correcting biases in the distribution of errors and unequal variances, and to carry out the transformation, it is necessary to select a range of λ values, searching for the transformation to approximate data. When estimating the probability function for lambda, it is obtained through the Box-Cox transformation function, implemented in R.
In this case, the maximum probability function of lambda of 0.3838 suggests the transformation of natural logarithms for data. When applying the transformation and estimating the Shapiro Wilks normality test, p-value 0.5354 was obtained, with reliability of 95%, and the hypothesis of normality of residuals is accepted and the homoscedasticity assumption is verified.
To confirm the homoscedasticity assumption, the Breusch Pagan test was applied to confirm the equality of variances. P-value of 0.228, which accepts the null hypothesis, was obtained. Finally, low correlations were found between the explanatory variables of proposed models, which is why the assumption of non-multicollinearity between regression variables was validated.
For the independence assumption, the Durbin Watson test was applied to test the hypothesis of autocorrelation equal to zero and for the alternative nonzero, the p-value is 6.306e-13, which determines that the alternative hypothesis was accepted. This result is due to the nature of data, where serial autocorrelation is shown.
The adjustment of models is low, which reveals that climatic variables alone do not fully explain yield (ton * ha -1 month), and other variables such as agrological soil class, fertility level, soil management, management and availability of water resources, etc., should also be considered. Additionally, each of the tests explained above were applied to each of the multiple regression models to verify assumptions, finding results and conclusions similar to those obtained with station 21035040. Figure 6 shows yield associated with climatic variables for stations 21035040, 21105040, stationarity, seasonality and decomposition of series of the yield variable (ton * ha -1 month) in the analyzed time horizon. A pattern with annual periodicity was observed, with higher peaks in the first quarter of the year and lower peaks in the third quarter of the year with slight trend from 2011, which continues until 2014, where maximum yield of the series exceeds 2.23 ton * ha -1 month of passion fruit. The series was then decomposed into its seasonal, trend and residual parts.

ARIMA model
The passion fruit yield series is stationary, a result corroborated by the Dickey Fuller test, with which the hypothesis is rejected at 5% significance (Table 04). Table 4. Dicey Fuller statistic to test the seasonality of series.

Dickey-Fuller Statistic = -6.463
Lag order = 4 P-Value = 0.01 Using the auto.arima function of the forecast library, the best ARIMA model for the analyzed performance variable is (Table 05):

AIC=-341.78
The graph of residuals of the second model (Fig. 7) correct distortions of the original model, and the AIC is lower with respect to ARIMA (0, 1.0), which is why this model was selected as the best model. When obtaining the residual graphs of the passion fruit yield autoregressive model ARIMA (0,1,0) and ARIMA (12,1,0), there is a clear pattern present in ACF / PACF and the residual graphs of the model repeats at lag 12. This suggests that the models may be better with a different specification, such as p = 12 or q = 12.  Robust regression models Robust regression models were defined considering yield as a function of explanatory variable temperature.
Using the Anaconda jupyter package, the following results were obtained (Table 6). From these results, it was observed that the adjustment of models is minimal, which shows that these techniques are not adequate for data under analysis.

Multilayer Perceptron (MLP)
Given the low adjustment of the multiple regression and the results of the correlation matrix between variables, it was suggested that these relationships may be nonlinear. This is why the multilayer perceptron is considered a nonlinear multiple regression method. Input neurons are defined through training data considered for the different agroclimatic stations analyzed, which variables are relative humidity, temperature, sunlight, rainfall and the ENSO index.
Using the sklearn.neural_network import MLPRegressor (SCIKIT-LEARN.ORG, 2019) python function, MLP with 3 hidden layers was estimated with neurons equal to input variables where the activation function is f (x) = max (0,x). The solution method is an optimizer of the quasi Newton family of methods, and the number of iterations for the MLP estimation is 1500. The yield results obtained for each of the selected stations are shown in Table 7. According to results, multilayer perceptron models have average mean absolute deviation of 0.36, which indicates that the accuracy and adjustment are very low given the deviation of the forecast with respect to yield observed in each of the selected stations.

Mathematical Models to Predict Low Yield with Mechanistic Approach
K yi and λ t estimated for the Stewart and Jensen models are observed in the following figure in the preflowering stage, using K c (DE LIMA CORREA, 2004;TORRENTE, 2009) reported in literature for passion fruit crop (Fig 8). It was observed that the sensitivity indexes associated with the effect of irrigation on yield are different for each of the years throughout the analysis period, according to K c result obtained by De Lima, it shows that K yi <1, which means that the crop is more tolerant to water deficit and partially recovers from stress, showing proportionally smaller yield reductions due to the decrease in water use (FAO, 2012). In general, for passion fruit cultivation, the behavior is associated with K yi <1 for K c proposed by De Lima and Torrente.
For the flowering stage, Figure 9 shows the calculated values of sensitivity indexes of water-yield models.
Regarding results obtained considering K c proposed by De Lima, the sensitivity index K yi is lower than 1, while in the case of K c proposed by Torrente, K yi is greater than 1, only for the year 2007. In general, as in the pre-flowering stage, passion fruit crop is tolerant to water deficit during the flowering stage. Finally, during the harvest stage (Fig. 10), K yi > 1, which indicates that they K c interpret the crop yield conditions as a function of the water needs, confirming that the crop is tolerant to water stress during all phenological stages. Sensitivity index constants for the phenological phases of passion fruit crop calculated for the Stewart and Jensen equations under the mechanistic approach are described below (Table 8):

Comparison of empirical and mechanistic models
Empirical models such as multiple linear regression describe yield through explanatory variables, and ARIMA through the history of autoregressive components of series, moving averages and trend smoothing. According to results and due to the low adjustment of regression forecast models, their use for descriptive purposes is recommended. In the same comparative analysis, it could be observed that in the initial years, the model tends to underestimate peaks of higher yield, since it is not able to take into account temporal variations, which is one of the advantages of the ARIMA model over multiple linear regression (Fig. 11).
Regarding mechanistic models, the best yield was obtained for the Stewart  The climatic conditions affect the crop ecophysiology, and by understanding and managing these variables, the environmental offer in association with the cultural management of farmers can be optimized (FISCHER; MELGAREJO, 2017).
In tropical conditions, the incidence of temperature more significantly explains crop yield in accordance with other investigations (CLEVES et al., 2016;MAYORGA et al., 2020). It was evidenced that crop evapotranspiration significantly influences aspects such as crop quality and yield (FISCHER; CUTLER, 2018).
This article contributes to technical scientific knowledge based on empirical models constructed from data, initially considering multivariate models, to determine the relationship between independent variables temperature, relative humidity, rainfall, sunlight, the ONI index and dependent variable yield. According to literature, new analysis variables are incorporated such as the ONI index, sunlight and relative humidity. The document makes significant contributions to the discrimination of the different methodologies reported in literature for yield forecasting, such as mechanistic and empirical mathematical models. Among mechanist models, it was possible to determine the water sensitivity index and the recommended crop coefficient for passion fruit and regarding empirical ones, machine learning models such as robust regression and multilayer perceptron were tested, and this knowledge allows establishing the best methodology to predict crop yield.
In future studies, it is suggested to estimate specific models and develop all the necessary error and coefficient tests for each variable in order to establish that in these models, there is greater adjustment coefficients to improve their stability and provide a valid basis on which it can be predicted, so that the results of forecasts are considered acceptable and close to future reality.

Conclusions
Some alternatives to be considered in the future are the analysis of ARIMA models with the inclusion of explanatory variables, the use of Multiple Linear Regression models with lagged data and the modeling of Multiple Series, including in this case other series corresponding to some variables (evaporation, transpiration, evapotranspiration etc.).
Missing data of the main climatological variables limit the application of models, and the disarticulation among public entities that administer data is evident, which affects the quality of results.
The use of methodologies and models that evaluate the effect of agroclimatic factors on yield are a fundamental tool to be used in decision making by farmers, contributing to reduce the likelihood of uncertainties.
Inferential statistics for the construction of models allow the deduction and estimation of the phenomena from available information. The modeling of data patterns allows the researcher to draw inferences and confirms hypotheses about the characteristics obtained from future observations. The hypotheses established in this article such as the relationship between yield and climatic variables could not be confirmed through the analyzed information, given that correlations that coexist between variables can be nonlinear in nature and the methods applied to find this correlation are obviously based on linear relationships.
Univariate models such as ARIMA better explain yield behavior since MAD of 0.007 was obtained, while the water-yield model has better performance and explains the yield behavior as a function of evapotranspiration related to temperature, rainfall and wind speed. The absolute deviation of errors was 0.309 using Kc proposed by De Lima. The multivariate regression model is a tool that allows describing the behavior of the climatic variable system, and its predictive use is not recommended due to its low performance. The multilayer perceptron had performance measured through MAD of 0.367.