Acessibilidade / Reportar erro

Computational intelligence to study the importance of characteristics in flood-irrigated rice

ABSTRACT.

The study of traits in crops enables breeders to guide strategies for selecting and accelerating the progress of genetic breeding. Although the simultaneous evaluation of characteristics in the plant breeding programme provides large quantities of information, identifying which phenotypic characteristic is the most important is a challenge facing breeders. Thus, this work aims to quantify the best approaches for prediction and establish a network of better predictive power in flood-irrigated rice via methodologies based on regression, artificial intelligence, and machine learning. Multiple regression, computational intelligence, and machine learning were used to predict the importance of the characteristics. Computational intelligence and machine learning were notable for their ability to extract nonlinear information from model inputs. Predicting the relative contribution of auxiliary characteristics in rice through computational intelligence and machine learning proved to be efficient in determining the relative importance of variables in flood-irrigated rice. The characteristics indicated to assist in decision making are flowering, number of grains filled by panicles and length of panicles for this study. The network with only one hidden layer with 15 neurons was observed to be efficient in determining the relative importance of variables in flooded rice.

Keywords:
Oryza sativa L.; multiple regression; computational intelligence; machine learning.

Introduction

Plant breeding is effective in increasing the productivity of crops. The primary objective of plant breeding is to increase the frequency of good alleles in plant populations such that superior crops are developed with high productivity, resistance to diseases and pests, tolerance to abiotic stresses, and superior adaptation to environments (Yu, Campbell, Zhang, Walia, & Morota, 2019Yu, H., Campbell, M. T., Zhang, Q., Walia, H., & Morota, G. (2019). Genomic Bayesian confirmatory factor analysis and Bayesian network to characterize a wide spectrum of rice phenotypes. G3: Genes, Genomes, Genetics, 9(6), 1975-1986. DOI: https://doi.org/10.1534/g3.119.400154
https://doi.org/https://doi.org/10.1534/...
).

In general, productivity prediction is performed using multiple linear regression. Although interesting, multiple regression models have some limitations, such as the size of the sample data. Specifically, when the observation number is less than the number of parameters, it is not possible to obtain the estimates using the usual estimation methods. Additionally, such models do not allow the adjustment of complex nonlinear relationships possibly existing in some data sets. Artificial neural networks (ANNs) provide an interesting alternative because they can capture nonlinear relationships between predictors and responses (Gianola, Okut, Weigel, & Rosa, 2011Gianola, D., Okut, H., Weigel, K. A., & Rosa, G. J. M. (2011). Predicting complex quantitative traits with Bayesian neural networks: a case study with Jersey cows and wheat. BMC Genetics, 12(87), 1-14. DOI: https://doi.org/10.1186/1471-2156-12-87
https://doi.org/https://doi.org/10.1186/...
; Skawsang, Nagai, Nitin, & Soni, 2019Skawsang, S., Nagai, M., Nitin, K., & Soni, P. (2019). Predicting rice pest population occurrence with satellite-derived crop phenology, ground meteorological observation, and machine learning: A case study for the central plain of Thailand. Applied Sciences, 9(22), 1-19. DOI: https://doi.org/10.3390/app9224846
https://doi.org/https://doi.org/10.3390/...
) and ignore assumptions in the data sets.

The application of artificial intelligence, such as ANN, allows the capture of nonlinear effects among the data set and has been used in studies of prediction in plant breeding (Silva et al., 2014Silva, G. N., Tomaz, R. S., Sant’anna, I. C., Nascimento, M., Bhering, L. L., & Cruz, C. D. (2014). Neural networks for predicting breeding values and genetic gains. Scientia Agricola, 71(6), 494-498. DOI: http://dx.doi.org/10.1590/0103-9016-2014-0057
https://doi.org/http://dx.doi.org/10.159...
; Silva et al., 2017Silva, G. N., Nascimento, M., Sant’Anna, I. C., Cruz, C. D., Caixeta, E. T., Carneiro, P. C. S., ... Oliveira, M. S. (2017). Artificial neural networks compared with Bayesian generalized linear regression for leaf rust resistance prediction in Arabica coffee. Pesquisa Agropecuária Brasileira, 52(3), 186-193. DOI: http://dx.doi.org/10.1590/s0100-204x2017000300009
https://doi.org/http://dx.doi.org/10.159...
; Sant'anna et al., 2019Sant’Anna, I. C., Ferreira, R. A. D. C., Nascimento, M., Carneiro, V. Q., Silva, G. N., Cruz, C. D., ... Chagas, F. E. O. (2019). Multigenerational prediction of genetic values using genome-enabled prediction. PLoS ONE, 14(1), 1-14. DOI: https://doi.org/10.1371/journal.pone.0210531
https://doi.org/https://doi.org/10.1371/...
). However, although ANNs are powerful predictive tools compared to conventional models, such as multiple linear regression (Paruelo & Tomasel, 1997Paruelo, J. M., & Tomasel, F. (1997). Prediction of functional characteristics of ecosystems: a comparison of artificial neural networks and regression models. Ecological Modelling, 98(2-3), 173-186. DOI: https://doi.org/10.1016/s0304-3800(96)01913-8
https://doi.org/https://doi.org/10.1016/...
; Olden & Jackson, 2002Olden, J. D., & Jackson, D. A. (2002). Illuminating the “black box”: a randomization approach for understanding variable contributions in artifical neural networks. Ecological Modelling, 154(1-2), 135-150. DOI: https://doi.org/10.1016/s0304-3800(02)00064-9; Beck, 2018Beck, M. W. (2018). NeuralNetTools: Visualization and analysis tools for neural networks. Journal of Statistical, 85(11), 1-20. DOI: http://dx.doi.org/10.18637 / jss.v085.i11
https://doi.org/http://dx.doi.org/10.186...
), they have the limitation of neglecting to quantify the importance of the variables.

Quantifying the importance of variables for prediction in breeding programmes allows for faster progress, selecting and predicting characteristics that have low heritability and/or measurement difficulty. Although simultaneous evaluation of characteristics provides a wide variety of information, identifying which predictor variable is most important is a challenge for breeders (Parmley, Higgins, Ganapathysubramanian, Sarkar, & Singh, 2019Parmley, K. A., Higgins, R. H., Ganapathysubramanian, B., Sarkar, S., & Singh, A. K. (2019). Machine learning approach for prescriptive plant breeding. Scientific Reports, 9(1), 1-12. DOI: https://doi.org/10.1038/s41598-019-53451-4
https://doi.org/https://doi.org/10.1038/...
). The quantification of the importance of variables can be performed by ANNs through algorithms such as Goh (1995Goh, A. T. C. (1995). Back-propagation neural networks for modeling complex systems. Artificial Intelligence in Engineering, 9(3),143-151. DOI: https://doi.org/10.1016/0954-1810(94)00011-S
https://doi.org/https://doi.org/10.1016/...
), who proposed a modification in Garson's (1991Garson, G. D. (1991). Interpreting neural network connection weights. Artificial Intelligence Expert, 6, 46-51.) algorithm that consists of partitioning the neural network connection weights to determine the relative importance of each variable entering the network.

Other interesting alternatives for studies of the prediction and importance of variables are methodologies based on machine learning, such as decision trees (Beucher, Møller, & Greve, 2019Beucher, A., Møller, A. B., & Greve, M. H. (2019). Artificial neural networks and decision tree classification for predicting soil drainage classes in Denmark. Geoderma, 352, 351-359. DOI: https://doi.org/10.1016/j.geoderma.2017.11.004
https://doi.org/https://doi.org/10.1016/...
; Parmley et al., 2019Parmley, K. A., Higgins, R. H., Ganapathysubramanian, B., Sarkar, S., & Singh, A. K. (2019). Machine learning approach for prescriptive plant breeding. Scientific Reports, 9(1), 1-12. DOI: https://doi.org/10.1038/s41598-019-53451-4
https://doi.org/https://doi.org/10.1038/...
) and their refinements, such as bagging (Degenhardt, Seifert, & Szymczak, 2019Degenhardt, F., Seifert, S., & Szymczak, S. (2019). Evaluation of variable selection methods for random forests and omics data sets. Briefings in Bioinformatics, 20(2), 492-503. DOI: https://doi.org/10.1093/bib/bbx124
https://doi.org/https://doi.org/10.1093/...
), random forest, and boosting (Degenhardt et al., 2019Degenhardt, F., Seifert, S., & Szymczak, S. (2019). Evaluation of variable selection methods for random forests and omics data sets. Briefings in Bioinformatics, 20(2), 492-503. DOI: https://doi.org/10.1093/bib/bbx124
https://doi.org/https://doi.org/10.1093/...
). Such methodologies allow good predictions and the importance of the characteristics to be obtained through measures based, for example, in the index of Gini and Entropy (Hastie, Tibshirani, & Friedman, 2009Hastie, T., Tibshirani, R., & Friedman, J. (2009). The elements of statiscal learning data mining, inference, and prediction (2nd ed.). New York, NY: Springer.). These methodologies enable the quantification of the impact of the disruption or disturbance of the input information on the estimate of the determination coefficient.

Methodologies based on regression, artificial intelligence, and machine learning have been used successfully in a prediction study. Parmley et al. (2019Parmley, K. A., Higgins, R. H., Ganapathysubramanian, B., Sarkar, S., & Singh, A. K. (2019). Machine learning approach for prescriptive plant breeding. Scientific Reports, 9(1), 1-12. DOI: https://doi.org/10.1038/s41598-019-53451-4
https://doi.org/https://doi.org/10.1038/...
) evaluated the phenotypic characteristics of high dimensionality soybeans through a machine learning approach to predict seed yield regarding the prescriptive development of cultivars for agricultural practices. Skawsang et al. (2019Skawsang, S., Nagai, M., Nitin, K., & Soni, P. (2019). Predicting rice pest population occurrence with satellite-derived crop phenology, ground meteorological observation, and machine learning: A case study for the central plain of Thailand. Applied Sciences, 9(22), 1-19. DOI: https://doi.org/10.3390/app9224846
https://doi.org/https://doi.org/10.3390/...
) applied such methodologies to predict the population of insect pests using climatic and phenological factors of the host plant. However, there are no studies in the literature related to yield prediction and verification of the importance of variables for grain yield in rice culture. Unlike the methods of regression, artificial intelligence and machine learning do not make any prior assumptions about the data structure, in which it captures linear and nonlinear dependencies between the predictor and the response variables, making it a suitable tool for the researcher.

Given the above, this work aims to i) predict grain yield, grain length and width ratio, and panicle length in flood-irrigated rice through regression, artificial intelligence, and machine learning methodologies; ii) quantify the best approaches to prediction; and iii) establish a network of better predictive power in flood-irrigated rice.

Material and methods

Description of the experiment

The experiments were carried out in the State of Minas Gerais, Brazil, in the experimental fields of the Agricultural Research Corporation of Minas Gerais (EPAMIG), in the city of Leopoldina (21°31'48.01'' S, 42°38'24'' W), Lambari (21°58'11.24'' S, 45°20'59.6'' W), and Janaúba (15°48'77'' S, 43°17'59.09'' W). Seventy-five genotypes of flood-irrigated rice belonging to the flood-irrigated rice breeding programme were evaluated in the agricultural year 2012/2013. The design was randomized blocks with three replications.

The evaluated characteristics were grain yield (GY, kg ha-1), panicle length (LP, cm), and grain length-to-width ratio (LGW), which were used as response variables and the others as explanatory variables (inputs), that is, plant height (HP, cm), flowering (FL, days), lodging (LO), number of full grains per panicle (GP), percentage of full grains (FG, %), tillering (TI), length (GL, mm), width (GW, mm) and thickness (GT, mm) of grains, and weight of 100 grains (WG, g). They were used to compose artificial neural networks of genotypes of flood-irrigated rice in the State of Minas Gerais.

Methodologies for predicting and verifying the importance of characteristics

Multiple regression

Multiple regression, through the stepwise strategy (Ghani & Ahmad, 2010Ghani, I. M. M., & Ahmad, S. (2010). Stepwise multiple regression method to forecast fish landing. Procedia - Social and Behavioral Sciences, 8, 549-554. DOI: https://doi.org/10.1016/j.sbspro.2010.12.076
https://doi.org/https://doi.org/10.1016/...
), was used to predict the variable responses to grain yield, panicle length, and grain length-to-width ratio as a function of the other measured variables and was considered to be explanatory.

The adopted model is represented by Equation 1:

y= β0+ β1x1+ β2x2++βkxk+ ε, (1)

where y is the response variable (grain yield, panicle length or grain length-to-width ratio), x 1 a x k are the explanatory variables, β0 represents the intercept, β1 e βk are the linear coefficients associated with x 1 a x k, and ε residual effect.

The estimate of the coefficient of determination R2 was used to verify how much of the independent variable is explained by the total variation of the dependent variable.

The description of R2 is found in Equation 2:

R2=1- i=1n(yi-y^i )2i=1n(yi-y-i) 2, (2)

where y is the observed values, and ŷ is the predicted.

Artificial intelligence

For better network efficiency, before training and validation, the data were normalized in the range between -1 and 1. The training data set, in each location, was established by 2/3 of the phenotypic information, using the strategy of aggregating information from two of the three repetitions for training and the information from the other repetition used as a validation set. In this cross-validation strategy, individuals from each repetition participated at least once in the validation data set in cross-validation (k-fold) k = 3 partitions.

Perceptron Multilayer - PMC

The maximum number of training seasons was set at 5,000; the mean square error (MSE), as a criterion to stop processing the network, was defined as 1.0 × 10−3. All trained networks had a neuron in the output layer and a single hidden layer, with 15 neurons. The sigmoid tangent activation function was used in the hidden layer, and the training algorithm was Bayesian regulation backpropagation. To quantify the efficiency of the prediction R2*.

Importance of variables

To quantify the importance of variables through the PMC network, two techniques were used. The first is based on the Garson (1991Garson, G. D. (1991). Interpreting neural network connection weights. Artificial Intelligence Expert, 6, 46-51.) algorithm modified by Goh (1995Goh, A. T. C. (1995). Back-propagation neural networks for modeling complex systems. Artificial Intelligence in Engineering, 9(3),143-151. DOI: https://doi.org/10.1016/0954-1810(94)00011-S
https://doi.org/https://doi.org/10.1016/...
) (AG), which consists of partitioning the neural network connection weights to determine the relative importance of each input variable within the network. This algorithm describes the relative magnitude of the importance of the descriptors (predictor) in their connection with outcome variables through the dissection of synaptic weights from the neural network. In the second technique, the importance of variables (inputs) is assessed through the impact of the disruption or disturbance of the information of a given input on the estimation of the determination coefficient. Thus, this importance is estimated by exchanging information or by making constant the phenotypic values shown for each variable and verifying changes in the estimates of the R2. When we disturb the values of a variable and R2 decreases, there is an indication that the input variable is important about the others for purposes of prediction with the network already established.

Radial Base Function network - RBF

The radial base function network is characterized by having only one hidden layer and making use of the Gaussian activation function (Cruz & Nascimento, 2018Cruz, C. D., & Nascimento, M. (2018). Inteligência computacional aplicada ao melhoramento genético. Viçosa, MG: Editora UFV. ). The structure of the RBF to better predict grain yield, panicle length, and grain length-to-width ratio was established with 10 to 30 neurons (increased by 2, with each processing), and the radius established between 5 and 15 increased by 0.5. The efficiency of the prediction was measured by the R2, and the relative importance of each entry was measured by the technique of destroying the information of each explanatory variable, as already described for PMC.

Machine learning

To predict grain yield, panicle length, and grain length-to-width ratio and quantify the importance of variables through a machine learning approach, a decision tree and its refinements were used, random forest, bagging, and boosting. The R2 measured the quality of the predictive model fit, and information from the minimum quadratic error (MSE) was used to quantify the importance of variables in flood-irrigated rice crops. The minimum square error was estimated as described in Equation 3 below:

MSE= 1ni=1n(yi- y^i)2, (3)

where y i and ŷ i correspond to the observed and predicted values of observation in genotype i, respectively, and n is the total number of observations (variable, depending on the environment analysed).

In these techniques, the importance of the explanatory variable is the quantification of the mean decrease in the prediction precision, which consists of the estimate of the percentage of increment of minimum square error (IMSE), which is constructed when we exchange the values of each variable of the data set and are compared with the prediction of the original unchanged data set for the variable. Analogous to the regression analysis, it is the average increase of the squares of the residuals of the data set when the variable is exchanged (Li & Zhan, 2019Li, L., & Zha, Y. (2019). Estimating monthly average temperature by remote sensing in China. Advances in Space Research, 63(8), 2345-2357. DOI: https://doi.org/10.1016/j.asr.2018.12.039
https://doi.org/https://doi.org/10.1016/...
). Higher values of IMSE represent the importance of the highest variable. For better efficiency of the prediction estimate of the importance of variables, 5,000 trees were generated.

The analyses were performed with the aid of R software using the NeuralNetTools (Beck, 2018Beck, M. W. (2018). NeuralNetTools: Visualization and analysis tools for neural networks. Journal of Statistical, 85(11), 1-20. DOI: http://dx.doi.org/10.18637 / jss.v085.i11
https://doi.org/http://dx.doi.org/10.186...
) and Genes (Cruz, 2016Cruz, C. D. (2016). Genes Software - extended and integrated with the R, Matlab and Selegen. Acta Scientiarum. Agronomy, 38(4), 547-552. DOI: http://dx.doi.org/10.4025/actasciagron.v38i4.32629
https://doi.org/http://dx.doi.org/10.402...
) packages, which use an interface with MATLAB software (Matlab, 2016Matlab. (2016). Software. Natick, MA: The MathWorks Inc. ).

Results and discussion

Prediction by different approaches

The estimate of R2 for all methodologies using the explanatory variables to predict grain yield (GY), panicle length (PL), and grain length and width ratio (LGW) in flood-irrigated rice is shown in Figure 1. Based on Figure 1, it is possible to compare and define the variables that proved to be most efficient for the prediction of GY, PL, and LGW. Higher values of this estimate indicate that the target prediction variable has a better adjustment than the other explanatory variables (Roy & Roy, 2008Roy, P. P., & Roy, K. (2008). On some aspects of variable selection for partial least squares regression models. QSAR & Combinatorial Science, 27(3), 302-313. DOI: https://doi.org/10.1002/qsar.200710043
https://doi.org/https://doi.org/10.1002/...
; Hassanzadeh, Ghavami, & Kompany-Zareh, 2015Hassanzadeh, Z., Ghavami, R., & Kompany-Zareh, M. (2015). Radial basis function neural networks based on the projection pursuit and principal component analysis approaches: QSAR analysis of fullerene[C60]-based HIV-1 PR inhibitors. Medicinal Chemistry Research, 25, 19-29. DOI: https://doi.org/10.1007/s00044-015-1466-x
https://doi.org/https://doi.org/10.1007/...
). Among the methodologies used in this study, it was found that multiple regression showed a lower estimate of R2 (Figure 1) for the same variable, indicating the existence of nonlinear associations between the explanatory variables not considered in the model. Artificial intelligence and machine learning methodologies, in turn, stood out for their ability to extract nonlinear information from model inputs (Parmley et al., 2019Parmley, K. A., Higgins, R. H., Ganapathysubramanian, B., Sarkar, S., & Singh, A. K. (2019). Machine learning approach for prescriptive plant breeding. Scientific Reports, 9(1), 1-12. DOI: https://doi.org/10.1038/s41598-019-53451-4
https://doi.org/https://doi.org/10.1038/...
; Skawsang et al., 2019Skawsang, S., Nagai, M., Nitin, K., & Soni, P. (2019). Predicting rice pest population occurrence with satellite-derived crop phenology, ground meteorological observation, and machine learning: A case study for the central plain of Thailand. Applied Sciences, 9(22), 1-19. DOI: https://doi.org/10.3390/app9224846
https://doi.org/https://doi.org/10.3390/...
), as seen in Figure 1. Other authors have already highlighted the abilities of neural networks to better capture nonlinear relationships when compared to conventional methodologies (Silva et al., 2014Silva, G. N., Tomaz, R. S., Sant’anna, I. C., Nascimento, M., Bhering, L. L., & Cruz, C. D. (2014). Neural networks for predicting breeding values and genetic gains. Scientia Agricola, 71(6), 494-498. DOI: http://dx.doi.org/10.1590/0103-9016-2014-0057
https://doi.org/http://dx.doi.org/10.159...
; Sant’anna et al., 2019Sant’Anna, I. C., Ferreira, R. A. D. C., Nascimento, M., Carneiro, V. Q., Silva, G. N., Cruz, C. D., ... Chagas, F. E. O. (2019). Multigenerational prediction of genetic values using genome-enabled prediction. PLoS ONE, 14(1), 1-14. DOI: https://doi.org/10.1371/journal.pone.0210531
https://doi.org/https://doi.org/10.1371/...
).

Figure 1
Maximum estimate of the coefficient of determination in three environments to predict grain yield (GY), panicle length (PL), and grain length and width ratio in flood-irrigated rice (LGW). A: panicle length; B: grain yield; C: grain length-to-width ratio; RG: multiple regression; PMC: multilayer perceptron; RBR: radial base network; AD: decision tree; FA: random forest; BA: bagging; BO: boosting.

The results obtained by different approaches show that there was a discrepancy between the maximum estimate of R2 for all predictive variables in the same environments (Figure 1). The artificial intelligence approach in the Leopoldina environment provided a higher estimate for the predictive variables PL and GY in the RBF procedure, 83.44 and 78.90%, respectively. The GY response variable had the best estimate of R2 in the Lambari and Janaúba environments in the PMC network with only one neuron in the output layer and a single hidden layer (Figure 1). In the Leopoldina and Lambari environments, for the LGW response variable, a maximum estimate of R2 was approximately 100% by multiple regression and artificial intelligence approaches. On the other hand, it is variable in Janaúba, with a maximum estimate of 62%. The differences in the results obtained in these analyses indicate that the environment influences the estimation of R2 and, consequently, the cause and effect relationships between the response variable and the set of explanatory variables.

Machine learning approaches proved to be more efficient than other approaches (Figure 1). There was a low estimate of R2 for the predictive variable GY in the Janaúba environment in the random forest procedure, which corresponds to 18.57%. This result is inferior to all the approaches used in this study. In this same environment, but for bagging procedures, the estimate of R2 was 94.76%. High estimates of R2 above 80%) were obtained using machine learning methodologies by the procedures bagging and boosting for all predictive variables (Figure 1). The decision tree (AD) and random forest methodologies did not stand out from the other machine learning procedures (Figure 1). Sousa et al. (2020Sousa, I. C., Nascimento, M., Silva, G. N., Nascimento, A. C. C., Cruz, C. D., Fonseca, F., ... Caixeta, E. T. (2020). Genomic prediction of leaf rust resistance to Arabica coffee using machine learning algorithms. Scientia Agricola, 78(4), 1-8. DOI: http://dx.doi.org/10.1590/1678-992x-2020-0021
https://doi.org/http://dx.doi.org/10.159...
) emphasized that the AD’s low predictive accuracy can be improved using ensemble methods such as bagging, random forest, and boosting. These strategies combine multiple AD to reduce the variability.

Random forests and bagging these methods have good predictive performances in practice; they work well for high-dimensional problems and can be used with multiclass output, categorical predictors, and imbalanced problems (Gregorutti, Michel, & Saint-Pierre, 2017Gregorutti, B., Michel, B., & Saint-Pierre, P. (2017). Correlation and variable importance in random forests. Statistics and Computing, 27, 659-678. DOI: https://doi.org/10.1007/s11222-016-9646-1
https://doi.org/https://doi.org/10.1007/...
). This author had satisfactory result variable selection with the random forests algorithm in the presence of correlated predictors.

When the variables are correlated, the simple correlation coefficient produces incomplete information. This is because a high correlation between two variables may have resulted from a third or a group of variables over another variable. Traditional methods, as well as path analysis, decompose into direct and indirect effects on the main variable, and logistic regression becomes unstable in the presence of high correlations. Multicollinearity is caused by the high correlation between the variables, which provides a problem of lack of adjustment of the model that affects the estimates of the parameters. In the literature, the ability of RNA to circumvent the problem of multicollinearity has already been highlighted (Cruz & Nascimento, 2018Cruz, C. D., & Nascimento, M. (2018). Inteligência computacional aplicada ao melhoramento genético. Viçosa, MG: Editora UFV. ). These authors presented an application in which a response variable is predicted through five explanatory variables. By including a sixth explanatory variable, which would assume the same values as the fifth variable, it did not affect the accuracy of the ANN - Adaline in any way. However, they reinforce that in the classic multiple linear regression approach, there would be no solution, since there would be two columns, in the prediction matrix X, linearly dependent, so that the established multicollinearity would lead to an X'X matrix without a common inverse.

The efficiency of ANNs in prediction problems, given their ability to extract relevant information from large data sets and generalize relatively inaccurate information (Porwal, Carranza, & Hale, 2003Porwal, A., Carranza, E. J. M., & Hale, M. (2003). Artificial neural networks for mineral potential mapping; a case study from Aravalli Province, Western India. Natural Resources Research, 12(3), 155-171. DOI: https://doi.org/10.1023/A:1025171803637
https://doi.org/https://doi.org/10.1023/...
), was very well expressed by the results obtained (Figure 1). The same can be seen for methodologies based on machine learning, which are capable of handling more reduced or redundant information in the input variables (Quinlan, 1996Quinlan, J. R. (1996). Learning decision tree classifiers. ACM Computing Surveys, 28(1), 71-72. DOI: https://doi.org/10.1145/234313.234346
https://doi.org/https://doi.org/10.1145/...
). However, another study as important as the prediction and which is often not carried out is the identification, among the explanatory variables, those of greater importance despite constituting important information in the process of understanding the adjusted model and decision making about dimensionality reduction in future studies (Beucher et al., 2019Beucher, A., Møller, A. B., & Greve, M. H. (2019). Artificial neural networks and decision tree classification for predicting soil drainage classes in Denmark. Geoderma, 352, 351-359. DOI: https://doi.org/10.1016/j.geoderma.2017.11.004
https://doi.org/https://doi.org/10.1016/...
). Thus, after the prediction analysis, the quantification of the importance of variables was performed using artificial intelligence and machine learning methods to identify, among the set of explanatory variables, those that should be prioritized and identified as auxiliary characteristics in indirect responses to selection.

Importance of variables in prediction by the artificial intelligence approach

For ease of interpretation, we will denote R2 the quality of prediction of the methodology and R2* this same quality of adjustment after the disturbance in the explanatory variable.

Multilayer Perceptron (PMC)

Neural networks tend to perform well when compared to other predictive algorithms based on machine learning (Santos, Dean, Weaver, & Hovanski, 2018Santos, R. P, Dean, D. L., Weaver, J. M., & Hovanski, Y. (2018). Identifying the relative importance of predictive variables in artificial neural networks based on data produced through a discrete event simulation of a manufacturing environment. International Journal of Modelling and Simulation, 39(4), 234-245. DOI: https://doi.org/10.1080/02286203.2018.1558736
https://doi.org/https://doi.org/10.1080/...
). These algorithms are capable of learning from linear and nonlinear relationships in the data (Somers & Casal, 2009Somers, M. J., & Casal, J. C. (2009). Using artificial neural networks to model nonlinearity: The case of the job satisfaction-job performance relationship. Organizational Research Methods, 12(3), 403-417. DOI: https://doi.org/10.1177/1094428107309326
https://doi.org/https://doi.org/10.1177/...
; Haddouche, Chetate, & Said Boumedine, 2018Haddouche, R., Chetate, B., & Said Boumedine, M. (2018). Neural network ARX model for gas conditioning tower. International Journal of Modeling and Simulation, 39(3), 166-177. DOI: https://doi.org/10.1080/02286203.2018.1538848
https://doi.org/https://doi.org/10.1080/...
). It can also measure and incorporate direct effects and effects of interaction between variables in predictive models (Tsang, Cheng, & Liu, 2017Tsang, M., Cheng, D., & Liu, Y. (2017). Detecting statistical interactions from neural network weights. In 6th International Conference on Learning Representations (p. 1-21). Vancouver, CA: ICLR. DOI: https://doi.org/10.48550/arXiv.1705.04977
https://doi.org/https://doi.org/10.48550...
).

The PMC network is widely used in the predictive process (Gedeon, Wong, & Harris, 1995Gedeon, T. D., Wong, P. M., & Harris, D. (1995). Balancing bias and variance: network topology and pattern set reduction techniques. Berlin, Heidelberg, GE: Springer Berlin Heidelberg.; Santos et al., 2018Santos, R. P, Dean, D. L., Weaver, J. M., & Hovanski, Y. (2018). Identifying the relative importance of predictive variables in artificial neural networks based on data produced through a discrete event simulation of a manufacturing environment. International Journal of Modelling and Simulation, 39(4), 234-245. DOI: https://doi.org/10.1080/02286203.2018.1558736
https://doi.org/https://doi.org/10.1080/...
) since the success of this network has already been shown in several research groups that have shown mathematically that, with only a single hidden layer, this network works very well with different numbers of neurons in the hidden layer (De Oña & Garrido, 2014De Oña, J., & Garrido, C. (2014). Extracting the contribution of independent variables in neural network models: a new approach to handle instability. Neural Computing and Applications, 25(3-4), 859-869. DOI: https://doi.org/10.1007/s00521-014-1573-5; Santos et al., 2018Santos, R. P, Dean, D. L., Weaver, J. M., & Hovanski, Y. (2018). Identifying the relative importance of predictive variables in artificial neural networks based on data produced through a discrete event simulation of a manufacturing environment. International Journal of Modelling and Simulation, 39(4), 234-245. DOI: https://doi.org/10.1080/02286203.2018.1558736
https://doi.org/https://doi.org/10.1080/...
).

The importance of the variables was quantified by assigning a zero value to the phenotypic information related to each variable to observe what changes would occur in the values of the R2*. The results of the PMC network are shown in Table 1. It is important to note that, in this table, reductions in the values of R2* after assigning zero value to the phenotypic information referring to each variable, they are indicative that this variable is important about the others for purposes of prediction with the network already established.

Table 1
Estimates of the coefficient of determination, provided by the use of the PMC, to predict grain yield, panicle length and grain length and width after disturbance (zero value assignment) in the explanatory variable values.

The results in Table 1 show great discrepancies in the R2* when comparing the environments with each other, which makes interpretation difficult. For the response variable LGW, it was efficient to quantify grain length and width due to the reduction in the estimate of R2* as a result of the strategy of assigning a zero value to phenotypic information. It should be remembered that such changes must be seen concerning the values of the R2 of prediction, which was approximately 100% in the environments of Leopoldina and Lambari, and Janaúba was 63% (Figure 1). For Leopoldina, when zeroing the variables, for example, HP, GL, and TI, the R2* values of these variables were 0.04, 0.52, and 1.70, respectively (Figure 1). This result shows that these variables are important in predicting GY because the disturbance in their values has led to a considerable reduction in the quality of the adjustment. In Lambari, the variable that presented the highest contribution was FL. Independent of the predictive variable in PMC, with only one neuron in the output layer and a single hidden layer, they agreed to point out that the most important variables were grain width and length, given the significant falls in the values of the estimate of R2* observed when zeroing the variable.

To overcome the difficulties faced when adopting PMC networks to study the importance of variables, an alternative is to use the AG algorithm, which takes into account the partitioning of the RNA connection weights to determine the relative importance of each input variable within the network. The weights that connect neurons in an ANN are partially analogous to the coefficients in a generalized linear model (Beck, 2018Beck, M. W. (2018). NeuralNetTools: Visualization and analysis tools for neural networks. Journal of Statistical, 85(11), 1-20. DOI: http://dx.doi.org/10.18637 / jss.v085.i11
https://doi.org/http://dx.doi.org/10.186...
) so that the combined effects of weights in the model's predictions represent the relative importance of predictors in their associations with the variable of the predictor. The large number of adjustable weights in an artificial neural network makes it very flexible in modelling nonlinear effects but imposes challenges for its interpretation. In this algorithm, the numbers of neurons were used to obtain the maximum estimate of R2* for a better estimate of the relative contribution of variables.

The percentages of the relative contribution estimated by the GA method are described in Table 2. In this table, for the GY response variable, the results were consistent in pointing plant height (HP), flowering (FL), and the number of full grains per panicle (GP) in terms of relative contribution. For the variable response PL, the variable with the greatest relative contribution was grain yield (GY) in the environments of Leopoldina and Lambari; however, in Janaúba, the variable that stood out was the length and width of grains. Regarding the explanatory variable LGW, the percentages of the relative contribution revealed that the variables grain length and grain width had the largest relative contribution. This result was expected since the length and width of grain variables are determinants of LGW. The results indicate that the GA approaches are efficient in quantifying the importance of variables in studies involving PMC neural networks.

Table 2
Percentages of the relative contribution estimated by the method of Garson (1991) modified by Goh (1995Goh, A. T. C. (1995). Back-propagation neural networks for modeling complex systems. Artificial Intelligence in Engineering, 9(3),143-151. DOI: https://doi.org/10.1016/0954-1810(94)00011-S
https://doi.org/https://doi.org/10.1016/...
) of 12 variables to predict grain yield, panicle length, and grain length and width ratio in flood-irrigated rice in three environments in the State of Minas Gerais.

Radial Base Network (RBF)

The quantification of the importance of flood-irrigated rice characters by assigning a zero value to the information of an input variable after the RBF was established was performed and is described in Table 3. In this table, the values are used after causing disturbances in the input variables with the action of assigning zero value of the variable in each explanatory variable. When using this strategy of zeroing the value of the variable, drastic reductions in the values of R2* were observed for the most important length (GL) and grain width (GW) variables when the target prediction variable was LGW. For other response variables, this result was very discrepant in quantifying the true importance of variables. When the explanatory variable was GY, in Janaúba, the variables that suffered the greatest reduction in R2* were flowering - R2* = 23.80 and weight of 100 grains (WG) - R2* = 19.91; in Leopoldina, plant height variables were observed (HP) - R2* = 21.26, grain width (GW) - R2*= 24.83 and weight of 100 grains (WG) = 24.25; and in Lambari, the most important variable using this approach was flowering (FL) - R2*= 28.43.

For the variable response PL, we observed changes in the values of R2* in Leopoldina and Lambari for the variable flowering (FL) - R2*= 47.77 and R2*= 46.76, respectively. In Leopoldina, the percentages of full grains (FG) - R2*= 25.51 also showed a drastic reduction in R2*. In Lambari, lower estimates of R2* were obtained for the variable weight of 100 grains (WG) - R2*= 45.60. For Janaúba, the results show that the most important variables using the RBF were grain width (GW) - R2*= 19.76 and weight of 100 grains (WG) - R2*= 23.11.

Therefore, there is a certain agreement between the results found by the two computational intelligence methodologies of PMC networks and RBF networks.

Table 3
Estimates of the coefficient for determining the grain yield prediction, panicle length, and grain length-to-width ratio using the RBF assigning zero value to the genotype information.

Importance of variables in prediction by the machine learning approach

Table 4 shows the averages of the relative contributions of the explanatory variables for predicting grain yield, panicle length, and grain length-to-width ratio by estimating the percentage of minimum square error increment (IMSE), which is constructed by exchanging the values of each variable in the data set and comparing it with the prediction of the original unix exchange data set for the variable. In this case, unlike the strategy used for the computational intelligence methodologies of PMC and RBF networks, for which lower values of R2* indicated greater importance of that variable for the model, in the machine learning approach, the importance of the explanatory variable is related to the estimate of the average decrease in the accuracy of the model through IMSE so that the higher this estimate the greater the importance of the variable.

Table 4
The average estimate of the relative contributions of the explanatory variables for predicting grain yield, panicle length, and grain length-to-grain ratio in flood-irrigated rice continues using a machine learning approach in three environments in Minas Gerais.

Based on Table 4, the variables that obtained the highest estimate in all machine learning methodologies were length (GL) and grain width when the prediction target variable was grain length and width ratio (LGW) in all environments. For this same response variable, another variable that had a high IMSE estimate was panicle length (PL) in Leopoldina and Lambari, and Janaúba did not consider this variable to be the most important due to the low estimate of the IMSE percentage. On the other hand, the weight variables of 100 grains (WG) and the number of full grains per panicle (GP) proved to be efficient in quantifying the prediction of LGW by boosting. This procedure proved to be more consistent in predicting variables compared to the others.

The variable that obtained the highest IMSE estimate when PL was the target prediction variable was plant height (HP) for Leopoldina and Lambari. On the other hand, this variable in Janaúba was not highlighted in predicting PL. In Leopoldina, another variable that stood out in predicting PL was the number of grains filled per panicle (GP) for all machine learning approaches. When using the explanatory variable PL, the variable GY presented the highest IMSE in Janaúba for procedure bagging. Regarding the procedure boosting and about the same predictive variable, the results show discrepancies. On the other hand, this procedure was more consistent in predicting the variable. In this procedure, to quantify the importance of a variable using PL as a predictive target, the variables GP, GY, and LGW stood out in Leopoldina. In Lambari, other variables showed better performance in predicting PL, for example, GW, GY, and LGW, and in Janaúba, they were PL, GW, GP, GY, and LGW.

When the target prediction variable was GY, in Leopoldina, the variables that obtained an estimate of the high IMSE percentage were plant height (HP) and grain length (GL) in all machine learning procedures. On the other hand, in Lambari, the variable that stood out was panicle length (PL). In this environment, another variable that showed better predictive performance when GY was used as the main variable was flowering (FL) in bagging and random forest. In the boosting procedure, the variables that stood out were HP, GL, PL, GP, WG, and LGW in all environments.

The literature has highlighted machine learning techniques as efficient tools in quantifying the relative importance of variables, in view of simplicity, the nonuse of assumptions about the distribution of explanatory variables, and their robustness to quantity, redundancy, and environmental influences (Tan et al., 2014Tan, K., Li, E., Du, Q., & Du, P. (2014). An efficient semi-supervised classification approach for hyperspectral imagery. ISPRS Journal of Photogrammetry and Remote Sensing, 97, 36-45. http://dx.doi.org/10.1016/j.isprsjprs.2014.08.003.
https://doi.org/http://dx.doi.org/10.101...
; Beucher et al., 2019Beucher, A., Møller, A. B., & Greve, M. H. (2019). Artificial neural networks and decision tree classification for predicting soil drainage classes in Denmark. Geoderma, 352, 351-359. DOI: https://doi.org/10.1016/j.geoderma.2017.11.004
https://doi.org/https://doi.org/10.1016/...
). On the other hand, we verify this premise for the regression method. Random forests and bagging these methods have good predictive performances in practice; they work well for high-dimensional problems and can be used with multiclass output, categorical predictors, and imbalanced problems (Gregorutti et al., 2017Gregorutti, B., Michel, B., & Saint-Pierre, P. (2017). Correlation and variable importance in random forests. Statistics and Computing, 27, 659-678. DOI: https://doi.org/10.1007/s11222-016-9646-1
https://doi.org/https://doi.org/10.1007/...
). This author had satisfactory result variable selection with the random forests algorithm in the presence of correlated predictors.

Grain yield is a trait controlled by several genes and is therefore a quantitative inheritance (Freitas et al., 2007Freitas, J. G., Cantarella, H., Salomon, M. V., Malovolta, V. M. A., Castro, L. H. S. M., Gallo, P. B., & Azzini, L. E. (2007). Produtividade de cultivares de arroz irrigado resultante da aplicação de doses de nitrogênio. Bragantia, 66(2), 317-325. DOI: http://dx.doi.org/10.1590/S0006-87052007000200016
https://doi.org/http://dx.doi.org/10.159...
). Therefore, grain yield depends on the interaction of several yield components, for example, numbers of spikelets and grains per panicle, mass of a thousand grains, spike fertility index and panicle length, which are controlled by genetic factors, and environmental factors. The length of the panicle, the number of spikelets per panicle, the fertility of the spikelets, and the mass of a thousand grains directly affect grain yield (Evans & Bhatt, 1977Evans, L. E., & Bhatt, G. M. (1977). Influence of seed size, protein content and cultivar on early seedling vigor in rice. Canadian Journal of Plant Science, 57(3), 929-935. DOI: https://doi.org/10.4141/cjps77-133
https://doi.org/https://doi.org/10.4141/...
). Thus, knowledge of these relationships can help breeders select new cultivars, which can increase the productivity and quality of grains and decrease the cost of production and the environmental impact.

The longer the flowering period in the rice culture, the more photoassimilates are produced and translocated to the grains, and consequently, an increase in grain yield. However, late-cycle cultivars tend to be more productive about the early cycle since they obtain an increase in the amount of photoassimilates that are translocated to the grains. According to Ntanos and Koutroubas (2002Ntanos, D. A., & Koutroubas, S. D. (2002). Dry matter and Naccumulation and translocation for Indica and Japonica riceunder Mediterranean conditions. Field Crops Research, 74(1), 93-101. DOI: https://doi.org/10.1016/S0378-4290(01)00203-9
https://doi.org/https://doi.org/10.1016/...
), productivity in rice has been justified by differences in the dynamics of the distribution of assimilates between organs during plant growth and development. From the results of these studies, it was found that the production of dry matter and the translocation of photoassimilates contributed significantly to the development of grains in different cultivars and, consequently, a direct relationship with grain yield.

Grain dimensions are the main determinants of grain weight and one of the three components (number of panicles per plant, number of grains per panicle, and weight of grains) of grain yield; therefore, they are important characteristics that affect yield in rice. In plant breeding applications, grain size is generally assessed by the weight of the grain, which is positively correlated with various characteristics, including the length, width, and thickness of the grain (Fan et al., 2006Fan, C., Xing, Y., Mao, H., Lu, T., Han, B., Xu, C., … Zhang, Q. (2006). GS3, a major QTL for grain length and weight and minor QTL for grain width and thickness in rice, encodes a putative transmembrane protein. Theoretical and Applied Genetics, 112(6), 1164-1171. DOI: https://doi.org/10.1007/s00122-006-0218-1
https://doi.org/https://doi.org/10.1007/...
). These characteristics also influence acceptability for consumers, and therefore, the size/shape of the rice grain is an important preferential target characteristic for breeders (Huang et al., 2012Huang, X., Zhao, Y., Wei, X., Li, C., Wang, A., Zhao, Q., … Han, B. (2012a) Genome-wide association study of flowering time and grain yield traits in a worldwide collection of rice germplasm. Nature Genetics, 44, 32-39. DOI: https://doi.org/10.1038/ng.1018
https://doi.org/https://doi.org/10.1038/...
; Anacleto et al., 2015Anacleto, R., Cuevas, R. P., Jimenez, R., Llorente, C., Nissila, E., Henry, R., Sreenivasulu, N. (2015). Prospects of breeding high-quality rice using post-genomic tools. Theoretical and Applied Genetics, 128(8), 1449-1466. DOI: https://doi.org/10.1007/s00122-015-2537-6
https://doi.org/https://doi.org/10.1007/...
). Cultivars of the short and long types are highly preferred by many consumers in Japan, South Korea, and North China, while consumers in India, the USA, and other countries in South and Southeast Asia prefer long and medium grains (Misra et al., 2017Misra, G., Badoni, S., Anacleto, R., Graner, A., Alexandrov, N., & Sreenivasulu, N. (2017). Whole genome sequencing-based association study to unravel genetic architecture of cooked grain width and length traits in rice. Scientific Reports, 7(12478), 1-16. DOI: https://doi.org/10.1038/s41598-017-12778-6
https://doi.org/https://doi.org/10.1038/...
).

Methodologies based on machine learning and computational intelligence do not depend on stochastic information and tend to be more efficient. These methodologies make no assumptions about the model but capture complex factors such as epistasis and dominance in prediction models. It is not necessary to know if the data have these effects and do not require any assumptions about the distribution of phenotypic values (Sousa et al., 2020Sousa, I. C., Nascimento, M., Silva, G. N., Nascimento, A. C. C., Cruz, C. D., Fonseca, F., ... Caixeta, E. T. (2020). Genomic prediction of leaf rust resistance to Arabica coffee using machine learning algorithms. Scientia Agricola, 78(4), 1-8. DOI: http://dx.doi.org/10.1590/1678-992x-2020-0021
https://doi.org/http://dx.doi.org/10.159...
). Machine learning algorithms have the advantage of modelling data in a nonlinear and a nonparametric manner (Osco et al., 2020Osco, L. P., Ramos, A. P. M., Pinheiro, M. M. F., Moriya, E. A. S., Imai, N. N., Estrabis, N., … Creste, J. E. (2020). A machine learning framework to predict nutrient content in valencia-orange leaf hyperspectral measurement. Remote Sensing, 12(6), 1-21. DOI: http://dx.doi.org/10.3390/rs12060906
https://doi.org/http://dx.doi.org/10.339...
). Unlike many traditional statistical methods, these algorithms are built with the advantage of dealing with noisy, complex, and heterogeneous data (Osco et al., 2019Osco, L. P., Ramos, A. P. M., Moriya, E. A. S., Bavaresco, L. G., Lima, B. C., Estrabis, N., ... Araújo, F. F. (2019). Modeling hyperspectral response of water-stress induced lettuce plants using artificial neural networks. Remote Sensing, 11(23), 1-15. DOI: https://doi.org/10.3390/rs11232797
https://doi.org/https://doi.org/10.3390/...
).

In this study, we compare different approaches to quantifying the importance of variables to identify relevant predictive variables within a regression problem. Additionally, we included in our comparison a traditional method that aims to find a small subset of important variables with ideal forecasting performance in flood-irrigated rice.

It is noteworthy that the 13 characteristics used in this study are laborious to obtain, and their evaluation can be costly if there are a greater number of genotypes to be evaluated. In this context, the study of the most important characteristics in prediction is necessary, since it is possible to reduce physical effort, cost, labour, and time in experimentation (Ferreira et al., 2017Ferreira, M. G., Azevedo, A. M., Siman, L. I., Silva, G. H., Carneiro, C. S., Alves, F. M., … Nick, C. (2017). Automation in accession classification of Brazilian Capsicum germplasm through artificial neural networks. Scientia Agricola, 73(3), 203-207. DOI: http://dx.doi.org/10.1590/1678-992X-2015-0451
https://doi.org/http://dx.doi.org/10.159...
).

Predicting the importance of flood-irrigated rice characteristics is of paramount importance for breeding programmes, as it directs genotype selection more practically, in addition to serving as a theoretical and practical framework in support of new recommendation cultivars. In practical terms, these results are consistent.

Therefore, our study presents the performance of some methodologies to evaluate the relative contributions of each variable through computational intelligence and machine learning in flood-irrigated rice culture. An approach to quantify the effect of explanatory variables on genetic improvement has successfully identified the true importance of each variable, including those that exhibit strong and weak correlations with the main variables, which in our case are grain yield, length of panicle and grain length-to-width ratio.

Researchers can now identify the individual and interactive contributions of the predictor variables to the rice crop using artificial intelligence and machine learning.

Conclusion

Computational intelligence and machine learning methodologies were able to quantify the importance of explanatory variables in the prediction of grain yield in rice, grain length and width ratio, and panicle length. In addition to artificial intelligence and machine learning, it is able to handle more reduced or redundant information in the input variables. The characteristics able to assist in decision making are flowering, number of grains filled by panicles, and panicle length. The network with only one hidden layer with 15 neurons was efficient in determining the relative importance of variables in flooded rice.

Acknowledgements

The authors would like to thank the Research Support Foundation of the State of Minas Gerais, the National Council for Scientific and Technological Development and the Coordination for the Improvement of Higher Education Personnel for the financial support and researcher of Embrapa Rice and Beans Dr. Orlando Peixoto de Morais (in memory). This study was financed in part by the Coordination for the Improvement of Higher Education Personnel - Brazil (CAPES) - Financial Code 001. The authors gratefully acknowledge the Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP) for researcher fellowship to ICS 2018/26408-0.

Reference

  • Anacleto, R., Cuevas, R. P., Jimenez, R., Llorente, C., Nissila, E., Henry, R., Sreenivasulu, N. (2015). Prospects of breeding high-quality rice using post-genomic tools. Theoretical and Applied Genetics, 128(8), 1449-1466. DOI: https://doi.org/10.1007/s00122-015-2537-6
    » https://doi.org/https://doi.org/10.1007/s00122-015-2537-6
  • Beck, M. W. (2018). NeuralNetTools: Visualization and analysis tools for neural networks. Journal of Statistical, 85(11), 1-20. DOI: http://dx.doi.org/10.18637 / jss.v085.i11
    » https://doi.org/http://dx.doi.org/10.18637 / jss.v085.i11
  • Beucher, A., Møller, A. B., & Greve, M. H. (2019). Artificial neural networks and decision tree classification for predicting soil drainage classes in Denmark. Geoderma, 352, 351-359. DOI: https://doi.org/10.1016/j.geoderma.2017.11.004
    » https://doi.org/https://doi.org/10.1016/j.geoderma.2017.11.004
  • Cruz, C. D. (2016). Genes Software - extended and integrated with the R, Matlab and Selegen. Acta Scientiarum. Agronomy, 38(4), 547-552. DOI: http://dx.doi.org/10.4025/actasciagron.v38i4.32629
    » https://doi.org/http://dx.doi.org/10.4025/actasciagron.v38i4.32629
  • Cruz, C. D., & Nascimento, M. (2018). Inteligência computacional aplicada ao melhoramento genético Viçosa, MG: Editora UFV.
  • De Oña, J., & Garrido, C. (2014). Extracting the contribution of independent variables in neural network models: a new approach to handle instability. Neural Computing and Applications, 25(3-4), 859-869. DOI: https://doi.org/10.1007/s00521-014-1573-5
  • Degenhardt, F., Seifert, S., & Szymczak, S. (2019). Evaluation of variable selection methods for random forests and omics data sets. Briefings in Bioinformatics, 20(2), 492-503. DOI: https://doi.org/10.1093/bib/bbx124
    » https://doi.org/https://doi.org/10.1093/bib/bbx124
  • Evans, L. E., & Bhatt, G. M. (1977). Influence of seed size, protein content and cultivar on early seedling vigor in rice. Canadian Journal of Plant Science, 57(3), 929-935. DOI: https://doi.org/10.4141/cjps77-133
    » https://doi.org/https://doi.org/10.4141/cjps77-133
  • Fan, C., Xing, Y., Mao, H., Lu, T., Han, B., Xu, C., … Zhang, Q. (2006). GS3, a major QTL for grain length and weight and minor QTL for grain width and thickness in rice, encodes a putative transmembrane protein. Theoretical and Applied Genetics, 112(6), 1164-1171. DOI: https://doi.org/10.1007/s00122-006-0218-1
    » https://doi.org/https://doi.org/10.1007/s00122-006-0218-1
  • Ferreira, M. G., Azevedo, A. M., Siman, L. I., Silva, G. H., Carneiro, C. S., Alves, F. M., … Nick, C. (2017). Automation in accession classification of Brazilian Capsicum germplasm through artificial neural networks. Scientia Agricola, 73(3), 203-207. DOI: http://dx.doi.org/10.1590/1678-992X-2015-0451
    » https://doi.org/http://dx.doi.org/10.1590/1678-992X-2015-0451
  • Freitas, J. G., Cantarella, H., Salomon, M. V., Malovolta, V. M. A., Castro, L. H. S. M., Gallo, P. B., & Azzini, L. E. (2007). Produtividade de cultivares de arroz irrigado resultante da aplicação de doses de nitrogênio. Bragantia, 66(2), 317-325. DOI: http://dx.doi.org/10.1590/S0006-87052007000200016
    » https://doi.org/http://dx.doi.org/10.1590/S0006-87052007000200016
  • Garson, G. D. (1991). Interpreting neural network connection weights. Artificial Intelligence Expert, 6, 46-51.
  • Gedeon, T. D., Wong, P. M., & Harris, D. (1995). Balancing bias and variance: network topology and pattern set reduction techniques Berlin, Heidelberg, GE: Springer Berlin Heidelberg.
  • Ghani, I. M. M., & Ahmad, S. (2010). Stepwise multiple regression method to forecast fish landing. Procedia - Social and Behavioral Sciences, 8, 549-554. DOI: https://doi.org/10.1016/j.sbspro.2010.12.076
    » https://doi.org/https://doi.org/10.1016/j.sbspro.2010.12.076
  • Gianola, D., Okut, H., Weigel, K. A., & Rosa, G. J. M. (2011). Predicting complex quantitative traits with Bayesian neural networks: a case study with Jersey cows and wheat. BMC Genetics, 12(87), 1-14. DOI: https://doi.org/10.1186/1471-2156-12-87
    » https://doi.org/https://doi.org/10.1186/1471-2156-12-87
  • Goh, A. T. C. (1995). Back-propagation neural networks for modeling complex systems. Artificial Intelligence in Engineering, 9(3),143-151. DOI: https://doi.org/10.1016/0954-1810(94)00011-S
    » https://doi.org/https://doi.org/10.1016/0954-1810(94)00011-S
  • Gregorutti, B., Michel, B., & Saint-Pierre, P. (2017). Correlation and variable importance in random forests. Statistics and Computing, 27, 659-678. DOI: https://doi.org/10.1007/s11222-016-9646-1
    » https://doi.org/https://doi.org/10.1007/s11222-016-9646-1
  • Haddouche, R., Chetate, B., & Said Boumedine, M. (2018). Neural network ARX model for gas conditioning tower. International Journal of Modeling and Simulation, 39(3), 166-177. DOI: https://doi.org/10.1080/02286203.2018.1538848
    » https://doi.org/https://doi.org/10.1080/02286203.2018.1538848
  • Hassanzadeh, Z., Ghavami, R., & Kompany-Zareh, M. (2015). Radial basis function neural networks based on the projection pursuit and principal component analysis approaches: QSAR analysis of fullerene[C60]-based HIV-1 PR inhibitors. Medicinal Chemistry Research, 25, 19-29. DOI: https://doi.org/10.1007/s00044-015-1466-x
    » https://doi.org/https://doi.org/10.1007/s00044-015-1466-x
  • Hastie, T., Tibshirani, R., & Friedman, J. (2009). The elements of statiscal learning data mining, inference, and prediction (2nd ed.). New York, NY: Springer.
  • Huang, X., Zhao, Y., Wei, X., Li, C., Wang, A., Zhao, Q., … Han, B. (2012a) Genome-wide association study of flowering time and grain yield traits in a worldwide collection of rice germplasm. Nature Genetics, 44, 32-39. DOI: https://doi.org/10.1038/ng.1018
    » https://doi.org/https://doi.org/10.1038/ng.1018
  • Li, L., & Zha, Y. (2019). Estimating monthly average temperature by remote sensing in China. Advances in Space Research, 63(8), 2345-2357. DOI: https://doi.org/10.1016/j.asr.2018.12.039
    » https://doi.org/https://doi.org/10.1016/j.asr.2018.12.039
  • Matlab. (2016). Software Natick, MA: The MathWorks Inc.
  • Misra, G., Badoni, S., Anacleto, R., Graner, A., Alexandrov, N., & Sreenivasulu, N. (2017). Whole genome sequencing-based association study to unravel genetic architecture of cooked grain width and length traits in rice. Scientific Reports, 7(12478), 1-16. DOI: https://doi.org/10.1038/s41598-017-12778-6
    » https://doi.org/https://doi.org/10.1038/s41598-017-12778-6
  • Ntanos, D. A., & Koutroubas, S. D. (2002). Dry matter and Naccumulation and translocation for Indica and Japonica riceunder Mediterranean conditions. Field Crops Research, 74(1), 93-101. DOI: https://doi.org/10.1016/S0378-4290(01)00203-9
    » https://doi.org/https://doi.org/10.1016/S0378-4290(01)00203-9
  • Olden, J. D., & Jackson, D. A. (2002). Illuminating the “black box”: a randomization approach for understanding variable contributions in artifical neural networks. Ecological Modelling, 154(1-2), 135-150. DOI: https://doi.org/10.1016/s0304-3800(02)00064-9
  • Osco, L. P., Ramos, A. P. M., Moriya, E. A. S., Bavaresco, L. G., Lima, B. C., Estrabis, N., ... Araújo, F. F. (2019). Modeling hyperspectral response of water-stress induced lettuce plants using artificial neural networks. Remote Sensing, 11(23), 1-15. DOI: https://doi.org/10.3390/rs11232797
    » https://doi.org/https://doi.org/10.3390/rs11232797
  • Osco, L. P., Ramos, A. P. M., Pinheiro, M. M. F., Moriya, E. A. S., Imai, N. N., Estrabis, N., … Creste, J. E. (2020). A machine learning framework to predict nutrient content in valencia-orange leaf hyperspectral measurement. Remote Sensing, 12(6), 1-21. DOI: http://dx.doi.org/10.3390/rs12060906
    » https://doi.org/http://dx.doi.org/10.3390/rs12060906
  • Paliwal, M. & Kumar, U. A. (2011). Assessing the contribution of variables in feed forward neural network. Applied Soft Computing, 11, 3690-3696.
  • Parmley, K. A., Higgins, R. H., Ganapathysubramanian, B., Sarkar, S., & Singh, A. K. (2019). Machine learning approach for prescriptive plant breeding. Scientific Reports, 9(1), 1-12. DOI: https://doi.org/10.1038/s41598-019-53451-4
    » https://doi.org/https://doi.org/10.1038/s41598-019-53451-4
  • Paruelo, J. M., & Tomasel, F. (1997). Prediction of functional characteristics of ecosystems: a comparison of artificial neural networks and regression models. Ecological Modelling, 98(2-3), 173-186. DOI: https://doi.org/10.1016/s0304-3800(96)01913-8
    » https://doi.org/https://doi.org/10.1016/s0304-3800(96)01913-8
  • Porwal, A., Carranza, E. J. M., & Hale, M. (2003). Artificial neural networks for mineral potential mapping; a case study from Aravalli Province, Western India. Natural Resources Research, 12(3), 155-171. DOI: https://doi.org/10.1023/A:1025171803637
    » https://doi.org/https://doi.org/10.1023/A:1025171803637
  • Quinlan, J. R. (1996). Learning decision tree classifiers. ACM Computing Surveys, 28(1), 71-72. DOI: https://doi.org/10.1145/234313.234346
    » https://doi.org/https://doi.org/10.1145/234313.234346
  • Roy, P. P., & Roy, K. (2008). On some aspects of variable selection for partial least squares regression models. QSAR & Combinatorial Science, 27(3), 302-313. DOI: https://doi.org/10.1002/qsar.200710043
    » https://doi.org/https://doi.org/10.1002/qsar.200710043
  • Sant’Anna, I. C., Ferreira, R. A. D. C., Nascimento, M., Carneiro, V. Q., Silva, G. N., Cruz, C. D., ... Chagas, F. E. O. (2019). Multigenerational prediction of genetic values using genome-enabled prediction. PLoS ONE, 14(1), 1-14. DOI: https://doi.org/10.1371/journal.pone.0210531
    » https://doi.org/https://doi.org/10.1371/journal.pone.0210531
  • Santos, R. P, Dean, D. L., Weaver, J. M., & Hovanski, Y. (2018). Identifying the relative importance of predictive variables in artificial neural networks based on data produced through a discrete event simulation of a manufacturing environment. International Journal of Modelling and Simulation, 39(4), 234-245. DOI: https://doi.org/10.1080/02286203.2018.1558736
    » https://doi.org/https://doi.org/10.1080/02286203.2018.1558736
  • Silva, G. N., Nascimento, M., Sant’Anna, I. C., Cruz, C. D., Caixeta, E. T., Carneiro, P. C. S., ... Oliveira, M. S. (2017). Artificial neural networks compared with Bayesian generalized linear regression for leaf rust resistance prediction in Arabica coffee. Pesquisa Agropecuária Brasileira, 52(3), 186-193. DOI: http://dx.doi.org/10.1590/s0100-204x2017000300009
    » https://doi.org/http://dx.doi.org/10.1590/s0100-204x2017000300009
  • Silva, G. N., Tomaz, R. S., Sant’anna, I. C., Nascimento, M., Bhering, L. L., & Cruz, C. D. (2014). Neural networks for predicting breeding values and genetic gains. Scientia Agricola, 71(6), 494-498. DOI: http://dx.doi.org/10.1590/0103-9016-2014-0057
    » https://doi.org/http://dx.doi.org/10.1590/0103-9016-2014-0057
  • Skawsang, S., Nagai, M., Nitin, K., & Soni, P. (2019). Predicting rice pest population occurrence with satellite-derived crop phenology, ground meteorological observation, and machine learning: A case study for the central plain of Thailand. Applied Sciences, 9(22), 1-19. DOI: https://doi.org/10.3390/app9224846
    » https://doi.org/https://doi.org/10.3390/app9224846
  • Somers, M. J., & Casal, J. C. (2009). Using artificial neural networks to model nonlinearity: The case of the job satisfaction-job performance relationship. Organizational Research Methods, 12(3), 403-417. DOI: https://doi.org/10.1177/1094428107309326
    » https://doi.org/https://doi.org/10.1177/1094428107309326
  • Sousa, I. C., Nascimento, M., Silva, G. N., Nascimento, A. C. C., Cruz, C. D., Fonseca, F., ... Caixeta, E. T. (2020). Genomic prediction of leaf rust resistance to Arabica coffee using machine learning algorithms. Scientia Agricola, 78(4), 1-8. DOI: http://dx.doi.org/10.1590/1678-992x-2020-0021
    » https://doi.org/http://dx.doi.org/10.1590/1678-992x-2020-0021
  • Tan, K., Li, E., Du, Q., & Du, P. (2014). An efficient semi-supervised classification approach for hyperspectral imagery. ISPRS Journal of Photogrammetry and Remote Sensing, 97, 36-45. http://dx.doi.org/10.1016/j.isprsjprs.2014.08.003.
    » https://doi.org/http://dx.doi.org/10.1016/j.isprsjprs.2014.08.003
  • Tsang, M., Cheng, D., & Liu, Y. (2017). Detecting statistical interactions from neural network weights. In 6th International Conference on Learning Representations (p. 1-21). Vancouver, CA: ICLR. DOI: https://doi.org/10.48550/arXiv.1705.04977
    » https://doi.org/https://doi.org/10.48550/arXiv.1705.04977
  • Yu, H., Campbell, M. T., Zhang, Q., Walia, H., & Morota, G. (2019). Genomic Bayesian confirmatory factor analysis and Bayesian network to characterize a wide spectrum of rice phenotypes. G3: Genes, Genomes, Genetics, 9(6), 1975-1986. DOI: https://doi.org/10.1534/g3.119.400154
    » https://doi.org/https://doi.org/10.1534/g3.119.400154

Publication Dates

  • Publication in this collection
    03 Mar 2023
  • Date of issue
    2023

History

  • Received
    23 Dec 2020
  • Accepted
    16 Apr 2021
Editora da Universidade Estadual de Maringá - EDUEM Av. Colombo, 5790, bloco 40, 87020-900 - Maringá PR/ Brasil, Tel.: (55 44) 3011-4253, Fax: (55 44) 3011-1392 - Maringá - PR - Brazil
E-mail: actaagron@uem.br