Acessibilidade / Reportar erro

Prediction of Land Suitability for Crop Cultivation Using Classification Techniques

Abstract

Agriculture, the backbone of every country, has been an emerging field of research, particularly in the recent past. The soil type and environment are critical factors that drive agriculture, especially in terms of crop prediction. To determine which crops grow best in certain types of soil and environment, the characteristics of the latter are to be ascertained. In the past, farmers picked suitable crops for cultivation, based on first-hand experience. Today, however, identifying appropriate crops for particular areas has become a difficult proposition. The application of machine learning techniques to agriculture is an emerging field of research that helps predicts crops for easy cultivation and improved productivity. In this work, a comparative analysis is undertaken using several classifiers like the k-Nearest Neighbor (kNN), Naïve Bayes (NB), Decision Tree (DT), Support Vector Machines (SVM), Random Forests (RF) and Bagging to help suggest the most suitable cultivable crop(s), based on soil and environmental characteristics, for a specific piece of land. The algorithms are trained with the training data and subsequently tested with the soil and climate-based test dataset. The results of all the approaches are evaluated to identify the best classification techniques. Experimental results show that the bagging method outclasses others with respect to all performance metrics.

Keywords:
agriculture; soil; environmental; crop; machine learning; classification

HIGHLIGHTS

Comparative analysis of Machine Learning techniques for Crop prediction.

Performance evaluation based on only soil, only environmental characteristics and both.

Performance analysis for classifiers using k-fold validation.

Performance analysis using data splitting method.

INTRODUCTION

Agriculture is a unique business proposition, with crop production largely dependent on the climate and soil. Consequently, agribusiness forecasts, recognizing plant disease, and advancing pesticide use are examined using a slew of information mining procedures prior to crop cultivation. Soil is a material asset that impacts land use. It is a natural resource, given the benefits it offers in terms of agricultural productivity. Minerals such as nitrogen, potassium, and phosphorus contribute to the organic composition of soil with their specific characteristics. Environmental factors such as the seasons, soil types, rainfall, and temperature also greatly impact crop cultivation. Notwithstanding the interaction between crop prediction, the environment and the weather, semi-linear variables involve a considerable level of difficulty. Machine learning could offer an effective alternative to crop cultivation predictions. Recommending suitable crops for a particular area is a major concern in agriculture, and is something that can be addressed through machine learning techniques.

Machine learning offers multiple methods to recognize rules and trends in large datasets, and has demonstrated a well-known predictive ability. A predictive model can be developed on its own. Unfortunately, however, machine learning approaches have so far not been applied on a large scale in the country, chiefly because numerical and plant simulation methods are still in vogue. In machine learning, classification techniques [11 Duda, Richard O, Hart, Peter E and Stork, David G. Pattern classification and scene analysis. New York: Wiley, 1973; 3: 731-9.-3] are used to predict the classes for each record in a dataset. Besides the use of advanced classification methods in remote sensing through the use of support vector machines, random forests, and rotational forests, scientists and researchers have worked to improve classification accuracy for analyzing predictions and helping make appropriate decisions. In general, there are three types of classification techniques used in prediction: supervised, unsupervised and reinforcement learning. Supervised learning trains the models with labelled soil and environmental characteristics as inputs and different crops as output pairs. Hence, it correctly predicts the suitable crop for unknown samples which contains soil and environmental characteristics from testing set. Supervised learning is used to classify the category of crops while unsupervised learning is used to groups the similar crop called clustering. Supervised learning predicts the target class based on current input whereas reinforcement learning sequential decision is happens; the next input depends on the outcome of learner. Hence, compared to other two learning techniques supervised learning is most suitable for crop cultivate prediction. This work uses supervised learning classification techniques for prediction, and shows their validity and quality, alongside those of graded crop mapping methods, following a comparative analysis. The primary contribution of this work is its attempt to find the best classification method to predict suitable crops for cultivation, based on the soil and environment.

Related work

Belson [44 Belson WA. Matching and prediction on the principle of biological classification. J. R. Stat. Soc. Ser. C. Appl. Stat., 1959;8(2):65-75.] described DT as models of classification and regression, developed in a tree-like architecture. A decision tree organizes a dataset in small homogeneous subsets (sub-populations), while simultaneously creating a corresponding tree map. Kohonen [55 Kohonen T, Learning vector quantization. Neural Network, 1988;1:303.] described instance-based models (IBM) as memory-based models that learn from the learning set by contrasting new examples with instances. Bayesian models (BM) are a family of probabilistic graphical models that help research Bayesian inference. They belong to the category of supervised learning models, and are used to solve classification or regression problems. Pearl [77 Pearl J, Probabilistic Reasoning in Intelligent Systems. Morgan Kauffmann Publishers Inc. San Francisco, CA, USA, 1988; 552.] discussed the Bayesian network, and Quinlan [88 Quinlan JR, C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers Inc.: San Francisco, CA, USA, 1993: 235-40.] the Iterative Dichotomizer as the most common learning algorithm in this class. Russell and Peter [99 Russell, Stuart J and Peter Norvig, Artificial Intelligence: A Modern Approach. Prentice Hall: Upper Saddle River, New Jersey, USA, 1995;9.] elaborated on the NB, Gaussian Naive Bayes, and multinomial Naive Bayes.

Ensemble learning (EL) models are designed to improve the predictive quality of a given statistical learning approach or model fitting technique by constructing a linear combination of simple base learners. Breiman [1010 Breiman L, Bagging Predictors. Machine Learning, 1996;24:123-40.] discussed the bootstrap aggregating or bagging algorithm, while Freund and Schapire [1111 Freund Y and Schapire RE. Experiments with a new boosting algorithm. In International Conference on Machine Learning, 1996;96:148-56.] proposed the Adaboost to reduce the errors of learning algorithms, and Schapire [1212 Schapire RE., A brief introduction to boosting. Proceedings of sixteenth In International Joint Conference on Artificial Intelligence, 1999;99:1401-6.] implemented the boosting algorithm. Smola and coauthors [1313 Smola A, Burges C, Drucker H, Golowich S, Hemmen LV, Muller Klaus-Robert, Bernhard Scholkopt et al. Regression Estimation with Support Vector Learning Machines. Master’s Thesis, The Technical University of Munchen, Germany, 1996:1-78.] described the most widely used SVM algorithms, including support vector regression. By turning the original feature space into a feature space of a higher dimension, the classification capabilities of conventional SVMs are significantly enhanced using the "kernel trick". Breiman [1414 Breiman L, Random forests. Machine learning, 2001;45(1):5-32.] described RF as a combination of tree predictors, with each tree dependent on the values of a separately sampled random vector with the same distribution for all the trees in the forest. As the number of trees in the forest grows, the forest generalization error converges to a limit. Cultivable crops are predicted, primarily on the basis of climatic features, giving the C4.5 algorithm an accuracy score of approximately 95% [1515 Veenadhari S, Bharat Misra, and Singh C D. Machine learning approach for forecasting crop yield based on climatic parameters. In 2014 International Conference on Computer Communication and Informatics, IEEE, 2014: 1-5.]. The environmental factors affecting crop yield, regions under cultivation, annual rainfall and food price indices, and defined the relationship between them. Environmental factors, coupled with algorithms like regression analysis (RA) and linear regression (LR), are used to analyze crop yields [1616 Sellam V and Poovammal E. Prediction of crop yield using regression analysis. Indian J. Sci. Technol., 2016;9(38):1-5.]. Priya and coauthors [1717 Priya P, Muthaiah U and Balamurugan M. Predicting yield of the crop using machine learning algorithm. Int. J. Eng. Sci. Res. Technol., 2018;7(1):1-7.] used real-time Tamil Nadu data to predict crop yields using the RF method. Jahan [1818 Jahan R. Applying Naive Bayes Classification Technique for Classification of Improved Agricultural Land soils. Int. J. Eng. Sci. Res. Technol., 2018; 6 (5): 189-93.], predict the soil types based on their characteristics and fertility by using NB classifier. Galvão and coauthors [1919 Galvao RKH, Araujo MCU, Fragoso WD, Silva EC, Gledson EJ, Soares SFC et al. A variable elimination method to improve the parsimony of MLR models using the successive projections algorithm. Chemometr. Intell. Lab. Syst., 2008; 92 (1): 83-91.], proposed Multiple Linear Regression (MLR) method by using corn dataset it contains soil parameters as a input. To improve the performance variable elimination method is carried out. Prasad Babu and coauthors [2020 Prasad Babu MS, Ramana Murty NV, and Narayana SVNL. A web based tomato crop expert information system based on artificial intelligence and machine learning algorithms. International Journal of Computer Science and Information Technologies, 2010; 1 (3): 6-15.], proposed a tomato crop advisory system based on soil and climate factors. This process done by ID3 algorithm and some optimization rule is applied to improve the performance. Jeong and coauthors [2121 Jeong JH, Resop JP, Mueller ND, Fleisher DH, Yun K, Butler EE, Timlin DJ et al. Random forests for global and regional crop yield predictions. PLoS One, 2016; 11 (6).], predict the crop yield using wheat and maize datasets which contains environmental factors. This prediction process done by using RF and MLR techniques and from the results, it is evident that the RF technique is efficient for crop yield analysis.

Motivation and justification

Several parameters impact agricultural production, and include those to do with climate (temperature, humidity and moisture), precipitation (irrigation, rainfall, and region-wise precipitation), and soil (potential of Hydrogen (pH), nutrients, organic carbon, and minerals like phosphorus, among others). Farmers still follow the standard practices adhered to by their ancestors. Soil characteristics in a particular region make it most appropriate for certain crops. Repeated planting of the same crops, however, decreases soil fertility and results in chemical accumulations that alter soil pH. The radical climatic changes characteristic of recent times can be effectively countered by the cultivation of alternative crops. The manual prediction and data collection involved in identifying suitable crops are drawbacks in agriculture. Manual prediction is affected by climatic changes. With advances in technology, the size of the data produced is enormous, and can be used to collect interesting trends in miscellaneous fields. The use of machine learning in agriculture helps farmers immensely. Thus justified, we use machine learning techniques to predict suitable crops for specific areas of land. These techniques work best when all soil types and environmental conditions are taken into consideration.

In machine learning, classification is the key to predicting the crop/s to be cultivated in specific areas. This work attempts an overview of techniques that help pick suitable crops, based on the soil and environment, using supervised learning techniques like the kNN, NB, DT, SVM, RF and bagging for crop prediction. Each algorithm has its pros and cons. The kNN does not work well with imbalanced data but resolves multi-class problems. The NB is very fast and can be used in real-time predictions, though each feature makes independent assumptions about the outcome. Data normalization is not needed in the DT, which is most data-sensitive. A slight change in the data is enough to change the outcome entirely. The SVM does not work well on overlapping classes, but has little impact on outliers. The RF handles errors in imbalanced data, but results in high computational costs while training a large number of deep trees. Bagging works well on high-dimensional data, and its performance is not affected by missing values in the dataset. However, it introduces a certain level of difficulty in the form of a loss of interpretability with regard to the model used. Since each classifier carries out prediction in its own unique way, it is essential to find the most accurate classifier for crop prediction. Motivated by the facts above, this work focuses on finding the best classifier for crop prediction so as to maximize production.

Outline of the work

Figure 1 depicts the overall process of this work. First, input data is preprocessed to find missing values, eliminates redundant data, and standardize the dataset. Next, the preprocessed data are subject to several classification techniques to determine the most suitable crops for a particular stretch of land. Prior to applying the classification techniques, the dataset is split into training and testing phases. Samples from the training dataset are used to train the classification algorithm to find the crop/s ideally suited to cultivation in a specific area. The unknown data from the testing dataset are given to the trained classifier to predict a suitable crop, following which the results are evaluated using different performance metrics. An analysis is undertaken to obtain the best classification method. Information on the predicted or recommended crop/s to be grown can be provided to the farming community, based on the results obtained.

Organization of the paper

The remaining part of the paper is organized as follows: Section II gives methodology for crop prediction. Section III discusses the experimental results followed by conclusion.

Figure 1
Outline of the Work.

MATERIAL AND METHODS

Background study

Predicting crops for cultivation enables agricultural departments to put in place strategies for improvement. Crop prediction is based on factors such as the climate, geography, genetics, politics and economics. Risks related to these variables can be quantified if the appropriate computational or quantitative methodologies are implemented. Bootstrap aggregating (bagging) is a meta-algorithm for machine learning that enhances the consistency and precision of the algorithms used through statistical classification and regression. It also significantly reduces variability and prevents overfitting [2222 Zala, Dipika H, and Chaudhri, MB. Review on use of BAGGING technique in agriculture crop yield prediction, International Journal for Scientific Research & Development, 2018;6(8):675-7.]. As noted earlier, preprocessing techniques can easily be incorporated into the learning algorithms that constitute the ensemble. Because of their simplicity and strong generalization potential, several methods have been developed using bagging ensembles to fix class disparity issues. This section compares different existing classification techniques and identifies the best for crop prediction.

K Nearest Neighbor

The kNN is a non-complex algorithm that predicts suitable crop based on certain similarity measures [2323 Pudumalar S, Ramanujam E, Harine Rajashree R, Kavya C, Kiruthika T and Nisha J. Crop recommendation system for precision agriculture. In 2016 Eighth International Conference on Advanced Computing (ICoAC), IEEE, 2017:32-6.]. The closeness measure is calculated by distance measures like the Euclidean distance and Manhattan distance [2424 Anantha Reddy D, Bhagyashri D and Watekar A. Crop Recommendation System to Maximize Crop Yield in Ramtek region using Machine Learning. Int. J. Sci. Res. Sci. Technol. 2019;6(1):485-9.]. In this work, Euclidean distance is used to find the shortest distance between training and testing samples. Top nearest class is taken and that class is assign as suitable crop for cultivation. In the kNN, feature vectors are stored in the training phase of the algorithm. The class labels of the training samples and target class of crops are classified by assigning the most frequent label of the nearest training samples. To validate the model, k cross fold (10 folds) validation is used. Fit the model using k-1 fold and validate the model using kth fold. Figure 2 depicts the work flow of kNN.

Figure 2
Flow diagram of kNN.

Naive Bayes

The Naïve Bayes technique assigns class labels to problem instances for constructing classifier models [2323 Pudumalar S, Ramanujam E, Harine Rajashree R, Kavya C, Kiruthika T and Nisha J. Crop recommendation system for precision agriculture. In 2016 Eighth International Conference on Advanced Computing (ICoAC), IEEE, 2017:32-6.], and is based on Bayes’ theorem [1818 Jahan R. Applying Naive Bayes Classification Technique for Classification of Improved Agricultural Land soils. Int. J. Eng. Sci. Res. Technol., 2018; 6 (5): 189-93.]. The NB is not a single algorithm for training a classifier but a family of algorithms based on common principles. It assumes that the value of a particular feature is independent of the value of any other quality, given the class variable [2323 Pudumalar S, Ramanujam E, Harine Rajashree R, Kavya C, Kiruthika T and Nisha J. Crop recommendation system for precision agriculture. In 2016 Eighth International Conference on Advanced Computing (ICoAC), IEEE, 2017:32-6.]. It works based on probability theory and it choose the suitable crop from testing samples which has the maximum probability. The potential of NB classifier for crop prediction is evaluated using k cross fold validation method. The crop dataset is split into two subsets where k-1 fold is used to train the model and kth model is used for validate the model. Figure 3 shows the NB flow diagram of crop prediction process.

Figure 3
Flow diagram of NB.

Decision Tree

The decision tree is a single tree predictive model that is based on the data structure of the tree [1818 Jahan R. Applying Naive Bayes Classification Technique for Classification of Improved Agricultural Land soils. Int. J. Eng. Sci. Res. Technol., 2018; 6 (5): 189-93.]. A tree consists of decision nodes and decision leaves [2424 Anantha Reddy D, Bhagyashri D and Watekar A. Crop Recommendation System to Maximize Crop Yield in Ramtek region using Machine Learning. Int. J. Sci. Res. Sci. Technol. 2019;6(1):485-9.]. Each split is labeled with an input feature and leaf as a target class that is crop. It executes a top-down approach by choosing a value for the variable at each stop that best splits a set of items [2525 Balducci F, Impedovo D and Pirlo G. Machine learning applications on agricultural datasets for smart farm enhancement. Machines, 2018;6(3):38-59.], depending on the application and makes the decision to find the suitable crop for cultivation. The aptness of this technique for crop cultivates prediction examined by the k cross fold validation method. The samples are split into k and k-1 subsample. The sample k is used for testing the model and k-1 samples are used to train the model. Figure 4 illustrates the DT work flow of prediction process.

Figure 4
Flow diagram of DT.

Support Vector Machine

The SVM is a supervised machine learning algorithm which breaks data into decision surfaces. The decision surfaces further divide the data into two hyperplane groups [2626 Suykens JAK and Vandewalle J. Least squares support vector machine classifiers. Neural Process Lett,1999;9(3):293-300.]. The training points specify the vector which supports the hyperplane. This hyperplane is used for crop prediction process. The crop that lies nearby the surface is urging for cultivation. Further, the ability of this technique is examined using k- cross fold validation process. In this work, 10 fold is used for validation where k-1 folds are used to fit the model for crop prediction and kth fold is used to test the model. Figure 5 represents the SVM work flow for crop prediction.

Figure 5
Flow diagram of SVM.

Random Forest

The random forest is a popular and powerful supervised machine learning algorithm that resolves both classification and regression problems [1717 Priya P, Muthaiah U and Balamurugan M. Predicting yield of the crop using machine learning algorithm. Int. J. Eng. Sci. Res. Technol., 2018;7(1):1-7.]. The RF is a multiple tree which includes a large number of individual decision trees. To decide the suitable crop of a test samples, it aggregates votes from different decision trees and based on the results it recommend the suitable crop. Additional, this technique is evaluated by k- cross fold method for predicting the suitable crop. The dataset samples are divided into two sub samples then fit the model using k-1 sample and test the model using kth sample. Figure 6 depicts the work flow of RF.

Bagging

Bagging, also known as bootstrap aggregating, was introduced by Breiman [1010 Breiman L, Bagging Predictors. Machine Learning, 1996;24:123-40.], and is used to train and combine multiple copies of a learning algorithm [2323 Pudumalar S, Ramanujam E, Harine Rajashree R, Kavya C, Kiruthika T and Nisha J. Crop recommendation system for precision agriculture. In 2016 Eighth International Conference on Advanced Computing (ICoAC), IEEE, 2017:32-6.]. It improves the stability of the learning algorithm and enhances the results of the prediction algorithm [2222 Zala, Dipika H, and Chaudhri, MB. Review on use of BAGGING technique in agriculture crop yield prediction, International Journal for Scientific Research & Development, 2018;6(8):675-7.]. Bagging splits the training samples as a sub samples to train the model for crop prediction. It takes the votes from each sub sample to predict the suitable crop from testing dataset. In this work, Adaptive Bagging (AdaBag) is used for prediction process. Since bagging does not permit weight recalculation, there is no need to change the weight update equation or modify the algorithm’s calculations. To estimate the accuracy of bagging technique for crop prediction k- cross fold validation is used. For this process the dataset is split into two subsamples as k-1 and k samples. To train the model k-1 sub samples are used and kth sample is used to validate the model. The work flow of bagging is given in Figure 7.

Crop prediction procedure

The algorithm for crop prediction is given below. The soil and environmental parameters are given as inputs, and a suitable crop is the output.

Algorithm

  • Step 1: Import a set of data.

  • Step 2: Preprocess the data to find the missing values and replicas for standardizing the data. Using preprocessing, it converts target variables into factor variables.

  • Step 3: Split the preprocessed data to be used in the training and testing datasets.

Training Phase

  • Step 4: Take 70% of the samples from the training dataset as training samples.

  • Step 5: Apply the classification algorithm to the training samples.

  • Step 6: Train the classification algorithm well with the training dataset to find a suitable crop.

Testing Phase

  • Step 7: Take 30% of the samples from the testing dataset as testing samples.

  • Step 8: Apply the trained classifier to all the testing samples used to identify a suitable crop for cultivation in a particular patch of land.

  • Step 9: The trained classifier finds the target label for new instances to predict a suitable crop.

  • Step 10: Finally, the result recommends a suitable crop for cultivation.

Figure 6
Flow diagram of RF.

RESULTS AND DISCUSSIONS

Dataset Description

This work utilizes an agricultural dataset that includes soil characteristics and environmental factors, collected from the Agricultural Department of Sankarankovil Taluk, Tenkasi District, Tamil Nadu, India. The dataset contains 1000 instances and 16 attributes, where 12 attributes are soil characteristics and the remaining 4 environmental. In this work, the 9 crops used for the prediction process include paddy, maize, black gram, green gram, rinja gram, rinjal, lady’s finger, tomato and chickpea. The data are collected from various villages in and around Sankarankovil.

Table 1 presents information on the soil type, a brief description of the soil, and the environmental attributes impacting crop prediction.

Figure 7
Flow diagram of Bagging.

Table 1
Dataset Description of Crop Dataset.

Table 1 presents information on the soil type, a brief description of the soil, and the environmental attributes impacting crop prediction.

Performance Metrics

The performance of crop prediction is measured using the following performance metrics. The formulae, and a description of each used in the experimental analysis, are given in Table 2.

Table 2
Performance Metrics Description

RESULTS AND DISCUSSION

This section compares several classification techniques for crop prediction, based on the soil and environmental conditions of a particular land area, using the performance metrics of accuracy, kappa, precision, recall, specificity and F1 score.

Performance comparison of Classification techniques based on Soil Characteristics

Table 3 shows a performance evaluation of classification techniques, based only on the soil conditions discussed in Table 1.

Table 3
Performance comparison of classification methods based on soil conditions.

Table 3 shows that bagging finds more accurate cultivable crops, based on soil characteristics, than other techniques. Further, bagging takes votes for each sample for improved performance, based on which it offers better crop prediction accuracy than other methods.

Performance comparison of Classification techniques based on Environmental Conditions

Table 4 represents a performance analysis of classification methods, based only on environmental factors such as texture, season, rainfall and average temperature.

Table 4
Performance comparison of classification methods based on environmental conditions

Table 4 shows that bagging selects cultivable crops, based on environmental characteristics, more accurately than other techniques. In addition, bagging is a homogenous ensemble method; in this work decision tree is used for ensemble technique. It splits the whole dataset which contains soil and environmental characteristics as sub samples. The different sub samples were separately trained with the single decision tree model and each model predicts the suitable crop. Finally, the outcomes of each model are combined by voting techniques to produce single result for crop cultivation.

Performance comparison of Classification techniques based on Soil and Environmental Characteristics

Table 5 represents a performance analysis of classification techniques, based on factors such as the soil and environment, following a comparison of them all.

Table 5
Performance comparison of classification methods based on Soil and Environmental conditions.

Table 5 infers that bagging produces more accurate results than the others, based on both soil and environmental characteristics. The variance of an estimate is reduced considerably by the bagging technique, using its aggregation procedure. Consequently, it has better crop prediction accuracy than other methods.

Table 3, Table 4 and Table 5 infer that the bagging technique has the best crop prediction accuracy, based on both soil and environmental characteristics, compared to only on soil characteristics and only on environmental factors.

Performance evaluation of Classification techniques using k-fold validation

To validate the performance of classification techniques for crop prediction, the fold variation method is used. Table 6 shows a performance evaluation of classification techniques to find the most suitable crop for a particular land area, based on various cross-fold validations to obtain the best fold of all the classification methods. The fold ranges vary from 10 to 90.

Table 6
Performance of the Classification methods based on fold variation

Table 6 above infers that classification techniques perform best in 10-fold-based cross-fold validation in terms of accuracy, kappa, precision, recall, sensitivity and f1 score. Table 6 clearly shows that the bagging classifier outperforms other 10-fold-based methods.

Performance evaluation of classification techniques using data splitting validation

To validate the crop prediction performance, a validation method termed data splitting is used. The following graphical representation below shows a performance evaluation of classification techniques for finding suitable crops for a particular land area, based on data splitting, to get the best training and testing splitting ranges. The ranges vary from between 25% - 75% and 75% - 25%. Performances are evaluated using the metrics of accuracy, kappa, precision, recall, specificity and F1 score.

The Figure 8 shows that the bagging classifier works better in the 70% - 30% splitting range than other splitting ranges, based on the metrics mentioned in Table 2.

Figure 8
Performance evaluations of Bagging classifier using data splitting method

Figure 9 presents a performance evaluation of the RF classifier, based on several metrics. The RF classification technique works better in the 70% - 30% data splitting range than other splitting ranges. Figure 10 shows a performance evaluation of the SVM classification method, with its prediction accuracy down from the RF and bagging. From the results, it is evident that the SVM classification technique performs better in the 70% - 30% data splitting range. Figure 11 clearly shows that the DT works better in the 70% - 30% range than other splitting ranges. The experimental results reveal that the decision tree classifier does not outperform the SVM, RF and bagging algorithms. Figure 12 depicts that the DT, SVM, RF and bagging algorithms outperform the NB classifier. The NB classifier work well in the 70%-30% range than other data splitting ranges. From figure 13, it is evident that the kNN classifier performs better with the 70% - 30% data splitting range than other splitting ranges. Further, the kNN technique has the least prediction accuracy of all the techniques.

Figure 9
Performance evaluations of RF classifier using data splitting method

Figure 10
Performance evaluations of SVM classifier using data splitting method

Figure 11
Performance evaluations of DT classifier using data splitting method

Figure 12
Performance evaluations of NB classifier using data splitting method

Figure 13
Performance evaluations of kNN classifier using data splitting method

The figures mentioned above reveal that all the classifiers perform much better with the 70%-30% data splitting range as training and testing ranges. The bagging classifier makes the best predictions, compared to other methods.

CONCLUSION

This work presents a comparative analysis of classification approaches such as the kNN, NB, DT, SVM, RF and bagging to predict suitable crop/s for particular land areas. The results are compared with respect to performance metrics like accuracy, kappa, sensitivity, specificity, precision and F1-score. Owing to the use of multiple learning algorithms, the bagging algorithm offers better predictions than other algorithms, based on the soil and environmental conditions observed from the experimental results. The algorithms above only provide guidelines for suitable crops for specific areas of land. Future directions include suggestions on fertilizer use for crops, as well as recommendations on alternative crops for arable land.

Acknowledgments

We would like to thank Department of Agriculture Sankarankovil Taluk, Tenkasi District, Tamilnadu, India for providing data for the analysis.

REFERENCES

  • 1
    Duda, Richard O, Hart, Peter E and Stork, David G. Pattern classification and scene analysis. New York: Wiley, 1973; 3: 731-9.
  • 2
    Breiman L, Friedman J, Charles J S and Richard A. Olshen. Classification and regression trees. CRC press, 1984.
  • 3
    Richard E. Neapolitan, Models for reasoning under uncertainty. Applied Artifical Intelligence, 1987;1(4):337-66.
  • 4
    Belson WA. Matching and prediction on the principle of biological classification. J. R. Stat. Soc. Ser. C. Appl. Stat., 1959;8(2):65-75.
  • 5
    Kohonen T, Learning vector quantization. Neural Network, 1988;1:303.
  • 6
    Atkeson, Christopher G, Moore, Andrew W and Schaal S. Locally weighted learning. In Lazy learning, Springer, Dordrecht, 1997:11-73.
  • 7
    Pearl J, Probabilistic Reasoning in Intelligent Systems. Morgan Kauffmann Publishers Inc. San Francisco, CA, USA, 1988; 552.
  • 8
    Quinlan JR, C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers Inc.: San Francisco, CA, USA, 1993: 235-40.
  • 9
    Russell, Stuart J and Peter Norvig, Artificial Intelligence: A Modern Approach. Prentice Hall: Upper Saddle River, New Jersey, USA, 1995;9.
  • 10
    Breiman L, Bagging Predictors. Machine Learning, 1996;24:123-40.
  • 11
    Freund Y and Schapire RE. Experiments with a new boosting algorithm. In International Conference on Machine Learning, 1996;96:148-56.
  • 12
    Schapire RE., A brief introduction to boosting. Proceedings of sixteenth In International Joint Conference on Artificial Intelligence, 1999;99:1401-6.
  • 13
    Smola A, Burges C, Drucker H, Golowich S, Hemmen LV, Muller Klaus-Robert, Bernhard Scholkopt et al. Regression Estimation with Support Vector Learning Machines. Master’s Thesis, The Technical University of Munchen, Germany, 1996:1-78.
  • 14
    Breiman L, Random forests. Machine learning, 2001;45(1):5-32.
  • 15
    Veenadhari S, Bharat Misra, and Singh C D. Machine learning approach for forecasting crop yield based on climatic parameters. In 2014 International Conference on Computer Communication and Informatics, IEEE, 2014: 1-5.
  • 16
    Sellam V and Poovammal E. Prediction of crop yield using regression analysis. Indian J. Sci. Technol., 2016;9(38):1-5.
  • 17
    Priya P, Muthaiah U and Balamurugan M. Predicting yield of the crop using machine learning algorithm. Int. J. Eng. Sci. Res. Technol., 2018;7(1):1-7.
  • 18
    Jahan R. Applying Naive Bayes Classification Technique for Classification of Improved Agricultural Land soils. Int. J. Eng. Sci. Res. Technol., 2018; 6 (5): 189-93.
  • 19
    Galvao RKH, Araujo MCU, Fragoso WD, Silva EC, Gledson EJ, Soares SFC et al. A variable elimination method to improve the parsimony of MLR models using the successive projections algorithm. Chemometr. Intell. Lab. Syst., 2008; 92 (1): 83-91.
  • 20
    Prasad Babu MS, Ramana Murty NV, and Narayana SVNL. A web based tomato crop expert information system based on artificial intelligence and machine learning algorithms. International Journal of Computer Science and Information Technologies, 2010; 1 (3): 6-15.
  • 21
    Jeong JH, Resop JP, Mueller ND, Fleisher DH, Yun K, Butler EE, Timlin DJ et al. Random forests for global and regional crop yield predictions. PLoS One, 2016; 11 (6).
  • 22
    Zala, Dipika H, and Chaudhri, MB. Review on use of BAGGING technique in agriculture crop yield prediction, International Journal for Scientific Research & Development, 2018;6(8):675-7.
  • 23
    Pudumalar S, Ramanujam E, Harine Rajashree R, Kavya C, Kiruthika T and Nisha J. Crop recommendation system for precision agriculture. In 2016 Eighth International Conference on Advanced Computing (ICoAC), IEEE, 2017:32-6.
  • 24
    Anantha Reddy D, Bhagyashri D and Watekar A. Crop Recommendation System to Maximize Crop Yield in Ramtek region using Machine Learning. Int. J. Sci. Res. Sci. Technol. 2019;6(1):485-9.
  • 25
    Balducci F, Impedovo D and Pirlo G. Machine learning applications on agricultural datasets for smart farm enhancement. Machines, 2018;6(3):38-59.
  • 26
    Suykens JAK and Vandewalle J. Least squares support vector machine classifiers. Neural Process Lett,1999;9(3):293-300.

Edited by

Editor-in-Chief:

Paulo Vitor Farago

Associate Editor:

Adriel Ferreira da Fonseca

Publication Dates

  • Publication in this collection
    25 Oct 2021
  • Date of issue
    2021

History

  • Received
    28 July 2020
  • Accepted
    16 Feb 2021
Instituto de Tecnologia do Paraná - Tecpar Rua Prof. Algacyr Munhoz Mader, 3775 - CIC, 81350-010 Curitiba PR Brazil, Tel.: +55 41 3316-3052/3054, Fax: +55 41 3346-2872 - Curitiba - PR - Brazil
E-mail: babt@tecpar.br