Automation in accession classification of Brazilian Capsicum germplasm through

Germplasm classification by species requires specific knowledge on/of the culture of interest. Therefore, efforts aimed at automation of this process are necessary for the efficient management of collections. Automation of germplasm classification through artificial neural networks may be a viable and less laborious strategy. The aims of this study were to verify the classification potential of Capsicum accessions regarding/ the species based on morphological descriptors and artificial neural networks, and to establish the most important descriptors and the best network architecture for this purpose. Five hundred and sixty-four plants from 47 Brazilian Capsicum accessions were evaluated. Neural networks of multilayer perceptron type were used in order to automate the species identification through 17 morphological descriptors. Six network architectures were evaluated, and the number of neurons in the hidden layer ranged from 1 to 6. The relative importance of morphological descriptors in the classification process was established by Garson’s method. Corolla color, corolla spot color, calyx annular constriction, fruit shape at pedicel attachment, and fruit color at mature stage were the most important descriptors. The network architecture with 6 neurons in the hidden layer is the most appropriate in this study. The possibility of classifying Capsicum plants regarding/ the species through artificial neural networks with 100 % accuracy was verified.


Introduction
Capsicum is a genus of the highly diverse Solanaceae family with origins in South and Central America (Nicolai et al., 2013).Cultivated forms of Capsicum represent one of the most economically important vegetable crops worldwide (Albrecht et al., 2012), used as fresh vegetables and spices (Ibiza et al., 2012).Five domesticated Capsicum species are recognized and include C. annuum, C. baccatum L., C. chinense Jacq., C. frutescens L. and C. pubescens Ruiz et Pav.Twenty-five additional wild Capsicum species are documented (Djian-Caporalino et al., 2007).These species have been introduced from South America and C. baccatum var.pendulum (Willd.)Eshbaugh is endemic to the south-south-eastern region of Brazil (Albrecht et al., 2012).
High levels of global biodiversity and a limited number of taxonomists represent significant challenges to the future of biological study and conservation.The main problem is almost all taxonomic information exists in languages and formats not easily understood or shared without a high level of specialized knowledge and vocabularies.Thus, taxonomic knowledge is localized within limited geographical areas and among a limited number of taxonomists.Furthermore, an expert on one species or family may be unfamiliar with another (Cope et al., 2012).This lack of accessibility of taxonomic knowledge to the general public has been termed the "taxonomic crisis" (Dayrat, 2005).This has led to an increasing interest in automating the process of species identification and related tasks (Cope et al., 2012).
Artificial neural networks (ANN), as a pattern recognition tool, have been used for modeling complex systems (Azevedo et al., 2015).Its main advantages are the fact that it is non-parametric, it enables nonlinear solutions, and considers several explanatory variables simultaneously (Niska et al., 2010).This technique has been successfully used in the identification of Banksia integrifólia genotypes (Pandolfi et al., 2009), Camellia species (Lu et al., 2012), weed species (Li et al., 2009) and wheat plants among weed herbs (Gomez-Casero et al., 2010).
The aim of this study was to classify Brazilian pepper germplasm accessions regarding/ the species through artificial neural networks, through the use of morphological descriptors; to establish those of greater importance; and to determine the best network architecture.
Forty-seven Capsicum accessions of the Vegetable Germplasm Bank of the Federal University of Viçosa (Table 1) were evaluated.Out of these accessions, 17 belong to the C. annuum var.annuum species; 20 belong to the C. baccatum var.pendulum species; one belongs to the C. baccatum var.baccatum species; seven belong to artificial neural networks the C. chinense species; and two belong to the C. frutescens species.Seedlings were produced in expanded polystyrene trays of 128 cells.Transplanting to the field was carried out when seedlings presented 5 or 6 true leaves.Plants were spaced 1.0 m between rows, and 0.6 m between plants.Twelve plants were evaluated for each one of the 47 accessions, making 564 plants in total.
For better network efficiency, before training, input data were normalized for an interval between -1 and 1.The maximum number of training epochs was set as 1000; the minimum mean squared error (MSE) for stopping was set as 1.0 × 10 −7 , and the maximum number of successive failures (early stopping) was set as 6.
All trained networks had a neuron in the output layer, and a single hidden layer.In order to identify the best network architecture, 1-6 neurons were tested in the hidden layer, making a total of 6 architectures.Logistic activation function was used in the hidden layer, and a linear activation function was used in the output layer.At the beginning of the training, the synaptic weights are randomly generated, which influences the final result.Thus, 1000 trainings were carried out for each network architecture.Network efficiency was presented in bar graphs for accuracy rate (percentage of correct classifications), accompanied by the respective standard deviations.Hyperbolic tangent function activation was used for neurons in the hidden layers.For the output layer, the linear function was used.

Determining variable importance
In order to reduce the required number of traits for plant classification, the most important traits were determined by Garson's method (1991).The relative contribution (%) was presented by bar graph, with the respective standard deviations.Subsequently, new trainings (1000 trainings) were carried out for each of the six network architectures, considering only the most important traits as input.

Results and Discussion
High efficiency of artificial neural networks was found in the classification of Brazilians Capsicum acces-sions regarding the species.When six neurons were used in the hidden layer, the accuracy rate was 100 % in 1000 trainings (Figure 1).This accuracy rate decreased with the reduction of the number of neurons in the hidden layer.When only one neuron was used in the hidden layer, in the 1000 trainings, there was a mean accuracy rate of 78 %.Li et al. (2009), studying weeds classification, also observed a decrease in the efficiency of neural networks with a reduced number of neurons in the hidden layer.
Figure 2 shows the network architecture with six neurons in the hidden layer, which was more efficient (Figure 1).In a similar study, Li et al. (2009) used 60 neurons in the hidden layer, and found a 78 % accuracy rate.The greater efficiency found in the present study (100 %) may be explained by the greater number of plants evaluated (564 plants), since for more efficient network training, it is important the availability of large data sets (Azevedo et al., 2015).Furthermore, the explanatory variables used in this study may have been more appropriate than those used by Li et al. (2009).
Although the 17 descriptors used in this study are easy to measure, their evaluation may be unfeasible if there is a very large number of plants to be classified.In this context, the study of the most important traits in prediction through ANN becomes necessary, which makes it possible to reduce computational effort and the use of labor (Paliwal and Kumar, 2011).
After applying the descriptors with a lower relative contribution, one hundred new trainings were carried out, and a high level of efficiency in plant classification was confirmed (Figure 4).When using six neurons in the hidden layer, a 100 % accuracy rate was found.Very close results were also found for network architectures with four and five neurons in the hidden layer.In this way, the feasibility of reducing the number of descriptors to five was verified.The morphological descriptor with the highest relative contribution when considering six neurons in the hidden layer was corolla color (56 %), followed by corolla spot color (18 %), calyx annular constriction (11 %), fruit shape at pedicel attachment (8 %), and fruit color at the mature stage (7 %) (Figure 5).The three most important traits (color corolla, corolla spot color, and calyx annular constriction) are associated with flowers.Traits associated with flowers are also considered as important in the classification of pepper varieties by Sudré et al., 2010.According to these authors, the different species and varieties of peppers can be differentiated by morphological traits, especially in flowers, such as the position of the flower and the pedicel, the presence or absence of spots in the petal lobes, the edge of the calyx, and the number of flowers per internode.
The possibility of automation through ANNs for species classification is important, since it eliminates the necessity of extensive knowledge on taxonomy (Cope et al., 2012).Furthermore, it enables the evaluation of a large number of plants in an easy, efficient and less laborious way (Husin et al., 2012).

Conclusion
The morphological descriptors corolla color, corolla sport color, calyx annular constriction, fruit shape at pedicel attachment, and fruit color at mature stage are the most important in this study.Network architecture with six neurons in the hidden layer is the most appropriate.It is possible to classify plants of the Capsicum genus by species through artificial neural networks, considering morphological descriptors with 100 % accuracy.

Figure 1 −
Figure 1 − Accuracy rate and its standard deviation regarding the classification of Capsicum species through artificial neural networks of multilayer perceptron type, using 17 morphological descriptors.

Figure 4 −
Figure 4 − Accuracy rate and its standard deviation in terms of the classification of pepper species through artificial neural networks of multilayer perceptron type, using 5 morphological descriptors.