Data mining as a hatchery process evaluation tool

The hatchery is one of the most important segments of the poultry chain, and generates an abundance of data, which, when analyzed, allow for identifying critical points of the process. The aim of this study was to evaluate the applicability of the data mining technique to databases of egg incubation of broiler breeders and laying hen breeders. The study uses a database recording egg incubation from broiler breeders housed in pens with shavings used for litters in natural mating, as well as laying hen breeders housed in cages using an artificial insemination mating system. The data mining technique (DM) was applied to analyses in a classification task, using the type of breeder and house system for delineating classes. The database was analyzed in three different ways: original database, attribute selection, and expert analysis. Models were selected on the basis of model precision and class accuracy. The data mining technique allowed for the classification of hatchery fertile eggs from different genetic groups, as well as hatching rates and the percentage of fertile eggs (the attributes with the greatest classification power). Broiler breeders showed higher fertility (> 95 %), but higher embryonic mortality between the third and seventh day post-hatching (> 0.5 %) when compared to laying hen breeders’ eggs. In conclusion, applying data mining to the hatchery process, selection of attributes and strategies based on the experience of experts can improve model performance.


Introduction
The expansion of poultry farming is founded on artificial incubation, one of the most important segments of the poultry chain, which ensures the performance of breeders and the quality of day-old chicks. With the evolution of available technology, the hatchery process now generates more data with potential for information extraction that can help increase process performance.
Classic data processing methods are at times not efficient in elaborating biological responses and factors (Mehri, 2013) such as genetic selection for meat or egg production, which influence the development and metabolism of the embryo during the hatching process (Buzala et al., 2015;Burggren et al., 2015).
A number of data and knowledge strategies to develop precision livestock farming are currently emerging such as the expert systems involving data mining (Vale et al., 2008), fuzzy logic (Sousa et al., 2016;Dominiak and Kristensen, 2017) and artificial neural networking (ANN; Kumar and Hancke, 2015). The performance of ANN when applied to egg hatchability prediction is satisfactory (Bolzan et al., 2008), but it does not explain how the features are used to solve the problem. On the other hand, data mining (DM) explains the acquisition of knowledge in terms of logical rules, such as the construction of classification trees. DM has been successfully applied in animal science to animal environment (Vale et al., 2010), laying breeders' production (Ferreira et al., 2013), breeding certification (Vieira et al., 2015), animal welfare (Moi et al., 2014), and egg quality prediction (Orhan et al., 2016).
Applying DM to organizing and analyzing a sizeable pool of data, the solutions are obtained faster than traditional methods with optimal predictive performance, which are needed in industrial applications (Koksal et al., 2011). Another advantage of the application of DM is that a mixture of numeric, categorical, and date data, can tolerate missing and noisy data, thereby increasing efficiency and detect potential quality problems in processes (Lei-da Chen and Frolick, 2000), including the poultry chain.
The aim of this study was to evaluate the applicability of the DM technique to building decision trees, data from hatchery results from broiler breeders and laying hen breeders housed under different systems.

Materials and Methods
The study used a database of hatcheries of brown egg laying hen breeders and broiler breeders from Apr to Sept 2011. Broiler breeders were housed in pens with shavings used for litters and reproductive management with natural mating. Laying hen breeders were housed in cages using artificial insemination. It was expected that the different systems would generate differences that the DM technique would probably explain.
The analysis used the DM technique for a classification task, having as classifier two genetic classes: broiler breeders (BB) and laying hen breeders (LH). The database had 49 egg hatching batches, 33 from LH and 16 from BB, and the attributes and distributions are in Table 1.

Biometry, Modeling and Statistics
Research Article Hatchery process evaluation tool Sci. Agric. v.77, n.4, e20180074, 2020 The database was analyzed using the Weka® software tool version 3.7.8, and the classification algorithm J48 for classification tasks. A classification tree, a graphical view of inverted tree form, was generated in which the root node is the first variable with the highest classification power presenting below the branches, formed by the other attributes that allow for classification.
The data mining technique was applied according to Vale et al. (2008) following the CRISP-DM (Cross Industry Standard Process for Data Mining) methodology. Three approaches to database analysis were used as follows: 1. analysis of database without attribute selection; 2. using attribute selection algorithms available in Weka®; and 3. attribute selection based on the help of experts with a minimum of two years of experience in hatchery.
The selection of attributes by Weka® algorithms seeks to find in databases those attributes which are determinants in the process of knowledge extraction. The algorithms used for the attribute selection were: InfoGain, which evaluates the value of an attribute by measuring the gain of information with respect to the class; CFS (Correlation-based Feature Selection), which evaluates the value of a subset of attributes by considering the individual predictive capacity of each resource along with an inherent degree of redundancy; and Gain-Ratio, which evaluates the value of an attribute by measuring the rate of gain with respect to the class (Witten et al., 2016).
The selection of the best model was based on the general precision of the models and the accuracy of the classes according to Vale et al. (2008) and Vale et al. (2010), using a contingency matrix and interpretation of the classification rules by experts.
The database was submitted to an analysis of variance using the R® software package, version 2016.

Results
DM resulted in three classification trees of great relevance. The first classification modeled from the database without attribute selection (Figure 1, model precision of 76 %), classified the performance of hatching eggs from LH with better performance than those of BB, and class accuracy of 0.82 and 0.63 respectively.
The second classification approach used only relative values as directed by an expert. The percentage of fertile eggs was the main attribute able to classify the strains (Figure 2, model precision of 90 %, class accuracy of 0.94 and 0.82 respectively for LH and BB). The model rules determined that the percentage of fertile eggs less or equal or are less than 95 % was the variable with the highest classification power to define LH. This result confirms the results of Figure 1.
The third tree using the Weka® attribute selection approach is shown in Figure 3 (94 % model precision, and class accuracy 1.00 and 0.84 respectively for LH and BB). The percentage of fertile eggs was the only variable capable of classifying LH and BB egg performance.
The attributes of root nodes from the three decision trees were submitted to analysis of variance, and the means between the classes were different (p < 0.001).  constant, absolute egg number information may not be a good indicator of the incubation process, requiring relative values that can be preprocessed. In the second tree BB has higher egg fertility and higher embryo mortality in the M2 phase, between the third and day of incubation (Buzala et al., 2015;Van Emous et al., 2015). This initial higher embryonic mortality is due to eggshell quality, egg size and, higher egg contamination from litter and nests (Ishaq et al., 2014;Parisi et al., 2015). The result reinforces the efficiency of the technique in extracting patterns for the differentiation of groups with different patterns.
The third classification tree focuses on the minimum information necessary to describe knowledge (Buczak and Guven, 2016). The rates of fertile eggs had great classification power and premise to identify classification rules linked with the production process (Vale et al., 2008).
The three approaches used in this DM confirm the importance of preprocessing data. Feature selection by experts or using algorithms is necessary in order to improve model precision and accuracy, removing irrelevant and redundant data (Kashef and Nezamabadi-Pour, 2015;Ramya et al., 2017).
The differences between the means of root node attributes of LH and BB prove that there are differences between the classes analyzed thereby improving the applicability of the DM technique and use of decision trees for extracting useful patterns from hatchery databases. Thus, taking note of other applications in animal science, the decision trees can be applied across the industry to quickly and efficiently find the patterns and solutions for quality problems.

Conclusion
The data mining technique allows for the classification of hatchery fertile eggs from different genetic groups. Hatching rates and the percentage of fertile eggs are the attributes with the highest classification power. Feature selection and the implementation of strategies based on the experience of experts improve model performance.

Acknowledgments
To the FAPERGS (Fundação de Amparo à Pesquisa do Estado do Rio Grande do Sul) for providing a Master's degree scholarship.

Discussion
The LH housed in cages results in a lower percentage of contaminated eggs because the eggs do not have contact with litter nor feces. The bacterial count in eggs of birds housed in litter systems is higher when compared to eggs from caged systems (Roberts and Chousalkar, 2014;Parisi et al., 2015).
A higher number of infertile eggs for LH is associated with artificial insemination, using diluted semen once a week (Getachew, 2016;Hughes, 1978). The ability of DM to find variations in hatching rate and infertility can identify eggs from different farms with different sanitary status and male quality.
The higher relative air humidity in the incubator for BB is related to the larger egg size and greater eggshell pore number, losing more water during embryo development changing the incubation environment (Boleli et al., 2016;Araújo et al., 2017). The presence of this variable in the construction of rules is consistent since many incubation problems arise from changes in the variables of the physical incubation environment.
The model considered the absolute number of incubated eggs when the hatchery batch size is constant. In conditions where the number of eggs per batch is not