ABSTRACT
Eggs are a widely consumed source of protein, with consumers often preferring free-range eggs due to their higher nutritive value and prices. However, dishonest traders sometimes mislabel cage eggs as free-range eggs for unjustified profits. Biochemical methods are currently used to differentiate between caged and free-range eggs, which could involve chemical reagents, sample preparation, and costly instruments. In this study, physical traits measurements were combined with machine learning to identify eggs according to their farming system. Measurements of 27 physical traits for 480 eggs were conducted using simple tools, and the multicollinearity was reduced by comparing correlation coefficients, resulting in 16 physical traits. Multi-layer Perceptron Neural Network, Naive Bayes, Linear Support Vector Classifier, Radial Basis Functions Support Vector Classifier and Random Forest were used to create recognition models, and the leave-one-out cross-validation method was used for training and evaluation. The Multi-layer Perceptron Neural Network achieved the best classification performance with an accuracy of 0.94167, a F1 score of 0.94118. The result demonstrates that the physical traits of eggs provide sufficient features for the Multi-layer Perceptron Neural Network classifier. Compared to mainstream biochemical methods, we proposed a novel approach to differentiate between caged and free-range eggs using only physical trait measurements, thereby avoiding the need for chemical reagents, sample preparation, and expensive instruments.
Keywords:
Egg quality; Cage egg; Free-range egg; Physical traits; Machine learning
INTRODUCTION
Eggs are one of the most widely consumed, low-priced foods in the world (Lesnierowski et al., 2018), providing nutrients such as proteins, fats, minerals, and vitamins (Narushin et al., 2002). Cage and free-range eggs are two categories of table eggs. Free-range egg producers attempt to exclude antibiotics, hormones, or steroids in hens’ farming (Marventano et al., 2020) and only feed hens with organic grains, oil seeds, and roughage. Conversely, cage egg hens are usually fed with artificial feed which may contain genetically modified organisms, animal byproducts, and synthetic additives. Furthermore, free-range hens have continuous access to open-air runs during the day with low stocking intensity. Consequently, the differences in nutrition and price between free-range and cage eggs are attributed to differing feeding methods (Liao et al., 2023). Consumers tend to opt for free-range eggs because of their benefits in nutrition, flavor, cultural appeal, and food safety (Rondoni et al., 2020). Both higher feeding costs and consumer preference drive the price of free-range eggs to be higher than that of cage eggs. This, in turn, causes some dishonest merchants to adulterate free-range eggs with cage eggs, or even sell cage eggs as free-range eggs.
The study of the discrimination between cage and free-range eggs currently focuses on two main aspects: biochemical traits and physical traits. To obtain biochemical information about eggs, biological and chemical technologies are required. For instance, Puertas et al. (Puertas et al., 2019; Puertas et al., 2023a) used the lipid extract of egg yolk to discriminate eggs from different farming systems based on ultraviolet-visible-near-infrared spectroscopy coupling with quadratic discriminant analysis, and found that the plasma obtained after whole egg fractionation by centrifugation is also useful for identifying cage and free-range eggs. Moreover, 16S rRNA gene sequencing (Wilson et al., 2021) and inductively coupled plasma mass spectrometry (Barbosa et al., 2014) also provide alternative methods. These techniques are all based on chemical principles and have the advantages of being quantifiable and highly accurate. They can comprehensively evaluate the quality and characteristics of eggs from different perspectives. However, they are expensive and require specialized equipment, knowledgeable personnel, and time-consuming sample preparation. Besides, some methods may have negative impacts on the environment due to the use of a large quantity of reagents and chemicals, and cannot be used recursively.
In previous studies, it has been reported that the physical characteristics of eggs (such as weight of egg, egg shape index, albumen height , Haugh units, yolk color, thickness) differ according to the farming method, including feed, stocking density, lighting, activity level (Alagawany et al., 2020; Idowu et al., 2024). Other studies also have documented the impact of different housing systems on egg quality characteristics (Dikmen et al., 2017). Therefore, eggs from different farming systems have different physical traits (Dikmen et al., 2016; Antonella et al., 2021).
Physical trait measurements features are simple, low-cost operations that cause no pollution, and do not require professional technicians or chemical reagents. Using simple tools like a Roche color fan, electronic balance and vernier caliper, physical traits about eggs can be obtained. Oguz et al. (Oguz et al., 2017) compared the different levels of expanded perlite on egg quality traits. They found that dietary expanded perlite can be added at 1% level in laying hen rations without changing the animal performance by measuring the weight of albumen and the yolk, the length and width of the albumen, and the diameter of the yolk. Huseyin et al. (Huseyin et al., 2015) demonstrated that olive leaf powder can be used in layer diets for reducing egg yolk cholesterol content and egg yolk coloring. The study by Jayasena et al. (Jayasena et al., 2012) demonstrated that internal physical quality parameters, including shape index, Haugh unit, and albumen index, significantly deteriorate with increasing storage time. Due to the differences in feed and stocking density between cage and free-range hens, the physical traits of cage and free-range eggs may have statistically significant differences (Hidalgo et al., 2008). However, there has been no study aimed at distinguishing cage and free-range eggs through physical traits.
The two aforementioned ways of evaluating egg quality have their own advantages and disadvantages. However, food regulators may need a simple, fast, and portable method for identifying cage and free-range eggs that can be used anywhere to obtain instant results. Machine learning (Jordan et al., 2015) is a sub-field of artificial intelligence and computer science that focuses on the utilization of data and algorithms to replicate the way in which humans learn, incrementally improving its accuracy. Among the various algorithms involved, the classification algorithm, which automatically sorts or classifies data into one or more of a given set of classes, has been used in a multitude of fields (Sisodia et al., 2018; Chen et al., 2023; Huang et al., 2023; Gao et al., 2024; Qiu et al., 2024). Therefore, this paper seeks to classify cage and free-range eggs based on their physical characteristics by using machine learning. Five models constructed using Random Forest (RF) (Sipper et al., 2021), Multi-layer Perceptron (MLP) Neural Network (Gardner et al., 1998), Naive Bayes (NB) (Vu et al., 2022), Linear Support Vector Classifier (Linear SVC) (Erman et al., 2023), and Radial Basis Functions Support Vector Classifier (RBF SVC) (De Oliveira Nogueira et al., 2022) are compared in term of their classification performance. A flowchart representing the construction of a recognition model for egg farming system is presented in Figure 1.
MATERIALS AND METHODS
Data Sources
In this study, the data came from 480 eggs of six different varieties (Cage 1, 2, 3 and Free-range 1, 2, 3) sold by the same egg producer (Hengliang Agricultural Technology Co., Ltd.). This sample size can be questioned in light of ongoing discussions regarding the reliability of mathematical models constructed using small sample sets (Cohen et al., 2015; Vabalas et al., 2019). However, relevant studies have indicated that, under appropriate parameters, models constructed using MLP Neural Network can be equally reliable even with small sample sizes (Alwosheel et al., 2018; D’souza et al., 2020). Therefore, based on previous research on egg quality characteristics (Caglayan et al., 2009; Nematinia et al., 2018; Ghanima et al., 2020), we selected a sample size of 80 for each class. This approach ensures model accuracy while lowering the threshold for data collection in modeling. The main information of the eggs is shown in Table 1.
Data Collection
On the fifth day after the production date stated on the packaging, measurements were taken with the temperature at 22ºC and humidity at 50%. Egg weight, eggshell weight, and yolk weight were weighed on an electronic scale with a precision of 0.01 grams. A vernier caliper with a resolution of 0.001 mm was used to measure the height of the egg’s albumen after cracking out the egg onto a glass plate, leaving the albumen and yolk in their natural positions. Three locations where the albumen joined the yolk (roughly 1 cm away) were measured, and the readings were then averaged. The vernier caliper was also utilized to measure the thickness of the shell, including the shell membrane, at the sharp and blunt ends of the egg. The total protein content of the albumen and yolk was measured using a tissue or cell total protein extraction kit (from Solarbio, Beijing, China). Finally, a Roche color fan with 15 yellow tones was used to assess the hue of the yolk. The albumen height and egg weight are related by the Haugh unit (Eisen et al., 1962). The following formula (Table 2) was used to determine the Haugh unit score, where the albumen’s height is measured in in millimeters and the egg’s weight in grams (Monira et al., 2003). Other calculations were also performed according to the data in Table 2 (Anderson et al., 2004). In total, we have obtained the following 27 traits shown in Table 2:
Data Preprocessing
As the first step in data analysis, data cleaning plays a crucial role in determining the accuracy of the model results. In data cleaning, we employed box plots (Jeffrey et al., 2022) to identify and eliminate outliers. The values that fall outside the upper and lower bounds are removed. We cleaned the data set containing 27 traits from the eggs and found that the data set was relatively clean, without invalid information, non-standard column names, inconsistent format, missing values, outliers, or duplicate values.
Feature selection has been demonstrated to be a useful and effective data preprocessing strategy in various data mining and machine learning data processing. In machine learning, each feature is expected to be independent of other features. When any two features are correlated, the multicollinearity will arise, resulting in model instability and the possibility of unexplained phenomena (Cai et al., 2018). On this basis, 11 traits were removed by comparing the correlation coefficients.
Scaling technology, mapping technology, or preprocessing stage (Ramakrishnaiah et al., 2023) are all terms used to describe standardization. Data can be mapped from an existing range to a new range in accordance to the standardized processing, from which uniform and convergent input or output data can be obtained, which is very beneficial for the prediction outcomes of the subsequent model (Singh et al., 2020). The most prevalent is Min-Max Normalization (Li et al., 2011), which, as Eq (1) demonstrates, uniformly maps data to the [0,1] interval.
x and x’ are inputs before and after standardization. x min and x max correspond to the minimum and maximum inputs, respectively.
Classification Algorithms
This study trained physical feature data using a variety of machine learning classification algorithms, including Linear SVC (Karatzoglou et al., 2006), RBF SVC (Jiang et al., 2016), Naive Bayes (Li et al., 2022) , RF (Riantini et al., 2023), and MLP Neural Networks (Popescu et al., 2009). Because each algorithm has its scope of application, there is no distinction between the advantages and disadvantages of the algorithm in essence. Finding the classification algorithm that is best suited for this study was the aim of this experiment. This study used Visual Studio Code software (version 1.74.3) to write Python code.
There are some evaluation indicators of the good mess-of-fit , which are defined as follows (Tharwat, 2021):
Multi-layer Perceptron Neural Network
MLP Neural Network algorithm, is also known as an artificial neural network (ANN). It may also have several hidden layers between the input and output layers. It can deal with the issue of linearly divisible data without using hidden layers. The simplest MLP model only contains a hidden layer, namely a three-layer structure, as shown in Figure 2.
The layers of the multi-layer perceptron are fully connected, as shown in the above figure (here, full connection means that any neuron on the upper layer is connected with all neurons on the lower layer). The input layer is located on the bottom layer of the multi-layer perceptron, the hidden layer is located in the middle, and the output layer is located on the top layer.
Assuming that the input layer is represented by vector X, the output of the first neuron in the hidden layer is: f(w 1 X + b 1 ), where w1 is the weight (also called the connection coefficient), and b1 is the offset. Since the training model is a dichotomous problem of free-range or cage eggs, the f function here uses function (Dombi et al., 2022):
The output value of this function can range from 0 to 1. The output samples greater than or equal to 0.5 will be assigned to the positive class if the threshold is set to 0.5, and the remaining samples will be assigned to the negative class.
Cross-Entropy Loss Function
By calculating the probability difference between the training data and each iteration’s prediction, cross-entropy can assess the model’s degree of optimization (Kline et al., 2005). Different loss functions are used in the MLP Neural Network depending on the type of task. Cross-entropy is the chosen loss function for classified tasks and is defined as:
In this study, y represents the real class label of free-range and cage eggs. ŷ is the prediction class label, which is activated and calculated by the SoftMax function in the output layer, and then used to normalize the output value of the classifier. The label “0” in the output result is free-range eggs, and the label “1” is cage eggs.
Regularization
Regularization’s primary goal is to keep model parameters within reasonable bounds in order to avoid overfitting and excessive parameter sizes. Regularization also restates the ill-posed issue in a stable form (Regińska et al., 1996), giving it a solution and making it dependent on data. For the model, regularization regulates the model’s complexity. To carry out the regularization procedure, you can include a ‘penalty’ in the loss function (Zhang et al., 2023), with common functions being the L2-norm. The cross-entropy loss function after L2 weight decay regularization is given by the following formula (Li et al., 2020). Where Loss(ŷ,y) is the non-regularized loss function. The L2 regularization term, α ||W||₂₂ , is used to penalize excessively complex models in order to keep the model from being extremely complex. α>0 is a non-negative hyperparameter that controls the magnitude of the penalty.
Naive Bayes
One of the most popular classification algorithms is Naive Bayes (Zhao et al., 2023). It is a classifier technique built on the independent assumption of feature conditions and the Bayesian definition. The Naive Bayesian method has a strong mathematical foundation and stable classification efficiency because it is computed using the Bayesian formula. The NB model has a straightforward algorithm, few estimated parameters, and is not sensitive to missing data.
Random Forest
Random Forest (Jiang et al., 2023a) is an algorithm used primarily in regression and classification scenarios that trains, classifies, and predicts samples using multiple decision trees. We can score the significance of each variable and assess its contribution to the classification while classifying the data. There is no dependency between weak learners when using the ensemble learning algorithm’s Random Forest feature. It can operate in parallel due to this benefit.
Randomness is at the center of a RF. The correlation between decision trees is decreased by choosing samples and features at random. Two main meanings can be derived from the randomness in the RF. One is that in the initial training data, the same volume of data is chosen as the training samples. The second is that a subset of the randomly chosen features are chosen when the decision tree is being built. These two varieties of randomness reduce the correlation between each decision tree and raise the model’s accuracy.
SVC
SVC (Alves et al., 2014) is a method for categorizing both linear and non-linear data. The best classification hyperplane between the two sample classes must be found in order to maximize the class interval. Data that are poorly classified in the original plane are separated by using the kernel function to map the Linear inseparable problem from low-dimensional space to high-dimensional space for Linear classification. Additionally, two dimensions are still used in the optimization process to identify the best-separating hyperplane. The points on either side of the hyperplane make up the support vectors that make up the ideal classification hyperplane, whereas other points far from the hyperplane have no bearing on determining the ideal classification hyperplane.
Leave-One-Out Cross-Validation Method
Leave-One-Out Cross-Validation (LOOCV) (Wong, 2015) is a special case of k-fold cross-validation. It randomly divides the data set into k parts, of which the training set has (k-1) parts and the test set occupies 1 part. This process is repeated k times. For example, the first part of the data can be used to test the fitting degree of the model established by the data based on the remaining (k-1), and then use the second data as the test set to test the fitting degree of the corresponding training set, and so on. After the completion, each piece of data is used as a test set exactly, and the fitting degree of the k tests is averaged to obtain the fitting degree of the model.
This method can avoid overfitting and is suitable for small sample data, while also having a better fitting effect. Figure 3 shows how to use LOOCV.
Parameters
The leave-one-out method is used for cross-validation, where the number of iterations N is 480, the number of training samples for each iteration is 479, and the number of validation samples is 1. Some parameters of the experimental model (MLP Neural Network) and the comparison model are shown in the Table 3.
RESULTS
Results of Data Cleaning
The box plot was employed to perform data cleaning, and obtain a distribution free from outliers or extreme values, as show in Figure 4. In the traditional perception of Chinese consumers, the weight of cage eggs is usually greater than that of free-range eggs. As can be seen in Figure 4, the median weight of whole eggs of the cage eggs was larger than free-cage eggs, but more than 25% of the weight of eggs from cages had lighter weight than any free-range eggs, indicating that it was not scientific to discriminate cage and free-range eggs solely based on the weight of eggs. The upper quartile of free-range shows that almost 75% of the albumen ratios of free-range eggs are lower than those of cage eggs, while almost 75% of the yolk ratios of cage eggs are lower than those of free-range eggs. Usually, the flavor of an egg is directly proportional to the yolk (Liao et al., 2023). Therefore, consumers who prefer a richer taste might choose free-range eggs. On the other hand, cage eggs, which usually have more albumen, could be more suitable for fitness enthusiasts who need more protein. The median thickness of the eggshell of free-range eggs was slightly larger than that of cage eggs. The thickness of the eggshell is related to transportation costs; for example, the thicker the eggshell of the egg, the lower the damage rate, which can lead to savings in packaging and protection materials. The median Haugh unit of free-range eggs is lower than that of cage eggs. In general, the larger the Haugh unit, the fresher the egg (Liao et al., 2023). A lower Haugh unit in free-range eggs may be assumed to be affected by the outdoor environment, which can have a negative effect on the eggshell membrane and delay the collection of eggs (Lund et al., 1938).
The box-plot of physical traits of cage and free-range eggs. Units are as follows: whole egg weight (g) (W-egg); eggshell thickness (cm) (T-shell), to display the results more clearly, the data of eggshell thickness were magnified by 1000 times; albumen ratio (%) (R-albumen); yolk ratio (%) (R-yolk); Haugh unit (Haugh).
Results of Data Cleaning
The thermal diagram in Figure 5 shows the correlation coefficient between each two features. When the absolute value of the correlation coefficient is higher than 0.8, it indicates that there are multiple collinearities. Therefore, we deleted the non-conforming features and selected the following features as the final variables: whole egg weight(g), weight of thick albumen (g), weight of thin albumen (g), yolk height (unseparated from albumen) (cm), egg yolk height (separated from albumen) (cm), egg yolk diameter (cm), egg yolk chroma, protein in yolk (mg/g), protein in albumen (mg/g), eggshell ratio (%), albumen ratio (%), yolk ratio (%), yolk index 1 (separated from albumen), egg shape index, eggshell thickness (cm), and Haugh unit.
Results of Classification Model
We used the MLP Neural Network model to classify the preprocessed data and distinguish free-range eggs from cage eggs as accurately as possible. In the process of continuous training of the model, as shown in Figure 6, it can be seen that with the increase of epochs in the Neural Network, the loss value becomes smaller and smaller, and gradually converges, indicating that the performance of the model is gradually improving through the iteration process. From the graph, it can be seen that the curve starts to converge when the epoch reaches 300. This indicates that the initial parameters are well-chosen, allowing the model to update its parameters more quickly towards the optimal solution, which is a positive indicator of model performance (Sinha et al., 2010).
Under the same conditions, we also trained and verified the four models of NB, RF, Linear SVC, and RBF SVC, as shown in Figure 7, which shows the classification results of the five models. The red and blue scatter points respectively represent cage and free-range eggs. The scatter distribution of the predicted values from the MLP Neural Network model matched that of the input data, demonstrating a high level of prediction accuracy. Figure 7 shows that the classification performance of the MLP Neural Network is superior to that of the other four models.
Model Evaluation
Various assessment indicators (See MATERIALS AND METHODS for detailed formulas) were used to test the model’s classification effect in order to evaluate the model more precisely. When a true sample is incorrectly classified, the egg producers or sellers are likely to doubt the detecting result and use alternative methods to ascertain the sample’s class. However, they will not raise a concern if a false sample is identified as true. The precision, recall and F1 score are calculated to study such problems (Tharwat, 2021). Free-range eggs are likely to be replaced by cage eggs because of the high price. As a result, we consider free-range eggs to be negative and cage eggs to be positive. The higher the precision, the fewer negative samples are incorrectly classified. The higher the recall, the fewer positive samples are missed. F1 score is defined as the harmonic mean of the model’s precision and recall. A high F1 score implies the high stability of the model. In our opinion, cage eggs sold as free-range eggs should be recognized as much as possible for consumer rights, so recall is more important than precision in the discrimination of eggs farming system.
Table 4 shows that the recall rate of the NB classifier is the highest, reaching 100%. This is typically because the NB algorithm tends to classify more samples as positive cases to ensure that as few positive cases as possible are missed. However, this strategy may also lead to a higher number of false positives, thereby reducing the overall accuracy. The performance metrics of the RBF SVC are all relatively low, making it unsuitable for identifying egg farming systems. We found that the MLP Neural Network’s recall, precision and F1 score were the highest in the model, with its accuracy reaching 94.167%. This demonstrated that the MLP Neural Network model outperformed other models in terms of egg farming system classification performance.
Based on Table 4, it can be concluded that the MLP Neural Network, NB, and RF are the top three models. Therefore, in addition to the aforementioned indications, we also plotted the ROC (Receiver Operating Characteristic) curves (Tatliparmak et al., 2023) of the MLP Neural Network, NB, and RF models in Figure 8 (Right) to provide a more intuitive comparison of their performance. The link between the False Positive Rate (FPR) and True Positive Rate is represented by this curve (TPR). The higher the TPR or the lower the FPR, the better the performance of the model. Therefore, the more the ROC curve of the model is convex to the upper left or the larger the AUC value (the area under the ROC curve), the better the classification performance of the model. The curve of the MLP Neural Network is more convex to the upper left than the other two models, and the AUC values are 0.98 (MLP Neural Network) > 0.95 (RF) > 0.92 (NB). In order to investigate the relationship between Precision and Recall in the models, the Precision-Recall curve (Miao et al., 2022) was also drawn in Figure 8 (Right). The curve of the MLP Neural Network model is closest to the upper right corner, indicating it can maintain high precision while achieving a high recall rate. These results shows that the MLP Neural Network model has the best classification effect.
DISCUSSION
Machine learning leveraging multidimensional data excels at classifying food quality (Rachineni et al., 2022; Gao et al., 2024; Xiao et al., 2024). In this research, we used a vernier caliper, a Roche color fan, and an electronic scale to make simple physical measurements of the properties of free-range and caged eggs. Then we removed multicollinearity between features, and a few valuable eigenvalues were retained to construct the mathematical models. Our result demonstrates that the physical characteristics of eggs are featured enough for a machine learning classifier. Compared to the discrimination model of egg farming systems constructed by biochemical techniques (Melgaço et al., 2014; Puertas et al., 2019; Puertas et al., 2023b), our model showed the similar strong predictive ability without expensive equipment and complex sample pretreatment. In 2019, a study by Galic A et al. (Galic et al., 2019) was conducted to compare the physical and mechanical characteristics of free-range and cage eggs. However, the study did not thoroughly use the data under different farming systems to establish classification models.
We have developed recognition models using MLP Neural Network, NB, Linear SVC, RBF SVC, and RF, and performed training and evaluation with leave-one-out cross-validation. The results showed that the MLP Neural Network model exhibited the best classification performance. The MLP Neural Network can capture complex nonlinear relationships in the data (such as eggshell thickness and yolk proportion) through its hidden layers and activation functions. The physical properties data of eggs have high dimensions, including size, volume, surface area, shape index, etc. The MLP Neural Network is effective in handling high-dimensional data, extracting features through multi-layer structures, and performing classification. In previous studies, MLP Neural Network has been widely used for classification, regression, and prediction. Researchers have used MLP Neural Networks to estimate greenhouse gas emissions from agricultural regions and enterprises (Nanda et al., 2023) , predict the prices of pine logs (Lamichhane et al., 2023), and analyze leakage in campus water supply systems (Azgomi et al., 2023), achieving high accuracy in all these objectives. Integrating our findings with prior studies, therefore, it is highly reasonable to use MLP Neural Network for distinguishing the farming system of eggs.
It has to be acknowledged that this paper has some shortcomings. Firstly, organic farming system eggs, which are currently favored by mid to high-end consumers, were not included in this work. Additionally, the use of tools such as calipers and balances to distinguish eggs based on farming systems is more suitable for food regulatory authorities. This work, however, did not aim to provide feasible identification methods for consumers.
CONCLUSIONS
In this study, 27 physical characteristics from both cage and free-range eggs were obtained through simple tools without chemical reagents. Afterward, the multi-collinearity between the features was eliminated by a comparison of correlation. Lastly, MLP Neural Network, Naive Bayes, Random Forest and SVC were trained and evaluated using a leave-one-out cross-validation based on the pre-processed data. The results indicated that the MLP Neural Network model had the best performance in classifying free-range and cage eggs. The model’s accuracy was 0.94167, its F1 score was 0.94118, allowing for a reliable authentication of the egg farming system at an expedited, economical, and precise rate.
Given the shortcomings of this work, future efforts could focus on: (1) including a wider variety of eggs from different farming systems as part of the dataset to build a multi-class model for egg farming systems, and (2) incorporating computer vision along with the photo-taking capabilities of smartphones to explore the visual distinctions of egg appearances based on farming systems, thus making it convenient for consumers.
ACKNOWLEDGMENTS
This work was supported by The Natural Science Foundation of Fujian Province of China [2022J01821 and 2022J05163], The National Key R&D Program of China [2020YFD0900904], and The National Natural Science Foundation of China [11705068].
REFERENCES
-
Alagawany M, El-Hindawy MM, El-Hack ME, et al. Influence of low-protein diet with different levels of amino acids on laying hen performance, quality and egg composition. Anais da Academia Brasileira de Ciencias 2020;92(1):e20180230. http://doi.org/10.1590/0001-3765202020180230
» http://doi.org/10.1590/0001-3765202020180230 -
Alves JCL, Henriques CB, Poppi RJ. Classification of diesel pool refinery streams through near infrared spectroscopy and support vector machines using C-SVC and ?-SVC. Spectrochimica Acta Part A: Molecular and Biomolecular Spectroscopy 2014;117(3):389-96. https://doi.org/10.1016/j.saa.2013.08.018
» https://doi.org/10.1016/j.saa.2013.08.018 -
Alwosheel A, Cranenburgh S van, Chorus CG. Is your dataset big enough? Sample size requirements when using artificial neural networks for discrete choice analysis. Journal of Choice Modelling 2018;28:167-82. https://doi.org/10.1016/j.jocm.2018.07.002
» https://doi.org/10.1016/j.jocm.2018.07.002 -
Anderson KE, Tharrington JB, Curtis PA, et al. Shell characteristics of eggs from historic strains of single comb white leghorn chickens and the relationship of egg shape to shell strength. International Journal of Poultry Science 2004;3(1):17-9. http://doi.org/10.3923/ijps.2004.17.19
» http://doi.org/10.3923/ijps.2004.17.19 -
Antonella DZ, Marco C, Erika P, et al. Is the farming method (cage, barn, organic) a relevant factor for marketed egg quality traits? Livestock Science 2021;246. https://doi.org/10.1016/j.livsci.2021.104453
» https://doi.org/10.1016/j.livsci.2021.104453 -
Azgomi H, Haredasht FR, Safari Motlagh MR. Diagnosis of some apple fruit diseases by using image processing and artificial neural network. Food Control 2023;145:109484. https://doi.org/10.1016/j.foodcont.2022.109484
» https://doi.org/10.1016/j.foodcont.2022.109484 -
Barbosa RM, Nacano LR, Freitas R, et al. The use of decision trees and Naïve Bayes algorithms and tace element patterns for controlling the authenticity of free-range-pastured hens' eggs. Journal of Food Science 2014;79(9):C1672-C7. https://doi.org/10.1111/1750-3841.12577
» https://doi.org/10.1111/1750-3841.12577 -
Caglayan T, Alasahan S, Kirikçi K, et al. Effect of different egg storage periods on some egg quality characteristics and hatchability of partridges (Alectoris graeca). Journal of Poultry Science 2009;88(6):1330-3. https://doi.org/10.3382/ps.2009-00091
» https://doi.org/10.3382/ps.2009-00091 -
Cai J, Luo J, Wang S, et al. Feature selection in machine learning: a new perspective. Neurocomputing 2018;300:70-9. https://doi.org/10.1016/j.neucom.2017.11.077
» https://doi.org/10.1016/j.neucom.2017.11.077 -
Cohen TN, Wiegmann DA, Shappell SA. Evaluating the reliability of the human factors analysis and classification system. Aerospace Medicine and Human Performance 2015;86(8):728-35. https://doi.org/10.3357/amhp.4218.2015
» https://doi.org/10.3357/amhp.4218.2015 -
Chen S, Wang Y, Zhu Q, et al. Fast recognition of the harvest period of Porphyra haitanensis based on mid-infrared spectroscopy and chemometrics. Journal of Food Measurement and Characterization 2023;17:5487-96. https://doi.org/10.1007/s11694-023-01999-1
» https://doi.org/10.1007/s11694-023-01999-1 -
D'souza RN, Huang PY, Yeh FC. Structural analysis and optimization of convolutional neural networks with a small sample size. Scientific Reports 2020;10(1):834. https://doi.org/10.1038/s41598-020-57866-2
» https://doi.org/10.1038/s41598-020-57866-2 -
De Oliveira Nogueira T, Palacio GBA, Braga FD, et al. Imbalance classification in a scaled-down wind turbine using radial basis function kernel and support vector machines. Energy 2022;238(1):122064. https://doi.org/10.1016/j.energy.2021.122064
» https://doi.org/10.1016/j.energy.2021.122064 -
Dikmen BY, Ipek A, Sahan Ü, et al. Egg production and welfare of laying hens kept in different housing systems (conventional, enriched cage, and free range). Journal of Poultry Science 2016;95(7):1564-72. http://doi.org/10.3382/ps/pew082
» http://doi.org/10.3382/ps/pew082 -
Dikmen BY, Ipek A, Sahan Ü, et al. Impact of different housing systems and age of layers on egg quality characteristics. Turkish Journal of Veterinary and Animal Sciences 2017;41(1):77-84. http://doi.org/10.3906/VET-1604-71
» http://doi.org/10.3906/VET-1604-71 -
Dombi J, Jónás T. The generalized sigmoid function and its connection with logical operators. International Journal of Approximate Reasoning 2022;143:121-38. https://doi.org/10.1016/j.ijar.2022.01.006
» https://doi.org/10.1016/j.ijar.2022.01.006 -
Eisen EJ, Bohren BB, McKean HE. The Haugh unit as a measure of egg albumen quality. Poultry Science 1962;41(5):1461-8. https://doi.org/10.3382/ps.0411461
» https://doi.org/10.3382/ps.0411461 -
Erman ME, Stelian N, Vasile S. Decision tree versus linear support vector machine classifier in the screening of medial speech sounds: a quest for a sound rationale. Studies in Health Technology and Informatics 2023;309:73-7. http://doi.org/10.3233/SHTI230742
» http://doi.org/10.3233/SHTI230742 -
Galic A, Bedekovic D, Kovacev I, et al. Physical and mechanical characteristics of Hisex Brown hen eggs from three different housing systems. South African Journal of Animal Science 2019;49(3):468-76. http://doi.org/10.4314/sajas.v49i3.7
» http://doi.org/10.4314/sajas.v49i3.7 -
Gao Z, Chen S, Huang J, et al. Real-time quantitative detection of hydrocolloid adulteration in meat based on swin transformer and smartphone. Journal of Food Science 2024;89(7):4359-71. https://doi.org/10.1111/1750-3841.17159
» https://doi.org/10.1111/1750-3841.17159 -
Gao Z, Lin Q, He Q, et al. Rapid detection of spoiled apple juice using electrical impedance spectroscopy and data augmentation-Based machine learning. Chiang Mai Journal of Science 2024;5:e2024071. https://doi.org/10.12982/CMJS.2024.071
» https://doi.org/10.12982/CMJS.2024.071 -
Gardner MW, Dorling SR. Artificial neural networks (the multilayer perceptron)-a review of applications in the atmospheric sciences. Atmospheric Environment 1998;32(14):2627-36. https://doi.org/10.1016/S1352-2310(97)00447-0
» https://doi.org/10.1016/S1352-2310(97)00447-0 -
Ghanima MMA, Elsadek MF, Taha AE, et al. Effect of housing system and rosemary and cinnamon essential oils on layers performance, egg quality, haematological traits, blood chemistry, immunity, and antioxidant. Animals 2020;10(2):245. https://doi.org/10.3390/ani10020245
» https://doi.org/10.3390/ani10020245 -
Hidalgo A, Rossi M, Clerici F, et al. A market study on the quality characteristics of eggs from different housing systems. Food Chemistry 2008;106(3):1031-8. https://doi.org/10.1016/j.foodchem.2007.07.019
» https://doi.org/10.1016/j.foodchem.2007.07.019 -
Huseyin C, G. E. Effect of olive leaf (olea europaea) powder on laying hens performance, egg quality and egg yolk cholesterol levels. Asian-Australasian Journal of Animal Sciences 2015;28(4):538-43. https://doi.org/10.5713/ajas.14.0369
» https://doi.org/10.5713/ajas.14.0369 -
Huang Z, Xiao Y, Xiao Y, et al. Rapid recognition of processed milk type using electrical impedance spectroscopy and machine learning. International Journal of Food Science and Technology 2023;58(6):3121-34. https://doi.org/10.1111/ijfs.16440
» https://doi.org/10.1111/ijfs.16440 -
Idowu OPA, Kareem DU, Oke OE, et al. Effects of housing systems and laying phases on external and internal egg quality characteristics of indigenous guinea fowl hens. Translational Animal Science 2024;8:txae011. https://doi.org/10.1093/tas/txae011
» https://doi.org/10.1093/tas/txae011 -
Jayasena DD, Cyril HW, Jo C. Evaluation of egg quality traits in the wholesale market in sri lankaduring the storage period. Journal of Animal Science and Technology 2012; 54(3):209-17. http://doi.org/10.5187/JAST.2012.54.3.209
» http://doi.org/10.5187/JAST.2012.54.3.209 -
Jeffrey W, Raymond RH, Joseph JPJ, et al. Wavelet analysis of variance box plot. Journal of Applied Statistics 2022;49(14):3536-63. https://doi.org/10.1080/02664763.2021.1951685
» https://doi.org/10.1080/02664763.2021.1951685 -
Jiang L, Yao R. Modelling personal thermal sensations using C-Support Vector Classification (C-SVC) algorithm. Building and Environment 2016;99:98-106. https://doi.org/10.1016/j.buildenv.2016.01.022
» https://doi.org/10.1016/j.buildenv.2016.01.022 -
Jiang M, Wang J, Hu L, et al. Random forest clustering for discrete sequences. Pattern Recognition Letters 2023a;174:145-51. https://doi.org/10.1016/j.patrec.2023.09.001
» https://doi.org/10.1016/j.patrec.2023.09.001 -
Jiang S, Yang C, Wang R, et al. Resting-state functional connectivity in a non-human primate model of cortical ischemic stroke in area F1. Magnetic Resonance Imaging 2023b;104:121-8. https://doi.org/10.1016/j.mri.2023.10.005
» https://doi.org/10.1016/j.mri.2023.10.005 -
Jordan MI, Mitchell TM. Machine learning: trends, perspectives, and prospects. Science 2015;349(6245):255-60. https://doi.org/10.1126/science.aaa8415
» https://doi.org/10.1126/science.aaa8415 -
Karatzoglou A, Meyer D, Hornik K. Support vector machines in R. Journal of Statistical Software 2006;15(9):1-28. https://doi.org/10.18637/jss.v015.i09
» https://doi.org/10.18637/jss.v015.i09 -
Kline DM, Berardi VL. Revisiting squared-error and cross-entropy functions for training neural network classifiers. Neural Computing and Applications 2005;14(4):310-8. http://doi.org/10.1007/s00521-005-0467-y
» http://doi.org/10.1007/s00521-005-0467-y -
Lamichhane S, Mei B, Siry J. Forecasting pine sawtimber stumpage prices: A comparison between a time series hybrid model and an artificial neural network. Forest Policy and Economics 2023;154:103028. https://doi.org/10.1016/j.forpol.2023.103028
» https://doi.org/10.1016/j.forpol.2023.103028 -
Lesnierowski G, Stangierski J. What's new in chicken egg research and technology for human health promotion? - A review. Trends in Food Science & Technology 2018;71:46-51. https://doi.org/10.1016/j.tifs.2017.10.022
» https://doi.org/10.1016/j.tifs.2017.10.022 -
Li L, Doroslovacki M, Loew MH. Approximating the gradient of cross-entropy loss function. IEEE Access 2020;8:111626-35. https://doi.org/10.1109/ACCESS.2020.3001531
» https://doi.org/10.1109/ACCESS.2020.3001531 -
Li L, Zhou Z, Bai N, et al. Naive Bayes classifier based on memristor nonlinear conductance. Microelectronics Journal 2022;129:105574. https://doi.org/10.1016/j.mejo.2022.105574
» https://doi.org/10.1016/j.mejo.2022.105574 -
Li W, Liu Z. A method of SVM with normalization in intrusion detection. Procedia Environmental Sciences 2011;11:256-62. https://doi.org/10.1016/j.proenv.2011.12.040
» https://doi.org/10.1016/j.proenv.2011.12.040 -
Liao W, Cai H, Lian H, et al. Quality evaluation of table eggs under different rearing systems in China. Food Science and Technology 2023;43:e110322. http://doi.org/10.1590/fst.110322
» http://doi.org/10.1590/fst.110322 -
Lund WA, Heiman V, Wilhelm LA. The relationship between egg shell thickness and strength. Poultry Science 1938;17:372-6. https://doi.org/10.3382/PS.0170372
» https://doi.org/10.3382/PS.0170372 -
Marventano S, Godos J, Tieri M, et al. Egg consumption and human health: an umbrella review of observational studies. International Journal of Food Sciences and Nutrition 2020;71(3):325-31. https://doi.org/10.1080/09637486.2019.1648388
» https://doi.org/10.1080/09637486.2019.1648388 -
Melgaço BR, Ramos NL, Rodolfo F, et al. The use of decision trees and Naïve Bayes algorithms and trace element patterns for controlling the authenticity of free-range-pastured hens' eggs. Journal of Food Science 2014;79(9):C1672-7. http://doi.org/10.1111/1750-3841.12577
» http://doi.org/10.1111/1750-3841.12577 -
Miao J, Zhu W. Precision-recall curve (PRC) classification trees. Evolutionary Intelligence 2022;15(3):1545-69. http://doi.org/10.1007/S12065-021-00565-2
» http://doi.org/10.1007/S12065-021-00565-2 -
Monira KN, Salahuddin M, Miah G. Effect of breed and holding period on egg quality characteristics of chicken. International Journal of Poultry Science 2003;2(4):261-3. https://doi.org/10.3923/ijps.2003.261.263
» https://doi.org/10.3923/ijps.2003.261.263 -
Nanda AK, Gupta S, Saleth AL, et al. Multi-layer perceptron's neural network with optimization algorithm for greenhouse gas forecasting systems. Environmental Challenges 2023;11:100708. https://doi.org/10.1016/j.envc.2023.100708
» https://doi.org/10.1016/j.envc.2023.100708 -
Narushin VG, Romanov MN, Bogatyr VP. AP-animal production technology: relationship between pre-incubation egg parameters and chick weight after hatching in layer breeds. Biosystems Engineering 2002;83(3):373-81. https://doi.org/10.1006/bioe.2002.0122
» https://doi.org/10.1006/bioe.2002.0122 -
Nematinia E, Abdanan Mehdizadeh S. Assessment of egg freshness by prediction of Haugh unit and albumen pH using an artificial neural network. Journal of Food Measurement and Characterization 2018;12(3):1449-59. https://doi.org/10.1007/s11694-018-9760-1
» https://doi.org/10.1007/s11694-018-9760-1 -
Oguz FK, Gumus H, Oguz MN, et al. Effects of different levels of expanded perlite on the performance and egg quality traits of laying hens. Revista Brasileira de Zootecnia 2017;46(1):20-4. http://doi.org/10.1590/s1806-92902017000100004
» http://doi.org/10.1590/s1806-92902017000100004 -
Popescu MC, Balas VE, Perescu-Popescu L, et al. Multilayer perceptron and neural networks. WSEAS Transactions on Circuits and Systems 2009;8(7):579-88. https://dl.acm.org/doi/10.5555/1639537.1639542
» https://dl.acm.org/doi/10.5555/1639537.1639542 -
Puertas G, Vázquez M. Fraud detection in hen housing system declared on the eggs' label: An accuracy method based on UV-VIS-NIR spectroscopy and chemometrics. Food Chemistry 2019;288:8-14. https://doi.org/10.1016/j.foodchem.2019.02.106
» https://doi.org/10.1016/j.foodchem.2019.02.106 -
Puertas G, Cazón P, Vázquez M. A quick method for fraud detection in egg labels based on egg centrifugation plasma. Food Chemistry 2023a;402:134507. https://doi.org/10.1016/j.foodchem.2022.134507
» https://doi.org/10.1016/j.foodchem.2022.134507 -
Puertas G, Cazón P, Vázquez M. Application of UV-VIS-NIR spectroscopy in membrane separation processes for fast quantitative compositional analysis: a case study of egg products. LWT 2023b;174:114429. https://doi.org/10.1016/j.lwt.2023.114429
» https://doi.org/10.1016/j.lwt.2023.114429 -
Qiu J, Lin Y, Wu J, et al. Rapid beef quality detection using spectra pre-processing methods in electrical impedance spectroscopy and machine learning. International Journal of Food Science & Technology 2024;59(3):1624-34. https://doi.org/10.1111/ijfs.16915
» https://doi.org/10.1111/ijfs.16915 -
Rachineni K, Rao Kakita VM, Awasthi NP, et al. Identifying type of sugar adulterants in honey: combined application of NMR spectroscopy and supervised machine learning classification. Current Research in Food Science 2022;5:272-7. https://doi.org/10.1016/j.crfs.2022.01.008
» https://doi.org/10.1016/j.crfs.2022.01.008 -
Ramakrishnaiah Y, Macesic N, Webb GI, et al. EHR-QC: a streamlined pipeline for automated electronic health records standardisation and preprocessing to predict clinical outcomes. Journal of Biomedical Informatics 2023;147:104509. https://doi.org/10.1016/j.jbi.2023.104509
» https://doi.org/10.1016/j.jbi.2023.104509 -
Rondoni A, Asioli D, Millan E. Consumer behaviour, perceptions, and preferences towards eggs: A review of the literature and discussion of industry implications. Trends in Food Science & Technology 2020;106:391-401. https://doi.org/10.1016/j.tifs.2020.10.038
» https://doi.org/10.1016/j.tifs.2020.10.038 -
Riantini V, Budi HA, Wira AF, et al. Machine learning remote sensing using the random forest classifier to detect the building damage caused by the Anak Krakatau Volcano tsunami. Geomatics, Natural Hazards and Risk 2023;14(1):28-51. http://doi.org/10.1080/19475705.2022.2147455
» http://doi.org/10.1080/19475705.2022.2147455 -
Reginska TA. A regularization parameter in discrete ill-posed problems. SIAM Journal on Scientific Computing 1996;17(3):740-9. https://doi.org/10.1137/S106482759325267
» https://doi.org/10.1137/S106482759325267 -
Singh D, Singh B. Investigating the impact of data normalization on classification performance. Applied Soft Computing 2020;97:105524. https://doi.org/10.1016/j.asoc.2019.105524
» https://doi.org/10.1016/j.asoc.2019.105524 -
Sinha S, Singh TN, Singh VK, et al. Epoch determination for neural network by self-organized map (SOM). Computational Geosciences 2010;14(1):199-206. https://doi.org/10.1007/s10596-009-9143-0
» https://doi.org/10.1007/s10596-009-9143-0 -
Sipper M, Moore JH. Conservation machine learning: a case study of random forests. Scientific Reports 2021;11(1):3629. http://doi.org/10.1038/S41598-021-83247-4
» http://doi.org/10.1038/S41598-021-83247-4 -
Sisodia D, Sisodia DS. Prediction of diabetes using classification algorithms. Procedia Computer Science 2018;132:1578-85. https://doi.org/10.1016/j.procs.2018.05.122
» https://doi.org/10.1016/j.procs.2018.05.122 -
Tatliparmak AC, Yilmaz S, Ak R. Importance of receiver operating characteristic curve and decision curve analysis methods in clinical studies. The American Journal of Emergency Medicine. 2023;70:196-7. https://doi.org/10.1016/j.ajem.2023.06.018
» https://doi.org/10.1016/j.ajem.2023.06.018 -
Tharwat A. Classification assessment methods. Applied Computing and Informatics 2021;17(1):168-92. https://doi.org/10.1016/j.aci.2018.08.003
» https://doi.org/10.1016/j.aci.2018.08.003 -
Vabalas A, Gowen E, Poliakoff E, et al. Machine learning algorithm validation with a limited sample size. PloS One 2019;14(11):e0224365. https://doi.org/10.1371/journal.pone.0224365
» https://doi.org/10.1371/journal.pone.0224365 -
Vu DH, Vu TS, Luong TD. An efficient and practical approach for privacy-preserving Naive Bayes classification. Journal of Information Security and Applications 2022;68:103215. https://doi.org/10.1016/j.jisa.2022.103215
» https://doi.org/10.1016/j.jisa.2022.103215 -
Wilson A, Chandry PS, Turner MS, et al. Comparison between cage and free-range egg production on microbial composition, diversity and the presence of Salmonella enterica. Food Microbiology 2021;97:103754. https://doi.org/10.1016/j.fm.2021.103754
» https://doi.org/10.1016/j.fm.2021.103754 -
Wong TT. Performance evaluation of classification algorithms by k-fold and leave-one-out cross validation. Pattern Recognition 2015;48(9):2839-46. https://doi.org/10.1016/j.patcog.2015.03.009
» https://doi.org/10.1016/j.patcog.2015.03.009 -
Xiao Y, Cai H, Ni H. Identification of geographical origin and adulteration of Northeast China soybeans by mid-infrared spectroscopy and spectra augmentation. Journal of Consumer Protection and Food Safety 2024;19(1):99-111. https://doi.org/10.1007/s00003-023-01471-8
» https://doi.org/10.1007/s00003-023-01471-8 -
Zhang S, Xie L. Leader learning loss function in neural network classification. Neurocomputing 2023;557:126735. https://doi.org/10.1016/j.neucom.2023.126735
» https://doi.org/10.1016/j.neucom.2023.126735 -
Zhao X, Xia Z. Secure outsourced NB: Accurate and efficient privacy-preserving Naive Bayes classification. Computers & Security 2023;124:103011. https://doi.org/10.1016/j.cose.2022.103011
» https://doi.org/10.1016/j.cose.2022.103011
-
Funding
This work was supported by The Natural Science Foundation of Fujian Province of China [2022J01821 and 2022J05163], The National Key R&D Program of China [2020YFD0900904], and The National Natural Science Foundation of China [11705068].
-
Data availability statement
Some or all data, models, or code that support the findings of this study are available from the corresponding author.
-
Disclaimer/Publisher’s Note
The published papers’ statements, opinions, and data are those of the individual author(s) and contributor(s). The editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions, or products referred to in the content.
Data availability
Some or all data, models, or code that support the findings of this study are available from the corresponding author.
Publication Dates
-
Publication in this collection
25 Nov 2024 -
Date of issue
2024
History
-
Received
22 Dec 2023 -
Accepted
16 Sept 2024