Managing Colllinearity in Modeling the Effect of Age in the Prediction of Egg Components of Laying Hens Using Stepwise and Ridge Regression Analysis

The relationships between egg measurements [egg weight (EGWT), egg width (EGWD), egg shape index (EGSI), egg volume (EGV) and egg density (EGD)], and egg components [eggshell (SWT), yolk (YWT) and albumen (AWT)] were investigated in laying hens with 32, 45, and 59 weeks of age with an objective of managing multicollinearity (MC), using stepwise regression (SR) and ridge regression (RR) analyses. There were significant correlations among egg traits that led to MC problems in all eggs. Hen age influenced egg characteristics and the magnitude of the correlations among egg characteristics. Eggs produced at older age had significantly (p<0.01) higher EGWT, EGWD, EGV, YWT and AWT than those produced at younger age. The SR model alleviated MC problem in eggs produced at 32 weeks, with condition index greater than 30, and one predictor, EGWT had a model fit predicted egg components with R2 ranged from 60 to 99%. The SR model of eggs produced at 45 and 59 weeks indicated MC problem with variance inflation factors (VIF) values greater than 10, and 4 predictors; EGWT, EGWD, EGV and EGD had a model fit that significantly predicted egg components with R2 % ranged from 76 to 99 %. The RR analysis provided lower VIF values than 10 and eliminated the MC problem for eggs produced at any age group. It is concluded that the RR analysis provided an ideal solution for managing the MC problem and successfully predicting egg components of laying hens from egg measurements.


INTRODUCTION
The characteristics of chicken egg are important part of consumer acceptability (Bejaei et al., 2011).Most eggs marketed in Saudi Arabia are sold in their shell and a consumer's first impression of any egg purchased is based on their perception of weight and shell quality (Attia et al., 2014).The market requirements for egg measurements encourage the industry to pay attention to egg weight and egg components.However, no defined grades have been developed for shell eggs sold in Saudi Arabia and little attention has been focused on the characteristics of egg and its components.
Morphological characters provide useful information of characteristics and quality of egg components due to the inherent relationship among all biological characters (Olawumi & Ogunlade, 2008;Shafey et al., 2014).Many factors affect egg characteristics, including the age of the laying hens.It influences egg weight (Johnston & Gous, 2007;Zita et al., 2009), egg shape index (Van den Brand et al., 2004), yolk weight ( Van den Brand et al., 2004;Zita et al., 2009), albumen weight (Zita et al., 2009), and eggshell proportion and quality (Abrahamsson & Tauson, 1998;Wahlstrom et al., 1999;Silversides & Scott, 2001;Zita et al., 2009).However, there is little detailed information on the prediction of the weight of egg components throughout the laying cycle.Much of the published information seems to predict only egg components from a set of egg measurements (Shafey et al., 2014).In addition, the interrelationships among egg characteristics and layer age have been investigated to an extent, but information on the problem of collinearityis still scarce.

Managing Colllinearity in Modeling the Effect of Age in the Prediction of Egg Components of Laying Hens Using Stepwise and Ridge Regression Analysis
Collinearity or multicollinearity (MC) occurs when a regression model includes two or more highly related predictors.The MC reduces the stability of the corresponding parameter estimates, increases standard errors, and decreases power to measure effects (Kreft & de Leeuw, 1998;Harrell, 2001;Cohen et al., 2003).Shafey et al. (2014) investigated the problem of MC in the estimation of egg components (eggshell weight, yolk weight, and albumen weight) of meat-type chicken eggs produced by 36-wk-old breeders.They found MC problems in egg weight, egg shape index and their interaction, as shown by variance inflation factor (VIF=>10), condition index (CI=>30) and high corresponding proportions of variance of egg weight, egg shape index and their interaction, respectively.
There are many statistical solutions to correct the MC problem.These include stepwise regression (Yakubu, 2009) and ridge regression analysis (Smith & Campbell, 1980;Schoeman et al., 2002;Pimentel et al., 2007).Stepwise regression is based on building a model by successively adding or removing collinear predictors based solely on the t-statistics of their estimated coefficients.Ridge regression improves the accuracy of a regression model by reducing the apparent magnitude of the correlations (Hoerl & Kennard, 1970a, 1970b;Hoerl et al., 1975;Marquardt & Snee, 1975;Mahajan et al., 1977).
Therefore, the objectives of this study were to ascertain the existence of collinearity in the estimation of egg components based on the morphological traits (egg weight, egg width, egg shape index, egg volume, and egg density) of eggs laid by 32-, 45-, and 59-wk-old of laying hens and to correct the problem, if detected, using stepwise regression and ridge regression analysis.

MATERIALS AND METHODS
A total of 540 freshly-laid eggs produced by a meat-type breeder flock (Ross-Alwadi, Riyadh, Saudi Arabia) were collected when hens were 32, 45 and 59 weeks old, with 180 eggs from each age group.Eggs were candled to detect cracks, then numbered and weighed (EGWT) individually in an electronic scale at ± 0.01 g precision.Egg length (EGL) and width (EGWD) were measured using a steel vernier caliper graduated to one tenth of a millimeter.Egg shape index (EGSI) was calculated using Equation 1, according to Carter (1968): EGSI = (EGWD/EGL) x 100 (Eq.1).Egg volume (EGV) was measured by the water displacement technique.The difference between the weight of an egg in the air and the weight in the water is equal to the weight of the water displaced by the egg (Archimedes' principle).The weight of the water displaced in grams is equal to the volume of the egg in milliliters (Carr, 1939) and egg density (EGD) was calculated by dividing the EGWT by EGV.
Eggs were broken and albumen and yolk separated.The yolk was then carefully rolled on a paper towel to remove extra white and chalaza.When the chalaza was not removed by this process, a razor was used to remove it from the yolk and the clean yolk was weighed (YWT).Eggshell was washed, air dried overnight and weighed (SWT) on an electronic scale.Eggshell membrane was not separated from the egg shell, thus eggshell weight includes membrane weight.All weights were measured with accuracy to the nearest 0.1 g.Albumen weight (AWT) was calculated by subtracting the total of YWT and SWT from EGWT.Also, eggshell weight percentage (SWTP) was calculated using Equation 2: SWTP=SWT*100/EGWT (Eq.2), yolk weight percentage (YWTP) was calculated using Equation 3: YWTP=YWT*100/EGWT (Eq. 3) and albumen weight percentage (AWTP) was calculated using Equation 4: AWTP=AWT*100/EGWT (Eq.4).Measurements were made of independent variables of EGWT, EGWD, EGSI, EGV and EGD, and dependent variables of YWT, SWT and AWT.

Statistical analysis
Data were analyzed for descriptive statistics (mean, coefficient of variation, minimum and maximum values and standard deviation).Stepwise regression analyses were used with the aim of predicting egg composition of SWT, YWT and AWT from a set of explanatory variables of egg characteristics (EGWT, EGWD, EGSI, EGV and EGD).The model that describes the regression analysis (Alexopoulos, 2010) is given in Equation 5: Y = a + ß 1 X 1 + ß 2 X 2 + ß 3 X 3 + ß 4 X 4 + ß 5 X 5 + e (Eq.5)

Managing Colllinearity in Modeling the Effect of Age in the Prediction of Egg Components of Laying Hens Using Stepwise and Ridge Regression Analysis
where, Y = dependent variable (Y 1 , Y 2 or Y 3 for SWT, YWT or AWT); a = intercept; ß' s = regression coefficients; X' s = independent variables (X 1 , X 2, X 3, X 4, and X 5 for EGWT, EGWD, EGSI, EGV and EGD, respectively); and e = the error term.The level of 0.15 was chosen for the stepwise selection to place a restriction on the number of variables allowed into the regression model.
The correlation coefficients among the independent egg variables, VIF and CI were computed to investigate the existence or not of MC.The method of VIF of Rook et al. (1990) was employed as in Equation 6: where, R i 2 = coefficient of determination.
The CI measurement was also computed following the procedures adopted by Pimentel et al. (2007).In ridge regression analysis, the cross-product matrix for the independent variables (X 1 -X 5 ) is centered and scaled to one on the diagonal elements.The ridge constant k is then added to each diagonal element of the cross-product matrix.The ridge regression estimates for each variable are the least squares estimates obtained by using the new cross-product matrix.Let X be an n ×p matrix of the independent variables after centering the data, and let Y be an n ×1 vector corresponding to the dependent variable (Y 1 , Y 2 or Y 3 ).Let D be a p ×p diagonal matrix with diagonal elements as in X`X.The ridge regression estimate corresponding to the ridge constant k can be computed as in Equation 7: where Z=XD -1/2 and I p is a pXp identity matrix.

SAS procedure
Egg parameter results were analyzed by one-way analysis of variance using the SAS PROC ANOVA.Pearson correlation coefficients were calculated between egg variables using the SAS PROC CORR procedure.Predictive regression models were constructed using the SAS PROC REG.To test for MC in predictor variables, VIF, CI and TOL for each predictor were calculated in the analysis (PROC REG with options VIF, CI and TOL).The egg parameter were submitted to a stepwise multiple regression procedure to determine the equation that best estimates egg components (PROC REG with option stepwise).Ridge regression analysis was performed using the ridge option in the PROC REG Procedure.All analyses were performed using the SAS statistical analysis system (SAS, 2006).

RESULTS AND DISCUSSION
Descriptive statistics of eggs laid by broiler breeders at 32, 45 and 59 weeks of age are shown in Table 1.Hen age is one of the main factors influencing egg characteristics (Van den Brand et al., 2004;Rizzi & Chiericato, 2005;Johnston & Gous, 2007;Zita et al., 2009).Eggs produced by older hens presented significantly (p<0.01)higher EGWT, EGWD, EGV, YWT, AWT, and SWT than those produced by younger hens(59>45>32, and 59>32>45 weeks of age, respectively).Eggs produced at 32 weeks of age had significantly (p<0.01)higher SWTP and AWTP and lower YWTP than those produced at older age (32>59>45 weeks of age).The EGD was significantly (p<0.01)lower at 45 weeks of age when compared with those produced at 32 and 59 weeks of age.EGSI was not influenced by hen age.These findings were in agreement with Van den Brand et al. (2004), Johnston & Gous (2007), and Zita et al. (2009), who reported that EGWT, YWT, AWT and YWTP increased with age of the hen, whereas AWTP and SWTP decreased with age (Rizzi & Chiericato, 2005;Zita et al., 2009).The correlation coefficients among egg characteristics produced at 32, 45 and 59 weeks of age are shown in Table 2.There were significant positive correlations in eggs produced at 32, 45 and 59 weeks of age between weight of egg components (SWT, YWT and AWT) and EGWT (r =0.77-0.99,p<0.001),EGV (r =0.60-0.96,p<0.001) and EGWD (r =0.22,

Managing Colllinearity in Modeling the Effect of Age in the Prediction of Egg Components of Laying Hens Using Stepwise and Ridge Regression Analysis
p<0.01-0.74,p<0.001) and between EGD and egg components of SWT and AWT (0.20, p<0.01-0.28,p<0.001).The high positive correlations between egg components and EGWT or EGV would suggest that EGWT and EGV are good indicators and predictors of components of eggs produced at 32, 45 and 59 weeks of age.However, results indicated that there were differences in the correlations among eggs produced at the three different ages.The correlation coefficients between egg components and EGWD or EGD ranged from 0.20 (p<0.01) to 0.74 (p<0.001) and it seems that the correlation coefficient changes with the size of the egg.The correlations between egg measures (EGWT and EGWD) and egg components (SWT, YWT and AWT) were in agreement with Olawumi & Ogunlade (2008) and Shafey et al. (2014), who found that the EGWT and EGWD had positive significant correlations with egg components.
The EGWT, EGV, and EGWD of the eggs produced at 32 and 45 weeks of age were positively correlated with AWTP and negatively correlated with YWTP (r =0.23-0.91,p<0.01-p<0.001).The EGD of the eggs produced at 32 and 59 weeks of age was negatively correlated with YWTP (r =0.17, p<0.05, to 0.20, p<0.01), and positively correlated with YWT (r =0.15,A summary of the regression analysis of egg measurements is shown in Table 3. Regression analysis indicated that the VIF values of EGWD and EGSI of the eggs produced at 45 and 59 weeks of age were between 1.4 and 3.7.However, the strong correlations found among egg measures led to MC problems in the in all age groups.The VIF and CI values ranged from 62 to 5568 and 35 to 4698, respectively.The VIF and CI are two of the collinearity diagnostic measures that give information about the level of MC for a reliable statistical analysis in a multiple regression analysis.The upper limits for VIF and CI are considered to be 10 and 30, respectively (Belsley, 1982;Karakus et al., 2010).
The results of stepwise regression analysis are given in Table 4. Results from the stepwise regression model of eggs produced at 32 weeks of age indicated that the EGWT was the significant variable included in the model with R2 % ranged from 60 to 99 for egg components (SWT, YWT and AWT).This finding suggests that 60 to

Managing Colllinearity in Modeling the Effect of Age in the Prediction of Egg Components of Laying Hens Using Stepwise and Ridge Regression Analysis
99% of the variations in components (SWT, YWT and AWT) of eggs produced at 32 weeks of age can be explained by EGWT.However, VIF and CI values were 1.0 and 44, respectively.The slightly high CI value indicated a moderate MC problem.The stepwise regression models for the prediction of components of eggs produced at 45 weeks of age indicated MC problems with VIF and CI ranged from 2.84 to 4615, and 43 to 5640, respectively.The variables included in the model were EGWT, EGWD, EGV and EGD with R2 % ranged from 93 to 99%.This finding suggested that 93 to 99% of the variations in components of the eggs produced at 45 weeks of age can be explained by EGWT, EGWD, EGV and EGD.On the other hand, the stepwise regression models for the prediction of components of the eggs produced at 59 weeks of age indicated MC problems with VIF and CI ranged from 1.02 to 5486, and 46 to 6719, respectively.The variables included in the model were EGWT and EGWD; EGWT, EGWD, EGV and EGD; and EGWT, ESI and EGD for the prediction of SWT, YWT and AWT, respectively.These variables explain about The results of ridge regression analysis are shown in Table 5 and Figures 1, 2, and 3 for eggs produced at 32, 45 and 59 weeks of age, respectively.The VIF values produced by ridge regression ranged from 4.9 to 8.2, 1.4 to 8.7 and 1.3 to 4.1 for eggs produced at 32, 45 and 59 weeks of age, respectively, all of which were lower than 10.These findings suggested that ridge regression analysis provided an ideal solution for managing the moderate and severe MC problems and successfully predicting egg components (SWT, YWT and AWT) from measurements (EGWT, EGWD, EGSI, EGV and EGD) of eggs produced at 32, 45 and 59 weeks of age.This study is in agreement with earlier studies, which indicated that the use of ridge regression eliminated the problem of MC (Schoeman et al., 2002;Pimentel et al., 2007).

Figure 1 -
Figure 1 -Ridge trace plots of K value for eggshell weight (a), yolk weight and albumin weight (c) of eggs produced at 32 weeks age

Table 1 -
Measurements of Ross strain eggs produced at 32, 45 and 59 weeks of age 2 Standard error of mean.

Table 2 -
Coefficient of correlation of the measurements of Ross strain eggs produced at 32, 45 and 59 weeks of age.

Table 3 -
Summary of regression analysis of measurements of Ross strain eggs produced at 32, 45 and 59 weeks of age

Table 4 -
Summary of stepwise regression analysis of measurements of Ross strain eggs produced at 32, 45 and 59 weeks of age.

Table 5 -
Summary of Ridge regression analysis of measurements of Ross strain eggs produced at 32, 45 and 59 weeks of age