ANALYSIS OF STRUCTURES OF COVARIANCE AND REPEATABILITY IN GUAVA SEGREGANTING POPULATION

The present study was conducted with the objective of analyzing the covariance structure and repeatability estimates of the variables related to guava productivity, such as fruit weight (FW), fruit number (FN) and fruit production (FP) of three harvests, in 95 genotypes of a segregating population. The study also aims to choose the most appropriate covariance structure of the observations within the same individual by means of AIC (Akaike's Information Criterion) and SBC (Schwarz's Bayesian Criterion) criteria. A covariance structure between repeated measures could be incorporated into the statistical model, with the self-regression and compound symmetry forms being the most adequate. The values of repeatability coefficients obtained for FW (0.25), FN (0.14), and FP (0.29) were considered low, indicating that the three harvests were not sufficient to select the best individuals with greater accuracy for the study population. For the variables PF and FP, estimates of accuracy around 0.50 could be obtained from five measurements, while for the variable FN more harvests would be necessary. These values indicate that in guava-segregating populations, evaluations in the first harvests are not enough to select more stable genotypes for the variables considered in this study.


INTRODUCTION
Guava (Psidium guajava L.) is one of the fruit trees of tropical climate that has presented a greater increase in the planting area.Its fruits are destined to the industrialization and market of fresh fruits, which are highly perishable due to their intense metabolism during maturation (HONG et al., 2012).According to Cavalini et al. (2015), quality attributes are influenced by varieties, edaphoclimatic conditions and cultural practices.In this sense, the use of cultivars adapted to the different edaphoclimatic conditions with potential for producing good quality fruits becomes of great importance for the increase of the cultivated area and the consolidation of the crop in Brazil (PEREIRA;NACHTIGAL, 2009;NIMISHA et al., 2013).
During the plant selection process, it is important to make more precise estimates of the genetic superiority of individuals (NEGREIROS et al., 2008).To this end, repeated measurements are made in the same individual, especially in perennial plants (NASCIMENTO FILHO et al., 2009;DANNER et al., 2010;BRUNA;MORETO;DALBÓ, 2012;CARGNIN, 2016).In addition, in guava, the farmer harvests several crops from the same plant.Therefore, an analysis method involving repeated measures is appropriate.Repeated measure experiments violate two basic assumptions required by the analysis of variance when analyzed under the subdivided plot approach: the lack of casualization between treatments and the time of evaluation (time) and error dependence, because the measures are taken from the same experimental units over time, resulting in a correlation between the variables (ROSÁRIO et al., 2005).
Resende (2002) argues that when analyzing variables from repeated measures, it is advisable to study their covariance structure.The correlation between the individual measurements over time can be modeled by means of variance structure and error covariance.Modeling an appropriate variance and covariance structure is essential for the inferences about the averages to be valid.In this sense, Cecon et al. (2008), working with the production of 50 clones of Conilon coffee for five years, verified that the model that provided the best fit was the heterogeneous compound symmetry (HCS).Resende, Thompson, and Welham (2006) worked with 1,800 individuals of mate in three harvests, and concluded that the heterogeneous autoregressive models (HAR), structured antedependence (SAD) and multivariate analysis models presented basically the same deviance, but the autoregressive model with heterogeneous variances was chosen because it is more parsimonious by the Akaike (Akaike's Information Criterion) -AIC.Jaffrézic and Pletcher (2000) also reported the superiority of the autoregressive model with heterogeneous variances.However, Gilmour et al. (2004) highlighted HAR and SAD as more favorable modeling in general.
Many methods have already been developed to facilitate the choice of the covariance structure that best explains the response of variability and correlation between the repeated measurements (FLORIANO et al., 2006;SILVA;DUARTE;REIS, 2015).The main criteria for selecting models used in computer programs are the AIC and Schwarz Bayesian Criterion -SBC criteria, which are based on the likelihood value of the model and depend on the number of observations and model parameters.The objectives of this study were to analyze the covariance structure and repeatability estimates of the variables related to guava productivity in three harvests of 95 genotypes from a segregating population.Furthermore, the aim was to choose the most appropriate covariance structure of the observations within the same individual by means of AIC (Akaike's Information Criterion) and SBC (Schwarz's Bayesian Criterion) criteria.

MATERIAL AND METHODS
Seventeen guava segregating families were evaluated in the present study.Crosses between the parents were established taking into account genetic diversity information obtained by Pessanha et al. (2011).The families were obtained after controlled biparental pollinations of selected guava trees.The crosses were carried out in the municipality of Bom Jesus do Itabapoana, located in the Northwest Fluminense region.
The experiment in a randomized block design with 2 replicates, with 12 plants per experimental plot was carried out when the 17 guava-segregating families had been obtained.The management, fertilization, cultivation, pruning and harvesting of the experimental plants were carried out according to the technical recommendations for the crop, and fertilizations were done based on the results of local soil analyses.
During the fruiting phase, 10 fruits were sampled per individual, and they were harvested at maturity stage 1 with dark green color bark (CAVALINI et al., 2006).Characteristics of three evaluations of guava genotypes were analyzed in harvests from February 2011, January 2012, and October 2012.The characteristics evaluated were: i) fruit weight (FW), with sampling of ten fruits harvested in each genotype, using a semi-analytic balance expressed in g; ii) number of fruits per plant (FN), the counting performed for each individual at the beginning of the harvest (considering fruits viable or not); and, iii) total production (PROD) -its estimate was made by multiplying the FN by the FW, expressed in grams.
The following statistical model was used: ; in which: is the random effect of plant i (NID, 0, ); I is the random effect of family g (NID, 0, ); is the fixed effect of measurement j; is the random effect of block k (NID, 0, ); and is the random error, ~NID (0, R), in which R is the variance and covariance matrix used to model error dependence (LITTELL et al., 2006).This error dependence was assumed to occur between the i -th plant and the j -th measure (i≠j) evaluation of the same genotype, in the same block and, thus, the following corresponding covariance matrices were obtained, respectively, to the variance components, compound symmetry, first order and unstructured autoregressive.
The model presented, considering all the covariance matrices used, was adjusted to the data by the restricted maximum likelihood method, through the PROC MIXED of the SAS (Appendix 1).A generalized linear mixed model has the following form: ; in which: y: known vector of observations; : parametric vector of fixed effects, with incidence matrix X; : random effects parametric vector, with incidence matrix z; : random error vector (LITTELL, et al., 2006).
The Akaike information criterion (AIC) was used in order to identify the best matrix, according to Littell, Henry, and Ammerman (1998), so that the lower its value, the better the adjustment of the model in question.In selecting the most appropriate model, it is necessary to calculate the value of AIC and SBC for each model considered, obtaining a classification of the candidate models.The Akaike -AIC criterion based on the log likelihood (LL or 2-LL) L(q) can be calculated by: ; in which d represents the total number of fixed-effect parameters and variance components estimated in the model.Among all possible models considered, the model with the lowest AIC value is considered the best model.
The Schwarz Information Criterion or Schwarz Bayesian Criterion -SBC is so named because Schwarz presented a Bayesian argument to prove it in 1978.The SBC is calculated by: in which (sum of the size of all vectors y i ).A feature of SBC and AIC is to penalize the more complex models with more parameters.According to this criterion, the best model will be the one that presents the lowest SBC.The estimates of the repeatability ( ) and accuracy ( ) coefficients were performed using mixed modeling, as described by Viana and Resende (2014).

RESULTS AND DISCUSSION
The covariance parameters estimates of SR (self-regression), SC (symmetric component), and UN (unstructured) structures for fruit weight (FW), fruit number (FN) and fruit production (PROD) are presented in Table 1.For FW, the compound symmetry component of the covariance structure, which specifies the covariance between two measurements in the same plant, was -7.3879.Measures of variance were 1676.48.
The correlation between two measurements on the same individual when applying the correlation estimators and covariance parameter estimates was: .Thus, this negative correlation between the two measurements within the same genotype establishes that this type of covariance structure cannot be used for the modeling of residues in order to improve the estimates of repeatability with greater accuracy.
The estimates obtained for the AR covariance structure and variance were respectively 1669.35 and 0.03045, for fruit weight (FW), 1709.38 and 0.08027, for fruit number (FN), 37579062, and 0.1445 for fruit production (PROD) (Table 1).For the unstructured covariance structure, the general pattern of correlations decreased with increasing harvests in the correlation matrix (time interval between measurements) for the FW variable.Variations also increased with harvests in the covariance matrix, from the variation of 128.22 in crop 1 to 3279.61 in crop 3 for FN, and from 6316707 in crop 1 to 68876835 in crop 2. This pattern of variance increase is not adjusted by the Symmetric Component (SC) or by the AR first-order autoregressive covariance model. 1 In Table 2 the correlations for the AR structure appear in the 'R Correlation Matrix' output of the PROC MIXED of SAS, in which they are equal to R = 0.03045 for adjacent crops, and 0.000927 = 0.03045 2 for 2-period observations in adjacent units (crops) for FW.For the AR structure in FN, R = 0.08027 for adjacent measurements and 0.00644 = 0.08027 2 for 2-period observations in adjacent units.In PROD, R = 0.1445 for adjacent measurements and 0.02088 = 0.1445 2 for 2-period observations in adjacent units.In SC correlations are equal to R = -0.00443for FW, 0.06895 for FN, and 0.1302 for PROD.Two adjustment model criteria calculated by the PROC MIXED were used to decide which of the three covariance structures will be used to evaluate the model and perform the final inference: the Akaike Information Criterion (AIC), and the Schwarz Bayesian Criterion (SBC).These are essentially likelihood values penalized by the number of estimated parameters.SBC imposes a more severe penalty than AIC (LITTELL; HENRY; AMMERMAN, 1998).
Regarding the structure of the R matrix of variances and covariance, both AR and SC structures are observed to have close AICs, with the lowest AIC values.G = SC and RA are observed to describe the variables "better", being the most appropriate because they promoted the smallest impacts on the 2 "fitting information" criteria (Table 3).Freitas, Presotti, and Toral (2005) concluded that an adequate covariance structure to model weight variables from birth to two years of age for all breeds was the non-structured, followed by the Factor-Analytical structure for Nelore, Gir and Indubrasil, and the Composite Heterogeneous Symmetry for Guzerá.
According to Gilmour et al. ( 2004), the multivariate model, also called the non-structured covariance matrix between crops (UN), which treats each crop as if it were a different variable is a complete and adequate model for analyzing a set of variables of this nature.This covariance structure is applied to all random model factors, such as genotypic treatment effects, plot effects, and residual effects.However, when one considers a relatively large number of crops (three or more), such a model is difficult to adjust for convergence.

Repeatability estimates
The repeatability estimates are an important parameter when choosing a genotype, according to reports by Cruz, Regazzi, and Carneiro (2012), since they can predict the stability of response to production related variables.In this context, the coefficient of repeatability allows determining the number of phenotypic observations that must be performed, with minimum cost and labor.Therefore, repeatability measures the mean correlation between two productions of the same individual.-1030.4 -984.0 -1987.5 1

UN
Table 4 shows the values of the coefficients of repeatability and accuracy estimated and predicted via mixed modeling, based on the three harvests.The values of the repeatability coefficients obtained for fruit weight (0.25), fruit number (0.14), and fruit production (0.29) were considered low according to Resende (2002).This fact indicates that a single observation of the individual does not represent its real capacity, and therefore more than one observation is necessary for decision making regarding its use.These low values also indicate the irregularity of the individuals' superiority from one measurement to another.
It is possible to observe that for the variables FW and PROD, five measurements would be necessary to possibly reach an accuracy of around 0.50.As for the variable FN, it was observed that an accuracy value of 0.32 would only be reached with 10 measurements.
One of the difficulties found in the evaluation and selection of breeding programs is the determination of the number of necessary evaluations (cuts or harvesting times) to estimate the differences between evaluated genotypes.The process usually involves a large number of experiments, with several stages and evaluation of different characteristics, which implies the use of considerable labor and time.As an alternative to overcome such limitations, the estimation of the repeatability coefficient (r) can be used to reduce the number of evaluations (CRUZ; REGAZZI; CARNEIRO, 2012;LESSA et al., 2014;NEGREIROS et al., 2014).Thus, the determination of the minimum number of multiple measures is used to carry out the selection with a certain degree of accuracy and efficiency, with minimum costs and efforts.According to Viana and Resende (2014), the different objectives of the selection should be considered: (i) for the short-term improvement (maximization of gain in the current generation), efficiency must be analyzed per selection cycle, (ii) annual efficiency must be verified for the long-term improvement by recurrent selection.Considering estimated repeatability values of 0.75, breeding in the short term is observed not to be worth evaluating for more than three harvests, and for breeding in the long term by mass selection, the ideal is to select based on only one harvest, according to the different scenarios of repeatability values observed by these authors.
By the same criterion established by Viana and Resende (2014), with estimated repeatability equal to 0.50, use of one or two harvests is observed to also contribute to the improvement in the long term.The use of three harvests or more only becomes advantageous for the long-term improvement when repeatability is equal to or lower than 0.45 and 0.35, respectively.In the case of guava, this becomes a point of relevance to be considered, being the most indicated since the program designed such population study is for a long term.It can also be reported that the smaller the interval of generations, the less advantageous the use of repeated measures becomes, which in this case is not considered for the species in question either.
The second criterion can be given as a function of the selective accuracy ( ) or determination (reliability) ( ) chosen a priori, but in this case, the inferences about the adequate value to reach certain reliability in the selection for the additive values and genotypes are observed to also depend on the estimates of characteristics heritability, and not only on the repeatability (VIANA; RESENDE, 2014).

CONCLUSIONS
In the present study, the self-regressive and compound symmetry forms provided the best results.
For variables FW, FN and PROD, the values of repeatability coefficients were considered low, indicating that more than one observation of the individual would be necessary for adequate accuracy.

Table 1 .
Estimates of Covariance Parameters of autoregressive covariance (AR), symmetric component (SC), and unstructured (UN) structures for productivity variables in guava.

Table 2 .
Correlation matrix estimation for the autoregressive (AR), symmetric component (SC), and unstructured (UN) structures for productivity variables in guava.

Table 3 .
AIC (Akaike's Information Criterion) and SBC (Schwarz's Bayesian Criterion) information criteria for the statistical adjustment of autoregressive covariance (AR), symmetric component (SC), and unstructured (UN) structures productivity variables in guava.

Table 4 .
Prediction of coefficients of repeatability and accuracy (efficiency of the number of measures) of the same individual in several crops for characteristics related to productivity in guava.