Generalized mixed linear modeling approach to analyze nodulation in common bean inbred lines

The objective of this work was to compare distributions for the modeling of the number and dry matter weight of nodules (DWN) of Rhizobium from different inoculants in common bean (Phaseolus vulgaris) inbred lines subjected to nitrogen doses, as well as to identify the best inoculant for those lines. The experiment was carried out in a randomized complete block design, arranged in split-split plots, with three factors – four inbred lines, five nitrogen doses (0, 20, 40, 60, and 80 kg ha-1), and three inoculants (CIAT 899, UFLA 02-100, and peat) – and four replicates. The number of nodules and their dry matter weight were analyzed with the generalized linear mixed modeling approach. The highest number of nodules was obtained with the CIAT 899 inoculant, at the dose of 20 kg ha-1 N (260 nodules), followed by UFLA 02-100, at 80 kg ha-1 (109 nodules), and peat alone at 20 kg ha-1 (98 nodules). The DWN with CIAT 899 exceeded in 0.66 g the DWN with UFLA 02-100, and in 0.95 g the DWN obtained without inoculation (inoculated with peat alone). The use of the negative binomial distribution and of the gamma distribution is a simple way to control data overdispersion of the nodule number and data underdispersion of DWN, respectively.


Introduction
Studies on Rhizobium inoculation and other plant growth promoting rhizobacteria (PGPR), under Brazilian conditions (Hungria et al., 2003), have been immensely important for improving the crop performance of common bean (Phaseolus vulgaris L.).
Nodulation and symbiotic nitrogen fixation (SNF) depend on the microsymbiont strain, N availability, and plant cultivar.According to Hungria et al. (2015), Rhizobium is certainly one of the most employed PGPR in the world.Increased SNF in Southern Brazil Oxisols has been observed with the use of Rhizobium tropici, which is a species more adapted to acid soils and to higher temperatures (Ali et al., 2009).Rhizobium tropici CIAT 899 is an efficient strain approved for commercial use in Brazil (Hungria et al., 2003).However, as P. vulgaris is a relatively promiscuous host, new Rhizobium spp.nodulating common beans are frequently reported, in different parts of the world, which means that their classification is always under review (Vieira et al., 2010).
High-nitrogen availability in the soil may reduce the symbiosis for nitrogen fixation, even if it did not affect nodulation (Souza et al., 2011).However, reduced availability of N also limits the symbiosis performance (Rosolem, 1987).Current research has confirmed that low-N doses in the sowing have favored the development of bean root system, enhancing the sites of root infection and contributing to nitrogen fixation (Brito et al., 2011).Soares et al. (2016) showed that the application of small doses (20 kg ha -1 ) of N in the sowing of common beans improves nodule formation of R. tropici strain CIAT 899.Consequently, the adequate doses of N fertilization may increase SNF and grain yield, while reducing environmental impacts and spending with fertilizers.
The number of nodules has been frequently analyzed -as an indicator trait of SNFwith continuous distribution (Ferreira et al., 2009;Fonseca et al., 2013).However, it has been shown that this trait does not follow a continuous distribution, and data overdispersion has increased the variability in nodulation studies with different cultivars (Simonsen et al., 2015;Silva et al., 2017).Overdispersion is a common feature of biological models, but researchers often fail to adequately model the excess variation driving it.This may result in biased parameter estimates and standard errors, which potentially compromise the conclusions (Harrison, 2015).
The objective of this work was to compare distributions for the modeling of the number and dry matter weight of Rhizobium nodules from different inoculants in common bean inbred lines subjected to N doses, as well as to identify the best inoculant for those lines.
The plots were formed by the inbred lines; the split plot, by the N doses, and the split-split plots, by the inoculants.Each experimental unit (split-split plot) had six 4-m rows, spaced 0.45 m apart from each other, with 0.5-m bordering space at the row ends.
Seed were treated with fungicide and insecticide, 48 hours before sowing.The following commercial products were used: pyraclostrobin, methyl thiophanate, and fipronil.The inoculants were prepared with sterile 79 medium (peat based), with a final concentration of 10 9 cells g -1 (Vincent, 1970).They were added to seed at 100 g kg -1 (w/v).Inoculation was performed right before sowing with the following treatments: Rhizobium tropici CIAT 899; Rhizobium etli UFLA 02-100; and peat (no inoculation).
At the R6 stage (full flowering), ten plants (from rows 4 and 5) were randomly collected per split-split plot, to evaluate the nodulation (number of nodules, NN).At the laboratory, roots were carefully washed and kept in a cold room at 5°C.The nodules were removed from the roots, counted, weighted, dried at 55°C for approximately 72 hours, and weighted again to obtain the dry matter weight of nodules (DWN).
A generalized linear mixed modeling (GLMM) approach was used to analyze the dry matter weight of nodules (DWN) and the number of nodules (NN).For NN, Poisson and negative binomial distributions were tested and implemented in Proc GlimMix of SAS.For DWN, normal and gamma distributions were compared.Random effects were used for modeling whole plots, the split plots, and the split-split plots.Log link function was used for all models.
Classical Pearson's chi-square and generalized Pearson's chi-square were used to evaluate the fit of the model (McCullagh & Nelder, 1989).Additionally, in order to find the best model for NN, Poisson and negative binomial (NB) distributions were tested based on the log likelihood value.Type III analysis was performed to evaluate the statistical significance of the main effects and of the interaction in the models for NN and NDW.If interactions were significant at 5% probability, they were shown; otherwise, only the main effects were shown.The means of all statistically significant interactions were compared with contrast statement in Proc GlimMix of SAS.
To construct a GLMM regression of N doses and NN for each inoculant, a log likelihood function was adjusted, using a NB distribution in Proc NLMixed of SAS.The parameterizations used to construct the log likelihood functions of NB were chosen to facilitate expressions for the mean parameters modeled through (inverse) link functions, and for scale parameters.The a priori (initial) values of parameterization were based on parameters estimated with a TableCurve 2D model, previously adjusted.In addition, the log likelihood function parameterized for means and the dispersion parameter were defined as: l y w y w y w y w in which: μ i is the general mean; y i is the response vector with negative binomial distribution; ϕ (or k) is the negative binomial dispersion parameter; and w i is the known weight for each observation w i = l for all observations).For a given ϕ, the negative binomial distribution was considered exponential.Parameter ϕ is related to the scale of the data, for being part of the variance function.

Results and Discussion
The residuals for NN and DWN were nonnormal and showed heterogeneity of variances, acording to the Shapiro-Wilk and Bartlett tests, respectively.If the assumptions of normal distribution of residuals, homogeneity of variances, and additivity are not met, the statistical conclusions are not consistent.These results were reinforced by the over-and underdispersion of NN and DWN data, respectively.
Actually, count data are frequently overdispersed, since some samples have extremely low or high values of NN, which can cause a considerable overinflation of parameter estimates and overly optimistic statistical results.Therefore, the number of nodules per plant is commonly represented by distributions other than normal (Silva et al., 2017).In fact, the Poisson distribution is the simplest procedure for modeling count data, and it is considered a standard (Cameron & Trivedi, 2013).For the legume Medicago lupulina, Simonsen et al. (2015) used a Poisson distribution for this variable.In the present study, the mean and variance of the number of nodules were extremely different, which suggests an overdispersion of the data.
Poisson and binomial distributions are frequently used for modeling nodulation data (Silva et al., 2017); Pesq.agropec.bras., Brasília, v.52, n.12, p.1178-1184, dez.2017 DOI: 10.1590/S0100-204X2017001200006 however, for overdispersed data, some distributions were developed for goodness-of-fit and modeling, including: negative binomial, quasi-likelihood, and generalized Poisson distributions (Wedderburn, 1974;Harrison, 2015).In the present study, the frequency of number of nodules was very variable, and the observed nodulation frequency differed from that expected one with the Poisson distribution (Figure 1).Consequently, two different distribution models (Poisson and negative binomial) were tested.Overdispersion was no longer an issue under negative binomial distribution (ϕ =1.0), and the goodness-of-fit of the model was better in this case than with the Poisson distribution (ϕ =35.08) (Table 1).Based on the log likelihood measure, the NB distribution [-2 log(L) NB = 520.01;-2 log(L) Poisson = 6175.89]was preferred.This distribution is also specific for count response variables, and has been applied as a standard underlying equation suitable for modeling biological count data (Pérez-Hernández & Giesler, 2014).DWN data were underdispersed ϕ < 1 with normal distribution (Table 1).According to the fit statistics, the underdispersion was less aggravated by gamma distribution modeling (based on ϕ statistics).In fact, the generalized chi-square showed a value closer to 1 with this distribution than with the normal distribution (Table 1).For NN, as expected, some differences in significant effects were identified in a comparison between the type III results of Poisson and NB distributions (Table 2).However, based on our -previously mentioned -preliminary statistics, type III analysis supported that the GLMM with NB distribution showed that N doses (D), inoculants (I), and their interaction (D x I) effects were statistically significant.The adjusted GLMM regression models (Table 3) expressed the NN relationship between CIAT 899, UFLA 02-100, and peat alone (absence of inoculation, AI), within the doses, and showed that CIAT 899 produced the highest NN at 20 kg ha -1 (260 nodules), whereas UFLA 02-100 produced the maximum NN at 80 kg ha -1 (109 nodules), and the treatment without inoculation showed the maximum NN with 20 kg ha -1 (98 nodules).Evaluating inoculation with R. tropici, Soares et al. (2016) found 649 and 423 nodules in the treatments with inoculation and inoculation + 20 kg ha -1 , and 0.66 g and 0.87 g DWN, respectively.
A superior nodulation capacity of inoculant CIAT 899 was observed, in comparison to UFLA 02-100, and to the treatment without inoculation, for all N doses (Table 3).Common bean inoculation with CIAT 899 resulted in a nodulation increase of 388%, in comparison to treatments without N fertilization, and without rhizobium inoculation (Valadão et al., 2009).In comparison with other rhizobia species, R. tropici (CIAT 899) is genetically more stable and more tolerant to several stresses, such as low-soil pH and high temperatures, which are common in tropical environments, and to several antimicrobials, including pesticides (Ormeño-Orrillo et al., 2012).Therefore, this strain is widely used in commercial inoculants destined for application to common bean in South America and Africa (Hungria et al., 2003).
The GLMM of DWN showed that N doses and inoculant effects were statistically significant in both models, with normal and gamma distributions (Table 2).In addition, this response variable was less affected by underdispersion than NN was by overdispersion.This suggests that mixed linear models can be used to analyze DWN. In fact, according to Harrison (2015), observation-level random effects can provide a simple means to control overdispersion that can easily be implemented in mixed-effect model packages.
The estimated difference between the inoculants for DWN showed that CIAT 899 exceeded in 0.66 g the inoculant UFLA 02-100, and in 0.95 g the absence of inoculation (Table 4).In a dystrophic Oxisol, Valadão et al. (2009) observed statistical differences between CIAT 899 and the treatment without inoculation, in common beans.This result confirmed that CIAT 899 is an important symbiont for nodulation and for symbiotic N fixation in the Brazilian acid soils (Oxisols).Nonetheless, in studies with a eutrophic Oxisol, Fonseca et al. (2013) reported no statistical differences between the inoculants CIAT 899 and UFLA 04-173, and the absence of inoculation for NN, DWN, and shoot-N content.Similarly, Ferreira et al. (2009) reported no statistical differences among the inoculants UFLA 02-68, UFLA 02-86, CIAT 899, UFLA 02-127, and UFLA 02-100, and the absence of inoculation, for DWN in the same soil type (Lavras, MG).
Despite the small sample size of the common bean lines evaluated for NN and DWN in the present study, the generalized mixed modeling approach may represent an alternative to analyze data with nonnormal distribution, and should be applied when NN and DWN do not meet the assumptions of normality and homogeneity of variance, or in the case of the commonly observed over-and underdispersion of the data.Generalized linear mixed model CIAT 899 log(nn) = 5.3546 -0.019D + 0.1344D 0.5 UFLA 02-100 log(nn) = 4.3050 -0.1639D + 0.0011D 2 + 0.7177D 0.5 AI log(nn) = 3.9704 -0.1407D + 0.0008D 2 + 0.6945D 0.5 χ 2 , chi-square; AI, absence of inoculation (inoculation with peat alone).

Conclusion
Negative binomial and gamma distributions constitute a simple means to overcome over-and underdispersed data, respectively, and they can easily be included in mixed model analyses for evaluating the nodulation and the nodule dry weight in common beans (Phaseolus vulgaris). (1)Contrasts were constructed using least square means based on GLMM; however, for presentation purposes, differences of original means are shown. (2)Difference between means on the raw data scale.AI, absence of inoculation.

Figure 1 .
Figure 1.Observed and expected frequencies for Poisson distribution of the number of nodules (NN) in common bean (Phaseolus vulgaris).

Table 1 .
Goodness-of-fit statistics of the negative binomial (NB) and Poisson distributions, and of normal and gamma distributions used for model comparison of the number of nodules (NN) and dry weight of nodules (DWN) in common bean (Phaseolus vulgaris), respectively.

Table 2 .
Comparison of type III analysis for Poisson and negative binomial distributions, and for normal and gamma distributions of the number of nodules (NN) and dry weight of nodules (DWN) in common bean (Phaseolus vulgaris), respectively.

Table 3 .
Analyses of contrasts and generalized linear regression mixed models for the number of nodules in common bean (Phaseolus vulgaris), and for the evaluated inoculants within different nitrogen doses.

Table 4 .
Difference of least square means between pair of inoculants(1), for dry weight of nodules in common bean (Phaseolus vulgaris).