Some Computational Aspects to Find Accurate Estimates for the Parameters of the Generalized Gamma distribution

In this paper, we discuss computational aspects to obtain accurate inferences for the parameters of the generalized gamma (GG) distribution. Usually, the solution of the maximum likelihood estimators (MLE) for the GG distribution have no stable behavior depending on large sample sizes and good initial values to be used in the iterative numerical algorithms. From a Bayesian approach, this problem remains, but now related to the choice of prior distributions for the parameters of this model. We presented some exploratory techniques to obtain good initial values to be used in the iterative procedures and also to elicited appropriate informative priors. Finally, our proposed methodology is also considered for data sets in the presence of censorship.


Introduction
The Generalized gamma (GG) distribution is very flexible to be fitted by reliability data due to its different forms for the hazard function. This distribution was introduced by Stacy [24] and has probability density function (p.d.f.) given by where t > 0 and θ = (φ, µ, α) is the vector of unknown parameters. The parameters α > 0 and φ > 0 are shape parameters and µ > 0 is a scale parameter. This parametrization is a standard form of the GG distribution originated from a general family of survival functions with a location and scale changes of the form, Y = log(T ) = µ + σW where W has a generalized extreme value distribution with parameter φ and α = 1 σ . The survival function is given by where Γ[y, x] = ∞ x w y−1 e −w dw is the upper incomplete gamma function. Many lifetime distributions are special cases of the GG distribution such as the Weibull distribution (when φ = 1), the gamma distribution (α = 1), log-Normal (limit case when φ → ∞) and the generalized normal distribution (α = 2). The generalized normal distribution also is a distribution that includes various known distributions such as half-normal (φ = 1/2, µ = 1/ √ 2σ), Rayleigh (φ = 1, µ = 1/ √ 2σ), Maxwell-Boltzmann (φ = 3/2) and chi (φ = k/2, k = 1, 2, . . .).
Although the inferential procedures for the gamma distribution can be easily be obtained by classical and Bayesian approaches, especially using the computational advances of last years in terms of hardware and software [13], the inferential procedures for the generalized gamma distribution under the maximum likelihood estimates (MLEs) can be unstable (e.g., [7,14,23,25]) and its results may depend on the initial values of the parameters chosen in the iterative methods. Ramos et al. [19] simplified the MLEs under complete samples and used the obtained results as initial values for censored data. However, the obtained MLEs may not be unique returning more than one root. Additionally, the obtained results lies under asymptotic properties in which are not achieved even for large samples (n > 400) as was discussed by Prentice [17]. A similar problem appears under the Bayesian approach, where different prior distributions can have a great effect on the subsequent estimates. It is important to point out that, the literature on GG distribution is very extensive, with many studies considering statistical inference as well as others more concentrated in applications [2-4, 15, 16, 20-22].
In this study, methods exploring modifications of Jeffreys' priors to obtain initial values for the parameter estimation are proposed. The obtained equations under complete and censored data were used as initial values to start the iterative numerical procedures as well as to elicited informative prior under the Bayesian approach. An intensive simulation study is presented in order to verify our proposed methodology. These results are of great practical interest since it enable us for the use of the GG distribution in many application areas. The proposed methodology is illustrated in two data sets. This paper is organized as follows: Section 2 reviews the MLE for the GG distribution. Section 3 presents the Bayesian analysis. Section 4 introduces the methods to obtain good initial values under complete and censored data. Section 5 describes a simulation study to compare both classical and Bayesian approaches. Section 6 presents an analysis of the data set. Some final comments are made in Section 7.

Classical inference
Maximum likelihood estimates are obtained from from the maximization of the likelihood function . Let T 1 , . . . , T n be a random sample of a GG distribution, the likelihood function is given by ( From ∂ ∂α log(L(θ; t)), ∂ ∂µ log(L(θ; t)) and ∂ ∂φ log(L(θ; t)) equal to zero, the obtained likelihood equations are . The solutions of (3) provide the maximum likelihood estimates [7,23]. Numerical methods such as Newton-Rapshon are required to find the solution of the nonlinear system.
Under mild conditions the maximum likelihood estimators of θ are not biased and asymptotically efficient. These estimators have an asymptotically normal joint distribution given by where I(θ) is the Fisher information matrix In the presence of censored observations, the likelihood function is From ∂ ∂α log(L(θ; t, δ)), ∂ ∂µ log(L(θ; t, δ)) and ∂ ∂φ log(L(θ; t, δ)) equal to zero, we get the likelihood equations, The solutions of the above nonlinear equations provide the MLEs of φ, µ and α. Other methods using the classical approach have been proposed in the literature to obtain inferences on the parameters of the GG distribution [1,4,8].

A Bayesian approach
The GG distribution has nonnegative real parameters. A joint prior distribution for φ, µ and α can be obtained from the product of gamma distributions. This prior distribution is given by From the product of the likelihood function (2) and the prior distribution (6), the joint posterior distribution for φ, µ and α is given by, The conditional posterior distributions for φ, µ and α needed for the Gibbs sampling algorithm are given as follows: Considering the presence of right censored observations, the joint posterior distribution for φ, µ and α, is proportional to the product of the likelihood function (4) and the prior distribution (6), resulting in The conditional posterior distributions for φ, µ and α needed for the Gibbs sampling algorithm are given as follows: Using prior opinion of experts or from the data (empirical Bayesian methods), the hyperparameters of the gamma prior distributions can be elicited using the method of moments. Let λ i and σ 2 i , respectively, be the mean and variance of θ i ,

Useful equations to get initial values
In this section, an exploratory technique is discussed to obtain good initial values for numerical procedure used to obtain the MLEs for GG distribution parameters. These initial values can also be used to elicit empirical prior distributions for the parameters (use of empirical Bayesian methods). Although different non-informative prior distributions could be considered for the GG distribution as the Jeffreys' prior or the reference prior [9], such priors involve transcendental function in φ such as digamma and trigamma functions with does not allow us to obtain closed-form estimator for φ. A simple non-informative prior distribution representing the lack of information on the parameters can be obtained by using Jeffreys' rule [9]. As the GG distribution parameters are contained in positive intervals (0, ∞), using Jeffreys' rule the the following prior is obtained The joint posterior distribution for φ, µ and α, using Jeffreys' rule is proportional to the product of the likelihood function (2) and the prior distribution (12), resulting in, However, the posterior density (13) is improper, i.e., > 0 and c 1 and g 1 are positive constants such that the above inequalities occur. For more details and proof of the existence of these constants, see Ramos [18].
Since the Jeffreys' rule prior returned an improper posterior, a modification in this prior was considered. This modified prior is given by The present modification had two main objectives. First was to produce a modified prior for π(α) that behave similar to π(α) ∝ α −1 (see Figure 1). Second, was to construct a joint prior distribution that yields a proper posterior distribution. The joint posterior distribution for φ, µ and α, using the prior distribution (14) is proportional to the product of the likelihood function (2) and the prior distribution (14) resulting in, This joint posterior distribution is proper, i.e., where θ = (φ, µ, α) and The proof of this result is presented in Appendix A at the end of the paper. The marginal posterior distribution for α and φ is obtained by integrating (15) with respect to µ, i.e., Therefore, the marginal posterior distribution of α and φ is given by The logarithm of the above marginal distribution is given by The first-order partial derivatives of log (p M (α, φ|t)), with respect to α and φ, are given, respectively, by, The maximum a posteriori estimates ofφ andα related to p M (α, φ|t) are obtained by solving ∂ log (p M (α, φ|t)) ∂α = 0 and ∂ log (p M (α, φ|t)) ∂φ = 0.
By solving ∂ log (p M (α, φ|t)/ ∂α, we havẽ By solving ∂ log (p M (α, φ|t)/ ∂φ and replacing φ byφ we haveα can be obtained finding the minimum of Using one dimensional iterative methods, we obtain the value ofα. Replacingα in equation (17), we haveφ. The preliminary estimate µ can be obtained from and after some algebraic manipulation, This approach can be easily used to find good initial valuesφ,μ andα. These results are of great practical interest since it can be used in a Bayesian analysis to construct empirical prior distributions for φ, µ and α as well as for initialization of the iterative methods to find the MLEs of the GG distribution parameters.

A numerical analysis
In this section, a simulation study via Monte Carlo method is performed in order to study the effect of the initial values obtained from (17, 18 and 19) in the MLEs and posterior estimates, considering complete and censored data. In addition, MCMC (Markov Chain Monte Carlo) methods are used to obtain the posterior summaries. The influence of sample sizes on the accuracy of the obtained estimates was also investigated. The following procedure was adopted in the study: (1) Set the values of θ = (φ, µ, α) and n.
The chosen values for this simulation study are θ = ((0.5, 0.5, 3),(0.4, 1.5, 5)) and n = (50, 100, 200). The seed used to generate the random values in the R software is 2013. The comparison between the methods was made through the calculation of the averages and standard deviations of the N estimates obtained through the MLEs and posterior means. It is expected that the best estimation method has the means of the N estimates closer to the true values of θ with smaller standard deviations. It is important to point out that, the results of this simulation study were similar for different values of θ.

A classical analysis
In the lack of information on which initial values should be used in the iterative method, we generate random values such thatφ ∼ U(0, 5),μ ∼ U(0, 5) andα ∼ U(0, 5). This procedure is denoted as "Method 1". On the other hand, from equations (17, 18 and 19), good initial values are easily obtained to be used in iterative methods to get the MLEs considering complete observations. In the presence of censored observations we discuss the possibility of using only the complete observations, in order to obtain good initial valuesφ,μ andα. This procedure is denoted as "Method 2". Tables 1-3 present the means and standard deviations of the estimates of N = 50, 000 samples obtained using the MLEs, calculated by the methods 1 and 2 for different values of θ and n, using complete and censored observations. We also have the coverage probability (CP) with a nominal 95% confidence level.  We can observe in the results shown in Tables 1 to 3, that the MLEs are strongly affected by the initial values. For example, using method 1, for µ = 0.5andn = 50, the average obtained from N = 50, 000 estimates of µ is equal to 1.816, with standard deviation equals to 1.973. Other problem using such methodology is that the coverage probability tends to decrease even considering large sample sizes.
Using our proposed methodology (method 2) it can be seen that in all cases the method gives good estimates for φ, α and µ. The coverage probabilities of parameters show that, as the sample size increase the intervals tend to 95% as expected. Therefore, we can easily obtain good inferences for the parameters of the GG distribution for both, complete or censored data sets.
The OpenBUGS was used to generate chains of the marginal posterior distributions of φ, µ and α via MCMC methods. The code is given by In the case of censored data, we only have to change dggamma(phi,mu,alpha) in the code above by dggamma(phi,mu,alpha)I(delta[i],). The hyperparameters phi1,phi2,. . .,alpha2 are obtained by our initial values. The main advantage of using the OpenBUGS is due its simplicity, here, the proposal distribution for generating the marginal distributions are defined by the program according to its structure. Therefore we only have to set the data to be fitted. For each simulated sample 8, 500 iterations were performed. As "burn-in samples", we discarded the 1000 initial values.To decrease the autocorrelations bettwen the samples in each chain, the thin considered was 15, obtaining at the end three chains of size 500, these results were used to obtain the posterior summaries for φ, µ and α. The convergence of the Gibbs sampling algorithm was confirmed by the Geweke criterion [6] under a 95% confidence level.
To illustrated the importance of good initial values and informative priors, firstly we considered a simulation study using flat priors, i.e., prior distributions that have large variance, here we assume that θ i ∼ Uniforme(0, 40), i = 1, . . . , 3 or θ i ∼ Gamma(0.01, 0.01). Table 4 displays the means, standard deviations of the posterior means from 1000 samples obtained using the flat priors calculated for different values of θ and n.
From this table, we observe that flat priors with vague information may affect the posterior estimates, specially for small sample sizes, for instance in the case of n = 50 and α = 3.0 we obtained the mean of the estimates given by 8.762 with standard deviation 44.485 which is undesirable. On the other hand, tables 5-6 display the means, standard deviations and the coverage probabilities of the estimates of 1000 samples obtained using the initial values (IV) and the Bayesian estimates (BE) of the posterior means, calculated for different values of θ and n, using complete data.   We observed from Tables 5-6 that using the values calculated from equations (17)(18)(19) in the method of moments (11) with mean λ =θ and with variance σ 2 θ =θ to get the hyperparameter values of the gamma distribution, we obtained very good posterior summaries for φ, µ and α, under a Bayesian approach.
The parameters estimation were also considered under censored observations (20% of censoring). Here, we used the same procedures described in Martinez et al. [12] to generate the pseudo random samples under the same conditions described in the beginning of this section. Tables 7-8 display the means, standard deviations and the coverage probabilities of the estimates of 1000 samples obtained using the initial values and the Bayes estimates of the posterior means, calculated for different values of θ and n, using censored observations (20% censoring) Table 6. Means and standard deviations from initial values and posterior means subsequently obtained from 1000 simulated samples with different values of θ and n, using the gamma prior distribution with λ = (φ,μ,α), σ 2 θ = (1, 1, 1), σ 2 θ = (φ,μ,α) and σ 2 θ = (10, 10, 10) assuming complete observations θ σ 2 θ = (1, 1, 1) σ 2 θ = (φ,μ,α)   (17)(18)(19) in the method of moments (11) with mean λ =θ and with variance σ 2 θ =θ are usefull to get the hyperparameter values of the gamma distribution under censored data allowing us to obtain good posterior summaries for φ, µ and α, under a Bayesian approach.

Real data applications
In this section, the proposed methodology is applied using two data sets from the literature. The GG distribution is assumed to analyze these data sets and the obtained results are compared with other models such as the Weibull, Gamma and Lognormal distributions, using the Akaike information criterion (AIC = −2l(α; x) + 2k), corrected Akaike information criterion (AICc = AIC + (2 k (k + 1))/(n − k − 1)) and the Bayesian information criterion (BIC = −2l(α; x) + k log(n)), where k is the number of parameters to be fitted andα is the estimate of α. In all cases, Bayesian approach is used to obtain the estimates of the parameters for the different distri- Table 8. Means and standard deviations from initial values and posterior means subsequently obtained from 1000 simulated samples with different values of θ and n, using the gamma prior distribution with λ = (φ,μ,α), σ 2 θ = (1, 1, 1), σ 2 θ = (φ,μ,α) and σ 2 θ = (10, 10, 10) assuming censored data (20% censoring). θ σ 2 θ = (1, 1, 1) σ 2 θ = (φ,μ,α)  butions. Addionaly, we assume gamma priors with hyperparameters values equal to 0.1 for the parameters of the Weibull, Gamma and Lognormal distributions. For each case, 30, 000 Gibbs samples were simulated using MCMC methods taking every 15 th generated sample obtaining a final sample of size 2, 000 to be used to get the posterior summaries.

Remission times of patients with cancer
In this section, we consider a data set that represents the remission times (in months) of a random sample of 128 bladder cancer patients [11]. This data set does not has censored values (see Table 9). From equations (17)(18)(19), we haveφ = 0.5665,μ = 0.2629 andα = 1.4669. Using (11) and assuming as mean λ =θ and variance σ 2 θ =θ, we obtain the priors φ ∼ Gamma(0.5665, 1), µ ∼ Gamma(0.2629, 1) and α ∼ Gamma(1.4669, 1). The posterior summaries obtained from MCMC methods are given in Table [? ].  Figure 2 shows the survival function fitted by the different probability distributions and the empirical survival function. Table 10 presents the results of the AIC, AICc and BIC criteria for the different probability distributions, considering the blader cancer data introduced in Table 8.  Based on the AIC and AICc criteria, we concluded from the results of Table 11 that the GG distribution has the best fit for the bladder cancer data.

Data from an industrial experiment
In this section, we considered a lifetime data set related to the cycles to failure for a batch of 60 electrical appliances in a life test introduced by Lawless [10] (+ indicates the presence of censorship).
From equations (17)(18)(19), we haveφ = 0.5090,μ = 0.2746 andα = 1.7725. Using (11) and assuming as mean λ =θ and variance σ 2 θ =θ, we elicit the priors φ ∼ Gamma(0.5090, 1), µ ∼ Gamma(0.2746, 1) and α ∼ Gamma(1.7725, 1). The posterior summaries obtained from MCMC methods are given in Table 13.  Figure 3 shows the survival function fitted by probability distributions and the empirical survival function. Table 14 presents the results of the AIC, AICc and BIC criteria for different probability distributions, considering the industrial failure data set introduced in Table 12.   Based on the AIC and AICc criteria, we concluded from Table 14 that the GG distribution has the best fit for the industrial data. Additionally, the GG distribution is the only one between probability distributions assumed that allows bathtub shape hazard. This is an important point for the use of the GG distribution in applications.

Concluding remarks
The generalized gamma distribution has played an important role as lifetime data, providing great flexibility of fit. In this paper, we showed that under the classical and Bayesian approaches, the parameter estimates usually do not exhibit stable performance in which can lead, in many cases, to different results. This problem is overcome by proposing some empirical exploration methods aiming to obtain good initial values to be used in iterative procedures to find MLEs for the parameters of the GG distribution. These values were also used to elicit empirical prior distributions (use of empirical Bayesian methods). These results are of great practical interest since the GG distribution can be easily used as appropriated model in different applications.