A new model for describing remission times : the generalized beta-generated Lindley distribution

New generators are required to define wider distributions for modeling real data in survival analysis. To that end we introduce the four-parameter generalized beta-generated Lindley distribution. It has explicit expressions for the ordinary and incomplete moments, mean deviations, generating and quantile functions. We propose a maximum likelihood procedure to estimate the model parameters, which is assessed through a Monte Carlo simulation study. We also derive an additional estimation scheme by means of least square between percentiles. The usefulness of the proposed distribution to describe remission times of cancer patients is illustrated by means of an application to real data.

This paper addresses similar issues to Zhu and Galbraith using a different approach.We consider the generalized beta generated (GBG) family of distributions pioneered by Alexander et al. (2012), which has three shape parameters.
The Lindley (L) distribution was firstly used by Lindley (1958) in order to measure the difference between Fiducial and posterior distributions related to Bayesian analysis.Its probability density function (pdf) (for z > 0) with parameter λ > 0, say L(λ), is given by where λ > 0 is a scale parameter.Its cumulative distribution function (cdf) is given by G(z; λ) = 1 − e −λz 1 + λz 1 + λ .
(2) Ghitany et al. (2008) discussed and studied various properties of the pdf (1).The L distribution has an important role in stress-strength reliability modeling and describes well some types of data sets, but it has lower flexibility in modeling asymmetric and/or heavy tail data.Further, it can accommodate hazard rate functions (hrfs) that are increasing, decreasing or constant but not unimodal, bathtub and other shapes, which are desirable in lifetime data analysis.To overcome this, several works proposed new distributions by adding parameters to the Lindley distribution.For example, Sankaran (2015) used such law as the mixing distribution of a Poisson parameter to generate a discrete model called the Poisson-Lindley distribution.Pararai et al. (2015) defined the Kumaraswamy Lindley-Poisson distribution and explored some of its properties.Another extension, named as the generalized Lindley distribution, was studied by Ashour and Eltehiwy (2015).
A profusion of new classes of distributions has recently proven useful to applied statisticians working in various areas of scientific investigation.Generalizing existing distributions by adding shape parameters leads to more flexible models.Let g(x; τ ) and G(x; τ ) be the pdf and cdf of a baseline distribution having parameter vector τ .Alexander et al. (2012) defined the pdf and cdf of the GBG-G distribution (for x ∈ X ⊆ R) using three additional positive shape parameters a, b and c by (3) and respectively, where I(x; a, b) denotes the incomplete beta function ratio and B(a, b) is the complete beta function.
In this paper, we propose a new lifetime model called the GBG-Lindley (GBGL) distribution.We also study some of its structural properties and present the maximum likelihood estimation of the parameters.A Monte Carlo study is performed in order to assess the proposed estimation procedure.
Further, we present evidence that the new model can (i) compensate the Lindley ability lack as well as (ii) produce better fits than the following distributions: • The Lindley-exponential (LE) model (Bhati and Malik 2015), whose pdf and cdf are, respectively, given by • the generalized L (GL) model (Nadarajah et al. 2012) whose pdf and cdf are, respectively, given by and • the transmuted Lindley (TL) model (Mansour and Mohamed 2015), whose pdf and cdf are, respectively, given by This comparison is performed in terms of both items under change in stress and the efficiency in describing remission times (in months) of cancer patients.
This paper is organized as follows.In Section 2, we introduce the GBGL distribution and provide plots of its density function and hrf.We derive linear representations for the pdf and cdf (Section 3), explicit expressions for the quantile function (qf) (Section 4), ordinary and incomplete moments, mean deviations, Bonferroni and Lorenz curves (Section 5) and generating function (Section 6).A procedure for determining the maximum likelihood estimates (MLEs) of the model parameters is addressed in Section 7. Section 8 presents empirical results for the proposed model.Concluding remarks are offered in Section 9.

-THE GBGL DISTRIBUTION
Applying (1) and (2) in equations ( 3) and ( 4), the pdf and cdf of the GBGL distribution (for x ∈ I) are, respectively, given by and For simplicity, we denote f GBGL (x; λ, a, b, c) and F GBGL (x; λ, a, b, c) by f (x) and F (x). Hereafter, a random variable X having density ( 7) is denoted by X ∼GBGL(λ, a, b, c).
Clearly, the L distribution arises as the basic exemplar by taking a = b = c = 1 in (7).As mentioned in the introduction, we motivate the paper by comparing the performance of the new distribution with those of the L, LE and GL models fitted to a real data set.
The qf is useful for determining various mathematical properties of a distribution.For a positive random variable X ∼ F , the qf of X is defined from the generalized inverse of its cdf for a fixed probability u, namely Then, the qf of the GBGL model can be determined by inverting (8) as where Q β(a,b) (u) = I −1 (u; a, b) is the beta qf and Q L (u) is the qf of the L distribution with parameter λ.
Consider the Lambert W-function as the principal solution for w = W (z) in z = w e w .We have the power series expansion for W (z) = P roductLog[z] using the software Mathematica Then, we obtain The qf of X can be expressed in terms of the Lambert function as where the last identity holds based on a result given by Jodrá (2010).

1347
In Figure 1(c), we present one case of generation at (λ, a, b, c) = (2, 2, 2, 2) based on Q GBGL (p) by evaluating the uniform distribution outcomes in its argument.In Figure 1, we display possible shapes of the pdf and hrf of the GBGL model for some parameter values.The hrf can take the most four common forms for applications to real data: increasing, decreasing, bathtub and unimodal shapes, which is an important characteristic of the new lifetime model.The skewness (B) and kurtosis (K) coefficients are two important tools to understand a distribution.Easy procedures to quantify B and K were proposed by Bowley (1920) and Moors (1984) given by, respectively: In particular, for our proposal, Figures 2(a)-2(c) and 2(d)-2(f) display GBGL skewness and kurtosis measures for some parametric points, respectively.It is known that former quantity points out how symmetrical is the model, while the second measures whether the shape of under study model is related to that due to the Gaussian law.These plots indicate that one may define symmetrical and non-symmetrical laws from our model.It is easer to specify curves with long tail to the right.Densities curves distinct from the Gaussian one are obtained.

-LINEAR REPRESENTATIONS
In this section, we present linear representations for ( 7) and ( 8) in order to obtain explicit expressions for some type-moment quantities of the GBGL model.We prove that the expansions -in the form of Theorem 1 and Corollary 1 -can depend only on the GL distribution (Nadarajah et al. 2012).
Theorem 1.The cdf of X ∼ GBGL(λ, a, b, c) can be expressed by the linear combination  where g l (x) denotes the GL density with scale and shape parameters λ and (a + l)c, respectively, and The proof of this theorem is given in Appendix A.
Corollary 1.The cdf of X is given by where G l (x) denotes the GL cdf with parameters λ and (a + l)c.
The following results indicate that type-moment quantities of the GL model can be obtained from those corresponding quantities of the gamma distribution.
Theorem 2. The cdf of Z ∼ GL(λ, c) can be expressed as An Acad Bras Cienc (2017) 89 (3) MODEL FOR DESCRIBING REMISSION TIMES

1349
where H i,k (z) denotes the gamma cdf with shape parameter (k + 1) and scale parameter (i + 1)λ, respectively, and The proof of this theorem is given in Appendix B.
Corollary 2. The pdf of Z ∼ GL(λ, c) is given by where h i,k (z) denotes the gamma density with shape parameter (k + 1) and scale parameter (i + 1)λ.
Finally, the main result of this section provides a simple way for obtaining the properties of the new model by means of the classical gamma model.Theorem 3. As consequences of Theorem 1 and Corollary 2, we can write the density of X as where and v i,k and h i,k (x) are defined in Theorem 2 and Corollary 2, respectively.
The proof of this theorem is given in Appendix C.

-QUANTILE FUNCTION
For some models, it is possible to invert the cdf.However, for some other distributions, this inverse function of cannot be obtained in closed-form.We shall resort to power series methods for the GBLG model.They are at the heart of many solutions in applied mathematics and statistics.First, based on equation (2), we have the following theorem for the qf of the L model.
Theorem 4. The L qf can be expressed as a power series The quantity π k and the proof of this theorem are given in Appendix D.
In the following, we use an equation of Gradshteyn and Ryzhik (2000) for a power series raised to a positive integer j where the coefficients c j,i (for i = 1, 2, . ..) are determined from the recurrence equation (for i ≥ 1) and c j,0 = a j 0 .The coefficient c j,i follows from c j,0 , . . ., c j,i−1 and then from the quantities a 0 , . . ., a i .
Corollary 3. The GBGL qf can be expanded as

-MOMENTS
Henceforth, let Y i,k ∼ Gamma(k + 1, (i + 1)λ).Next, we obtain the ordinary and incomplete moments of X from the corresponding moments of Y i,k .Based on Theorem 3, we can write We have the following corollary from the moments of Y i,k .
Corollary 4. Suppose that µ n = E(X n ) exists.Then, where Further, we can express µ n in terms of Q L (u) as Thus, an alternative expansion for µ n can be obtained from Theorem 4 in the following corollary.
Corollary 5. Suppose that µ n = E(X n ) exists.Then, where the quantities f n,j are determined from ( 13)-( 14) as m π l and the quantity π l is defined in Appendix D.
Next, we obtain the incomplete moments of X.
Corollary 6. Suppose that the nth incomplete moment of X, say T n (y) = y 0 x n f (x)dx, exists.Then, where the quantity w i,k is defined in (11) Equations ( 16), ( 17) and ( 18) are the main results of this section.
Another important application of the first incomplete moment refers to the Bonferroni and Lorenz curves defined (for a given probability p) by L(p) = T 1 (x p )/µ 1 and B(p) = T (x p )/(pµ 1 ), respectively, where x p can be evaluated numerically by ( 9) with u = p.These curves are very useful in economics, demography, insurance, engineering and medicine.
Figure 3 displays plots of the Bonferroni and Lorenz curves for selected parameter values.The nth moment of the residual life, say which is easily obtained from (18).A special case is the mean residual life (MRL) function at age t given by v 1 (t) = E [(X − t) | X > t], which represents the expected additional life length for a unit which is alive at age t.
The nth moment of the reversed residual life given by M n (t) = 1 dx, (for t > 0 and n = 1, 2, . ..) uniquely determines F (x) and follows from v n (t).

-GENERATING FUNCTION
A first representation for the moment generating function (mgf) M (s) of X can be based on the L qf.We can write Expanding the exponential function, and after some algebra using (15), we have the following corollary.
Corollary 7. The mgf of X can be expressed as 0 and the coefficients t j s are defined in Theorem 4.1.
A second representation for M (s) comes from the gamma generating function.We can write where w i,k is defined by (11) and M i,k (s) is the mgf of Y i,k given by Equations ( 19) and ( 20) are the main results of this section.

-ESTIMATION
Several approaches for parameter estimation were proposed in the statistical literature but the maximum likelihood method is the most commonly employed.The MLEs enjoy desirable properties for constructing confidence intervals.In this section, we investigate the estimation of the parameters of the GBGL distribution by maximum likelihood for complete data sets.Alternatively, we propose other estimation procedure that rely on squared distance between theoretical and empirical GBGL quantiles.Both estimation methods will be compared in the next section of numerical results.Consider a random variable X ∼GBGL(a, b, c, λ) and let θ = (a, b, c, λ) T be the parameter vector.Thus, the associated log-likelihood function for one observation x is The MLE of θ is determined by maximizing l n (θ) = n i=1 (θ; x i ) for a given data set x 1 , . . ., x n .Equation ( 21) can be maximized either directly by using the R (optim function), SAS (PROC NLMIXED), Ox program (sub-routine MaxBFGS) or by solving the nonlinear likelihood equations obtained by differentiating this equation.
Based on equation ( 21), the components of the unit score function where ψ(•) is the digamma function.
Although these equations cannot be solved analytically, a numerical solution can be determined by using computing packages.Iterative techniques such as Newton-Raphson type algorithms can be adopted to obtain the MLEs.
For interval estimation and hypothesis tests on the model parameters, we require the observed information matrix.The 4 × 4 unit observed information matrix, , where J rs = −∂ 2 (θ; x)/∂θ r ∂θ s , is given in Appendix E. Likelihood ratio tests can be performed for the new distribution in the usual way.

-LEAST SQUARE ESTIMATION
An alternative estimation to the maximum likelihood method is the least square estimation discussed by Ashour and Eltehiwy (2015).For the GBGL model, the least square estimates (LSEs), a, b, c and λ of a, b, c and λ are defined as those arguments that minimize the objective function: where x (i) is a possible outcome of the ith order statistic based on a n-points random sample obtained from X ∼ GBGL(a, b, c, λ).
The minimum point ( a, b, c, λ) can also be given as a solution of the following system of non-linear equations: where the ith components in the sums are Here, ] and (obtained by Mathematica) An Acad Bras Cienc (2017) 89 (3)

-SIMULATION STUDY
We perform a Monte Carlo simulation study (with 1,000 replications) to quantify some asymptotic properties of both MLEs and LSEs of GBGL parameters.We also measure both the effects of the MLEs and LSEs for the additional parameters, (â, b, ĉ) , over the corresponding estimators of the baseline parameter, λ, and reciprocally.
To that end, we consider λ ∈ {0.5, 1, 2, 3, 4}, a = b = c ∈ {2, 5} and sample size n ∈ {50, 100, 150}.Additionally, as figures of merits, we consider the average estimates due to MLEs and LSEs and their mean square errors (MSEs).The simulation results are given in Table I and II.As expected, the MSEs and biases for the two proposed procedures tend to decrease when the sample size increases.Additionally, increasing the additional parameters implies that the MLE and LSE of λ will have smaller MSEs and biases.Real scenarios having higher additional parameters will conduct to more biased MLEs.Moreover, for approximately 83% of cases, MLEs outperform LSEs in terms of MSEs.

-APPLICATIONS TO REAL DATA
In this section, we perform two applications to real data sets.Initially, we consider data obtained from accelerated life testing of 40 items with change in stress from 100 to 150 at an time instant (Murthy et al. 2004, p. 236, Dataset 12.2).In this first study, we aim to compare Lindley and GBGL models and, for such end, we use the likelihood ratio statistic to test the hypothesis H 0 : a = b = c = 1 ⇔ H 0 : GBGL ≡ Lindley.Table III and Figure 4 display associated main results.One can note that baseline and proposed models are statistically distinct for any nominal level higher than 4%.Fits with respect both empirical density and cumulative distribution function confirm that our model describe data better than the Lindley model.Second, our aim is also to explain remission times (in months) of a random sample of 128 bladder cancer patients (Lee and Wang 2003).To that end, we consider the GBGL distribution, the Lindley baseline, and other three extended Lindley models, namely the LE, GL and TE distributions described in Section 1. Table IV lists the MLEs and their standard errors (SEs) for each fitted model.One can note that all estimates are statistically significant.The plots in Figure 5 display the empirical pdf and cdf and the fitted versions for the three best models according to the subsequent discussion.
Both GBGL and LE models describe well the empirical density of the remission times, but only our proposed model fits well the empirical cdf.
In order to compare quantitatively the competitive models, we adopt two criteria: the Akaike Information Criterion (AIC) and Kolmogorov-Smirnov (KS) statistic.These statistics are widely used to determine how closely a specific cdf fits the associated empirical distribution for a given data set.The smaller these statistics are, the better the fit is.
Table V presents the values of these statistics for some models.The GBGL model provides the best fit to these data among the current models.Thus, our proposal can be a competitive distribution compared with other extended Lindley models: L, L exponential (Bhati and Malik 2015) and GL.In this paper, we propose a new four-parameter distribution called the generalized beta-generated Lindley (GBGL) model.Some of its structural properties (such as the moments and generating function) have been derived from a linear representation for the GBGL density function.We propose a procedure for determining the maximum likelihood estimates (MLEs) of the model parameters.A simulation study is performed to validate the MLEs.We also have indicated an additional estimation process based on the least square method between percentiles.Finally, two applications to real data sets provide evidence that the proposed model  Observed values Empirical and estimated cdfs q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q Lindley Exponential Generalized Lindley GBG−L (b) For cdfs.Using the power series expansion, we obtain Further, where H i,k (x) denotes the gamma cdf with shape parameter (k + 1) and scale parameter (i + 1)λ.
We can change i j=0 j+1 k=0 by i+1 k=0 i j=δk , where δ 0 = 0 for k = 1, 2 and δ k = k − 1 for k ≥ 2, which is very easy to prove by a cartesian plot of k versus j.Then, we have and rearranging terms, we obtain where the new cdf follows as a double linear combination of gamma cdfs An Acad Bras Cienc (2017) 89 (3) MODEL FOR DESCRIBING REMISSION TIMES

1363
By differentiating the last equation, we obtain where h i,k (x) denotes the gamma density with parameters (k + 1) and (i + 1)λ.

APPENDIX C: PROOF OF THEOREM 3
We can write from equations ( 10) and ( 12) where (a + l)c − 1 i and v i,k is defined in Theorem 1.

APPENDIX D: QUANTILE FUNCTION
We derive a power series for Q GL (u) following the steps.First, we use a power series for Q −1 (a, 1−u).Second, we obtain a power series for the argument 1−exp[−Q −1 (a, 1−u)].Third, we derive a power series for the L qf using the Lagrange theorem in order to obtain a power series for Q GL (u).
Then, the inverse function G −1 (x) exists and is single-valued in the neighborhood of the point x = x 0 .The inverse power series x = Q L (u) is given by Then, we can write the GL qf as follows where u 0 = 1 and f i = (−λ) i+1 1 (i+1)!− 1 (1−λ)i!for i ≥ 0. Further, we have where 0 = −1, i = − i , 0 = 1 and i = 1 f0 ∞ j=1 f j i−j .Thus, we obtain from equation ( 22) Illustration of ramon number generator Figure 1 -The GBGL pdf and hrf.

Figure 2 -
Figure 2 -GBGL skewness and kurtosis curve plots for some parametric points.

Figure 4 -
Figure 4 -Histogram and fitted pdfs and cdfs for L and GBGL models at first application (see the colors in the online version).

Figure 5 -
Figure 5 -Histogram and fitted pdfs and cdfs for several models at second application (For the interpretation of the references to color in this figure legend, the reader is referred to the web version of this article).