## Print version ISSN 0001-3765On-line version ISSN 1678-2690

### An. Acad. Bras. Ciênc. vol.89 no.3 Rio de Janeiro July/Sept. 2017

#### https://doi.org/10.1590/0001-3765201720160455

Mathematical Sciences

A new model for describing remission times: the generalized beta-generated Lindley distribution

1Departamento de Estatística, Universidade Federal de Pernambuco, Cidade Universitária, Av. Prof. Moraes Rego, 1235, 50740-540 Recife, PE, Brazil

ABSTRACT

New generators are required to define wider distributions for modeling real data in survival analysis. To that end we introduce the four-parameter generalized beta-generated Lindley distribution. It has explicit expressions for the ordinary and incomplete moments, mean deviations, generating and quantile functions. We propose a maximum likelihood procedure to estimate the model parameters, which is assessed through a Monte Carlo simulation study. We also derive an additional estimation scheme by means of least square between percentiles. The usefulness of the proposed distribution to describe remission times of cancer patients is illustrated by means of an application to real data.

Key words: GBG generator; remission times; Extended Lindley model; quantile function; Lambert function

INTRODUCTION

The statistical literature is filled with hundreds of continuous univariate distributions, see Johnson et al. (1994). Recent procedures for building meaningful distributions (called generators) have been proposed. As important generators, the two-piece approach pioneered by Hansen (1994) and the beta family defined by Eugene et al. (2002) and Jones (2004) have received prominent positions.

Many papers have applied these techniques to provide more skewness in generalizations of well-known symmetric distributions. As an example, Aas and Haff (2006) presented an extension for the Student’s t-distribution.

Using the two-piece method with a view to finance applications, Zhu and Galbraith (2010) argued that, in addition to Student’s t parameters, three shape parameters are required: one parameter to control asymmetry in the center of a distribution and two parameters to control the left and right tail behavior.

This paper addresses similar issues to Zhu and Galbraith using a different approach. We consider the generalized beta generated (GBG) family of distributions pioneered by Alexander et al. (2012), which has three shape parameters.

The Lindley (L) distribution was firstly used by Lindley (1958) in order to measure the difference between Fiducial and posterior distributions related to Bayesian analysis. Its probability density function (pdf) (for z>0 ) with parameter λ>0 , say L (λ) , is given by

g(z;λ)=λ21+λ(1+z)e-λz, (1)

where λ>0 is a scale parameter. Its cumulative distribution function (cdf) is given by

G(z;λ)=1-e-λz(1+λz1+λ). (2)

Ghitany et al. (2008) discussed and studied various properties of the pdf (1). The L distribution has an important role in stress-strength reliability modeling and describes well some types of data sets, but it has lower flexibility in modeling asymmetric and/or heavy tail data. Further, it can accommodate hazard rate functions (hrfs) that are increasing, decreasing or constant but not unimodal, bathtub and other shapes, which are desirable in lifetime data analysis. To overcome this, several works proposed new distributions by adding parameters to the Lindley distribution. For example, Sankaran (2015) used such law as the mixing distribution of a Poisson parameter to generate a discrete model called the Poisson-Lindley distribution. Pararai et al. (2015) defined the Kumaraswamy Lindley-Poisson distribution and explored some of its properties. Another extension, named as the generalized Lindley distribution, was studied by Ashour and Eltehiwy (2015).

A profusion of new classes of distributions has recently proven useful to applied statisticians working in various areas of scientific investigation. Generalizing existing distributions by adding shape parameters leads to more flexible models. Let g(x;𝝉) and G(x;𝝉) be the pdf and cdf of a baseline distribution having parameter vector 𝝉 . Alexander et al. (2012) defined the pdf and cdf of the GBG-G distribution (for x𝒳 ) using three additional positive shape parameters a , b and c by

f𝒢𝒢(x;𝝉,a,b,c)=cB(a,b)-1g(x;𝝉)G(x;𝝉)ac-1[1-G(x;𝝉)c]b-1 (3)

and

FGBG(x;𝝉,a,b,c)
=
I(G(x;𝝉)c;a,b) (4)
=
B(a,b)-10G(x;𝝉)cωa-1(1-ω)b-1𝑑ω,

respectively, where I(x;a,b) denotes the incomplete beta function ratio and B(a,b) is the complete beta function.

In this paper, we propose a new lifetime model called the GBG-Lindley (GBGL) distribution. We also study some of its structural properties and present the maximum likelihood estimation of the parameters. A Monte Carlo study is performed in order to assess the proposed estimation procedure.

Further, we present evidence that the new model can (i) compensate the Lindley ability lack as well as (ii) produce better fits than the following distributions:

• The Lindley-exponential (LE) model (Bhati and Malik 2015), whose pdf and cdf are, respectively, given by

f(x;α,λ)
=α2λe-λx(1-e-λx)α-1[1-log(1-e-λx)]1+α and
F(x;α,λ)
=(1-e-λx)α[1+α-αlog(1-e-λx)]1+α;

• the generalized L (GL) model (Nadarajah et al. 2012) whose pdf and cdf are, respectively, given by

f𝒢(x;λ,c)=cλ21+λ(1+x)e-λx(1-1+λ+λx1+λ)c-1 (5)

and

F𝒢(x,λ,c)=(1-1+λ+λx1+λe-λ)c; (6)

• the transmuted Lindley (TL) model (Mansour and Mohamed 2015), whose pdf and cdf are, respectively, given by

f𝒯(x;λ,θ,δ,α)
=
θ2θ+1(1+x)e-θx×
{(1+λ)δ[1-θ+1+θxθ+1e-θx]δ-1
-
λα[1-θ+1+θxθ+1e-θx]α-1}

and

F𝒯(x;λ,θ,δ,α)
=(1+λ)[1-θ+1+θxθ+1e-θx]δ-λ[1-θ+1+θxθ+1e-θx]α.

This comparison is performed in terms of both items under change in stress and the efficiency in describing remission times (in months) of cancer patients.

This paper is organized as follows. In Section 2, we introduce the GBGL distribution and provide plots of its density function and hrf. We derive linear representations for the pdf and cdf (Section 3), explicit expressions for the quantile function (qf) (Section 4), ordinary and incomplete moments, mean deviations, Bonferroni and Lorenz curves (Section 5) and generating function (Section 6). A procedure for determining the maximum likelihood estimates (MLEs) of the model parameters is addressed in Section 7. Section 8 presents empirical results for the proposed model. Concluding remarks are offered in Section 9.

THE GBGL DISTRIBUTION

Applying (1) and (2) in equations (3) and (4), the pdf and cdf of the GBGL distribution (for x ) are, respectively, given by

fGBGL(x;λ,a,b,c)
=
cλ2(1+x)(1+λ)B(a,b)e-λx[1-e-λx(1+λx1+λ)]ac-1× (7)
{1-[1-e-λx(1+λx1+λ)]c}b-1

and

FGBGL(x;λ,a,b,c)
=
I([1-e-λx(1+λx1+λ)]c;a,b). (8)

For simplicity, we denote fGBGL(x;λ,a,b,c) and FGBGL(x;λ,a,b,c) by f(x) and F(x) . Hereafter, a random variable X having density (7) is denoted by X GBGL (λ,a,b,c) .

Clearly, the L distribution arises as the basic exemplar by taking a=b=c=1 in (7). As mentioned in the introduction, we motivate the paper by comparing the performance of the new distribution with those of the L, LE and GL models fitted to a real data set.

The qf is useful for determining various mathematical properties of a distribution. For a positive random variable XF , the qf of X is defined from the generalized inverse of its cdf for a fixed probability u , namely

QX(u)=inf{x+:uF(x)},u(0,1).

Then, the qf of the GBGL model can be determined by inverting (8) as

QGBGL(u)=QL([Qβ(a,b)(u)]1/c), (9)

where Qβ(a,b)(u)=I-1(u;a,b) is the beta qf and QL(u) is the qf of the L distribution with parameter λ .

Consider the Lambert W-function as the principal solution for w=W(z) in z=wew . We have the power series expansion for W(z)=ProductLog[z] using the software Mathematica

W(z)=i=1(-1)i+1ii-2zi(i-1)!.

Then, we obtain

W(z)=z-z2+3z32-8z43+125z524-54z65+16807z7720+O(z8).

The qf of X can be expressed in terms of the Lambert function as

QGBGL(u)
=QL(Qβ(a,b)(u)1/c)
=-1-1λ-1λW([1+λ]e1+λ[Qβ(a,b)(u)1/c-1]),

where the last identity holds based on a result given by Jodrá (2010).

In Figure 1(c), we present one case of generation at (λ,a,b,c)=(2,2,2,2) based on QGBGL(p) by evaluating the uniform distribution outcomes in its argument. In Figure1, we display possible shapes of the pdf and hrf of the GBGL model for some parameter values. The hrf can take the most four common forms for applications to real data: increasing, decreasing, bathtub and unimodal shapes, which is an important characteristic of the new lifetime model.

##### Figure 1

The GBGL pdf and hrf.

The skewness (B) and kurtosis (K) coefficients are two important tools to understand a distribution. Easy procedures to quantify B and K were proposed by Bowley (1920) and Moors (1984) given by, respectively: In particular, for our proposal,

B
=QGBGL(3/4)+QGBGL(1/4)-2QGBGL(1/4)QGBGL(3/4)-QGBGL(1/4)

and

K
=[QGBGL(7/8)-QGBGL(5/8)]+[QGBGL(3/8)-QGBGL(1/8)]QGBGL(6/8)-QGBGL(2/8).

Figures 2(a)-2(c) and 2(d)-2(f) display GBGL skewness and kurtosis measures for some parametric points, respectively. It is known that former quantity points out how symmetrical is the model, while the second measures whether the shape of under study model is related to that due to the Gaussian law. These plots indicate that one may define symmetrical and non-symmetrical laws from our model. It is easer to specify curves with long tail to the right. Densities curves distinct from the Gaussian one are obtained.

##### Figure 2

GBGL skewness and kurtosis curve plots for some parametric points.

LINEAR REPRESENTATIONS

In this section, we present linear representations for (7) and (8) in order to obtain explicit expressions for some type-moment quantities of the GBGL model. We prove that the expansions – in the form of Theorem 1 and Corollary 1 – can depend only on the GL distribution (Nadarajah et al. 2012).

Theorem 1.The cdf of XGBGL(λ,a,b,c) can be expressed by the linear combination

f(x)=l=0ζlgl(x),

where gl(x) denotes the GL density with scale and shape parameters λ and (a+l)c , respectively, and

ζl=(-1)l(a+l)B(a,b)(b-1l).

The proof of this theorem is given in Appendix A.

Corollary 1.The cdf of X is given by

F(x)=l=0ζlGl(x),

where Gl(x) denotes the GL cdf with parameters λ and (a+l)c .

The following results indicate that type-moment quantities of the GL model can be obtained from those corresponding quantities of the gamma distribution.

Theorem 2.The cdf of ZGL(λ,c) can be expressed as

G(z)=i=0k=0i+1wi,kHi,k(z), (10)

where Hi,k(z) denotes the gamma cdf with shape parameter (k+1) and scale parameter (i+1)λ , respectively,

wi,k=(-1)ivi,k(1+λ)i+2(1+i)k+1(c-1i) (11)

and

vi,k=j=δkiλj-k+1k!(ij)(j+1k).

The proof of this theorem is given in Appendix B.

Corollary 2.The pdf of ZGL(λ,c) is given by

f(z)=i=0k=0i+1wi,khi,k(z), (12)

where hi,k(z) denotes the gamma density with shape parameter (k+1) and scale parameter (i+1)λ .

Finally, the main result of this section provides a simple way for obtaining the properties of the new model by means of the classical gamma model.

Theorem 3.As consequences of Theorem 1 and Corollary 2, we can write the density of X as

f(x)=i=0k=0i+1τi,khi,k(x),

where

τi,k=l=0ζlwi,k(l),wi,k(l)=(-1)ivi,k(1+λ)i+2(1+i)k+1((a+l)c-1i),

and vi,k and hi,k(x) are defined in Theorem 2 and Corollary 2, respectively.

The proof of this theorem is given in Appendix C.

QUANTILE FUNCTION

For some models, it is possible to invert the cdf. However, for some other distributions, this inverse function of cannot be obtained in closed-form. We shall resort to power series methods for the GBLG model. They are at the heart of many solutions in applied mathematics and statistics. First, based on equation (2), we have the following theorem for the qf of the L model,

Theorem 4.The L qf can be expressed as a power series

QL(u)=n=0tnun,

where tn=k=n+1(-1)k-n(kn)πk . The quantity πk and the proof of this theorem are given in Appendix D.

In the following, we use an equation of Gradshteyn and Ryzhik (2000) for a power series raised to a positive integer j

(i=0aixi)j=i=0cj,ixi, (13)

where the coefficients cj,i (for i=1,2, ) are determined from the recurrence equation (for i1 )

cj,i=(ia0)-1m=1i[m(j+1)-i]amcj,i-m (14)

and cj,0=a0j . The coefficient cj,i follows from cj,0,,cj,i-1 and then from the quantities a0,,ai .

Corollary 3.The GBGL qf can be expanded as

Q𝐺𝐵𝐺𝐿(u)=j=0ejuj/a, (15)

where ej=i,r=0tisr(i/c)ηr,j , ηr,j=(jθ¯0)-1m=1j[m(r+1)-j]θ¯mηr,j-m and θ¯i is given in Appendix D.

MOMENTS

Henceforth, let Yi,kGamma(k+1,(i+1)λ) . Next, we obtain the ordinary and incomplete moments of X from the corresponding moments of Yi,k . Based on Theorem 3, we can write

μn=E(Xn)=i=0k=0i+1τi,kE(Yi,kn).

We have the following corollary from the moments of Yi,k .

Corollary 4.Suppose that μn=E(Xn) exists. Then,

μn=E(Xn)=i=0k=0i+1τi,k[(i+1)λ]n(k+1)[n], (16)

where k[n]=k(k+1)(k+n-1),nN .

Further, we can express μn in terms of QL(u) as

μn=i=0k=0i+1τi,k01QL(u)nua(i+c)-1𝑑u.

Thus, an alternative expansion for μn can be obtained from Theorem 4 in the following corollary.

Corollary 5.Suppose that μn=E(Xn) exists. Then,

μn=i,j=0k=0i+1τi,kfn,j[a(i+1)+j], (17)

where the quantities fn,j are determined from (13)-(14) as

en,j=(it0)-1m=1j[m(n+1)-j]tmen,j-m

for j1 , fn,0=t0n , tm=l=m+1(-1)l-m(lm)πl and the quantity πl is defined in Appendix D.

Next, we obtain the incomplete moments of X .

Corollary 6.Suppose that the n th incomplete moment of X , say Tn(y)=0yxnf(x)dx , exists. Then,

Tn(y)
=
0yxni=0k=0i+1wi,k(λ+λi)k![(k+1)x]λ+λi-1exp[-(λ+λi)x]𝑑x (18)
=
i=0ϵiyλ+λi+n[(λ+λi)y]-(λ+λi+n)
×
[Γ(λ+λi+n)-Γ(λ+λi+n,(λ+λi)y)],

where the quantity wi,k is defined in (11) and ϵi=k=0i+1wi,kλ(1+i)k!(k+1)λ(1+i)-1 .

Equations (16), (17) and (18) are the main results of this section.

The amount of scatter in a population is evidently measured to some extent by the totality of deviations from the mean and median given by δ1=𝔼(|X-μ1|) and δ2=𝔼(|X-M|) , respectively. They can be expressed in terms of the first incomplete moment by δ1=2μ1F(μ1)-2T1(μ1) and δ2=μ1-2T1(M) , respectively, where F(μ1) follows from (8) and T1() is the first incomplete moment given by (18) with n=1 .

Another important application of the first incomplete moment refers to the Bonferroni and Lorenz curves defined (for a given probability p ) by L(p)=T1(xp)/μ1 and B(p)=T(xp)/(pμ1) , respectively, where xp can be evaluated numerically by (9) with u=p . These curves are very useful in economics, demography, insurance, engineering and medicine.

Figure3 displays plots of the Bonferroni and Lorenz curves for selected parameter values.

##### Figure 3

Bonferroni and Lorenz curves.

The n th moment of the residual life, say vn(t)=E[(X-t)nX>t] (for n=1,2, ) uniquely determines F(x) . It is given by vn(t)=1R(t)t(x-t)nf(x)dx , which is easily obtained from (18). A special case is the mean residual life (MRL) function at age t given by v1(t)=E[(X-t)X>t] , which represents the expected additional life length for a unit which is alive at age t .

The n th moment of the reversed residual life given by Mn(t)=1F(t)0t(t-x)nf(x)dx , (for t>0 and n=1,2, ) uniquely determines F(x) and follows from vn(t) .

GENERATING FUNCTION

A first representation for the moment generating function (mgf) M(s) of X can be based on the L qf. We can write

M(s)=01exp[sQGL(u)]du.

Expanding the exponential function, and after some algebra using (15), we have the following corollary.

Corollary 7.The mgf of X can be expressed as

M(s)=cB(a,b+1)-1i=0(-1)i(bi)ρ(s,a[i+c]-1), (19)

where

ρ(s,a[i+c]-1)
=
01exp[sQL(u)]ua(i+c)-1𝑑u=j,k=0skdk,j[a(i+c)+j]k!,

dk,j=(jt0)-1m=1j[m(k+1)-j]tmdk,j-m (for j1 ), dk,0=t0k and the coefficients tj s are defined in Theorem 4.1.

A second representation for M(s) comes from the gamma generating function. We can write

M(s)=i=0k=0i+1wi,kMi,k(s),

where wi,k is defined by (11) and Mi,k(s) is the mgf of Yi,k given by

Mi,k(s)=1[1-λs(1+i)]k+1,s<λ-1. (20)

Equations (19) and (20) are the main results of this section.

ESTIMATION

Several approaches for parameter estimation were proposed in the statistical literature but the maximum likelihood method is the most commonly employed. The MLEs enjoy desirable properties for constructing confidence intervals. In this section, we investigate the estimation of the parameters of the GBGL distribution by maximum likelihood for complete data sets. Alternatively, we propose other estimation procedure that rely on squared distance between theoretical and empirical GBGL quantiles. Both estimation methods will be compared in the next section of numerical results.

MAXIMUM LIKELIHOOD ESTIMATION

Consider a random variable X GBGL (a,b,c,λ) and let 𝜽=(a,b,c,λ)T be the parameter vector. Thus, the associated log-likelihood function for one observation x is

(𝜽;x)
=
log(c)+2log(λ)+log(1+x)-log(1-λ)-nlog[B(a,b)]-λx (21)
+
(ac-1)log[1-e-λx(1+λx1+λ)]
+
(b-1)log{1-[1-e-λx(1+λx1+λ)]c}.

The MLE of 𝜽 is determined by maximizing ln(𝜽)=i=1n(𝜽;xi) for a given data set x1,,xn . Equation (21) can be maximized either directly by using the R (optim function), SAS (PROC NLMIXED), Ox program (sub-routine MaxBFGS) or by solving the nonlinear likelihood equations obtained by differentiating this equation.

Based on equation (21), the components of the unit score function

𝕌(𝜽)=(Ua,Ub,Uc,Uλ)=((𝜽;x)a,(𝜽;x)b,(𝜽;x)c,(𝜽;x)λ)

are given by

Ua=Ua(𝜽)
=
-ψ(a)+ψ(a+b)+clog{1-e-λx[1+λx1+λ]},
Ub=Ub(𝜽)
=
-ψ(b)+ψ(a+b)+log{1-{1-e-λx[1+λx1+λ]}c},
Uc=Uc(𝜽)
=
1c+alog[1-e-λx(1+λx1+λ)]
-
(b-1)[1-e-λx(1+λx1+λ)]clog[1-e-λx(1+λx1+λ)]1-[1-e-λx(1+λx1+λ)]c

and

Uλ=Uλ(𝜽)
=
2λ-11+λ-x
+
(ac-1){xe-λx(1+λx1+λ)-e-λx[x1+λ-λx(1+λ)2]}1-e-λx(1+λx1+λ)
-
c(b-1){xe-λx(1+λx1+λ)-e-λx[x1+λ-λx(1+λ)2]}1-[1-e-λx(1+λx1+λ)]c
×
[1-e-λx(1+λx1+λ)]c-1,

where ψ() is the digamma function.

Although these equations cannot be solved analytically, a numerical solution can be determined by using computing packages. Iterative techniques such as Newton-Raphson type algorithms can be adopted to obtain the MLEs.

For interval estimation and hypothesis tests on the model parameters, we require the observed information matrix. The 4×4 unit observed information matrix,

J=J(𝜽)(JaaJabJacJaλJbaJbbJbcJbλJcaJcbJccJcλJλaJλbJλcJλλ),

where Jrs=-2(𝜽;x)/θrθs , is given in Appendix E. Likelihood ratio tests can be performed for the new distribution in the usual way.

LEAST SQUARE ESTIMATION

An alternative estimation to the maximum likelihood method is the least square estimation discussed by Ashour and Eltehiwy (2015). For the GBGL model, the least square estimates (LSEs), a^ , b^ , c^ and λ^ of a,b,c and λ are defined as those arguments that minimize the objective function:

Q(a,b,c,λ)=i=1n{I([1-e-λx(i)(1+λx(i)1+λ)]c;a,b)-in+1}2,

where x(i) is a possible outcome of the i th order statistic based on a n-points random sample obtained from XGBGL(a,b,c,λ) .

The minimum point (a^,b^,c^,λ^) can also be given as a solution of the following system of non-linear equations:

Q(a,b,c,λ)a=Q(a,b,c,λ)b=Q(a,b,c,λ)c=Q(a,b,c,λ)λ=0,

where the i th components in the sums are

Qi(a,b,c,λ)a=
2{I([1-e-λx(i)(1+λx(i)1+λ)]c;a,b)-in+1}
×{-B(a)(a,b)B(a,b)I([1-e-λx(i)(1+λx(i)1+λ)]c;a,b)
+1B(a,b)0{1-e-λx(i)[1+λx(i)1+λ]}clog(w)ωa-1(1-ω)b-1dω},
Qi(a,b,c,λ)b=
2{I([1-e-λx(i)(1+λx(i)1+λ)]c;a,b)-in+1}
×{-B(b)(a,b)B(a,b)I([1-e-λx(i)(1+λx(i)1+λ)]c;a,b)
+1B(a,b)0{1-e-λx(i)[1+λx(i)1+λ]}clog(1-w)ωa-1(1-ω)b-1dω},
Qi(a,b,c,λ)c=
2{I([1-e-λx(i)(1+λx(i)1+λ)]c;a,b)-in+1}
×{1B(a,b)[1-e-λx(i)(1+λx(i)1+λ)]c(a-1)
{1-[1-e-λx(i)(1+λx(i)1+λ)]c}b-1
[1-e-λx(i)(1+λx(i)1+λ)]clog[1-e-λx(i)(1+λx(i)1+λ)]}

and

Qi(a,b,c,λ)λ=
2{I([1-e-λx(i)(1+λx(i)1+λ)]c;a,b)-in+1}
×{cB(a,b)[1-e-λx(i)(1+λx(i)1+λ)]ac-1
{1-[1-e-λx(i)(1+λx(i)1+λ)]c}b-1
{xe-λx[1+λx1+λ-1(1+λ)]-λe-λx(1+λ)2}}.

Here, B(a)(a,b)=B(a,b)/a=B(a,b)[ψ(a)-ψ(a+b)] , B(b)(a,b)=B(a,b)/a=B(a,b)[ψ(b)-ψ(a+b)] and (obtained by Mathematica)

0[1-e-λx(i)(1+λx(i)1+λ)]clog(w)wa-1(1-w)b-1dw=
-[1-e-λx(i)(1+λx(i)1+λ)]cΓ(a)2
×Hqp({a,a,1-b},{1+a,1+a},[1-e-λx(i)(1+λx(i)1+λ)]c)
+cI([1-e-λx(i)(1+λx(i)1+λ)]c;a,b)log[1-e-λx(i)(1+λx(i)1+λ)]

and Hqp({,,},{,,},) represents the hypergeometric function.

NUMERICAL RESULTS

SIMULATION STUDY

We perform a Monte Carlo simulation study (with 1,000 replications) to quantify some asymptotic properties of both MLEs and LSEs of GBGL parameters. We also measure both the effects of the MLEs and LSEs for the additional parameters, (a^,b^,c^) , over the corresponding estimators of the baseline parameter, λ^ , and reciprocally.

To that end, we consider λ{0.5,1,2,3,4} , a=b=c{2,5} and sample size n{50,100,150} . Additionally, as figures of merits, we consider the average estimates due to MLEs and LSEs and their mean square errors (MSEs). The simulation results are given in TableI andII.

As expected,the MSEs and biases for the two proposed procedures tend to decrease when the sample size increases. Additionally, increasing the additional parameters implies that the MLE and LSE of λ will have smaller MSEs and biases. Real scenarios having higher additional parameters will conduct to more biased MLEs. Moreover, for approximately 83% of cases, MLEs outperform LSEs in terms of MSEs.

TABLE I Simulation results for MLEs.

𝜽 n MLEs MSEs
a b c λ a b c λ
(2,2,2,0.5) 50 2.018 2.031 2.014 0.502 0.060 0.061 0.045 0.001
100 2.023 2.002 2.019 0.500 0.029 0.026 0.021 0.000
150 2.012 2.002 2.010 0.500 0.019 0.019 0.014 0.000
(5,5,5,0.5) 50 5.011 5.042 5.007 0.501 0.177 0.183 0.107 0.000
100 5.008 5.014 5.006 0.500 0.078 0.077 0.047 0.000
150 5.006 5.013 5.004 0.500 0.060 0.059 0.036 0.000
(2,2,2,1) 50 2.030 2.023 2.024 1.002 0.061 0.062 0.046 0.003
100 2.015 2.008 2.013 1.001 0.027 0.026 0.020 0.001
150 2.014 2.000 2.012 0.999 0.019 0.019 0.014 0.001
(5,5,5,1) 50 5.011 5.040 5.007 1.001 0.171 0.174 0.103 0.001
100 5.001 5.024 5.000 1.001 0.085 0.088 0.052 0.000
150 5.010 5.009 5.007 1.000 0.060 0.058 0.036 0.000
(2,2,2,2) 50 2.040 2.015 2.034 2.001 0.060 0.062 0.045 0.014
100 2.010 2.017 2.008 2.005 0.030 0.030 0.023 0.007
150 2.002 2.015 2.001 2.006 0.019 0.020 0.014 0.005
(5,5,5,2) 50 5.016 5.027 5.012 2.001 0.172 0.168 0.104 0.002
100 5.004 5.017 5.002 2.001 0.086 0.088 0.052 0.001
150 5.002 5.018 5.001 2.001 0.057 0.057 0.034 0.001
(2,2,2,3) 50 2.018 2.032 2.014 3.017 0.061 0.058 0.045 0.034
100 2.016 2.008 2.013 3.002 0.030 0.027 0.022 0.016
150 2.018 1.998 2.015 2.996 0.019 0.018 0.014 0.011
(5,5,5,3) 50 4.986 5.066 4.987 3.009 0.171 0.176 0.103 0.006
100 5.013 5.019 5.009 3.001 0.087 0.087 0.053 0.003
150 5.000 5.020 5.000 3.003 0.059 0.058 0.035 0.002
(2,2,2,4) 50 2.020 2.027 2.016 4.017 0.056 0.059 0.042 0.062
100 2.006 2.018 2.004 4.015 0.027 0.029 0.020 0.031
150 2.009 2.008 2.008 4.005 0.019 0.019 0.014 0.021
(5,5,5,4) 50 5.016 5.044 5.010 4.006 0.169 0.179 0.102 0.011
100 5.017 5.016 5.012 4.001 0.094 0.090 0.057 0.006
150 5.019 4.994 5.014 3.997 0.060 0.059 0.036 0.004

TABLE II Simulation results for LSEs.

θ n LSEs MSEs
a b c λ a b c λ
(2,2,2,0.5) 50 1.972 1.996 1.969 0.491 0.049 0.011 0.061 0.003
100 1.992 1.995 1.991 0.498 0.021 0.009 0.026 0.001
150 1.987 1.982 1.983 0.498 0.017 0.008 0.020 0.001
(5,5,5,0.5) 50 5.011 5.011 5.013 0.501 0.046 0.032 0.063 0.000
100 5.041 5.036 5.048 0.502 0.060 0.040 0.083 0.000
150 4.936 4.945 4.923 0.497 0.030 0.021 0.042 0.000
(2,2,2,1) 50 2.002 1.999 2.002 0.994 0.066 0.032 0.081 0.013
100 1.998 1.998 1.999 1.000 0.034 0.035 0.041 0.008
150 1.996 2.004 1.997 0.998 0.021 0.008 0.026 0.004
(5,5,5,1) 50 4.998 5.007 4.998 0.999 0.114 0.077 0.156 0.002
100 5.037 5.026 5.043 1.003 0.116 0.079 0.157 0.001
150 4.990 4.993 4.988 0.999 0.023 0.014 0.032 0.000
(2,2,2,2) 50 1.986 2.025 2.000 1.980 0.069 0.061 0.082 0.045
100 2.001 1.993 2.004 2.004 0.031 0.040 0.038 0.024
150 1.998 2.002 2.002 1.993 0.021 0.021 0.025 0.016
(5,5,5,2) 50 4.950 4.972 4.940 1.989 0.196 0.120 0.274 0.011
100 4.942 4.955 4.929 1.986 0.130 0.081 0.181 0.008
150 4.980 4.986 4.975 1.996 0.053 0.030 0.074 0.003
(2,2,2,3) 50 1.993 2.000 2.003 2.993 0.068 0.087 0.077 0.078
100 2.005 1.968 2.004 3.063 0.037 0.107 0.036 0.122
150 1.998 1.994 2.002 3.008 0.019 0.043 0.023 0.031
(5,5,5,3) 50 4.893 4.910 4.867 2.960 0.287 0.190 0.403 0.039
100 5.010 5.014 5.011 2.998 0.132 0.085 0.183 0.016
150 4.940 4.951 4.927 2.976 0.074 0.051 0.106 0.009
(2,2,2,4) 50 2.004 1.995 2.007 4.018 0.062 0.120 0.074 0.072
100 1.983 2.011 2.004 4.001 0.039 0.096 0.037 0.109
150 1.997 1.998 1.999 4.001 0.019 0.040 0.023 0.024
(5,5,5,4) 50 4.949 4.949 4.933 3.967 0.323 0.257 0.454 0.060
100 5.004 5.012 5.004 3.992 0.169 0.123 0.234 0.033
150 4.944 4.943 4.930 3.972 0.118 0.089 0.165 0.025

APPLICATIONS TO REAL DATA

In this section, we perform two applications to real data sets. Initially, we consider data obtained from accelerated life testing of 40 items with change in stress from 100 to 150 at an time instant (Murthy et al. 2004, p. 236, Dataset 12.2). In this first study, we aim to compare Lindley and GBGL models and, for such end, we use the likelihood ratio statistic to test the hypothesis H0:a=b=c=1H0:GBGLLindley . TableIII and Figure4 display associated main results. One can note that baseline and proposed models are statistically distinct for any nominal level higher than 4% . Fits with respect both empirical density and cumulative distribution function confirm that our model describe data better than the Lindley model.

TABLE III MLEs of fitted models to Stress data and likelihood ratio statistics.

Model MLEs (SEs)
a b c λ H0:GBGL×L(a=b=c=1)
GBG-L 0.096 (0.014) 0.065 (0.020) 47.284 (1.383) 10.859 (0.384) 8.311 ( α^=0.040 )
L 0.177 (0.007) - - -

Second, our aim is also to explain remission times (in months) of a random sample of 128 bladder cancer patients (Lee and Wang 2003). To that end, we consider the GBGL distribution, the Lindley baseline, and other three extended Lindley models, namely the LE, GL and TE distributions described in Section 1. Table IV lists the MLEs and their standard errors (SEs) for each fitted model. One can note that all estimates are statistically significant. The plots in Figure5 display the empirical pdf and cdf and the fitted versions for the three best models according to the subsequent discussion.

Both GBGL and LE models describe well the empirical density of the remission times, but only our proposed model fits well the empirical cdf.

### Histogram and fitted pdfs and cdfs for several models at second application.

TABLE IV MLEs of the fitted models to the current data.

Model MLEs (SEs)
L λ^=0.19(0.0123)
LE α^=1.55(0.1647) , λ^=0.11(0.0136)
GL α^=0.73(0.0917) , λ^=0.16(0.0165)
GBGL λ^=0.24(0.0335) , a^=0.03(0.0077) ,
b^=0.31(0.0807) , c^=34.95(6.8937)
TL λ^=-0.38(0.1494) , θ^=0.03(0.0028) ,
δ^=0.73(0.1347) , α^=0.73(0.1833)

In order to compare quantitatively the competitive models, we adopt two criteria: the Akaike Information Criterion (AIC) and Kolmogorov-Smirnov (KS) statistic. These statistics are widely used to determine how closely a specific cdf fits the associated empirical distribution for a given data set. The smaller these statistics are, the better the fit is.

TableV presents the values of these statistics for some models. The GBGL model provides the best fit to these data among the current models. Thus, our proposal can be a competitive distribution compared with other extended Lindley models: L, L exponential (Bhati and Malik 2015) and GL.

TABLE V Goodness-of-fit measures.

Model Dependent on the pdf Dependent on the cdf
AIC KS p-value
L 814.3574 0.1075 0.1163
LE 802.3400 0.0575 0.8105
GL 809.6681 0.1982 1.273×10-4
GBGL 801.9561 0.0282 0.9999
TL 813.6681 0.088731 0.2875

CONCLUSIONS

In this paper, we propose a new four-parameter distribution called the generalized beta-generated Lindley (GBGL) model. Some of its structural properties (such as the moments and generating function) have been derived from a linear representation for the GBGL density function. We propose a procedure for determining the maximum likelihood estimates (MLEs) of the model parameters. A simulation study is performed to validate the MLEs. We also have indicated an additional estimation process based on the least square method between percentiles. Finally, two applications to real data sets provide evidence that the proposed model can be better than the Lindley model and some of its extensions, namely the exponentiated Lindley and generalized Lindley distributions.

ACKNOWLEDGMENTS

The authors also acknowledge partial support from Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq), Brazil.

REFERENCES

1 AAS K AND HAFF IH. 2006. The generalized hyperbolic skew Student’s t-distribution. J Financial Econom 4: 275-309. [ Links ]

2 ALEXANDER C, CORDEIRO GM, ORTEGA EMM AND SARABIA JM. 2012. Generalized beta-generated distributions. Comput Stat Data Anal 56: 1880-1897. [ Links ]

3 ASHOUR SK AND ELTEHIWY MA. 2015. Exponentiated Power Lindley distribution. J Adv Res 6: 895-905. [ Links ]

4 BHATI D AND MALIK MA. 2015. On Lindley-exponential distribution: properties and application. Metron 73: 335-357. [ Links ]

5 BOWLEY AL. 1920. Elements of statistics. Scribner’s sons, New York. [ Links ]

6 CORDEIRO GM AND LEMONTE AJ. 2011. The β-birnbaum–Saunders distribution: An improved distribution for fatigue life modeling. Comput Stat Data Anal 55: 1445-1461. [ Links ]

7 EUGENE N, LEE C AND FAMOYE F. 2002. Beta-normal distribution and its applications. Commun Stat Theory Methods 31:497-512. [ Links ]

8 GHITANY M, ATIEH B AND NADARAJAH S. 2008. Lindley distribution and its applications. Math Comput Simul 78: 493-506. [ Links ]

9 GRADSHTEYN IS AND RYZHIK IM. 2000. Table of Integrals, Series, and Products. Academic Press, San Diego. [ Links ]

10 HANSEN BE. 1994. Autoregressive conditional density estimation. Int Econom Rev 35: 705-730. [ Links ]

11 JODRÁ P. 2010. Computer generation of random variables with Lindley or Poisson-Lindley distribution via the Lambert W function. Math Comput Simul 81: 851-859. [ Links ]

12 JOHNSON NL, KOTZ S AND BALAKRISHNAN N. 1994. Continuous Univariate Distributions I. Wiley, New York. [ Links ]

13 JONES MC. 2004. Families of distributions arising from distributions of order statistics. Test 13: 1-43. [ Links ]

14 LEE ET AND WANG JW. 2003. Statistical methods for survival data analysis. J Wiley e Sons, New Jersey. [ Links ]

15 LINDLEY D. 1958. Fiducial distributions and Bayes’ theorem. J R Stat Soc Series B Stat Methodol 20: 102-107. [ Links ]

16 MANSOUR M AND MOHAMED S. 2015. A New Generalized of Transmuted Lindley Distribution. Appl Math Sci 55: 2729-2748. [ Links ]

17 MOORS JJA. 1984. A Quantile Alternative for Kurtosis. J R Stat Soc Ser D 37: 25-32. [ Links ]

18 MURTHY DNP, XIE M AND JIANG R. 2004. Weilbul Models. J Wiley e Sons, New Jersey. [ Links ]

19 NADARAJAH S, BAKOUCH HS AND TAHMASBI R. 2012. A generalized Lindley distribution. Sankhya Ser B 73: 331-359. [ Links ]

20 PARARAI M, OLUYEDE BO AND WARAHENA-LIYANAGE G. 2015. Kumaraswamy Lindley-Poisson distribution: theory and applications. Asian J Math Appl 2015: 1-30. [ Links ]

21 SANKARAN M. 2015. The discrete Poisson-Lindley distribution. Biometrics 26: 145-149. [ Links ]

22 ZHU D AND GALBRAITH JW. 2010. A generalized asymmetric Student-t distribution with application to financial econometrics. J Econom 157: 297-305. [ Links ]

Appendix A

Proof of Theorem 1

In this section, we prove that both the GBGL density and cdf, say f(x) and F(x) , respectively, can be represented as linear combinations of GL densities and cdfs.

From equation (7), we have

f(x)=cλ2(1+x)(1+λ)B(a,b)e-λx[1-e-λx(1+λx1+λ)]ac-1A(x;λ,b,c),

where

A(x;λ,b,c)={1-[1-e-λx(1+λx1+λ)]c}b-1.

Using the power series, we obtain

A(x;λ,b,c)=n=0(-1)n(b-1n)[1-e-λx(1+λx1+λ)]nc.

Then, we can write

f(x;λ,a,b,c)
=
cλ2(1+x)(1+λ)B(a,b)e-λx[1-e-λx(1+λx1+λ)]ac-1×
×
n=0(-1)n(b-1n)[1-e-λx(1+λx1+λ)]nc
=
n=0ζngn(x),

where

ζn=(-1)l(a+n)B(a,b)(b-1n)

and gn(x) denotes the GL density with parameters λ and (a+n)c .

Thus, the corresponding cdf is given by

F(x;λ,a,b,c)=n=0ζnGn(x).

Appendix B: Proof of Theorem 2

Let

J(x;c,λ)=0x(1+s)e-λs[1-1+λ+λs1+λe-λs]c-1ds.

Using the power series expansion, we obtain

J(x;c,λ)
=
i=0(-1)i(c-1i)(1+λ)i0x(1+s)(1+λ+λs)iexp(-λis-λs)ds
=
i=0(-1)i(c-1i)(1+λ)ij=0i(ij)λj0x(1+s)j+1exp[-λs(1+i)]ds
=
i=0j=0ik=0j+1(-1)iλj(c-1i)(ij)(1+λ)i0xs(k+1)-1exp[-(1+i)λs]ds.

Further,

F(x)
=
cλ21+λJ(x;c,λ)=i=0j=0ik=0j+1(-1)iλj-k+1k!(1+λ)i+2(1+i)k+1
×
(c-1i)(ij)(j+1k)Hi,k(x),

where Hi,k(x) denotes the gamma cdf with shape parameter (k+1) and scale parameter (i+1)λ .

We can change j=0ik=0j+1 by k=0i+1j=δki , where δ0=0 for k=1,2 and δk=k-1 for k2 , which is very easy to prove by a cartesian plot of k versus j . Then, we have

F(x)=i=0k=0i+1j=δki(-1)iλj-k+1k!(1+λ)i+2(1+i)k+1(c-1i)Hi,k(x)

and rearranging terms, we obtain

F(x)=i=0k=0i+1(-1)ivi,k(1+λ)i+2(1+i)k+1(c-1i)Hi,k(x),

where

vi,k=j=δkiλj-k+1k!(ij)(j+1k).

Setting

wi,k=(-1)ivi,k(1+λ)i+2(1+i)k+1(c-1i),

the new cdf follows as a double linear combination of gamma cdfs

F(x)=i=0k=0i+1wi,kHi,k(x).

By differentiating the last equation, we obtain

f(x)=i=0k=0i+1wi,khi,k(x),

where hi,k(x) denotes the gamma density with parameters (k+1) and (i+1)λ .

Appendix C: Proof of Theorem 3

We can write from equations (10) and (12)

f(x)=l,i=0k=0i+1ζlwi,k(l)hi,k(x)i=0k=0i+1τi,khi,k(x),

where

τi,k=l=0ζlwi,k(l),wi,k(l)=(-1)ivi,k(1+λ)i+2(1+i)k+1((a+l)c-1i)

and vi,k is defined in Theorem 1.

Appendix D: Quantile function

We derive a power series for QGL(u) following the steps. First, we use a power series for Q-1(a,1-u) . Second, we obtain a power series for the argument 1-exp[-Q-1(a,1-u)] . Third, we derive a power series for the L qf using the Lagrange theorem in order to obtain a power series for QGL(u) .

We introduce the following quantities defined by Cordeiro and Lemonte (2011). Let Q-1(a,z) be the inverse function of

Q(a,z)=1-γ(a,z)Γ(a)=Γ(a,z)Γ(a)=u.

A power series for Q-1(a,1-u) is given in the Wolfram website 1 1http://functions.wolfram.com/GammaBetaErf/InverseGammaRegularized/06/01/03/ as

Q-1(a,1-u)
=
w+w2a+1+(3a+5)w32(a+1)2(a+2)+[a(8a+33)+31]w43(a+1)3(a+2)(a+3)
+
{a(a[a(125a+1179)+3971]+5661)+2888}w524(a+1)4(a+2)2(a+3)(a+4)+𝑂(w6),

where w=[uΓ(a+1)]1/a . We can write the last equation as

z=Q-1(a,1-u)=r=0δrur/a,

where δi=b¯iΓ(a+1)i/a . Here, b¯0=0 , b¯1=1 and any coefficient b¯i+1 (for i1 ) can be obtained from the cubic recurrence equation

b¯i+1=
1i(a+i){r=1is=1i-s+1b¯rb¯sb¯i-r-s+2s(i-r-s+2)
×r=2ib¯rb¯i-r+2r[r-a-(1-a)(i+2-r)]}.

We have b¯2=1/(a+1) , b¯3=(3a+5)/[2(a+1)2(a+2)] , etc. Next, we present some algebraic details for the GL qf, say QGL(u) . The cdf of X is given by (8). By inverting F(x)=u , we obtain (9). We can determine the L qf using the Lagrange theorem. We consider that the power series expansion holds

x=G(u)=x0+k=1fk(u-u0)k,f1=G(u)0,

where G(u) is analytic at a point u0 that gives a simple x0 -point.

Then, the inverse function G-1(x) exists and is single-valued in the neighborhood of the point x=x0 . The inverse power series x=QL(u) is given by

x=QL(u)=u0+k=1πk(u-u0),

where

πk=1k!dk-1dxk-1{[ψ(x)]k}|x=x0 and ψ(x)
=
x-x0G(x)-u0.

Then, we can write the GL qf as follows

G(x)=1-(1+λx1+λ)e-λx=u0+xi=0fixi,

where u0=1 and fi=(-λ)i+1[1(i+1)!-1(1-λ)i!]fori0 .

Further, we have

ψ(x)
=
x-x0G(x)-u0=1i=0fixi (22)
=
1λ(-1+11+λ)i=0ϱixi=(1+λλ2)i=0ϱ¯ixi,

where ϱ¯0=-1 , ϱ¯i=-ϱi , ϱ0=1 and ϱi=1f0j=1fjϱi-j .

Thus, we obtain from equation (22)

dk-1dxk-1{[ψ(x)]k}|x=x0=νk,k-1(1+λ)k(k-1)!λ2k, (23)

where νk,i=(kϱ¯0)-1m=1k[m(i+1)-k]ϱ¯mνk,i-m and νk,0=ϱ¯0i=1 .

From equations (22) and (23), the quantity πk is given by

πk=1k!dk-1dxk-1{[ψ(x)]k}|x=x0=νk,k-1(1+λ)kkλ2k.

Hence, the Lindley qf reduces to

QL(u)=k=1νk,k-1(1+λ)kkλ2k(u-1)n.

An alternative expression for QL(x) is given by

QL(u)=n=0tnun,

where tn=k=n+1(-1)k-n(kn)πk .

Thus, we can obtain

QGBGL(u)
=k=0tk[Qβ(a,b)(u)]i/c,

where tn=k=n+1(-1)k-n(kn)πk , πk=νk,k-1(1+λ)kkλ2k , νk,i=(kϱ¯0)-1m=1k[m(i+1)-k]ϱ¯mνk,i-m and νk,0=ϱ¯0i=1 , ϱ¯0=-1 , ϱ¯i=-ϱi , ϱ0=1 and ϱi=1f0j=1fjϱi-j and fi=(-λ)i+1[1(i+1)!-1(1-λ)i!] .

The beta qf reduces to

Qβ(a,b)(u)=j=0θ¯juj/a,

where the transformed variable is v=[aβ(a,b)u]1/a , θ¯j=θj[aβ(a,b)u]1/a ,

θj={0,if j=01,if j=1γj if j2

and

γj=
1[j2+(a-2)j+1-a]{(1-δj,2)r=2i-1γrγj+1-r[r(1-a)(j-r)
-r(r-1)]+r=1j-1s=1j-rγrγsγj+i-r-s[r(r-a)+s(a+b-2)(j+1-r-s)]},

where δj,2=1 if i=2 and δj,2=0 if i2 . The first quantities are γ2=b-1a-1 , γ3=(b-1)(3ab+5b-4)2(a+1)2(a+2) , γ4=(b-1)[a+(6b-1)a+(b+2)(8b-5)a+(33b2-30b+4)a+b(31b-47)+18]/[3(a+1)3(a+2)(a+3)],

For z(0,1) and any real non-integer α , we have

zα=r=0sr(α)zr,

where

sr(α)=l=r(-1)r+l(αl)(lr).

Finally, using (13), we obtain

QGBGL(u)=j=0ejuj/a,

where ej=i,r=0tisr(i/c)ηr,j , ηr,j=(jθ¯0)-1m=1j[m(r+1)-j]θ¯mηr,j-m and θ¯i is given before.

Appendix E: Information Matrix

The elements of the unit observed information matrix J(𝜽) for the parameters (a,b,c,λ) are given by:

Jaa=
-[ψ(a)-ψ(a+b)],Jab=ψ(a+b),
Jac=
log[1-e-λx(1+λx1+λ)],
Jaλ
={xe-λx(1+λx1+λ)-e-λx[x1+λ-λx(1+λ)2]}1-e-λx(1+λx1+λ),
Jbb
=-[ψ(b)-ψ(a+b)],
Jbc
=[1-e-λx(1+λx1+λ)]clog[1-e-λx(1+λx1+λ)]1-[1-e-λx(1+λx1+λ)]c,
Jbλ
=
-c[xe-λx(1+λx1+λ)-e-λx(x1+λ-λx(1+λ)2)]1-[1-e-λx(1+λx1+λ)]c
×
[1-e-λx(1+λx1+λ)]c-1,
Jcc
=
-(b-1)[1-e-λx(λxλ+1+1)]c[1-e-λx(λxλ+1+1)]c{1-[1-e-λx(λxλ+1+1)]c}2
×
log2[1-e-λx(λxλ+1+1)]
-
(b-1)[1-e-λx(λxλ+1+1)]clog[1-e-λx(λxλ+1+1)]1-[1-e-λx(λxλ+1+1)]c
×
log[1-e-λx(λxλ+1+1)]-1c2,
Jcλ
=
-ae-λx(λxλ+1+1)[-λx(xλ+1-λx(λ+1)2)-x(λxλ+1+1)]1-e-λx(λxλ+1+1)
+
(b-1)e-λx(λxλ+1+1)[-λx(xλ+1-λx(λ+1)2)-x(λxλ+1+1)][1-e-λx(λxλ+1+1)]{1-[1-e-λx(λxλ+1+1)]c}
×
[1-e-λx(λxλ+1+1)]c
+
c(b-1)e-λx(λxλ+1+1)[-λx(xλ+1-λx(λ+1)2)-x(λxλ+1+1)]{1-[1-e-λx(λxλ+1+1)]c}2
×
[1-e-λx(λxλ+1+1)]c[1-e-λx(λxλ+1+1)]c-1log[1-e-λx(λxλ+1+1)]
-
(b-1)c[xe-λx(λxλ+1+1)-e-λx(xλ+1-λx(λ+1)2)]1-[1-e-λx(λxλ+1+1)]c
×
[1-e-λx(λxλ+1+1)]c-1log[1-e-λx(λxλ+1+1)].

Received: July 25, 2016; Accepted: April 3, 2017

Correspondence to: Abraão David Costa do Nascimento E-mail: abraao.susej@gmail.com

All the contents of this journal, except where otherwise noted, is licensed under a Creative Commons Attribution License.