Abstract
New generators are required to define wider distributions for modeling real data in survival analysis. To that end we introduce the four-parameter generalized beta-generated Lindley distribution. It has explicit expressions for the ordinary and incomplete moments, mean deviations, generating and quantile functions. We propose a maximum likelihood procedure to estimate the model parameters, which is assessed through a Monte Carlo simulation study. We also derive an additional estimation scheme by means of least square between percentiles. The usefulness of the proposed distribution to describe remission times of cancer patients is illustrated by means of an application to real data.
GBG generator; remission times; Extended Lindley model; quantile function; Lambert function
INTRODUCTION
The statistical literature is filled with hundreds of continuous univariate distributions, see Johnson et al. (1994). Recent procedures for building meaningful distributions (called generators) have been proposed. As important generators, the two-piece approach pioneered by Hansen (1994) and the beta family defined by Eugene et al. (2002) and Jones (2004) have received prominent positions.
Many papers have applied these techniques to provide more skewness in generalizations of well-known symmetric distributions. As an example, Aas and Haff (2006) presented an extension for the Student’s t-distribution.
Using the two-piece method with a view to finance applications, Zhu and Galbraith (2010) argued that, in addition to Student’s t parameters, three shape parameters are required: one parameter to control asymmetry in the center of a distribution and two parameters to control the left and right tail behavior.
This paper addresses similar issues to Zhu and Galbraith using a different approach. We consider the generalized beta generated (GBG) family of distributions pioneered by Alexander et al. (2012), which has three shape parameters.
The Lindley (L) distribution was firstly used by Lindley (1958) in order to measure the difference between Fiducial and posterior distributions related to Bayesian analysis. Its probability density function (pdf) (for ) with parameter , say L, is given by
where is a scale parameter. Its cumulative distribution function (cdf) is given by
Ghitany et al. (2008) discussed and studied various properties of the pdf (1). The L distribution has an important role in stress-strength reliability modeling and describes well some types of data sets, but it has lower flexibility in modeling asymmetric and/or heavy tail data. Further, it can accommodate hazard rate functions (hrfs) that are increasing, decreasing or constant but not unimodal, bathtub and other shapes, which are desirable in lifetime data analysis. To overcome this, several works proposed new distributions by adding parameters to the Lindley distribution. For example, Sankaran (2015) used such law as the mixing distribution of a Poisson parameter to generate a discrete model called the Poisson-Lindley distribution. Pararai et al. (2015) defined the Kumaraswamy Lindley-Poisson distribution and explored some of its properties. Another extension, named as the generalized Lindley distribution, was studied by Ashour and Eltehiwy (2015).
A profusion of new classes of distributions has recently proven useful to applied statisticians working in various areas of scientific investigation. Generalizing existing distributions by adding shape parameters leads to more flexible models. Let and be the pdf and cdf of a baseline distribution having parameter vector . Alexander et al. (2012) defined the pdf and cdf of the GBG-G distribution (for ) using three additional positive shape parameters , and by
and
respectively, where denotes the incomplete beta function ratio and is the complete beta function.
In this paper, we propose a new lifetime model called the GBG-Lindley (GBGL) distribution. We also study some of its structural properties and present the maximum likelihood estimation of the parameters. A Monte Carlo study is performed in order to assess the proposed estimation procedure.
Further, we present evidence that the new model can (i) compensate the Lindley ability lack as well as (ii) produce better fits than the following distributions:
-
The Lindley-exponential (LE) model (Bhati and Malik 2015), whose pdf and cdf are, respectively, given by
-
the generalized L (GL) model (Nadarajah et al. 2012) whose pdf and cdf are, respectively, given by
and
-
the transmuted Lindley (TL) model (Mansour and Mohamed 2015), whose pdf and cdf are, respectively, given by
and
This comparison is performed in terms of both items under change in stress and the efficiency in describing remission times (in months) of cancer patients.
This paper is organized as follows. In Section 2, we introduce the GBGL distribution and provide plots of its density function and hrf. We derive linear representations for the pdf and cdf (Section 3), explicit expressions for the quantile function (qf) (Section 4), ordinary and incomplete moments, mean deviations, Bonferroni and Lorenz curves (Section 5) and generating function (Section 6). A procedure for determining the maximum likelihood estimates (MLEs) of the model parameters is addressed in Section 7. Section 8 presents empirical results for the proposed model. Concluding remarks are offered in Section 9.
THE GBGL DISTRIBUTION
Applying (1) and (2) in equations (3) and (4), the pdf and cdf of the GBGL distribution (for ) are, respectively, given by
and
For simplicity, we denote and by and . Hereafter, a random variable having density (7) is denoted by GBGL.
Clearly, the L distribution arises as the basic exemplar by taking in (7). As mentioned in the introduction, we motivate the paper by comparing the performance of the new distribution with those of the L, LE and GL models fitted to a real data set.
The qf is useful for determining various mathematical properties of a distribution. For a positive random variable , the qf of is defined from the generalized inverse of its cdf for a fixed probability , namely
Then, the qf of the GBGL model can be determined by inverting (8) as
where is the beta qf and is the qf of the L distribution with parameter .
Consider the Lambert W-function as the principal solution for in . We have the power series expansion for using the software Mathematica
Then, we obtain
The qf of can be expressed in terms of the Lambert function as
where the last identity holds based on a result given by Jodrá (2010).
In Figure 1(c), we present one case of generation at based on by evaluating the uniform distribution outcomes in its argument. In Figure1, we display possible shapes of the pdf and hrf of the GBGL model for some parameter values. The hrf can take the most four common forms for applications to real data: increasing, decreasing, bathtub and unimodal shapes, which is an important characteristic of the new lifetime model.
The skewness (B) and kurtosis (K) coefficients are two important tools to understand a distribution. Easy procedures to quantify and were proposed by Bowley (1920) and Moors (1984) given by, respectively: In particular, for our proposal,
and
Figures 2(a)-2(c) and 2(d)-2(f) display GBGL skewness and kurtosis measures for some parametric points, respectively. It is known that former quantity points out how symmetrical is the model, while the second measures whether the shape of under study model is related to that due to the Gaussian law. These plots indicate that one may define symmetrical and non-symmetrical laws from our model. It is easer to specify curves with long tail to the right. Densities curves distinct from the Gaussian one are obtained.
GBGL skewness
(b)
GBGL skewness
(c)
GBGL skewness
(d)
GBGL kurtosis
(e)
GBGL kurtosis
(f)
GBGL kurtosis
LINEAR REPRESENTATIONS
In this section, we present linear representations for (7) and (8) in order to obtain explicit expressions for some type-moment quantities of the GBGL model. We prove that the expansions – in the form of Theorem 1 and Corollary 1 – can depend only on the GL distribution (Nadarajah et al. 2012).
Theorem 1. The cdf of can be expressed by the linear combination
where denotes the GL density with scale and shape parameters and , respectively, and
The proof of this theorem is given in Appendix A.
Corollary 1. The cdf of is given by
where denotes the GL cdf with parameters and .
The following results indicate that type-moment quantities of the GL model can be obtained from those corresponding quantities of the gamma distribution.
Theorem 2. The cdf of can be expressed as
where denotes the gamma cdf with shape parameter and scale parameter , respectively,
and
The proof of this theorem is given in Appendix B.
Corollary 2. The pdf of is given by
where denotes the gamma density with shape parameter and scale parameter .
Finally, the main result of this section provides a simple way for obtaining the properties of the new model by means of the classical gamma model.
Theorem 3. As consequences of Theorem 1 and Corollary 2, we can write the density of as
where
and and are defined in Theorem 2 and Corollary 2, respectively.
The proof of this theorem is given in Appendix C.
QUANTILE FUNCTION
For some models, it is possible to invert the cdf. However, for some other distributions, this inverse function of cannot be obtained in closed-form. We shall resort to power series methods for the GBLG model. They are at the heart of many solutions in applied mathematics and statistics. First, based on equation (2), we have the following theorem for the qf of the L model,
Theorem 4. The L qf can be expressed as a power series
where . The quantity and the proof of this theorem are given in Appendix D.
In the following, we use an equation of Gradshteyn and Ryzhik (2000) for a power series raised to a positive integer
where the coefficients (for ) are determined from the recurrence equation (for )
and . The coefficient follows from and then from the quantities .
Corollary 3. The GBGL qf can be expanded as
where , and is given in Appendix D.
MOMENTS
Henceforth, let . Next, we obtain the ordinary and incomplete moments of from the corresponding moments of . Based on Theorem 3, we can write
We have the following corollary from the moments of .
Corollary 4. Suppose that exists. Then,
where .
Further, we can express in terms of as
Thus, an alternative expansion for can be obtained from Theorem 4 in the following corollary.
Corollary 5. Suppose that exists. Then,
where the quantities are determined from (13)-(14) as
for , , and the quantity is defined in Appendix D.
Next, we obtain the incomplete moments of .
Corollary 6. Suppose that the th incomplete moment of , say , exists. Then,
where the quantity is defined in (11) and .
Equations (16), (17) and (18) are the main results of this section.
The amount of scatter in a population is evidently measured to some extent by the totality of deviations from the mean and median given by and , respectively. They can be expressed in terms of the first incomplete moment by and , respectively, where follows from (8) and is the first incomplete moment given by (18) with .
Another important application of the first incomplete moment refers to the Bonferroni and Lorenz curves defined (for a given probability ) by and , respectively, where can be evaluated numerically by (9) with . These curves are very useful in economics, demography, insurance, engineering and medicine.
Figure3 displays plots of the Bonferroni and Lorenz curves for selected parameter values.
The th moment of the residual life, say (for ) uniquely determines . It is given by , which is easily obtained from (18). A special case is the mean residual life (MRL) function at age given by , which represents the expected additional life length for a unit which is alive at age .
The th moment of the reversed residual life given by , (for and ) uniquely determines and follows from .
GENERATING FUNCTION
A first representation for the moment generating function (mgf) of can be based on the L qf. We can write
Expanding the exponential function, and after some algebra using (15), we have the following corollary.
Corollary 7. The mgf of can be expressed as
where
(for ), and the coefficients s are defined in Theorem 4.1.
A second representation for comes from the gamma generating function. We can write
where is defined by (11) and is the mgf of given by
Equations (19) and (20) are the main results of this section.
ESTIMATION
Several approaches for parameter estimation were proposed in the statistical literature but the maximum likelihood method is the most commonly employed. The MLEs enjoy desirable properties for constructing confidence intervals. In this section, we investigate the estimation of the parameters of the GBGL distribution by maximum likelihood for complete data sets. Alternatively, we propose other estimation procedure that rely on squared distance between theoretical and empirical GBGL quantiles. Both estimation methods will be compared in the next section of numerical results.
MAXIMUM LIKELIHOOD ESTIMATION
Consider a random variable GBGL and let be the parameter vector. Thus, the associated log-likelihood function for one observation is
The MLE of is determined by maximizing for a given data set . Equation (21) can be maximized either directly by using the R (optim function), SAS (PROC NLMIXED), Ox program (sub-routine MaxBFGS) or by solving the nonlinear likelihood equations obtained by differentiating this equation.
Based on equation (21), the components of the unit score function
are given by
and
where is the digamma function.
Although these equations cannot be solved analytically, a numerical solution can be determined by using computing packages. Iterative techniques such as Newton-Raphson type algorithms can be adopted to obtain the MLEs.
For interval estimation and hypothesis tests on the model parameters, we require the observed information matrix. The unit observed information matrix,
where , is given in Appendix E. Likelihood ratio tests can be performed for the new distribution in the usual way.
LEAST SQUARE ESTIMATION
An alternative estimation to the maximum likelihood method is the least square estimation discussed by Ashour and Eltehiwy (2015). For the GBGL model, the least square estimates (LSEs), , , and of and are defined as those arguments that minimize the objective function:
where is a possible outcome of the th order statistic based on a n-points random sample obtained from .
The minimum point can also be given as a solution of the following system of non-linear equations:
where the th components in the sums are
and
Here, , and (obtained by Mathematica)
and represents the hypergeometric function.
NUMERICAL RESULTS
SIMULATION STUDY
We perform a Monte Carlo simulation study (with 1,000 replications) to quantify some asymptotic properties of both MLEs and LSEs of GBGL parameters. We also measure both the effects of the MLEs and LSEs for the additional parameters, , over the corresponding estimators of the baseline parameter, , and reciprocally.
To that end, we consider , and sample size . Additionally, as figures of merits, we consider the average estimates due to MLEs and LSEs and their mean square errors (MSEs). The simulation results are given in TableI andII.
As expected,the MSEs and biases for the two proposed procedures tend to decrease when the sample size increases. Additionally, increasing the additional parameters implies that the MLE and LSE of will have smaller MSEs and biases. Real scenarios having higher additional parameters will conduct to more biased MLEs. Moreover, for approximately 83% of cases, MLEs outperform LSEs in terms of MSEs.
APPLICATIONS TO REAL DATA
In this section, we perform two applications to real data sets. Initially, we consider data obtained from accelerated life testing of 40 items with change in stress from 100 to 150 at an time instant (Murthy et al. 2004, p. 236, Dataset 12.2). In this first study, we aim to compare Lindley and GBGL models and, for such end, we use the likelihood ratio statistic to test the hypothesis . TableIII and Figure4 display associated main results. One can note that baseline and proposed models are statistically distinct for any nominal level higher than . Fits with respect both empirical density and cumulative distribution function confirm that our model describe data better than the Lindley model.
Second, our aim is also to explain remission times (in months) of a random sample of 128 bladder cancer patients (Lee and Wang 2003). To that end, we consider the GBGL distribution, the Lindley baseline, and other three extended Lindley models, namely the LE, GL and TE distributions described in Section 1. Table IV lists the MLEs and their standard errors (SEs) for each fitted model. One can note that all estimates are statistically significant. The plots in Figure5 display the empirical pdf and cdf and the fitted versions for the three best models according to the subsequent discussion.
Both GBGL and LE models describe well the empirical density of the remission times, but only our proposed model fits well the empirical cdf.
In order to compare quantitatively the competitive models, we adopt two criteria: the Akaike Information Criterion (AIC) and Kolmogorov-Smirnov (KS) statistic. These statistics are widely used to determine how closely a specific cdf fits the associated empirical distribution for a given data set. The smaller these statistics are, the better the fit is.
TableV presents the values of these statistics for some models. The GBGL model provides the best fit to these data among the current models. Thus, our proposal can be a competitive distribution compared with other extended Lindley models: L, L exponential (Bhati and Malik 2015) and GL.
CONCLUSIONS
In this paper, we propose a new four-parameter distribution called the generalized beta-generated Lindley (GBGL) model. Some of its structural properties (such as the moments and generating function) have been derived from a linear representation for the GBGL density function. We propose a procedure for determining the maximum likelihood estimates (MLEs) of the model parameters. A simulation study is performed to validate the MLEs. We also have indicated an additional estimation process based on the least square method between percentiles. Finally, two applications to real data sets provide evidence that the proposed model can be better than the Lindley model and some of its extensions, namely the exponentiated Lindley and generalized Lindley distributions.
ACKNOWLEDGMENTS
The authors also acknowledge partial support from Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq), Brazil.
REFERENCES
-
1AAS K AND HAFF IH. 2006. The generalized hyperbolic skew Student’s t-distribution. J Financial Econom 4: 275-309.
-
2ALEXANDER C, CORDEIRO GM, ORTEGA EMM AND SARABIA JM. 2012. Generalized beta-generated distributions. Comput Stat Data Anal 56: 1880-1897.
-
3ASHOUR SK AND ELTEHIWY MA. 2015. Exponentiated Power Lindley distribution. J Adv Res 6: 895-905.
-
4BHATI D AND MALIK MA. 2015. On Lindley-exponential distribution: properties and application. Metron 73: 335-357.
-
5BOWLEY AL. 1920. Elements of statistics. Scribner’s sons, New York.
-
6CORDEIRO GM AND LEMONTE AJ. 2011. The β-birnbaum–Saunders distribution: An improved distribution for fatigue life modeling. Comput Stat Data Anal 55: 1445-1461.
-
7EUGENE N, LEE C AND FAMOYE F. 2002. Beta-normal distribution and its applications. Commun Stat Theory Methods 31:497-512.
-
8GHITANY M, ATIEH B AND NADARAJAH S. 2008. Lindley distribution and its applications. Math Comput Simul 78: 493-506.
-
9GRADSHTEYN IS AND RYZHIK IM. 2000. Table of Integrals, Series, and Products. Academic Press, San Diego.
-
10HANSEN BE. 1994. Autoregressive conditional density estimation. Int Econom Rev 35: 705-730.
-
11JODRÁ P. 2010. Computer generation of random variables with Lindley or Poisson-Lindley distribution via the Lambert W function. Math Comput Simul 81: 851-859.
-
12JOHNSON NL, KOTZ S AND BALAKRISHNAN N. 1994. Continuous Univariate Distributions I. Wiley, New York.
-
13JONES MC. 2004. Families of distributions arising from distributions of order statistics. Test 13: 1-43.
-
14LEE ET AND WANG JW. 2003. Statistical methods for survival data analysis. J Wiley e Sons, New Jersey.
-
15LINDLEY D. 1958. Fiducial distributions and Bayes’ theorem. J R Stat Soc Series B Stat Methodol 20: 102-107.
-
16MANSOUR M AND MOHAMED S. 2015. A New Generalized of Transmuted Lindley Distribution. Appl Math Sci 55: 2729-2748.
-
17MOORS JJA. 1984. A Quantile Alternative for Kurtosis. J R Stat Soc Ser D 37: 25-32.
-
18MURTHY DNP, XIE M AND JIANG R. 2004. Weilbul Models. J Wiley e Sons, New Jersey.
-
19NADARAJAH S, BAKOUCH HS AND TAHMASBI R. 2012. A generalized Lindley distribution. Sankhya Ser B 73: 331-359.
-
20PARARAI M, OLUYEDE BO AND WARAHENA-LIYANAGE G. 2015. Kumaraswamy Lindley-Poisson distribution: theory and applications. Asian J Math Appl 2015: 1-30.
-
21SANKARAN M. 2015. The discrete Poisson-Lindley distribution. Biometrics 26: 145-149.
-
22ZHU D AND GALBRAITH JW. 2010. A generalized asymmetric Student-t distribution with application to financial econometrics. J Econom 157: 297-305.
Appendix A Proof of Theorem 1
In this section, we prove that both the GBGL density and cdf, say and , respectively, can be represented as linear combinations of GL densities and cdfs.
From equation (7), we have
where
Using the power series, we obtain
Then, we can write
where
and denotes the GL density with parameters and .
Thus, the corresponding cdf is given by
Appendix B: Proof of Theorem 2
Let
Using the power series expansion, we obtain
Further,
where denotes the gamma cdf with shape parameter and scale parameter .
We can change by , where for and for , which is very easy to prove by a cartesian plot of versus . Then, we have
and rearranging terms, we obtain
where
Setting
the new cdf follows as a double linear combination of gamma cdfs
By differentiating the last equation, we obtain
where denotes the gamma density with parameters and .
Appendix C: Proof of Theorem 3
We can write from equations (10) and (12)
where
and is defined in Theorem 1.
Appendix D: Quantile function
We derive a power series for following the steps. First, we use a power series for . Second, we obtain a power series for the argument . Third, we derive a power series for the L qf using the Lagrange theorem in order to obtain a power series for .
We introduce the following quantities defined by Cordeiro and Lemonte (2011). Let be the inverse function of
A power series for is given in the Wolfram website 1 1http://functions.wolfram.com/GammaBetaErf/InverseGammaRegularized/06/01/03/ as
where . We can write the last equation as
where . Here, , and any coefficient (for ) can be obtained from the cubic recurrence equation
We have , , etc. Next, we present some algebraic details for the GL qf, say . The cdf of is given by (8). By inverting , we obtain (9). We can determine the L qf using the Lagrange theorem. We consider that the power series expansion holds
where is analytic at a point that gives a simple -point.
Then, the inverse function exists and is single-valued in the neighborhood of the point . The inverse power series is given by
where
Then, we can write the GL qf as follows
where and .
Further, we have
where , , and .
Thus, we obtain from equation (22)
where and .
From equations (22) and (23), the quantity is given by
Hence, the Lindley qf reduces to
An alternative expression for is given by
where .
Thus, we can obtain
where , , and , , , and and .
The beta qf reduces to
where the transformed variable is , ,
and
where if and if . The first quantities are , ,
For and any real non-integer , we have
where
Finally, using (13), we obtain
where , and is given before.
Appendix E: Information Matrix
The elements of the unit observed information matrix for the parameters are given by:
Publication Dates
-
Publication in this collection
Jul-Sep 2017
History
-
Received
25 July 2016 -
Accepted
3 Apr 2017