The Nadarajah-Haghighi Lindley distribution

Abstract: We define a new lifetime model based on compounding the Lindley and Nadarajah-Haghighi distributions. The proposed distribution is very competitive to other lifetime models. Some of its mathematical properties are investigated including generating function, mean residual life, moments, Bonferroni and Lorenz curves and mean deviations. We discuss the estimation of the model parameters by maximum likelihood. We provide a simulation study and two applications to real data for illustrative purposes. We prove empirically that the new distribution yields good fits to both data sets, and it can be a useful alternative for other classical lifetime models.

The Lindley distribution was pioneered by Lindley (1958) in the context of fiducial and Bayesian inference.It is a mixture of the exponential and length-biased exponential distributions.Let Y be a Lindley random variable with parameter γ > 0 having probability density function (pdf) where the mixing proportion is γ/(1 + γ).The survival function of Y is Various of its statistical properties were discussed in details by Ghitany et al. (2008b).The authors also showed that the Lindley distribution is quite competitive with the exponential distribution.Gupta and Singh q(z) = αλ(1 + λz) α-1 e 1-(1+λz) α and Q(z) = e 1-(1+λz) α , respectively.Some generalizations of the NH distribution have been proposed in recent years, such as the exponentiated Nadarajah-Haghighi (Lemonte 2013) and gamma Nadarajah-Haghighi (Bourguignon et al. 2015), among others.By using the discrete-continuous compounding approach, we have the Poisson gamma Nadarajah-Haghighi (Ortega et al. 2015) and geometric Nadarajah-Haghighi (Marinho 2016) distributions.
A comprehensive review on compounding method for generating distributions can be found in Tahir et al. (2016).They pointed out a different compounding approach by taking the minimum between two continuous distributions.In the discrete-continuous compositions, N is a discrete random variable representing the number of identical elements having some continuous distribution.For the continuous-continuous compositions, we suppress the condition to be identically distributed and set N = 2.Some well-known continuous-continuous compounded models are the additive Weibull (Xie andLai 1995, Lemonte et al. 2014), exponential-Weibull (Cordeiro et al. 2014) and generalized exponential-exponential (Popovíc et al. 2015) distributions, among others.
In this paper, we introduce a new continuous-continuous compounded model referred to as the Nadarajah-Haghighi Lindley (NHL) distribution.The new three-parameter distribution is obtained by compounding the Lindley and NH distributions.We assume that Y and Z are independent random variables and define X = min(Y, Z) as a NHL random variable, whose survival function is given by The cumulative distribution function (cdf) of X is where x > 0. The parametric space of the cdf in (1) can be defined as Θ = {(α, λ, γ) : α ≥ 0, γ ≥ 0, λ ≥ 0, max(α, γ) > 0, max(λ, γ) > 0}.
The pdf and hazard rate function (hrf) of X are given by and , respectively.Henceforth, we consider that X ∼ NHL(α, λ, γ).The proposed distribution contains as special models some well-known distributions.For γ = 0, the NHL reduces to the NH distribution.If γ = 0 and α = 1, we have the exponential distribution.For α = 0 or λ = 0, we have the Lindley distribution.
Figure 1 displays plots of the pdf of X for some parameter values.The new density presents decreasing and reverse J shaped curve.Figure 2 reveals that the NHL distribution can have decreasing, increasing, upside-down bathtub and bathtub-shaped hazard functions.This feature makes the new distribution very competitive to the Weibull, gamma and exponential distributions that exhibit only monotonic hazard rates.According to Nadarajah et al. (2011) this is a major weakness because most empirical life systems have bathtub shapes for their hrfs.The NHL distribution has the following theoretical motivations: (i) It can be useful in engineering and reliability for modeling a system having two sub-systems functioning in series independently at a given time.If Y and Z denote the lifetimes of these independent sub-systems following the Lindley and NH distributions, then the lifetime X of the system has the NHL distribution.Cordeiro et al. (2014)   biological and medical applications.(iii) Nadarajah and Haghighi (2011) mentioned some advantages of the NH model such as the ability to model data with mode fixed at zero and the fact that it can be interpreted as a truncated Weibull distribution.The NHL distribution also accumulates this advantages since it has as special model the NH distribution.(iv) Some of its mathematical properties are easily obtained.Further, the NHL distribution has some practical motivations: (i) It can be used for modeling data with bathtub and unimodal failure rates, which are very common in many applied areas.(ii) The finite sample behavior of the maximum likelihood estimates of the its parameters is adequate.(iii) It is really a very competitive model to well-known distributions such as the Weibull, exponentiated Weibull, NH and Lindley distributions as proved empirically in two applications to real data in Section 7.
The rest of this paper is outlined as follows.In Sections 2-4, we obtain a range of mathematical properties of the NHL distribution.In Section 5, we consider the maximum likelihood method to estimate the model parameters.We perform a simulation study in Section 6.Two real data applications are provided in Section 7. Some concluding remarks are offered in Section 8.

-MEAN RESIDUAL LIFE
The mean residual life is a relevant characteristic to the design of safe systems in a wide variety of applications in engineering and reliability.Given that a component survives up to time x > 0, the residual life is defined by It represents the period beyond x until the time of failure.Note that We consider the integral Setting u = (1 + λ t) α , we have t = (u 1/α -1)/λ.So, the above integral reduces to By expanding , and using the binomial theorem for (u 1/α -1) i , we can write where and Γ(a, z) = Γ(a)γ(a, z) = ∞ z t a-1 e -t dt is the upper incomplete gamma function.Note that (3) can be approximated by setting large values in the upper limit of the sum, say N. Hence, For illustrative purposes, we provide a numerical study to analyze de convergence of the expansion (4) by comparing its results with those from numerical integration.Table I presents the calculations for N = 5, 10, 15 and 20 and x = 1.This expansion presents good approximations for the mean residual life.All computations are obtained using the R software.We provide the scripts for these calculations in the Appendix A.

-GENERATING FUNCTION AND MOMENTS
We denote by M(t) the moment generating function (mgf) of X. From Equation (2), we obtain Setting u = (1 + λx) α , we have By expanding using the binomial theorem for (u 1/α -1) i and, after some algebra, we obtain (for γ > 0 and t < γ) where The generating function M(t) is useful for computing moments of the NHL distribution by differentiation.Thus, to considering values of the parameters α, λ and γ for which the above expansion converges, the nth derivative gives where is the falling factorial.Therefore, It is possible to obtain an approximation for μ n by truncating the previous expansion.Hence, for N large enough, (5) We use the R software to compute μ n from ( 5) and compare the results with those obtained by numerical integration.Table II presents the results for some parametric values and N = 5, 10, 15 and 20.In fact, they illustrate that the proposed expansion provides reasonable approximations for the moments.For n = 1, note that the series converges for all selected parameterizations even for very small N = 5.More terms may be required when n increases.We provide the scripts for these calculations in Appendix A. The central moments (μ n ) and cumulants (κ n ) of X can be determined from these raw moments using well-known relationships.An alternative expression for the skewness of X can be expressed as (MacGillivray 1986) where u ∈ (0, 1) and ]/2 equals the median of X for all u, and the numerator of ρ(u) is zero.Thus, the farther ρ(u) is from the horizontal line ρ(u) = 0, means higher asymmetry.Decrescent form in ρ(u) indicates positive asymmetry.Plots of the MacGillivray skewness for some parameter values are displayed in Figure 3.Note that for λ = 1.0 and γ = 0.1, increases in α imply less skewed to the left allowing to model data with a distribution of heavy tails.An opposite result is observed by setting α = 0.5 and γ = 0.1 and increasing λ.Less asymmetric distributions arise for higher values of λ.

-INCOMPLETE MOMENTS
The sth incomplete moment of X is defined by m s (y) = y 0 x n f(x)dx.Thus, by inserting (2) in m s (y), we have Using the exponential and binomial expansions, we obtain where and Γ * y (a) = γ(a, (1 + λy) α )γ(a, 1) = (1+λy) α 1 t a-1 e -t dt.We compute approximations for m s (y) by setting values in the upper limit of the sum of equation ( 6).The results for some parameter values and y are shown in Table III.This expansion presents good approximations for the incomplete moments.We provide the scripts for these calculations in Appendix A.  Equation ( 6) is the main result of this section.In various practical situations, the shape of many distributions can be usefully described by the incomplete moments.For example, the mean deviations about the mean and median and Lorenz and Bonferroni curves are simple applications of the first incomplete moment.Nowadays in the literature, these curves are the most used curves in inequality analysis.Also, note that (6) can be used to approximate μ n by setting large values in y.
For a given probability π, the Lorenz and Bonferroni curves are defined by B(π) = m 1 (q)/(πμ 1 ) and L(π) = m 1 (q)/μ 1 , respectively, where q = Q(π) is the qf of X evaluated at π. Plots of the Lorenz and Bonferroni curves for some parameter values are displayed in Figure 4.

-MAXIMUM LIKELIHOOD ESTIMATION
Let x 1 , • • • , x n be a sample of size n from the NHL(α, λ, γ) distribution and θ = (α, λ, γ) T the parameter vector of interest.The log-likelihood function for θ based on this sample is given by The maximum likelihood estimates (MLEs) of the model parameters can be obtained by maximizing (7).There are several routines for numerical maximization in the R program (optim function), SAS (PROC NLMIXED), Ox (MaxBFGSsub-routine), among others.Alternatively, we can differentiate (7) and solve the resulting nonlinear likelihood equations using the quasi-Newton BFGS and Newton-Raphson algorithms.The score components can be obtained from the authors under request.
Under standard regularity conditions and for large n, the distribution of θ can be approximated by a trivariate normal N 3 (0, J( θ) -1 ) distribution, where J(θ) is the observed information matrix.In addition, this approximation can also be used for constructing confidence regions and testing hypotheses on the parameters.

-SIMULATION STUDY
In this section, we provide a Monte Carlo simulation study to evaluate the adequacy of the MLEs of the parameters of the NHL distribution.The simulations are performed by generating observations from eight scenarios with different parameter combinations.Note that the cdf in (1) can only be inverted numerically.However, it is possible generating random numbers by using the qfs of the Lindley and NH distributions.
Jodrá (2010) used the Lambert W function to generate random numbers from the Lindley distribution.
where W -1 is the negative branch of the Lambert W function, which can be implemented in R (lamW package) or C (GNU Scientific Library).
The number of Monte Carlo replications is set at N = 10, 000.We use the optim subroutine with analytical derivatives in R for maximizing the log-likelihood function.We compare the performance of the estimators by computing the mean estimates and root mean square errors (RMSEs) from N Monte Carlo replications.We take the sample size as n ∈ {50, 100, 300, 500}.
The simulation results are given in Table IV.As expected, under first-order asymptotic theory, the mean estimates of the parameters tend to be closer to the true values and the RMSEs decrease when the sample size n increases for all scenarios.The MLEs of λ and γ are more accurate than those of α.It is noteworthy that the estimates of α have considerable biases, specially in small samples.Nevertheless, we can build bias-corrected point estimator through bootstrap method, see Davison (1997).
The non-parametric bootstrap procedure for bias correction applied in this paper can be described as follows: 1. Letting x = x 1 , . . ., x n be a observed random sample, we obtain the MLE θ for the NHL parameter vector θ; 6.The bias corrected estimate of θ is given by θ = θ -B( θ).
In order to evaluate the effectiveness of such correction, we carry out a separate simulation experiment.The numbers of Monte Carlo and bootstrap replications are equal to 1, 000 and n varies in {20, 30, 50, 70, 100}.We study the performance of the bootstrap bias corrected estimates by computing the total relative bias defined in Cribari-Neto and Soares (2003), which is a measure given by the sum of the absolute values of the individual relative biases.Thus, the total relative bias is an aggregate measure of the biases of the parameters estimates.
The plots in Figure 5 show the behavior of the total relative bias for the uncorrected and corrected estimators.They reveal that the corrected estimators are more reliable than the MLEs in small sample sizes.In fact, the corrected estimators outperform the usual MLEs for both scenarios.In addition, it is also observed that the asymptotic property of unbiasedness of both estimators is satisfied, because their biases decrease when the sample size increases.

-APPLICATIONS
In this section, we fit the NHL distribution to two real data sets.They illustrate the potentiality of this distribution for modeling positive data.The first data set represents the times to reinfection of sexually transmitted diseases (STDs) for eight hundred and seventy seven patients.The data are taken from Section 1.12 of Klein and Moeschberger (1997).The second data set corresponds to the exceedances of flood peaks (in m 3 /s) of the Wheaton River near Carcross in Yukon Territory, Canada.The data consist of 72 exceedances for the years 1958 -1984, rounded to one decimal place, see Choulakian and Stephens (2001).Table V provides a descriptive summary for both data sets.
Note that the first data set has large amplitude and variance.Its measures of central tendency, such as mean, median and mode, are quite distant when compared among them.Besides, both data sets present positive values for the skewness and kurtosis.The exceedances of flood peaks data also present large amplitude and variance, but their values are lower than those for reinfection times.
For modeling these data sets, we fit the NHL distribution and also considered the fits of six related distributions: the ENH introduced by Lemonte (2013) with pdf where α > 0 and β > 0 are shape parameters and λ > 0 is a scale parameter; the exponentiated Weibull (EW), whose pdf is where α > 0 and β > 0 are shape parameters and λ > 0 is a scale parameter; the Weibull model arises from the EW model when β = 1; and the NH, Lindley and exponential distributions, which are NHL special models.We estimate the NHL parameters and the parameters for the above models by maximum likelihood.Based on the results in Section 6, the MLEs obtained for the exceedances of flood peaks data are obtained from the bootstrap bias corrected estimators, with B = 1, 000.The goodness-of-fit statistics considered are: Akaike information criteria (AIC), consistent Akaike information criteria (CAIC), Kolmogorov-Smirnov (KS), Cramér-von Mises (W * ) and Anderson-Darling (A * ).The lower are these statistics, the better is the adjustment to the data.The MLEs and goodness-of-fit statistics are calculated using the AdequacyModel script in the R software.
Tables VI and VIII list the MLEs (and the corresponding standard errors in parentheses) of the unknown parameters for the fitted models to the first and second data sets, respectively.We note that all distributions present reasonable estimates for the standard errors.Tables VII and IX present the goodness-of-fit statistics  for reinfection times and flood peaks, respectively.According to these statistics, the NHL distribution provides a good fit and is quite competitive with the other current distributions.
For both data sets, the NHL distribution yields the best fit under all goodness-of-fit statistics.Figure 6 displays the histograms and the estimated densities for three competitive models according to their goodness-of-fit statistics.This graphical inspection also indicates the superiority of the new distribution to the reinfection times and flood peaks data.We also provide the TTT plots and estimated hazard functions of the NHL fitted model for the reinfection times and flood peaks data in Figures 7 and 8, respectively.They reveal that the NHL hazard function for both fitted models present bathtub shapes, which is in agreement with their corresponding TTT plots.Figure 9 displays the plots of the estimated cdfs for the most competitive  models and empirical cumulative function to both data sets.These plots illustrate the good adjustment of the NHL distribution.Hence, these results reveal that the proposed distribution can be a very effective alternative to the well-known Weibull, EW and ENH distributions, among others.

-CONCLUSIONS
We introduce the Nadarajah-Haghighi Lindley (NHL) model by compounding the Lindley and Nadarajah-Haghighi distributions.Once we have a composition by taking the minimum of two continuous independent random variables, the proposed distribution might be useful in engineering for modeling the failure time of systems composed of two independent components in series.The NHL distribution has as special cases the Lindley, Nadarajah-Haghighi and exponential distributions.Further, it is a competitive

-AUTHOR CONTRIBUTIONS
The participation of the authors in the production of the manuscript is as follows: first author, characterization of the new distribution, mathematical properties and implementation of computational routines.The second author, application, simulation studies, and computational routines.The third author, review and general correction of the paper.

Figure 1 -
Figure 1 -Plots of the NHL density.
studied a similar situation for the exponential-Weibull model.(ii) The stochastic representation X = min{Y, Z} can arise in several

Figure 2 -
Figure 2 -Plots of the NHL hazard function.
FERNANDO A. PEÑA-RAMÍREZ, RENATA R. GUERRA AND GAUSS M. CORDEIRO THE NHL DISTRIBUTION

Figure 3 -
Figure 3 -Skewness of the NHL distribution for some parameter values.

Figure 4 -
Figure 4 -(a) Bonferroni curves for the NHL model for some parameter values; (b) Lorenz curves for the NHL model for some parameter values.

2.
From the original data, generate a bootstrap sample x * b = (x * b 1 , . . ., x * b n ), that is, take a size n random sample from x with replacement; 3. Calculate the MLE of θ from the bootstrap sample x * b , namely θ * ; 4. Repeat the steps (2) and (3) for a very large number B of times, thus obtaining θ * 1 , . . ., θ * B ; 5. Compute the bias estimate by B( θ)

Figure 5 -
Figure 5 -Total relative bias of the estimator for (a) Scenario 2 and (b) Scenario 5.

Figure 6 -
Figure 6 -Histogram and estimated pdfs of (a) the NHL, ENH and EW models for the reinfection times; and (b) the NHL, ENH and Exp models for the exceedances of flood peaks.

Figure 7 -
Figure 7 -(a) TTT plot; and (b) estimated hazard function for the reinfection times data.

Figure 8 -
Figure 8 -(a) TTT plot; and (b) estimated hazard function for the exceedances of flood peaks data.

Figure 9 -
Figure 9 -Empirical and estimated cdfs of (a) the NHL, ENH and EW models for the reinfection times; and (b) the NHL, ENH and Exp models for the exceedances of flood peaks.