## Print version ISSN 0001-3765On-line version ISSN 1678-2690

### An. Acad. Bras. Ciênc. vol.91 no.3 Rio de Janeiro  2019  Epub Aug 19, 2019

#### https://doi.org/10.1590/0001-3765201920190002

Mathematical Sciences

Improved Bayes estimators and prediction for the Wilson-Hilferty distribution

1Departamento de Matemática Aplicada e Estatística, Universidade de São Paulo, Avenida TrabalhadorSão-Carlense, 400, 13566-590 São Carlos, SP, Brazil

2Departamento de Estatística, Universidade Federal de São Carlos, Rodovia Washington Luis, Km 235, 13565-905 São Carlos, SP, Brazil

ABSTRACT

Abstract: In this paper, we revisit the Wilson-Hilferty distribution and presented its mathematical properties such as the r-th moments and reliability properties. The parameters estimators are discussed using objective reference Bayesian analysis for both complete and censored data where the resulting marginal posterior intervals have accurate frequentist coverage. A simulation study is presented to compare the performance of the proposed estimators with the frequentist approach where it is observed a clear advantage for the Bayesian method. Finally, the proposed methodology is illustrated on three real datasets.

Key words Bayesian analysis; Bayesian prediction; censored data; objective prior; Wilson-Hilferty distribution.

1 - INTRODUCTION

In this paper, we revisit the Wilson-Hilferty (WH) distribution, showing that this distribution is appropriate for modeling data with increasing and bathtub hazard rates. This model takes the name of a fundamental statistical technique given by Wilson and Hilferty (1931), the so-called Wilson-Hilferty transformation. This procedure gives a normal approximation to the cube root of a chi-squared variable, and it is essential for a wide range of scientific processes. Ishikawa et al. (2014) considered such transformation to improve the accuracy of likelihood analysis in the cases where only a small number of modes are available in power spectrum measurements. Zemzami and Benaabidate (2016) considered the WH transformation to reduces the effect of local variations in artificial neural networks. Here, we are not introducing a new distribution but providing common mathematical functions and inferential procedures related to this vital model to be used in different frameworks.

For the WH distribution a critical reparametrization is presented. This approach returns a simple probability density function (PDF) where its parameters are orthogonal in the sense discussed by Cox and Reid (1987). Besides, we considered the presence of randomly censored data, since it has received particular attention in medical experiments and industrial lifetime testing. To the best of our knowledge, there is no evidence of the use of the WH distribution to describe time-to-event data and reliability applications so far. Therefore, these results are useful contributions to be used by reliability engineers and practitioners of statistical analysis of lifetime data in general.

Due to the limited number of data that are usually observed in reliability application, we consider an objective Bayesian analysis to achieve inference that does not depend on asymptotic results such as the frequentist inference. In this case, a Bayesian approach using an overall reference prior is presented. It is proved that such prior is also a matching prior, i.e., the resulting marginal posterior intervals have accurate frequentist coverage (Tibshirani 1989). Moreover, the resulting posterior is proper and has interesting properties, such as one-to-one invariance, consistent marginalization, and consistent sampling properties.

Moreover, we also propose efficient closed-form maximum a posteriori probability (MAP) estimators for both parameters in the case of complete data, which is essential for practical purposes, since they can be applied for real-time computing estimators in embedded technology. An intensive simulation study is presented to show the usefulness of the Bayesian approach. Finally, a Bayes prediction of the future failure time is presented based on observed order statistics. These results are applied in three lifetime data to demonstrate how the WH distribution can be applied in real situations.

The remainder of this study is organized as follows. Section 2 presents some mathematical properties of the WH distribution. In Section 3, we present the parameter estimates for the model based on Maximum Likelihood Estimator (MLE) for complete and censored data. Section 4 presents the Bayesian estimators for the distribution. Section 5 is devoted to present simulation studies to compare the performance of the estimators. The predictive modeling for the WH distribution is discussed in Section 6. In Section 7, we prove the relevance of the distribution by considering three real lifetime datasets. Finally, Section 8 summarizes the present study.

2 - WILSON-HILFERTY DISTRIBUTION

Let T be a non-negative random variable with WH distribution, then its PDF is

f(t|ϑ)=3Γ(α)(αλ)αt3α1exp(αλt3), (1)

where α>0 and λ>0 are shape and scale parameters, respectively, and ϑ=(α,λ) , hereafter, we will assume that ϑ represents these parameters. This model is a particular case of a three-parameter generalized gamma (GG) distribution proposed by Stacy (1962) (replacing 3 by p>0 in equation (1)) that unifies the WH distribution with other important models such as, Weibull distribution, gamma, Nakagami, and the lognormal distribution, to list a few.

The cumulative function of T is given by

F(t|ϑ)=1Γ(α)γ(α,αλt3),

where γ(y,x)=0xwy1ewdw is known as the lower incomplete gamma function.

The raw moments of the model are given as

E(Tr|ϑ)=Γ(α+r/3)Γ(α)(λα)r/3, for r, (2)

which can also be obtained using a reparametrization in the r-moment of the GG distribution given in Khodabin and Ahmadabadi (2010). Hence, the mean and variance of (1) are, respectively, given by

E(T|ϑ)=Γ(α+1/3)Γ(α)(λα)13 and (3)
Var(T|ϑ)=λ(1(Γ(α+1/2)Γ(α))2). (4)

2.1 - QUANTITATIVE METHODS FOR THE RELIABILITY

Throughout this subsection, we present some crucial measures in many areas of science, mainly in reliability engineering. Such quantitative measures for the reliability are: Reliability Function, Failure Rate Function and Mean Residual Life.

2.1.1 - Reliability function and hazard rate function

The reliability function of an observational unity is given by

R(t|ϑ)=1Γ(α)Γ(α,αλt3),

where Γ(y,x)=xwy1ewdw is known as the upper incomplete gamma function. In general applications, R(t) means the probability that an observation does not fail in the time interval (0,t] .

When dF(t)dt=f(t) exists, the hazard function of a component is obtained by considering h(t|ϑ)=f(t|ϑ)/R(t|ϑ) . So, for the WH distribution, the hazard function is given by

h(t|ϑ)=3(αλ)αt3α1exp(αλt3)Γ(α,αλt3)1. (5)

A considerable attribute of the function (5) is that it presents three distinct phases, i.e., decreases first, then remains roughly constant and ultimately increases. This result is proved in Theorem 2.1.

Theorem 2.1. For the WH distribution the hazard function h(t|ϑ) is bathtub (increasing) shaped for 0<α<13 (α13) , for all λ>0 .

Proof. Consider that

ω(t|ϑ)=ddtlogf(t|ϑ)=(3α1)t3αtλ. (6)

Note that for α13 and λ>0 , ω(t|ϑ) has a decreasing behavior, then from Lemma 2.1.2 presented in Ramos et al. (2018), h(t|ϑ) is increasing. On the other hand, for 0<α<13 and λ>0 , we have that ω(t|ϑ) has bathtub shape property with a global minimum at t*=λλ3α . Hence, h(t|ϑ) is also bathtub shaped.

The behavior of the hazard function (5) when t0 and t is given, respectively, by

h(0|ϑ )={,ifα <133λ ,ifα =130,ifα >13 and h(|ϑ )=.

2.1.2 - Mean residual life function

The mean residual life (MRL) is an essential measure in reliability and survival analysis applications and represents the expected additional lifetime given that a component has survived until time t t , i.e., it summarizes the entire remaining life from components or systems. This measure gives substantial information for maintenance strategies involving repair and replacement.

Proposition 2.2. The mean residual life function r(t|ϑ) of the WH distribution is given by

r(t|ϑ)=1R(t|ϑ)tyf(y|ϑ))dyt=(λα)13(Γ(α+13,αλt3)Γ(α,αλt3))t. (7)

The behaviors of the MRL function in (7) when t0 and t are, respectively

r(0|ϑ)=(λα)13(Γ(α+13)Γ(α))andr(|ϑ)=1h(|ϑ)=0.

Theorem 2.3. The MRL function r(t|ϑ) of the WH distribution has either a unimodal or decreasing shape when 0<α<13 , or α13 , respectively, for all λ>0 .

Proof. If α13 and λ>0 , we have that h(t|ϑ) is increasing. In this case, using the Lemma II.4 presented by Ramos et al. (2018), r(t|ϑ) is decreasing. Now, if λ>0 and 0<α<13 , then h(t|ϑ) has a bathtub shape and since h(0)r(0)>1 , from the same Lemma II.4 used above, we have that r(t|ϑ) has a unimodal shape property.

Figure 1 presents different shapes of the hazard and the mean residual life function. The hazard function is characterized by three distinct regions according to a bathtub curve of the hazard rate. The assessment of the impact of each region it is essential to improve the reliability of the components and systems by preventive maintenance or to eliminate early-life failures.

3 - CLASSICAL INFERENCE

3.1 - COMPLETE DATA

In this section, we use the maximum likelihood method to obtain the classical estimators of the unknown parameters of the WH distribution which may be obtained by direct maximization of the likelihood function. Let T1,,Tn be a random sample such that TiiidWH(λ,α) , then, the likelihood function from (1) is given by

L(ϑ|𝐭)=3nΓ(α)n(αλ)nα{i=1nti3α1}exp(αλi=1nti3). (8)

The estimates are obtained by maximizing the likelihood function. From the expressions λlogL(ϑ|𝐭)=0 and αlogL(ϑ|𝐭)=0 , the likelihood equations are given by

n(1+log(αλ))nψ(α)+3i=1nlog(ti)=1λi=1nti3 (9)

and

nαλ+αλ3i=1nti3=0, (10)

where ψ(k)=klogΓ(k)=Γ(k)Γ(k) is the digamma function. The MLE of λ̂ is given by

λ̂=1ni=1nti3. (11)

Substituting λ̂ in (9), the MLE of α̂ can be obtained by solving

ξ(α;𝐭)=log(1ni=1nti3)3ni=1nlog(ti)log(α)+ψ(α). (12)

Theorem 3.1. Suppose that λ̂=1ni=1nti3 is the MLE of λ . Hence, the root of ξ(α;𝐭)=0 , α̂ , is unique.

Proof. Since log(α)ψ(α) is strictly monotone and continuous with range in (,0) , we have that for α>0 there is a unique solution in

log(α)ψ(α)=log(1ni=1nti3)1ni=1nlog(ti3),

and the proof is completed.

The MLEs are asymptotically normally distributed with a bivariate normal distribution given by

(α̂MLE,λ̂MLE)N2[(α,λ),I1(α,λ))] for n,

where I(α,λ) is the Fisher information matrix

I(α,λ)=n[(αψ(α)1)α00αλ2], (13)

and ψ(k)=kψ(k) is the trigamma function. These results are useful to construct asymptotic confidence intervals for the parameters of the WH distribution.

3.2 - CENSORED DATA

A fundamental problem that one meets with lifetime data is that available data are a mixture of complete and incomplete observations due to many reasons. The case most easily encountered of incompleteness is random censoring (for more details see Lawless 2011). In this type of censoring there are the follow random variables, the lifetime Ti related to i th individual and the censoring time Ci , such that data observed are (ti,δi) , where ti=min(Ti,Ci) and the censoring indicator δi=I(TiCi) , resulting 1 if the lifetime is observed and 0 otherwise. This type of censoring has as special cases the type I and II censoring mechanisms. We assume that the random censoring times Ci s are independent of Ti s. In this case, the likelihood function for ϑ is given by L(ϑ,𝐭)=i=1nf(ti|ϑ)δiR(ti|ϑ)1δi .

Letting the failure times T1,,Tn be a random sample of WH distribution, the likelihood function considering data with random censoring is given by

L(α,λ|𝐭,𝛅)=3dΓ(α)n(αλ)dαexp(αλi=1nδiti3)×i=1n{ti(3α+1)δiΓ(α,αλti3)1δi}, (14)

where d=i=1nδi . The logarithm of the likelihood function (14) is given by

l(α,λ|𝐭,𝛅)=dlog(3)+dαlog(αλ)+(3α1)i=1nδilog(ti)nlog(Γ(α))+i=1n(1δi)log(Γ(α,αλti3))αλi=1nδiti3, (15)

From l(α,λ|𝐭,𝛅)/α=0 and l(α,λ|𝐭,𝛅)/λ=0 , the likelihood equations are given as follows

dlog(αλ)+d+3i=1nδilog(ti)+i=1n(1δi)Ψ1(α,λti3)nψ(α)=1λi=1nδiti3 (16)
i=1n(1δi)Ψ2(α,λti3)+i=1nδiαλ3ti3dαλ=0, (17)

where Ψ 1(a,b)=alogΓ (a,ab) and Ψ 2(a,b)=blogΓ (a,ab) can be computed numerically. Numerical methods are required to find the solution of these non-linear equations.

4 - BAYESIAN INFERENCE

In this section, we considered a Bayesian approach to obtain the parameter estimators of the WH distribution. Our interest is to find accurate estimates and prediction (condition/performance in the future). These problems are often challenging to deal with when using classical inference methods. In particular, our setting presents many interest quantities, and we are simultaneously interested in all the parameters of the model and several functions of them.

One may consider the Jeffreys prior, which is obtained by taking the square root from the determinant of the Fisher information matrix in (13), i.e.,

πJ(λ,α)αψ(α)1λ. (18)

On the other hand, the Jeffreys prior may not be a good choice in the multiparametric case Bernardo (2005). Moreover, as it will be discussed further, this prior is not a matching prior for both parameters, i.e., the marginal posterior intervals have not accurate frequentist coverage in a sense discussed by Tibshirani (1989). Another well-known class of non-informative priors is then considered, the so-called reference priors Bernardo (1979). The main idea for obtaining such prior is to maximize the expected Kullback-Leibler divergence between the posterior distribution and the prior distribution. The obtained reference prior provides posterior distributions with interesting properties such as invariance, consistency under marginalization and consistent sampling properties.

In cases where the Fisher information matrix has orthogonal parameters, the following Lemma (see Berger et al. 2015) can be used to easily obtain a one-at-a-time reference prior for any chosen parameter of interest and any ordering of the nuisance parameters in the derivation (hereafter referred to as overall reference prior).

Lemma 4.1. Berger et al. (2015). Considering the unknown parameters ϑ=(ϑ1,ϑ2) and the posterior distribution p (ϑ1,ϑ2|t) which is asymptotically normally distributed with dispersion matrix S(ϑ1,ϑ2)=I1(ϑ1,ϑ2) . If I(ϑ1,ϑ2) is of the form

I(ϑ1,ϑ2)=diag(f1(ϑ1)g1(ϑ2),f2(ϑ2)g2(ϑ1)),

where fi() and gi() are positive functions of ϑi for i=1,2 i=1,2 , then the overall reference prior is given by

πR(ϑ1,ϑ2)=f1(ϑ1)f2(ϑ2).

Consider the posterior distribution with dispersion matrix S(α,λ)=I1(α,λ) is given by

p(ϑ|𝐭)1Γ(α)n(αλ)nα{i=1nti3α}exp(αλi=1nti3).

Since ti3,i=1,,n are constant terms supose that yi=ti3 . Then, we have that

p(ϑ|𝐭)1Γ(α)n(αλ)nα{i=1nyiα}exp(αλi=1nyi).

The posterior above has the same form of the posterior distribution of the gamma model reparametrized by its mean. Crain and Morgan (1975) proved that for the exponential family the posterior distribution tends asymptotically to the multivariate normal. Since our model is included in this case, Lemma 4.1 can be used to derive the overall referente prior.

The overall reference prior for the WH distribution is given by

πR(α,λ)1λαψ(α)1α. (19)

An interesting aspect of this overall reference prior is that such prior satisfies the solutions of both partial differential equations

{λ{ι12πι22(ι11ι122/ι22)12}α{π(ι11ι122/ι22)12}=0α{ι12πι11(ι22ι122/ι11)12}λ{π(ι22ι122/ι11)12}=0

where ιij(ϑ),i,j=1,2 , are the elements of the Fisher information matrix given in (13). This implies that the credible interval for ϑi,i=1,2 has a coverage error O O ( n^{-1} n1 ) in the frequentist sense, i.e.,

P[ϑiϑi1a(π;X)|ϑj]=1aO(n1),

where ϑi1a(π;X)|ϑj refers to the (1a) th quantile of the posterior distribution of ϑi . It is important to point out that the Jeffreys prior only satisfies the second partial differential, therefore, such prior is not a matching prior for both parameters, which is undesirable.

4.1 - COMPLETE DATA

The joint posterior distribution for α and λ , produced by the overall reference prior is proportional to the product of the likelihood function (8) and the prior distribution (19), resulting in

p(α,λ|𝐭)αψ(α)1α12λΓ(α)n(αλ)nα{i=1nti3α1}exp(αλi=1nti3).

The normalizing constant is given by

d(𝐭)=𝒜αψ(α)1α12λΓ(α)n(αλ)nα{i=1nti3α1}exp(αλi=1nti3)dϑ,

and 𝒜={(0,)×(0,)} is the parameter space for ϑ . The posterior distribution (20) will be proper if and only if d(𝐭)< .

Theorem 4.2. The posterior distribution (20) is proper if and only if n1 .

Proof. Since αψ(α)1α12λΓ(α)n(αλ)nα{i=1nti3α1}exp(αλi=1nti3)0 , by Tonelli theorem (see Folland 1999) we have

d(𝐭)=𝒜αψ(α)1α12λΓ(α)n(αλ)nα{i=1nti3α1}exp(αλi=1nti3)dϑ=00αψ(α)1α12λΓ(α)n(αλ)nα{i=1nti3α1}exp(αλi=1nti3)dλdα=0αψ(α)1α12Γ(nα)Γ(α)n(i=1nti)3α(i=1nti3)nαdα=s1(𝐭)+s2(𝐭)

where

s1(𝐭)=01αψ(α)1α12Γ(nα)Γ(α)n(i=1nti)3α(i=1nti3)nαdα

and

s2(𝐭)=1αψ(α)1α12Γ(nα)Γ(α)n(i=1nti)3α(i=1nti3)nαdα.

Ramos et al. (2018) already proved in the context of the Nakagami-m distribution that

limα0+αψ(α)1α12=1,limααψ(α)1α12=12and
Γ(nα)Γ(α)nαn1 for α(0,c] and Γ(nα)Γ(α)nnnααn12

for α[c,) and for any c>0 c>0 . Therefore, we have that

s1(𝐭)=01αψ(α)1α12Γ(nα)Γ(α)n(i=1nti)3α(i=1nti3)nαdα01α12α12αn1(i=1nti)3α(i=1nti3)nαdα=01αn2eq(t)nαdα<

and

s2(𝐭)=1αψ(α)1α12Γ(nα)Γ(α)n(i=1nti)3α(i=1nti3)nαdα1α12α12nnααn12(i=1nti)3α(i=1nti3)nαdα=1αn32ep(t)nα<,

where p(t)=log(1ni=1nti3i=1nti3n)>0 and q(t)=log(i=1nti3i=1nti3n)>0 by the inequality of the arithmetic and geometric means. Finally, we obtain d(𝐭)=s1(𝐭)+s2(𝐭)< .

The marginal posterior distribution for α is given by

p(α|𝐭)αψ(α)1αΓ(nα)Γ(α)n(i=1nti)3α(i=1nti3)nα. (21)

The conditional posterior distribution for λ is given by

p(λ|α,𝐭)IG(nα,αi=1nti3), (22)

where IG (,) denotes the inverse gamma distribution with PDF f(y|a,b)=baya1exp(by1)/Γ(a) and shape parameter a and scale parameter b .

4.2 - MAXIMUM A POSTERIORI PROBABILITY (MAP) ESTIMATORS

Firstly, we considered the Bayesian approach to derive efficient closed-form estimators for parameters of the generalized gamma distribution when one of the parameters is known. Let W follows a GG distribution with α,λ and ϕ parameters, then, the likelihood function is given by

L(ϑ𝐰)=φ nΓ(α)n(αλ)nα{i=1nwiφ α1}exp(αλi=1nwiφ ). (23)

Although different objective priors could be considered such as Jeffreys’ prior or reference priors (Ramos et al. 2017) to perform inference with this generalized model, they depend on polygamma functions which do not allow us to obtain closed-form estimators. On the other hand, they considered the following objective prior

π (ϑ )1λ c1α c2φ c3, (24)

where ci0,i=1,2,3 are known hyperparameters. From the likelihood function (23) and the prior distribution (24), the joint posterior distribution for ϑ is given by,

p(ϑ |w)=1d(w)φ nc3α c2λ c1Γ (α )n(α λ )nα {i=1nwiφ α }exp(α λ i=1nwiφ ), (25)

where d(𝐰) is a normalized constant with parameter space 𝒜={(0,)×(0,)×(ϵ,M)} , 0<ϵ<3 is a small constant and M>3 is a large constant. We select (ϵ,M) for the interval of φ as the interest is in the situation where φ =3 . Hence, any interval (ϵ,M) that contains φ =3 is adequate for our purposes.

To achieve the maximum a posteriori probability estimator of ϑ we consider ϑ ^ MAP=argmaxϑ log(π (ϑ |ω )) . Hence, we have that the MAP estimate of λ is given by

λ̂=α̂i=1nwiφ ̂nα̂+c1 (26)

As can be noted, (26) will be equal to the MLE (11) if and only if c1=0 , i.e, λ is unbiased when φ =3 . Hence we consider only that c1=0 . Moreover, if c1=0 , then (25) is a proper posterior distribution, i.e., d(𝐰)< (see Louzada et al. 2018). The MAP estimate of α is given by

α̂=(nc3)(1λ̂i=1nwiφ ̂log(wiφ ̂)+i=1nlog(wiφ ̂)), (27)

and the MAP estimate for φ is obtained by solving the non-linear equation

log(α̂)ψ(α̂)=log(λ̂)1ni=1nlog(wiφ ̂)+c2nα̂.

For the WH distribution ( ϕ=3 ), the MAP estimator for λ is λ̂MAP=1ni=1nti3 and the estimate of α is obtained from

α ^ MAP=(nc3)1ni=1nti3(1ni=1nti3i=1nlog(ti3)i=1nti3log(ti3)) (28)

It is possible to show that the asymptotic variance σ̂α2 and σ̂λ2 of α̂MAP and λ̂MAP are given by

σ̂α2=(nc3)2n3(α2+α3ψ(α+1)) (29)

and

σ̂λ2=λ2nα (30)

4.3 - CENSORED DATA

In the case of censored data, the joint posterior distribution for α and λ , produced by the overall reference prior is given by

p(α,λ|𝐭,𝛅)αψ(α)1α3dλΓ(α)n(αλ)dαi=1n{ti3αδi}××i=1n{Γ(α,αλti3)1δi}exp(αλi=1nδiti3). (31)

To sample from the joint posterior distribution using the Markov Chain Monte Carlo (MCMC) methods it is necessary to consider the conditional distributions for the parameters. The conditional posterior distribution of α is given by

p(α|λ,𝐭,𝛅)1Γ(α)nαψ(α)1αexp(αλi=1nδiti3)×(αλ)dαi=1n{ti3αδiΓ(α,αλti3)1δi}.

The conditional posterior distribution of λ is given by

p(λ|α,𝐭,𝛅)(1λ)dα+1i=1nΓ(α,αλti3)1δiexp(αλi=1nδiti3).

Since the marginal distributions do not belong to any known parametric family, the Metropolis-Hastings algorithm was considered in order to generate samples from the marginal posterior. The Gamma distribution was used as transition kernel q(ϑi(j)|ϑi(*),b),i=1,2 , for sampling values of ϑi , where b is the parameter that control the rate of acceptance of the algorithm. In our case, we choose b to be equal to one. However, other higher values can also be considered. To increase the convergence of the MCMC method, we could use the closed-form estimators as a good initial value for λ and α .

The Metropolis-Hastings algorithm operates as follows:

1. 1. Start with an initial value α(1),λ(1) and set the iteration counter j=1;

2. 2. Generate a random value α(*) from the proposal Gamma(α(j),b) with PDF given by f(ya,b)=baΓ(a)ya1expby and random values u1 and u2 from an independent uniform in (0,1) ;

3. 3. Evaluate the acceptance probability

η1(α(j),α(*))=min(1,p(α(*)|λ(j),𝐭,𝛅)p(α(j)|λ(j),𝐭,𝛅)q(α(j)α(*),b)q(α(*)|α(j),b));
• 4. If η1(α(j),α(*))u1 , then α(j+1)=α(*) . Otherwise, α(j+1)=α(j) ;

• 5. Generate a random value λ(*) from the proposal Gamma(λ(j),b) and a random value u from an independent uniform in (0,1) ;

• 6. Evaluate the acceptance probability

η2(λ(j),λ(*))=min(1,p(λ(*)|α(j+1),𝐭,𝛅)p(λ(j)|α(j+1),𝐭,𝛅)q(λ(j)|λ(*),b)q(λ(*)|λ(j),b));
• 7. If η2(λ(j),λ(*))u2 , then λ(j+1)=λ(*) . Otherwise, λ(j+1)=λ(j) ;

• 8. Change the counter from j to j+1 and return to step 2 until convergence is reached.

5 - NUMERICAL ANALYSIS

In this section, the proposed estimation methods are investigated via Monte Carlo simulation in order to compare their efficiency. We used two of the most commonly used measures to assess the performance of the estimators, namely, the mean relative error (MRE) and the root-mean-square error (RMSE), given by MREϑi=1Nj=1Nϑ̂i,jϑi and RMSEϑi=1Nj=1N(ϑ̂i,jϑi)2 , for i=1,2, respectively, where N is the number of estimates and ϑ̂i,j,,j=1,,N is the estimates obtained through the MLE and MAP estimators. The MRE and the RMSE of the λ are the same for the different estimation procedures.

Taking into account this approach, the most efficient estimators are those ones yielding MREs closer to one with smaller RMSEs. It is also computed the 95% coverage probability ( CP95% ) of the credibility intervals (CI) obtained from the Bayesian estimators and the confidence intervals from the classical approach for α and λ . Under this approach, the estimators with best coverage probabilities will show the frequencies of intervals that cover the true values of ϑ closer to the nominal level. The samples of size n were generated by assuming TWH(λ,α) . The results were computed by using the software R. Overall, the main aim of this section is to compared the Bayes estimators with the classical approach in order to verify if our proposed estimators return more accurate results. This approach has been considered for other models such as Gamma (Louzada et al. 2019), Weibull (Teimouri et al. 2013), Kumaraswamy (Dey et al. 2018) and the GG distribution (Ramos et al. 2017), to list a few.

5.1 - COMPLETE DATA

Recall that it was presented a class of MAP estimators for α that depends on c3 . Therefore, before going on with the simulation study, it is necessary to find a value for c3 in which the MRE is closer to one. Figure 2 presents the MREs for αMAP considering different values of c3 , for ϑ=(3,4) and ϑ=(10,4) and n=20 . We observed that a good choice is c3=2.9 . Therefore, we considered such value in (28), where n3 . Since both estimators obtained from the reference posterior and the closed-form Bayes estimators came from the Bayesian approach, they will be referred to as Bayes and CBayes estimators, respectively. To achieve the MLEs, numerical techniques must be used to solve the non-linear equation (12), the uniroot function available in R was considered to find the estimate, and as provided theoretically, we find a unique solution in all situations.

Figure 3 shows the MREs, RMSEs and CPs for the estimates of α and λ obtained by using the Monte Carlo (MC) method where α=(4,0.5) , λ=2 and n=(15,20,,100) . The horizontal lines in the figures correspond to the minimum MREs and RMSEs, equal to one and zero, respectively. Notice that for λ we have the same estimator using the three different estimators for complete data. In this case, both MRE and RMSE are similar. On the other hand, the confidence and credibility intervals are computed differently for each method. We observe that the estimates of the parameters are asymptotically unbiased, i.e., the MREs tend to one when n increases and the RMSEs decrease to zero. However, the Bayes estimators, obtained from the reference posterior, outperform the other estimation procedures and present extremely efficient estimates for both parameters even for small sample sizes. It worth mentioning that if one needs fast computation, the CBayes returned similar estimates when compared to the Bayes estimators, obtained by considering reference priors.

5.2 - CENSORED DATA

In this section, we considered the Bayes estimators in the presence of randomly censored data. The censored data were generated as follows. We first generated two random samples of size n , where XWH(α,λ) and WU(0,tc) , with tc as fixed value. Then, from the obtained samples, we took Tj=min(Xj,Wj),j=1,,n , and defined δj=0 if Tj=Xj or δj=1 if Tj=Wj .

Note that tc has to be selected in order to obtain the desired proportion of censoring. In order to obtain approximately 0.3 and 0.5 proportions of censored data, i.e., 30% and 50% of censorship we considered tc=9.3 and tc=4.8 , respectively. The simulation study is performed by considering ϑ=(2,4) , N=100,000 and n=(20,25, , 100) . To achieve the MLEs considering censored data the maxLik package to maximize the log-likelihood function (15). Different initial values were considered which returned the same estimates. Figure 4 shows the MREs, RMSEs and the coverage probabilities from the estimates of α and λ obtained by using the MC method where α=4 , λ=2 .

The results indicate that the MLE of α has a systematic bias, as shown in Figure 4. On the other hand, from the Bayes estimators, we have observed more accurate results. The coverage probabilities are also close to the nominal levels as the MLE. Therefore, the Bayes estimators are also recommended to perform inference for the parameters of the WH distribution in the presence of censoring.

6 - PREDICTION ANALYSIS

In many application, we may be interested in identifying the time to the next event (failure). In these contexts, predicting future observations plays an important role, especially, when costs are linked with the events. In order to provide a predictive analysis of future events, we consider the Bayesian prediction with WH distribution using observed order statistics.

Several authors have performed prediction analysis from the Bayesian point of view (see, for instance, Pradhan and Kundu (2011), Kundu and Raqab (2012), Asgharzadeh et al. (2015)) for other distributions and the predictive distribution of future failure times and its respective credibility intervals.

We denote the order statistics of T1,,Tm by T(1)<<T(m) . Let T(m+1)<<T(n) be the unobserved future sample and t(m) denote the m -th order statistic. From the Markov property of the conditional order statistics, we have

fTm+k(y|𝐭)=fTm+k|Tm(y|𝐭)=(nm)!f(y)(F(y)F(tm))k1(1F(y))nmk(k1)!(nmk)!(1F(tm))nm, (32)

for y>t(m) . After some algebra we have

fTm+k(y|𝐭)=3(nm)!(k1)!(nmk)!y3α1(αλ)αexp(αλy3)×(γ(α,αλy3)γ(α,αλtm3))k1(Γ(α,αλy3))nmk(Γ(α,αλt(m)3))nm. (33)

The posterior predictive density of T(m+k) given 𝐭 is given by

pTm+k(y|𝐭)=00fTm+k(y|𝐭)p(λ,α|𝐭,𝛅)dλdα,

where p(λ,α|𝐭,𝛅) is given in (31). Consequently, the predictive density of T(m+k) according to y>t(m) is given by

fTm+k*(y|𝐭)=00fTm+k|Tm(y|𝐭)p(λ,α|𝐭,𝛅)dλdα. (34)

We also should note here that one striking result of the Bayesian predictive approach is the advantage in constructing a credibility interval for T(m+k) using the MCMC techniques.

7 - REAL-DATA APPLICATIONS

We examined the adjust of the proposed model on three real datasets to emphasize its flexibility and compare its performance with other distributions. A closer inspection of the following figures shows that the WH distribution exhibits best fit for each dataset when compared to commonly used distributions. In order to discriminate the best fit, we considered some key performance measures, namely, the Akaike information criterion (AIC) given by AIC=2l(ϑ̂;x)+2p, the Corrected Akaike information criterion (AICc) given by AICc=AIC+[2p(p+1)]/np1, and the Bayesian information criterion (BIC) given by BIC=plnn2l(ϑ̂;x), where p is the dimension of ϑ and ϑ̂ is the estimate of the parameters and l(ϑ̂;x) is the logarithm of the likelihood function given in (8) and (14).

7.1 - LIFE TEST IN ELECTRICAL APPLIANCES

Table I presents a well-known dataset available in Lawless (2011), the number of 1000s of cycles to failure for a group of 60 electrical appliances in a life test, whose data are uncensored.

TABLE I Failure times in 60 electrical appliances.

 0.014 0.034 0.059 0.061 0.069 0.080 0.123 0.142 0.381 0.464 0.479 0.556 0.574 0.839 0.917 0.969 1.088 1.091 1.174 1.27 1.275 1.355 1.397 1.477 1.702 1.893 1.932 2.001 2.161 2.292 2.326 2.337 2.811 2.886 2.993 3.122 3.248 3.715 3.790 3.857 4.106 4.116 4.315 4.51 4.580 5.267 5.299 5.583 0.165 0.21 0.991 1.064 1.578 1.649 2.628 2.785 3.912 4.1 6.065 9.701

The results summarized in Table II show that the WH distribution is the best model since resulted in the lowest values of the different discrimination methods.

TABLE II Results of AIC, AICc and BIC criteria for all fitted distributions considering the dataset related to the failure time of 60 electrical appliances.

Criteria WH Weibull Gamma Lognormal Logistic
AIC 215.26 218.23 218.02 237.13 248.95
AICc 215.47 218.44 218.23 237.34 249.16
BIC 219.45 222.42 222.21 241.32 253.14

In order to discriminate the best fit, we considered the AIC, AICc, and BIC available in Table II. For the three criteria, we observed that the proposed model has smaller values, indicating a better fit for the WH distribution.

Table III provides the summary statistics: the MAP estimates, standard-deviations (SD) and 95% credibility intervals (CI) for α , λ and y* (future failure time). We can obtain the MAP estimates of the parameters λ and α from (26) and (28), respectively. In order to predict the future failures of an item, as described in Section 6, we considered the posterior predictive density of Tm+k from (34).

TABLE III MAPs, SDs and 95% CIs for α, λ and y* related to the WH distribution.

ϑ MAP SD CI95%ϑ
α 0.2132 0.0306 (0.1593; 0.2772)
λ 43.7261 15.7220 (28.3993; 87.6349)
y* 9.9075 0.6349 (9.7185; 12.1094)

As can be seen from Table I, the last failure was observed at 9.701 cycles. Then, the last line of Table III shows the prediction of the 61st failure will be at y*=9.908 cycles with 95% predictive interval given by (9.7185; 12.1094).

7.2 - FAILURE TIME DATA OF ELECTRICAL COMPONENTS

The following datasets motive the application of WH distribution in the presence of censored data. The first dataset concerns electrical failures in the components from agricultural machines. The follow-up time was a harvesting season. In this period, the electrical components have failed 29 times collected in 2017. The data are shown in Table IV.

TABLE IV Dataset related to the agricultural machines with failure in electrical components (+ indicates censoring).

 1 1 1 2 2 2 2 4 6 8 8 9 11 12 15 21 21 21 21 23 24 27 29 31 36 39 41 45 46 47+ 47+

In order to discriminate the best fit, we used the AIC, AICc, and BIC (see Table V). In all of them, the proposed model has smaller values, indicating a better fit.

TABLE V Results of AIC, AICc and BIC criteria for all fitted distributions considering the dataset related to the agricultural machine’s components electrical problems.

Criteria WH Weibull Gamma Lognormal Logistic
AIC 236.57 238.01 237.80 241.34 255.41
AICc 237.03 238.47 238.26 241.80 255.87
BIC 239.31 240.74 240.53 244.00 258.14

Table VIprovides the summary statistics: the MAP estimates, SDs and 95% CIs for α , λ and y* (future failure time). As can be seen from the Table IV the last failure was observed on the 46th day. Then from the last line of Table VI we can see the prediction of the 47th failure that will be at 50.13 50.13 days, with a 95% predictive interval given by (47.27; 78.25).

TABLE VI MAPs, SDs and 95% CIs for α, λ and y* related to the WH distribution.

ϑ MAP SD CI95%ϑ
α 0.2239 0.0480 ( 0.1521; 0.3435)
λ 27733.5 7654.4 (12949.92; 42160.68)
y* 50.1313 8.3780 (47.2707; 78.2493)

Turning now to another type of failure, Table VII presents the time up to corrective maintenance of the agricultural machine. In this case, correctly predicting the next maintenance is of main interest in order to reduce costs.

TABLE VII Dataset related to the agricultural machine’s corrective maintenance (+ indicates censoring).

 1 1 1 1 1 1 1 2 2 3 3 3 3 3 4 4 4 4 4 4 4 5 5 5 5 5 5 5 5 5 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 7 7 7 7 7 7 7 7 7 7 7 7 7 7 8 8 8 8 8 8 8 8 8 8 8 9 9 9 9 9 11 11 11 11 11 11 11 11 13 13+ 13+

From the results presented in Table VIII we observe from the AIC, AICc and BIC that the WH distribution returned better fit for the dataset.

TABLE VIII Results of AIC, AICc and BIC criteria for all fitted distributions considering the dataset related to the agricultural machine’s revision.

Criteria WH Weibull Gamma Lognormal Logistic
AIC 441.29 442.52 451.37 469.56 442.29
AICc 441.43 442.66 451.51 469.70 442.43
BIC 446.26 447.50 456.34 474.53 447.27

Table Table IXshows the MAPs, standard deviations, and 95% credible intervals for the parameters of the WH distribution. From Table VII the last failure was observed at 13 days. Applying the proposed methodology, the prediction of the 31st maintenance will be at 13.37 days with 95% predictive interval given by (13.03;16.83).

TABLE IX MAPs, SDs and 95% CIs for α, λ and y* related to agricultural machine’s revision dataset.

ϑ MAP SD CI95%ϑ
α 0.6309 0.0793 (0.4892; 0.7949)
λ 423.5057 60.1510 (332.7807; 564.9964)
y* 13.3713 1.0384 (13.0293; 16.8254)

Figure 5presents the reliability function fitted by different distributions and the Kaplan-Meier curve for the three data sets. In all cases, the WH distribution provides an improved fit for the datasets.

Therefore, the practical importance of the WH distribution is observed for these datasets, since it provides a better fitting in comparison with other essential distributions, allowing to conduct prevision for the next failures.

8 - CONCLUSIONS

In this paper, we revisit the WH distribution and presented mathematical properties of this important distribution, which can be used in situations where the data present bathtub or increasing hazard rate. Initially, the parameter estimators and their asymptotic intervals were explored using the maximum likelihood theory under complete and censored data. Further, we considered an objective Bayesian analysis with an overall reference prior in order to obtain improved estimators, which outperform the ones obtained via maximum likelihood. It is proved that the reference posterior distribution is proper when n1 and the resulting marginal posterior intervals have accurate frequentist coverage. Moreover, we have introduced MAP estimators for the parameters of WH distribution that have closed-form expressions. A simulation study revealed the superiority of the Bayes estimators.

A Bayes prediction of the future failure time was also presented based on observed order statistics. The proposed methodology was applied to three data sets associated with failure time. Many possible extensions of this current work can be considered. This distribution is a promising distribution to be used in studies involving recurrent event data. In this case, we can consider a rate model from a nonhomogeneous Poisson process, where the parametric baseline rate function is a WH rate function. Further, classical point prediction could be studied for this model, for example, maximum likelihood prediction, best-unbiased predictor, conditional median prediction along with prediction interval using Pivotal method and highest conditional density (HCD) method. Our approach should be investigated further in this context.

ACKNOWLEGMENTS

The authors are very grateful to the Editor and the three reviewers for their helpful and useful comments that improved the manuscript. This study was financed in part by the Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES) and Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq). Pedro L. Ramos is grateful to the Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP, Proc. 2017-25971-0).

REFERENCES

ASGHARZADEH A, VALIOLLAHI R and KUNDU D. 2015. Prediction for future failures in weibull distribution under hybrid censoring. J Stat Comput Simul 85(4): 824-838. [ Links ]

BERGER JO et al. 2015. Overall objective priors. Bayesian Anal 10(1): 189-221. [ Links ]

BERNARDO JM. 1979. Reference posterior distributions for bayesian inference. J Royal Stat Soc Series B, p. 113-147. [ Links ]

BERNARDO JM. 2005. Reference analysis. Handbook of Statistics 25: 17-90. [ Links ]

COX DR and REID N. 1987. Parameter orthogonality and approximate conditional inference. J Royal Stat Society Series B 49(1): 1-18. [ Links ]

CRAIN BR and MORGAN RL. 1975. Asymptotic normality of the posterior distribution for exponential models. Ann Stat, p. 223-227. [ Links ]

DEY S, MAZUCHELI J and NADARAJAH S. 2018. Kumaraswamy distribution: different methods of estimation. J Comput Appl Math 37(2): 2094-2111. [ Links ]

FOLLAND GB. 1999. Real analysis: modern techniques and their applications. Wiley, New York, 2nd edition. [ Links ]

ISHIKAWA T, TOTANI T, NISHIMICHI T, TAKAHASHI R, YOSHIDA N and TONEGAWA M. 2014. On the systematic errors of cosmological-scale gravity tests using redshift-space distortion: non-linear effects and the halo bias. Mon Notices Royal Astron Soc 443(4): 3359-3367. [ Links ]

KHODABIN M and AHMADABADI A. 2010. Some properties of generalized gamma distribution. Math Sci 4(1): 9-28. [ Links ]

KUNDU D and RAQAB MZ. 2012. Bayesian inference and prediction of order statistics for a type-ii censored Weibull distribution. J Stat Plan Inference 142(1): 41-47. [ Links ]

LAWLESS JF. 2011. Statistical models and methods for lifetime data. Volume 362, J Wiley \& Sons. [ Links ]

LOUZADA F, RAMOS PL and NASCIMENTO D. 2018. The inverse nakagami-m distribution: A novel approach in reliability. IEEE Trans Rel (99): 1-13. [ Links ]

LOUZADA F, RAMOS PL and RAMOS E. 2019. A note on bias of closed-form estimators for the gamma distribution derived from likelihood equations. Am Stat, p. 1-8. [ Links ]

PRADHAN B and KUNDU D. 2011. Bayes estimation and prediction of the two-parameter gamma distribution. J Stat Comp Simul 81(9): 1187-1198. [ Links ]

RAMOS PL, ACHCAR JA, MOALA FA, RAMOS E and LOUZADA F. 2017. Bayesian analysis of the generalized gamma distribution using non-informative priors. Statistics 51(4): 824-843. [ Links ]

RAMOS PL, LOUZADA F and RAMOS E. 2018. Posterior properties of the nakagami-m distribution using noninformative priors and applications in reliability. IEEE Trans Rel 67(1): 105-117. [ Links ]

STACY EW. 1962. A generalization of the gamma distribution. Ann Math Stat, p. 1187-1192. [ Links ]

TEIMOURI M, HOSEINI SM and NADARAJAH S. 2013. Comparison of estimation methods for the weibull distribution. Statistics 47(1): 93-109. [ Links ]

TIBSHIRANI R. 1989. Noninformative priors for one parameter of many. Biometrika 76(3): 604-608. [ Links ]

WILSON EB and HILFERTY MM. 1931. The distribution of chi-square. Proc Natl Acad Sci 17(12): 684-688. [ Links ]

ZEMZAMI M and BENAABIDATE L. 2016. Improvement of artificial neural networks to predict daily streamflow in a semi-arid area. Hydrol Sci J 61(10): 1801-1812. [ Links ]

Received: January 02, 2019; Accepted: June 10, 2019

Correspondence to: Pedro Luiz Ramos E-mail: pedrolramos@usp.br ORCid: https://orcid.org/0000-0002-5387-2457

MSC 2010 subject classifications: 62F15, 62F10.

AUTHOR CONTRIBUTIONS

PLR conceived the proposed idea and developed the theory, performed the computations, and contributed with the text writing. MPA developed the theory, deal with the applications to real data and collaborated with the text writing. VLDT developed the theory, performed the computations and contributed with the text writing. LF conceived the proposed idea, supervised the findings of this work and contributed with the text writing. All authors discussed the results and contributed to the final manuscript.