Improved Bayes estimators and prediction for the Wilson-Hilferty distribution

PEDRO L. RAMOS MARCO P. ALMEIDA VERA L.D. TOMAZELLA FRANCISCO LOUZADA About the authors

Abstract

Abstract: In this paper, we revisit the Wilson-Hilferty distribution and presented its mathematical properties such as the r-th moments and reliability properties. The parameters estimators are discussed using objective reference Bayesian analysis for both complete and censored data where the resulting marginal posterior intervals have accurate frequentist coverage. A simulation study is presented to compare the performance of the proposed estimators with the frequentist approach where it is observed a clear advantage for the Bayesian method. Finally, the proposed methodology is illustrated on three real datasets.

Key words
Bayesian analysis; Bayesian prediction; censored data; objective prior; Wilson-Hilferty distribution.


1 - INTRODUCTION

In this paper, we revisit the Wilson-Hilferty (WH) distribution, showing that this distribution is appropriate for modeling data with increasing and bathtub hazard rates. This model takes the name of a fundamental statistical technique given by Wilson and Hilferty (1931)WILSON EB and HILFERTY MM. 1931. The distribution of chi-square. Proc Natl Acad Sci 17(12): 684-688., the so-called Wilson-Hilferty transformation. This procedure gives a normal approximation to the cube root of a chi-squared variable, and it is essential for a wide range of scientific processes. Ishikawa et al. (2014)ISHIKAWA T, TOTANI T, NISHIMICHI T, TAKAHASHI R, YOSHIDA N and TONEGAWA M. 2014. On the systematic errors of cosmological-scale gravity tests using redshift-space distortion: non-linear effects and the halo bias. Mon Notices Royal Astron Soc 443(4): 3359-3367. considered such transformation to improve the accuracy of likelihood analysis in the cases where only a small number of modes are available in power spectrum measurements. Zemzami and Benaabidate (2016)ZEMZAMI M and BENAABIDATE L. 2016. Improvement of artificial neural networks to predict daily streamflow in a semi-arid area. Hydrol Sci J 61(10): 1801-1812. considered the WH transformation to reduces the effect of local variations in artificial neural networks. Here, we are not introducing a new distribution but providing common mathematical functions and inferential procedures related to this vital model to be used in different frameworks.

For the WH distribution a critical reparametrization is presented. This approach returns a simple probability density function (PDF) where its parameters are orthogonal in the sense discussed by Cox and Reid (1987)COX DR and REID N. 1987. Parameter orthogonality and approximate conditional inference. J Royal Stat Society Series B 49(1): 1-18.. Besides, we considered the presence of randomly censored data, since it has received particular attention in medical experiments and industrial lifetime testing. To the best of our knowledge, there is no evidence of the use of the WH distribution to describe time-to-event data and reliability applications so far. Therefore, these results are useful contributions to be used by reliability engineers and practitioners of statistical analysis of lifetime data in general.

Due to the limited number of data that are usually observed in reliability application, we consider an objective Bayesian analysis to achieve inference that does not depend on asymptotic results such as the frequentist inference. In this case, a Bayesian approach using an overall reference prior is presented. It is proved that such prior is also a matching prior, i.e., the resulting marginal posterior intervals have accurate frequentist coverage (Tibshirani 1989TIBSHIRANI R. 1989. Noninformative priors for one parameter of many. Biometrika 76(3): 604-608.). Moreover, the resulting posterior is proper and has interesting properties, such as one-to-one invariance, consistent marginalization, and consistent sampling properties.

Moreover, we also propose efficient closed-form maximum a posteriori probability (MAP) estimators for both parameters in the case of complete data, which is essential for practical purposes, since they can be applied for real-time computing estimators in embedded technology. An intensive simulation study is presented to show the usefulness of the Bayesian approach. Finally, a Bayes prediction of the future failure time is presented based on observed order statistics. These results are applied in three lifetime data to demonstrate how the WH distribution can be applied in real situations.

The remainder of this study is organized as follows. Section 2 presents some mathematical properties of the WH distribution. In Section 3, we present the parameter estimates for the model based on Maximum Likelihood Estimator (MLE) for complete and censored data. Section 4 presents the Bayesian estimators for the distribution. Section 5 is devoted to present simulation studies to compare the performance of the estimators. The predictive modeling for the WH distribution is discussed in Section 6. In Section 7, we prove the relevance of the distribution by considering three real lifetime datasets. Finally, Section 8 summarizes the present study.

2 - WILSON-HILFERTY DISTRIBUTION

Let T be a non-negative random variable with WH distribution, then its PDF is

f ( t | ϑ ) = 3 Γ ( α ) ( α λ ) α t 3 α 1 exp ( α λ t 3 ) , (1)

where α>0 and λ>0 are shape and scale parameters, respectively, and ϑ=(α,λ) , hereafter, we will assume that ϑ represents these parameters. This model is a particular case of a three-parameter generalized gamma (GG) distribution proposed by Stacy (1962)STACY EW. 1962. A generalization of the gamma distribution. Ann Math Stat, p. 1187-1192. (replacing 3 by p>0 in equation (1)) that unifies the WH distribution with other important models such as, Weibull distribution, gamma, Nakagami, and the lognormal distribution, to list a few.

The cumulative function of T is given by

F ( t | ϑ ) = 1 Γ ( α ) γ ( α , α λ t 3 ) ,

where γ(y,x)=0xwy1ewdw is known as the lower incomplete gamma function.

The raw moments of the model are given as

E ( T r | ϑ ) = Γ ( α + r / 3 ) Γ ( α ) ( λ α ) r / 3 , for r , (2)

which can also be obtained using a reparametrization in the r-moment of the GG distribution given in Khodabin and Ahmadabadi (2010)KHODABIN M and AHMADABADI A. 2010. Some properties of generalized gamma distribution. Math Sci 4(1): 9-28.. Hence, the mean and variance of (1) are, respectively, given by

E ( T | ϑ ) = Γ ( α + 1 / 3 ) Γ ( α ) ( λ α ) 1 3 and (3)
V a r ( T | ϑ ) = λ ( 1 ( Γ ( α + 1 / 2 ) Γ ( α ) ) 2 ) . (4)

2.1 - QUANTITATIVE METHODS FOR THE RELIABILITY

Throughout this subsection, we present some crucial measures in many areas of science, mainly in reliability engineering. Such quantitative measures for the reliability are: Reliability Function, Failure Rate Function and Mean Residual Life.

2.1.1 - Reliability function and hazard rate function

The reliability function of an observational unity is given by

R ( t | ϑ ) = 1 Γ ( α ) Γ ( α , α λ t 3 ) ,

where Γ(y,x)=xwy1ewdw is known as the upper incomplete gamma function. In general applications, R(t) means the probability that an observation does not fail in the time interval (0,t] .

When dF(t)dt=f(t) exists, the hazard function of a component is obtained by considering h(t|ϑ)=f(t|ϑ)/R(t|ϑ) . So, for the WH distribution, the hazard function is given by

h ( t | ϑ ) = 3 ( α λ ) α t 3 α 1 exp ( α λ t 3 ) Γ ( α , α λ t 3 ) 1 . (5)

A considerable attribute of the function (5) is that it presents three distinct phases, i.e., decreases first, then remains roughly constant and ultimately increases. This result is proved in Theorem 2.1.

Theorem 2.1. For the WH distribution the hazard function h(t|ϑ) is bathtub (increasing) shaped for 0<α<13 (α13) , for all λ>0 .

Proof. Consider that

ω ( t | ϑ ) = d d t log f ( t | ϑ ) = ( 3 α 1 ) t 3 α t λ . (6)

Note that for α13 and λ>0 , ω(t|ϑ) has a decreasing behavior, then from Lemma 2.1.2 presented in Ramos et al. (2018)RAMOS PL, LOUZADA F and RAMOS E. 2018. Posterior properties of the nakagami-m distribution using noninformative priors and applications in reliability. IEEE Trans Rel 67(1): 105-117., h(t|ϑ) is increasing. On the other hand, for 0<α<13 and λ>0 , we have that ω(t|ϑ) has bathtub shape property with a global minimum at t*=λλ3α . Hence, h(t|ϑ) is also bathtub shaped.

The behavior of the hazard function (5) when t0 and t is given, respectively, by

h ( 0 | ϑ ) = { , if α < 1 3 3 λ , if α = 1 3 0 , if α > 1 3 and h ( | ϑ ) = .

2.1.2 - Mean residual life function

The mean residual life (MRL) is an essential measure in reliability and survival analysis applications and represents the expected additional lifetime given that a component has survived until time tt , i.e., it summarizes the entire remaining life from components or systems. This measure gives substantial information for maintenance strategies involving repair and replacement.

Proposition 2.2. The mean residual life function r(t|ϑ) of the WH distribution is given by

r ( t | ϑ ) = 1 R ( t | ϑ ) t y f ( y | ϑ ) ) d y t = ( λ α ) 1 3 ( Γ ( α + 1 3 , α λ t 3 ) Γ ( α , α λ t 3 ) ) t . (7)

The behaviors of the MRL function in (7) when t0 and t are, respectively

r ( 0 | ϑ ) = ( λ α ) 1 3 ( Γ ( α + 1 3 ) Γ ( α ) ) and r ( | ϑ ) = 1 h ( | ϑ ) = 0 .

Theorem 2.3. The MRL function r(t|ϑ) of the WH distribution has either a unimodal or decreasing shape when 0<α<13 , or α13 , respectively, for all λ>0 .

Proof. If α13 and λ>0 , we have that h(t|ϑ) is increasing. In this case, using the Lemma II.4 presented by Ramos et al. (2018), r(t|ϑ) is decreasing. Now, if λ>0 and 0<α<13 , then h(t|ϑ) has a bathtub shape and since h(0)r(0)>1 , from the same Lemma II.4 used above, we have that r(t|ϑ) has a unimodal shape property.

Figure 1 presents different shapes of the hazard and the mean residual life function. The hazard function is characterized by three distinct regions according to a bathtub curve of the hazard rate. The assessment of the impact of each region it is essential to improve the reliability of the components and systems by preventive maintenance or to eliminate early-life failures.

Figure 1
Hazard and mean residual life function shapes for WH distribution considering different values of α and λ .

3 - CLASSICAL INFERENCE

3.1 - COMPLETE DATA

In this section, we use the maximum likelihood method to obtain the classical estimators of the unknown parameters of the WH distribution which may be obtained by direct maximization of the likelihood function. Let T1,,Tn be a random sample such that TiiidWH(λ,α) , then, the likelihood function from (1) is given by

L ( ϑ | 𝐭 ) = 3 n Γ ( α ) n ( α λ ) n α { i = 1 n t i 3 α 1 } exp ( α λ i = 1 n t i 3 ) . (8)

The estimates are obtained by maximizing the likelihood function. From the expressions λlogL(ϑ|𝐭)=0 and αlogL(ϑ|𝐭)=0 , the likelihood equations are given by

n ( 1 + log ( α λ ) ) n ψ ( α ) + 3 i = 1 n log ( t i ) = 1 λ i = 1 n t i 3 (9)

and

n α λ + α λ 3 i = 1 n t i 3 = 0 , (10)

where ψ(k)=klogΓ(k)=Γ(k)Γ(k) is the digamma function. The MLE of λ̂ is given by

λ ̂ = 1 n i = 1 n t i 3 . (11)

Substituting λ̂ in (9), the MLE of α̂ can be obtained by solving

ξ ( α ; 𝐭 ) = log ( 1 n i = 1 n t i 3 ) 3 n i = 1 n log ( t i ) log ( α ) + ψ ( α ) . (12)

Theorem 3.1. Suppose that λ̂=1ni=1nti3 is the MLE of λ . Hence, the root of ξ(α;𝐭)=0 , α̂ , is unique.

Proof. Since log(α)ψ(α) is strictly monotone and continuous with range in (,0), we have that for α>0 there is a unique solution in

log ( α ) ψ ( α ) = log ( 1 n i = 1 n t i 3 ) 1 n i = 1 n log ( t i 3 ) ,

and the proof is completed.

The MLEs are asymptotically normally distributed with a bivariate normal distribution given by

( α ̂ M L E , λ ̂ M L E ) N 2 [ ( α , λ ) , I 1 ( α , λ ) ) ] for n ,

where I(α,λ) is the Fisher information matrix

I ( α , λ ) = n [ ( α ψ ( α ) 1 ) α 0 0 α λ 2 ] , (13)

and ψ(k)=kψ(k) is the trigamma function. These results are useful to construct asymptotic confidence intervals for the parameters of the WH distribution.

3.2 - CENSORED DATA

A fundamental problem that one meets with lifetime data is that available data are a mixture of complete and incomplete observations due to many reasons. The case most easily encountered of incompleteness is random censoring (for more details see Lawless 2011LAWLESS JF. 2011. Statistical models and methods for lifetime data. Volume 362, J Wiley \& Sons.). In this type of censoring there are the follow random variables, the lifetime Ti related to i th individual and the censoring time Ci , such that data observed are (ti,δi) , where ti=min(Ti,Ci) and the censoring indicator δi=I(TiCi) , resulting 1 if the lifetime is observed and 0 otherwise. This type of censoring has as special cases the type I and II censoring mechanisms. We assume that the random censoring times Ci s are independent of Ti s. In this case, the likelihood function for ϑ is given by L(ϑ,𝐭)=i=1nf(ti|ϑ)δiR(ti|ϑ)1δi .

Letting the failure times T1,,Tn be a random sample of WH distribution, the likelihood function considering data with random censoring is given by

L ( α , λ | 𝐭 , 𝛅 ) = 3 d Γ ( α ) n ( α λ ) d α exp ( α λ i = 1 n δ i t i 3 ) × i = 1 n { t i ( 3 α + 1 ) δ i Γ ( α , α λ t i 3 ) 1 δ i } , (14)

where d=i=1nδi . The logarithm of the likelihood function (14) is given by

l ( α , λ | 𝐭 , 𝛅 ) = d log ( 3 ) + d α log ( α λ ) + ( 3 α 1 ) i = 1 n δ i log ( t i ) n log ( Γ ( α ) ) + i = 1 n ( 1 δ i ) log ( Γ ( α , α λ t i 3 ) ) α λ i = 1 n δ i t i 3 , (15)

From l(α,λ|𝐭,𝛅)/α=0 and l(α,λ|𝐭,𝛅)/λ=0 , the likelihood equations are given as follows

d log ( α λ ) + d + 3 i = 1 n δ i log ( t i ) + i = 1 n ( 1 δ i ) Ψ 1 ( α , λ t i 3 ) n ψ ( α ) = 1 λ i = 1 n δ i t i 3 (16)
i = 1 n ( 1 δ i ) Ψ 2 ( α , λ t i 3 ) + i = 1 n δ i α λ 3 t i 3 d α λ = 0 , (17)

where Ψ 1(a,b)= alog Γ (a,ab) and Ψ 2(a,b)= blog Γ (a,ab) can be computed numerically. Numerical methods are required to find the solution of these non-linear equations.

4 - BAYESIAN INFERENCE

In this section, we considered a Bayesian approach to obtain the parameter estimators of the WH distribution. Our interest is to find accurate estimates and prediction (condition/performance in the future). These problems are often challenging to deal with when using classical inference methods. In particular, our setting presents many interest quantities, and we are simultaneously interested in all the parameters of the model and several functions of them.

One may consider the Jeffreys prior, which is obtained by taking the square root from the determinant of the Fisher information matrix in (13), i.e.,

π J ( λ , α ) α ψ ( α ) 1 λ . (18)

On the other hand, the Jeffreys prior may not be a good choice in the multiparametric case Bernardo (2005)BERNARDO JM. 2005. Reference analysis. Handbook of Statistics 25: 17-90.. Moreover, as it will be discussed further, this prior is not a matching prior for both parameters, i.e., the marginal posterior intervals have not accurate frequentist coverage in a sense discussed by Tibshirani (1989). Another well-known class of non-informative priors is then considered, the so-called reference priors Bernardo (1979)BERNARDO JM. 1979. Reference posterior distributions for bayesian inference. J Royal Stat Soc Series B, p. 113-147.. The main idea for obtaining such prior is to maximize the expected Kullback-Leibler divergence between the posterior distribution and the prior distribution. The obtained reference prior provides posterior distributions with interesting properties such as invariance, consistency under marginalization and consistent sampling properties.

In cases where the Fisher information matrix has orthogonal parameters, the following Lemma (see Berger et al. 2015) can be used to easily obtain a one-at-a-time reference prior for any chosen parameter of interest and any ordering of the nuisance parameters in the derivation (hereafter referred to as overall reference prior).

Lemma 4.1. Berger et al. (2015)BERGER JO et al. 2015. Overall objective priors. Bayesian Anal 10(1): 189-221.. Considering the unknown parameters ϑ=(ϑ1,ϑ2) and the posterior distribution p (ϑ1,ϑ2|t) which is asymptotically normally distributed with dispersion matrix S(ϑ1,ϑ2)=I1(ϑ1,ϑ2) . If I(ϑ1,ϑ2) is of the form

I ( ϑ 1 , ϑ 2 ) = diag ( f 1 ( ϑ 1 ) g 1 ( ϑ 2 ) , f 2 ( ϑ 2 ) g 2 ( ϑ 1 ) ) ,

where fi() and gi() are positive functions of ϑi for i=1,2i=1,2 , then the overall reference prior is given by

π R ( ϑ 1 , ϑ 2 ) = f 1 ( ϑ 1 ) f 2 ( ϑ 2 ) .

Consider the posterior distribution with dispersion matrix S(α,λ)=I1(α,λ) is given by

p ( ϑ | 𝐭 ) 1 Γ ( α ) n ( α λ ) n α { i = 1 n t i 3 α } exp ( α λ i = 1 n t i 3 ) .

Since ti3,i=1,,n are constant terms supose that yi=ti3 . Then, we have that

p ( ϑ | 𝐭 ) 1 Γ ( α ) n ( α λ ) n α { i = 1 n y i α } exp ( α λ i = 1 n y i ) .

The posterior above has the same form of the posterior distribution of the gamma model reparametrized by its mean. Crain and Morgan (1975)CRAIN BR and MORGAN RL. 1975. Asymptotic normality of the posterior distribution for exponential models. Ann Stat, p. 223-227. proved that for the exponential family the posterior distribution tends asymptotically to the multivariate normal. Since our model is included in this case, Lemma 4.1 can be used to derive the overall referente prior.

The overall reference prior for the WH distribution is given by

π R ( α , λ ) 1 λ α ψ ( α ) 1 α . (19)

An interesting aspect of this overall reference prior is that such prior satisfies the solutions of both partial differential equations

{ λ { ι 12 π ι 22 ( ι 11 ι 12 2 / ι 22 ) 1 2 } α { π ( ι 11 ι 12 2 / ι 22 ) 1 2 } = 0 α { ι 12 π ι 11 ( ι 22 ι 12 2 / ι 11 ) 1 2 } λ { π ( ι 22 ι 12 2 / ι 11 ) 1 2 } = 0

where ιij(ϑ),i,j=1,2 , are the elements of the Fisher information matrix given in (13). This implies that the credible interval for ϑi,i=1,2 has a coverage error OO ( n^{-1}n1 ) in the frequentist sense, i.e.,

P [ ϑ i ϑ i 1 a ( π ; X ) | ϑ j ] = 1 a O ( n 1 ) ,

where ϑi1a(π;X)|ϑj refers to the (1a) th quantile of the posterior distribution of ϑi . It is important to point out that the Jeffreys prior only satisfies the second partial differential, therefore, such prior is not a matching prior for both parameters, which is undesirable.

4.1 - COMPLETE DATA

The joint posterior distribution for α and λ , produced by the overall reference prior is proportional to the product of the likelihood function (8) and the prior distribution (19), resulting in

p ( α , λ | 𝐭 ) α ψ ( α ) 1 α 1 2 λ Γ ( α ) n ( α λ ) n α { i = 1 n t i 3 α 1 } exp ( α λ i = 1 n t i 3 ) .

The normalizing constant is given by

d ( 𝐭 ) = 𝒜 α ψ ( α ) 1 α 1 2 λ Γ ( α ) n ( α λ ) n α { i = 1 n t i 3 α 1 } exp ( α λ i = 1 n t i 3 ) d ϑ ,

and 𝒜={(0,)×(0,)} is the parameter space for ϑ . The posterior distribution (20) will be proper if and only if d(𝐭)< .

Theorem 4.2. The posterior distribution (20) is proper if and only if n1 .

Proof. Since αψ(α)1α12λΓ(α)n(αλ)nα{i=1nti3α1}exp(αλi=1nti3)0 , by Tonelli theorem (see Folland 1999) we haveFOLLAND GB. 1999. Real analysis: modern techniques and their applications. Wiley, New York, 2nd edition.

d ( 𝐭 ) = 𝒜 α ψ ( α ) 1 α 1 2 λ Γ ( α ) n ( α λ ) n α { i = 1 n t i 3 α 1 } exp ( α λ i = 1 n t i 3 ) d ϑ = 0 0 α ψ ( α ) 1 α 1 2 λ Γ ( α ) n ( α λ ) n α { i = 1 n t i 3 α 1 } exp ( α λ i = 1 n t i 3 ) d λ d α = 0 α ψ ( α ) 1 α 1 2 Γ ( n α ) Γ ( α ) n ( i = 1 n t i ) 3 α ( i = 1 n t i 3 ) n α d α = s 1 ( 𝐭 ) + s 2 ( 𝐭 )

where

s 1 ( 𝐭 ) = 0 1 α ψ ( α ) 1 α 1 2 Γ ( n α ) Γ ( α ) n ( i = 1 n t i ) 3 α ( i = 1 n t i 3 ) n α d α

and

s 2 ( 𝐭 ) = 1 α ψ ( α ) 1 α 1 2 Γ ( n α ) Γ ( α ) n ( i = 1 n t i ) 3 α ( i = 1 n t i 3 ) n α d α .

Ramos et al. (2018) already proved in the context of the Nakagami-m distribution that

lim α 0 + α ψ ( α ) 1 α 1 2 = 1 , lim α α ψ ( α ) 1 α 1 2 = 1 2 and
Γ ( n α ) Γ ( α ) n α n 1 for α ( 0 , c ] and Γ ( n α ) Γ ( α ) n n n α α n 1 2

for α[c,) and for any c>0c>0 . Therefore, we have that

s 1 ( 𝐭 ) = 0 1 α ψ ( α ) 1 α 1 2 Γ ( n α ) Γ ( α ) n ( i = 1 n t i ) 3 α ( i = 1 n t i 3 ) n α d α 0 1 α 1 2 α 1 2 α n 1 ( i = 1 n t i ) 3 α ( i = 1 n t i 3 ) n α d α = 0 1 α n 2 e q ( t ) n α d α <

and

s 2 ( 𝐭 ) = 1 α ψ ( α ) 1 α 1 2 Γ ( n α ) Γ ( α ) n ( i = 1 n t i ) 3 α ( i = 1 n t i 3 ) n α d α 1 α 1 2 α 1 2 n n α α n 1 2 ( i = 1 n t i ) 3 α ( i = 1 n t i 3 ) n α d α = 1 α n 3 2 e p ( t ) n α < ,

where p(t)=log(1ni=1nti3i=1nti3n)>0 and q(t)=log(i=1nti3i=1nti3n)>0 by the inequality of the arithmetic and geometric means. Finally, we obtain d(𝐭)=s1(𝐭)+s2(𝐭)< .

The marginal posterior distribution for α is given by

p ( α | 𝐭 ) α ψ ( α ) 1 α Γ ( n α ) Γ ( α ) n ( i = 1 n t i ) 3 α ( i = 1 n t i 3 ) n α . (21)

The conditional posterior distribution for λ is given by

p ( λ | α , 𝐭 ) IG ( n α , α i = 1 n t i 3 ) , (22)

where IG (,) denotes the inverse gamma distribution with PDF f(y|a,b)=baya1exp(by1)/Γ(a) and shape parameter a and scale parameter b .

4.2 - MAXIMUM A POSTERIORI PROBABILITY (MAP) ESTIMATORS

Firstly, we considered the Bayesian approach to derive efficient closed-form estimators for parameters of the generalized gamma distribution when one of the parameters is known. Let W follows a GG distribution with α,λ and ϕ parameters, then, the likelihood function is given by

L ( ϑ 𝐰 ) = φ n Γ ( α ) n ( α λ ) n α { i = 1 n w i φ α 1 } exp ( α λ i = 1 n w i φ ) . (23)

Although different objective priors could be considered such as Jeffreys’ prior or reference priors (Ramos et al. 2017RAMOS PL, ACHCAR JA, MOALA FA, RAMOS E and LOUZADA F. 2017. Bayesian analysis of the generalized gamma distribution using non-informative priors. Statistics 51(4): 824-843.) to perform inference with this generalized model, they depend on polygamma functions which do not allow us to obtain closed-form estimators. On the other hand, they considered the following objective prior

π ( ϑ ) 1 λ c 1 α c 2 φ c 3 , (24)

where ci0,i=1,2,3 are known hyperparameters. From the likelihood function (23) and the prior distribution (24), the joint posterior distribution for ϑ is given by,

p ( ϑ | w ) = 1 d ( w ) φ n c 3 α c 2 λ c 1 Γ ( α ) n ( α λ ) n α { i = 1 n w i φ α } exp ( α λ i = 1 n w i φ ) , (25)

where d(𝐰) is a normalized constant with parameter space 𝒜={(0,)×(0,)×(ϵ,M)} , 0<ϵ<3 is a small constant and M>3 is a large constant. We select (ϵ,M) for the interval of φ as the interest is in the situation where φ =3 . Hence, any interval (ϵ,M) that contains φ =3 is adequate for our purposes.

To achieve the maximum a posteriori probability estimator of ϑ we consider ϑ ^ MAP=argmaxϑ log (π (ϑ |ω )) . Hence, we have that the MAP estimate of λ is given by

λ ̂ = α ̂ i = 1 n w i φ ̂ n α ̂ + c 1 (26)

As can be noted, (26) will be equal to the MLE (11) if and only if c1=0 , i.e, λ is unbiased when φ =3 . Hence we consider only that c1=0 . Moreover, if c1=0 , then (25) is a proper posterior distribution, i.e., d(𝐰)< (see Louzada et al. 2018LAWLESS JF. 2011. Statistical models and methods for lifetime data. Volume 362, J Wiley \& Sons.). The MAP estimate of α is given by

α ̂ = ( n c 3 ) ( 1 λ ̂ i = 1 n w i φ ̂ log ( w i φ ̂ ) + i = 1 n log ( w i φ ̂ ) ) , (27)

and the MAP estimate for φ is obtained by solving the non-linear equation

log ( α ̂ ) ψ ( α ̂ ) = log ( λ ̂ ) 1 n i = 1 n log ( w i φ ̂ ) + c 2 n α ̂ .

For the WH distribution ( ϕ=3 ), the MAP estimator for λ is λ̂MAP=1ni=1nti3 and the estimate of α is obtained from

α ^ MAP = ( n c 3 ) 1 n i = 1 n t i 3 ( 1 n i = 1 n t i 3 i = 1 n log ( t i 3 ) i = 1 n t i 3 log ( t i 3 ) ) (28)

It is possible to show that the asymptotic variance σ̂α2 and σ̂λ2 of α̂MAP and λ̂MAP are given by

σ ̂ α 2 = ( n c 3 ) 2 n 3 ( α 2 + α 3 ψ ( α + 1 ) ) (29)

and

σ ̂ λ 2 = λ 2 n α (30)

4.3 - CENSORED DATA

In the case of censored data, the joint posterior distribution for α and λ , produced by the overall reference prior is given by

p ( α , λ | 𝐭 , 𝛅 ) α ψ ( α ) 1 α 3 d λ Γ ( α ) n ( α λ ) d α i = 1 n { t i 3 α δ i } × × i = 1 n { Γ ( α , α λ t i 3 ) 1 δ i } exp ( α λ i = 1 n δ i t i 3 ) . (31)

To sample from the joint posterior distribution using the Markov Chain Monte Carlo (MCMC) methods it is necessary to consider the conditional distributions for the parameters. The conditional posterior distribution of α is given by

p ( α | λ , 𝐭 , 𝛅 ) 1 Γ ( α ) n α ψ ( α ) 1 α exp ( α λ i = 1 n δ i t i 3 ) × ( α λ ) d α i = 1 n { t i 3 α δ i Γ ( α , α λ t i 3 ) 1 δ i } .

The conditional posterior distribution of λ is given by

p ( λ | α , 𝐭 , 𝛅 ) ( 1 λ ) d α + 1 i = 1 n Γ ( α , α λ t i 3 ) 1 δ i exp ( α λ i = 1 n δ i t i 3 ) .

Since the marginal distributions do not belong to any known parametric family, the Metropolis-Hastings algorithm was considered in order to generate samples from the marginal posterior. The Gamma distribution was used as transition kernel q(ϑi(j)|ϑi(*),b),i=1,2 , for sampling values of ϑi , where b is the parameter that control the rate of acceptance of the algorithm. In our case, we choose b to be equal to one. However, other higher values can also be considered. To increase the convergence of the MCMC method, we could use the closed-form estimators as a good initial value for λ and α .

The Metropolis-Hastings algorithm operates as follows:

  1. 1. Start with an initial value α(1),λ(1) and set the iteration counter j=1;

  2. 2. Generate a random value α(*) from the proposal Gamma(α(j),b) with PDF given by f(ya,b)=baΓ(a)ya1expby and random values u1 and u2 from an independent uniform in (0,1) ;

  3. 3. Evaluate the acceptance probability

η 1 ( α ( j ) , α ( * ) ) = min ( 1 , p ( α ( * ) | λ ( j ) , 𝐭 , 𝛅 ) p ( α ( j ) | λ ( j ) , 𝐭 , 𝛅 ) q ( α ( j ) α ( * ) , b ) q ( α ( * ) | α ( j ) , b ) ) ;
  1. 4. If η1(α(j),α(*))u1 , then α(j+1)=α(*) . Otherwise, α(j+1)=α(j) ;

  2. 5. Generate a random value λ(*) from the proposal Gamma(λ(j),b) and a random value u from an independent uniform in (0,1) ;

  3. 6. Evaluate the acceptance probability

η 2 ( λ ( j ) , λ ( * ) ) = min ( 1 , p ( λ ( * ) | α ( j + 1 ) , 𝐭 , 𝛅 ) p ( λ ( j ) | α ( j + 1 ) , 𝐭 , 𝛅 ) q ( λ ( j ) | λ ( * ) , b ) q ( λ ( * ) | λ ( j ) , b ) ) ;
  1. 7. If η2(λ(j),λ(*))u2 , then λ(j+1)=λ(*) . Otherwise, λ(j+1)=λ(j) ;

  2. 8. Change the counter from j to j+1 and return to step 2 until convergence is reached.

5 - NUMERICAL ANALYSIS

In this section, the proposed estimation methods are investigated via Monte Carlo simulation in order to compare their efficiency. We used two of the most commonly used measures to assess the performance of the estimators, namely, the mean relative error (MRE) and the root-mean-square error (RMSE), given by MREϑi=1Nj=1Nϑ̂i,jϑi and RMSEϑi=1Nj=1N(ϑ̂i,jϑi)2 , for i=1,2, respectively, where N is the number of estimates and ϑ̂i,j,,j=1,,N is the estimates obtained through the MLE and MAP estimators. The MRE and the RMSE of the λ are the same for the different estimation procedures.

Taking into account this approach, the most efficient estimators are those ones yielding MREs closer to one with smaller RMSEs. It is also computed the 95% coverage probability ( CP95% ) of the credibility intervals (CI) obtained from the Bayesian estimators and the confidence intervals from the classical approach for α and λ . Under this approach, the estimators with best coverage probabilities will show the frequencies of intervals that cover the true values of ϑ closer to the nominal level. The samples of size n were generated by assuming TWH(λ,α) . The results were computed by using the software R. Overall, the main aim of this section is to compared the Bayes estimators with the classical approach in order to verify if our proposed estimators return more accurate results. This approach has been considered for other models such as Gamma (Louzada et al. 2019LOUZADA F, RAMOS PL and NASCIMENTO D. 2018. The inverse nakagami-m distribution: A novel approach in reliability. IEEE Trans Rel (99): 1-13.), Weibull (Teimouri et al. 2013TEIMOURI M, HOSEINI SM and NADARAJAH S. 2013. Comparison of estimation methods for the weibull distribution. Statistics 47(1): 93-109.), Kumaraswamy (Dey et al. 2018DEY S, MAZUCHELI J and NADARAJAH S. 2018. Kumaraswamy distribution: different methods of estimation. J Comput Appl Math 37(2): 2094-2111.) and the GG distribution (Ramos et al. 2017), to list a few.

5.1 - COMPLETE DATA

Recall that it was presented a class of MAP estimators for α that depends on c3 . Therefore, before going on with the simulation study, it is necessary to find a value for c3 in which the MRE is closer to one. Figure 2 presents the MREs for αMAP considering different values of c3 , for ϑ=(3,4) and ϑ=(10,4) and n=20 . We observed that a good choice is c3=2.9 . Therefore, we considered such value in (28), where n3 . Since both estimators obtained from the reference posterior and the closed-form Bayes estimators came from the Bayesian approach, they will be referred to as Bayes and CBayes estimators, respectively. To achieve the MLEs, numerical techniques must be used to solve the non-linear equation (12), the uniroot function available in R was considered to find the estimate, and as provided theoretically, we find a unique solution in all situations.

Figure 2
MREs for α considering c3=(2,2.1,,4) , α=3 , λ=4 and n=20 (left panel) α=10 , λ=4 and n=20 (right panel) for N=100,000 simulated samples.

Figure 3 shows the MREs, RMSEs and CPs for the estimates of α and λ obtained by using the Monte Carlo (MC) method where α=(4,0.5) , λ=2 and n=(15,20,,100) . The horizontal lines in the figures correspond to the minimum MREs and RMSEs, equal to one and zero, respectively. Notice that for λ we have the same estimator using the three different estimators for complete data. In this case, both MRE and RMSE are similar. On the other hand, the confidence and credibility intervals are computed differently for each method. We observe that the estimates of the parameters are asymptotically unbiased, i.e., the MREs tend to one when n increases and the RMSEs decrease to zero. However, the Bayes estimators, obtained from the reference posterior, outperform the other estimation procedures and present extremely efficient estimates for both parameters even for small sample sizes. It worth mentioning that if one needs fast computation, the CBayes returned similar estimates when compared to the Bayes estimators, obtained by considering reference priors.

Figure 3
MREs, RMSEs and CPs for α and \lambdaλ considering \alpha=4, \lambda=2α=4,λ=2 (upper panel) and \alpha=0.5, \lambda=2α=0.5,λ=2 (lower panel) for N=100,000 simulated samples and n=(15,20,\ldots,100)n=(15,20,,100) .

5.2 - CENSORED DATA

In this section, we considered the Bayes estimators in the presence of randomly censored data. The censored data were generated as follows. We first generated two random samples of size n , where XWH(α,λ) and WU(0,tc) , with tc as fixed value. Then, from the obtained samples, we took Tj=min(Xj,Wj),j=1,,n , and defined δj=0 if Tj=Xj or δj=1 if Tj=Wj .

Note that tc has to be selected in order to obtain the desired proportion of censoring. In order to obtain approximately 0.3 and 0.5 proportions of censored data, i.e., 30% and 50% of censorship we considered tc=9.3 and tc=4.8 , respectively. The simulation study is performed by considering ϑ=(2,4) , N=100,000 and n=(20,25, , 100) . To achieve the MLEs considering censored data the maxLik package to maximize the log-likelihood function (15). Different initial values were considered which returned the same estimates. Figure 4 shows the MREs, RMSEs and the coverage probabilities from the estimates of α and λ obtained by using the MC method where α=4 , λ=2 .

Figure 4
MREs, RMSEs and CPs for α and λ considering α=4,λ=2 for N=100,000 simulated samples, 30% (upper panel) and 50% (lower panel) of censoring and n=(20,25,,100) .

The results indicate that the MLE of α has a systematic bias, as shown in Figure 4. On the other hand, from the Bayes estimators, we have observed more accurate results. The coverage probabilities are also close to the nominal levels as the MLE. Therefore, the Bayes estimators are also recommended to perform inference for the parameters of the WH distribution in the presence of censoring.

6 - PREDICTION ANALYSIS

In many application, we may be interested in identifying the time to the next event (failure). In these contexts, predicting future observations plays an important role, especially, when costs are linked with the events. In order to provide a predictive analysis of future events, we consider the Bayesian prediction with WH distribution using observed order statistics.

Several authors have performed prediction analysis from the Bayesian point of view (see, for instance, Pradhan and Kundu (2011)PRADHAN B and KUNDU D. 2011. Bayes estimation and prediction of the two-parameter gamma distribution. J Stat Comp Simul 81(9): 1187-1198., Kundu and Raqab (2012)KUNDU D and RAQAB MZ. 2012. Bayesian inference and prediction of order statistics for a type-ii censored Weibull distribution. J Stat Plan Inference 142(1): 41-47., Asgharzadeh et al. (2015)ASGHARZADEH A, VALIOLLAHI R and KUNDU D. 2015. Prediction for future failures in weibull distribution under hybrid censoring. J Stat Comput Simul 85(4): 824-838.) for other distributions and the predictive distribution of future failure times and its respective credibility intervals.

We denote the order statistics of T1,,Tm by T(1)<<T(m) . Let T(m+1)<<T(n) be the unobserved future sample and t(m) denote the m -th order statistic. From the Markov property of the conditional order statistics, we have

f T m + k ( y | 𝐭 ) = f T m + k | T m ( y | 𝐭 ) = ( n m ) ! f ( y ) ( F ( y ) F ( t m ) ) k 1 ( 1 F ( y ) ) n m k ( k 1 ) ! ( n m k ) ! ( 1 F ( t m ) ) n m , (32)

for y>t(m) . After some algebra we have

f T m + k ( y | 𝐭 ) = 3 ( n m ) ! ( k 1 ) ! ( n m k ) ! y 3 α 1 ( α λ ) α exp ( α λ y 3 ) × ( γ ( α , α λ y 3 ) γ ( α , α λ t m 3 ) ) k 1 ( Γ ( α , α λ y 3 ) ) n m k ( Γ ( α , α λ t ( m ) 3 ) ) n m . (33)

The posterior predictive density of T(m+k) given 𝐭 is given by

p T m + k ( y | 𝐭 ) = 0 0 f T m + k ( y | 𝐭 ) p ( λ , α | 𝐭 , 𝛅 ) d λ d α ,

where p(λ,α|𝐭,𝛅) is given in (31). Consequently, the predictive density of T(m+k) according to y>t(m) is given by

f T m + k * ( y | 𝐭 ) = 0 0 f T m + k | T m ( y | 𝐭 ) p ( λ , α | 𝐭 , 𝛅 ) d λ d α . (34)

We also should note here that one striking result of the Bayesian predictive approach is the advantage in constructing a credibility interval for T(m+k) using the MCMC techniques.

7 - REAL-DATA APPLICATIONS

We examined the adjust of the proposed model on three real datasets to emphasize its flexibility and compare its performance with other distributions. A closer inspection of the following figures shows that the WH distribution exhibits best fit for each dataset when compared to commonly used distributions. In order to discriminate the best fit, we considered some key performance measures, namely, the Akaike information criterion (AIC) given by AIC=2l(ϑ̂;x)+2p, the Corrected Akaike information criterion (AICc) given by AICc=AIC+[2p(p+1)]/np1, and the Bayesian information criterion (BIC) given by BIC=plnn2l(ϑ̂;x), where p is the dimension of ϑ and ϑ̂ is the estimate of the parameters and l(ϑ̂;x) is the logarithm of the likelihood function given in (8) and (14).

7.1 - LIFE TEST IN ELECTRICAL APPLIANCES

Table I presents a well-known dataset available in Lawless (2011), the number of 1000s of cycles to failure for a group of 60 electrical appliances in a life test, whose data are uncensored.

TABLE I
Failure times in 60 electrical appliances.

The results summarized in Table II show that the WH distribution is the best model since resulted in the lowest values of the different discrimination methods.

TABLE II
Results of AIC, AICc and BIC criteria for all fitted distributions considering the dataset related to the failure time of 60 electrical appliances.

In order to discriminate the best fit, we considered the AIC, AICc, and BIC available in Table II. For the three criteria, we observed that the proposed model has smaller values, indicating a better fit for the WH distribution.

Table III provides the summary statistics: the MAP estimates, standard-deviations (SD) and 95% credibility intervals (CI) for α , λ and y* (future failure time). We can obtain the MAP estimates of the parameters λ and α from (26) and (28), respectively. In order to predict the future failures of an item, as described in Section 6, we considered the posterior predictive density of Tm+k from (34).

TABLE III
MAPs, SDs and 95% CIs for α, λ and y* related to the WH distribution.

As can be seen from Table I, the last failure was observed at 9.701 cycles. Then, the last line of Table III shows the prediction of the 61st failure will be at y*=9.908 cycles with 95% predictive interval given by (9.7185; 12.1094).

7.2 - FAILURE TIME DATA OF ELECTRICAL COMPONENTS

The following datasets motive the application of WH distribution in the presence of censored data. The first dataset concerns electrical failures in the components from agricultural machines. The follow-up time was a harvesting season. In this period, the electrical components have failed 29 times collected in 2017. The data are shown in Table IV.

TABLE IV
Dataset related to the agricultural machines with failure in electrical components (+ indicates censoring).

In order to discriminate the best fit, we used the AIC, AICc, and BIC (see Table V). In all of them, the proposed model has smaller values, indicating a better fit.

TABLE V
Results of AIC, AICc and BIC criteria for all fitted distributions considering the dataset related to the agricultural machine’s components electrical problems.

Table VIprovides the summary statistics: the MAP estimates, SDs and 95% CIs for α , λ and y* (future failure time). As can be seen from the Table IV the last failure was observed on the 46th day. Then from the last line of Table VI we can see the prediction of the 47th failure that will be at 50.1350.13 days, with a 95% predictive interval given by (47.27; 78.25).

TABLE VI
MAPs, SDs and 95% CIs for α, λ and y* related to the WH distribution.

Turning now to another type of failure, Table VII presents the time up to corrective maintenance of the agricultural machine. In this case, correctly predicting the next maintenance is of main interest in order to reduce costs.

TABLE VII
Dataset related to the agricultural machine’s corrective maintenance (+ indicates censoring).

From the results presented in Table VIII we observe from the AIC, AICc and BIC that the WH distribution returned better fit for the dataset.

TABLE VIII
Results of AIC, AICc and BIC criteria for all fitted distributions considering the dataset related to the agricultural machine’s revision.

Table Table IXshows the MAPs, standard deviations, and 95% credible intervals for the parameters of the WH distribution. From Table VII the last failure was observed at 13 days. Applying the proposed methodology, the prediction of the 31st maintenance will be at 13.37 days with 95% predictive interval given by (13.03;16.83).

TABLE IX
MAPs, SDs and 95% CIs for α, λ and y* related to agricultural machine’s revision dataset.

Figure 5presents the reliability function fitted by different distributions and the Kaplan-Meier curve for the three data sets. In all cases, the WH distribution provides an improved fit for the datasets.

Figure 5
Reliability function adjusted by different distributions superimposed to the empirical reliability function. Right-hand panel: Estimated hazard function.

Therefore, the practical importance of the WH distribution is observed for these datasets, since it provides a better fitting in comparison with other essential distributions, allowing to conduct prevision for the next failures.

8 - CONCLUSIONS

In this paper, we revisit the WH distribution and presented mathematical properties of this important distribution, which can be used in situations where the data present bathtub or increasing hazard rate. Initially, the parameter estimators and their asymptotic intervals were explored using the maximum likelihood theory under complete and censored data. Further, we considered an objective Bayesian analysis with an overall reference prior in order to obtain improved estimators, which outperform the ones obtained via maximum likelihood. It is proved that the reference posterior distribution is proper when n1 and the resulting marginal posterior intervals have accurate frequentist coverage. Moreover, we have introduced MAP estimators for the parameters of WH distribution that have closed-form expressions. A simulation study revealed the superiority of the Bayes estimators.

A Bayes prediction of the future failure time was also presented based on observed order statistics. The proposed methodology was applied to three data sets associated with failure time. Many possible extensions of this current work can be considered. This distribution is a promising distribution to be used in studies involving recurrent event data. In this case, we can consider a rate model from a nonhomogeneous Poisson process, where the parametric baseline rate function is a WH rate function. Further, classical point prediction could be studied for this model, for example, maximum likelihood prediction, best-unbiased predictor, conditional median prediction along with prediction interval using Pivotal method and highest conditional density (HCD) method. Our approach should be investigated further in this context.

ACKNOWLEGMENTS

The authors are very grateful to the Editor and the three reviewers for their helpful and useful comments that improved the manuscript. This study was financed in part by the Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES) and Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq). Pedro L. Ramos is grateful to the Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP, Proc. 2017-25971-0).

REFERENCES

  • ASGHARZADEH A, VALIOLLAHI R and KUNDU D. 2015. Prediction for future failures in weibull distribution under hybrid censoring. J Stat Comput Simul 85(4): 824-838.
  • BERGER JO et al. 2015. Overall objective priors. Bayesian Anal 10(1): 189-221.
  • BERNARDO JM. 1979. Reference posterior distributions for bayesian inference. J Royal Stat Soc Series B, p. 113-147.
  • BERNARDO JM. 2005. Reference analysis. Handbook of Statistics 25: 17-90.
  • COX DR and REID N. 1987. Parameter orthogonality and approximate conditional inference. J Royal Stat Society Series B 49(1): 1-18.
  • CRAIN BR and MORGAN RL. 1975. Asymptotic normality of the posterior distribution for exponential models. Ann Stat, p. 223-227.
  • DEY S, MAZUCHELI J and NADARAJAH S. 2018. Kumaraswamy distribution: different methods of estimation. J Comput Appl Math 37(2): 2094-2111.
  • FOLLAND GB. 1999. Real analysis: modern techniques and their applications. Wiley, New York, 2nd edition.
  • ISHIKAWA T, TOTANI T, NISHIMICHI T, TAKAHASHI R, YOSHIDA N and TONEGAWA M. 2014. On the systematic errors of cosmological-scale gravity tests using redshift-space distortion: non-linear effects and the halo bias. Mon Notices Royal Astron Soc 443(4): 3359-3367.
  • KHODABIN M and AHMADABADI A. 2010. Some properties of generalized gamma distribution. Math Sci 4(1): 9-28.
  • KUNDU D and RAQAB MZ. 2012. Bayesian inference and prediction of order statistics for a type-ii censored Weibull distribution. J Stat Plan Inference 142(1): 41-47.
  • LAWLESS JF. 2011. Statistical models and methods for lifetime data. Volume 362, J Wiley \& Sons.
  • LOUZADA F, RAMOS PL and NASCIMENTO D. 2018. The inverse nakagami-m distribution: A novel approach in reliability. IEEE Trans Rel (99): 1-13.
  • LOUZADA F, RAMOS PL and RAMOS E. 2019. A note on bias of closed-form estimators for the gamma distribution derived from likelihood equations. Am Stat, p. 1-8.
  • PRADHAN B and KUNDU D. 2011. Bayes estimation and prediction of the two-parameter gamma distribution. J Stat Comp Simul 81(9): 1187-1198.
  • RAMOS PL, ACHCAR JA, MOALA FA, RAMOS E and LOUZADA F. 2017. Bayesian analysis of the generalized gamma distribution using non-informative priors. Statistics 51(4): 824-843.
  • RAMOS PL, LOUZADA F and RAMOS E. 2018. Posterior properties of the nakagami-m distribution using noninformative priors and applications in reliability. IEEE Trans Rel 67(1): 105-117.
  • STACY EW. 1962. A generalization of the gamma distribution. Ann Math Stat, p. 1187-1192.
  • TEIMOURI M, HOSEINI SM and NADARAJAH S. 2013. Comparison of estimation methods for the weibull distribution. Statistics 47(1): 93-109.
  • TIBSHIRANI R. 1989. Noninformative priors for one parameter of many. Biometrika 76(3): 604-608.
  • WILSON EB and HILFERTY MM. 1931. The distribution of chi-square. Proc Natl Acad Sci 17(12): 684-688.
  • ZEMZAMI M and BENAABIDATE L. 2016. Improvement of artificial neural networks to predict daily streamflow in a semi-arid area. Hydrol Sci J 61(10): 1801-1812.

Publication Dates

  • Publication in this collection
    19 Aug 2019
  • Date of issue
    2019

History

  • Received
    02 Jan 2019
  • Accepted
    10 June 2019
Academia Brasileira de Ciências Rua Anfilófio de Carvalho, 29, 3º andar, 20030-060 Rio de Janeiro RJ Brasil, Tel: +55 21 2533-6274, +55 21 2532-0562 - Rio de Janeiro - RJ - Brazil
E-mail: aabc@abc.org.br