Improved Bayes estimators and prediction for the Wilson-Hilferty distribution

RAMOS, PEDRO L.; ALMEIDA, MARCO P.; TOMAZELLA, VERA L.D.; LOUZADA, FRANCISCO

doi:10.1590/0001-3765201920190002

Abstract

Abstract: In this paper, we revisit the Wilson-Hilferty distribution and presented its mathematical properties such as the r-th moments and reliability properties. The parameters estimators are discussed using objective reference Bayesian analysis for both complete and censored data where the resulting marginal posterior intervals have accurate frequentist coverage. A simulation study is presented to compare the performance of the proposed estimators with the frequentist approach where it is observed a clear advantage for the Bayesian method. Finally, the proposed methodology is illustrated on three real datasets.

Key words
Bayesian analysis; Bayesian prediction; censored data; objective prior; Wilson-Hilferty distribution.

1 - INTRODUCTION

In this paper, we revisit the Wilson-Hilferty (WH) distribution, showing that this distribution is appropriate for modeling data with increasing and bathtub hazard rates. This model takes the name of a fundamental statistical technique given by Wilson and Hilferty (1931)WILSON EB and HILFERTY MM. 1931. The distribution of chi-square. Proc Natl Acad Sci 17(12): 684-688., the so-called Wilson-Hilferty transformation. This procedure gives a normal approximation to the cube root of a chi-squared variable, and it is essential for a wide range of scientific processes. Ishikawa et al. (2014)ISHIKAWA T, TOTANI T, NISHIMICHI T, TAKAHASHI R, YOSHIDA N and TONEGAWA M. 2014. On the systematic errors of cosmological-scale gravity tests using redshift-space distortion: non-linear effects and the halo bias. Mon Notices Royal Astron Soc 443(4): 3359-3367. considered such transformation to improve the accuracy of likelihood analysis in the cases where only a small number of modes are available in power spectrum measurements. Zemzami and Benaabidate (2016)ZEMZAMI M and BENAABIDATE L. 2016. Improvement of artificial neural networks to predict daily streamflow in a semi-arid area. Hydrol Sci J 61(10): 1801-1812. considered the WH transformation to reduces the effect of local variations in artificial neural networks. Here, we are not introducing a new distribution but providing common mathematical functions and inferential procedures related to this vital model to be used in different frameworks.

For the WH distribution a critical reparametrization is presented. This approach returns a simple probability density function (PDF) where its parameters are orthogonal in the sense discussed by Cox and Reid (1987)COX DR and REID N. 1987. Parameter orthogonality and approximate conditional inference. J Royal Stat Society Series B 49(1): 1-18.. Besides, we considered the presence of randomly censored data, since it has received particular attention in medical experiments and industrial lifetime testing. To the best of our knowledge, there is no evidence of the use of the WH distribution to describe time-to-event data and reliability applications so far. Therefore, these results are useful contributions to be used by reliability engineers and practitioners of statistical analysis of lifetime data in general.

Due to the limited number of data that are usually observed in reliability application, we consider an objective Bayesian analysis to achieve inference that does not depend on asymptotic results such as the frequentist inference. In this case, a Bayesian approach using an overall reference prior is presented. It is proved that such prior is also a matching prior, i.e., the resulting marginal posterior intervals have accurate frequentist coverage (Tibshirani 1989TIBSHIRANI R. 1989. Noninformative priors for one parameter of many. Biometrika 76(3): 604-608.). Moreover, the resulting posterior is proper and has interesting properties, such as one-to-one invariance, consistent marginalization, and consistent sampling properties.

Moreover, we also propose efficient closed-form maximum a posteriori probability (MAP) estimators for both parameters in the case of complete data, which is essential for practical purposes, since they can be applied for real-time computing estimators in embedded technology. An intensive simulation study is presented to show the usefulness of the Bayesian approach. Finally, a Bayes prediction of the future failure time is presented based on observed order statistics. These results are applied in three lifetime data to demonstrate how the WH distribution can be applied in real situations.

The remainder of this study is organized as follows. Section 2 presents some mathematical properties of the WH distribution. In Section 3, we present the parameter estimates for the model based on Maximum Likelihood Estimator (MLE) for complete and censored data. Section 4 presents the Bayesian estimators for the distribution. Section 5 is devoted to present simulation studies to compare the performance of the estimators. The predictive modeling for the WH distribution is discussed in Section 6. In Section 7, we prove the relevance of the distribution by considering three real lifetime datasets. Finally, Section 8 summarizes the present study.

2 - WILSON-HILFERTY DISTRIBUTION

Let $T$ be a non-negative random variable with WH distribution, then its PDF is

f (t | ϑ) = \frac{3}{Γ (α)} {(\frac{α}{λ})}^{α} t^{3 α - 1} exp (- \frac{α}{λ} t^{3}),

(1)

where $α > 0$ and $λ > 0$ are shape and scale parameters, respectively, and $ϑ = (α, λ)$ , hereafter, we will assume that $ϑ$ represents these parameters. This model is a particular case of a three-parameter generalized gamma (GG) distribution proposed by Stacy (1962)STACY EW. 1962. A generalization of the gamma distribution. Ann Math Stat, p. 1187-1192. (replacing 3 by $p > 0$ in equation (1)) that unifies the WH distribution with other important models such as, Weibull distribution, gamma, Nakagami, and the lognormal distribution, to list a few.

The cumulative function of $T$ is given by

F (t | ϑ) = \frac{1}{Γ (α)} γ (α, \frac{α}{λ} t^{3}),

where $γ (y, x) = \int_{0}^{x} w^{y - 1} e^{- w} d w$ is known as the lower incomplete gamma function.

The raw moments of the model are given as

E (T^{r} | ϑ) = \frac{Γ (α + r / 3)}{Γ (α)} {(\frac{λ}{α})}^{r / 3}, for r \in ℕ,

(2)

which can also be obtained using a reparametrization in the r-moment of the GG distribution given in Khodabin and Ahmadabadi (2010)KHODABIN M and AHMADABADI A. 2010. Some properties of generalized gamma distribution. Math Sci 4(1): 9-28.. Hence, the mean and variance of (1) are, respectively, given by

E (T | ϑ) = \frac{Γ (α + 1 / 3)}{Γ (α)} {(\frac{λ}{α})}^{\frac{1}{3}} and

(3)

V a r (T | ϑ) = λ (1 - {(\frac{Γ (α + 1 / 2)}{Γ (α)})}^{2}) .

(4)

2.1 - QUANTITATIVE METHODS FOR THE RELIABILITY

Throughout this subsection, we present some crucial measures in many areas of science, mainly in reliability engineering. Such quantitative measures for the reliability are: Reliability Function, Failure Rate Function and Mean Residual Life.

2.1.1 - Reliability function and hazard rate function

The reliability function of an observational unity is given by

R (t | ϑ) = \frac{1}{Γ (α)} Γ (α, \frac{α}{λ} t^{3}),

where $Γ (y, x) = \int_{x}^{\infty} w^{y - 1} e^{- w} d w$ is known as the upper incomplete gamma function. In general applications, $R (t)$ means the probability that an observation does not fail in the time interval $(0, t]$ .

When $\frac{d F (t)}{d t} = f (t)$ exists, the hazard function of a component is obtained by considering $h (t | ϑ) = f (t | ϑ) / R (t | ϑ)$ . So, for the WH distribution, the hazard function is given by

h (t | ϑ) = 3 {(\frac{α}{λ})}^{α} t^{3 α - 1} exp (- \frac{α}{λ} t^{3}) Γ {(α, \frac{α}{λ} t^{3})}^{- 1} .

(5)

A considerable attribute of the function (5) is that it presents three distinct phases, i.e., decreases first, then remains roughly constant and ultimately increases. This result is proved in Theorem 2.1.

Theorem 2.1. For the WH distribution the hazard function $h (t | ϑ)$ is bathtub (increasing) shaped for $0 < α < \frac{1}{3}$ $(α \geq \frac{1}{3})$ , for all $λ > 0$ .

Proof. Consider that

ω (t | ϑ) = - \frac{d}{d t} log f (t | ϑ) = - \frac{(3 α - 1)}{t} - \frac{3 α t}{λ} .

(6)

Note that for $α \geq \frac{1}{3}$ and $λ > 0$ , $ω (t | ϑ)$ has a decreasing behavior, then from Lemma 2.1.2 presented in Ramos et al. (2018)RAMOS PL, LOUZADA F and RAMOS E. 2018. Posterior properties of the nakagami-m distribution using noninformative priors and applications in reliability. IEEE Trans Rel 67(1): 105-117., $h (t | ϑ)$ is increasing. On the other hand, for $0 < α < \frac{1}{3}$ and $λ > 0$ , we have that $ω (t | ϑ)$ has bathtub shape property with a global minimum at $t^{*} = \sqrt{λ - \frac{λ}{3 α}}$ . Hence, $h (t | ϑ)$ is also bathtub shaped.

The behavior of the hazard function (5) when $t \to 0$ and $t \to \infty$ is given, respectively, by

h (0 | ϑ) = {\begin{cases} \infty, & if α < \frac{1}{3} \\ \sqrt{\frac{3}{λ}}, & if α = \frac{1}{3} \\ 0, & if α > \frac{1}{3} \end{cases} and h (\infty | ϑ) = \infty .

2.1.2 - Mean residual life function

The mean residual life (MRL) is an essential measure in reliability and survival analysis applications and represents the expected additional lifetime given that a component has survived until time $t$ $t$ , i.e., it summarizes the entire remaining life from components or systems. This measure gives substantial information for maintenance strategies involving repair and replacement.

Proposition 2.2. The mean residual life function $r (t | ϑ)$ of the WH distribution is given by

\begin{matrix} r (t | ϑ) & = \frac{1}{R (t | ϑ)} \int_{t}^{\infty} y f (y | ϑ)) d y - t \\ = {(\frac{λ}{α})}^{\frac{1}{3}} (\frac{Γ (α + \frac{1}{3}, \frac{α}{λ} t^{3})}{Γ (α, \frac{α}{λ} t^{3})}) - t . \end{matrix}

(7)

The behaviors of the MRL function in (7) when $t \to 0$ and $t \to \infty$ are, respectively

r (0 | ϑ) = {(\frac{λ}{α})}^{\frac{1}{3}} (\frac{Γ (α + \frac{1}{3})}{Γ (α)}) and r (\infty | ϑ) = \frac{1}{h (\infty | ϑ)} = 0 .

Theorem 2.3. The MRL function $r (t | ϑ)$ of the WH distribution has either a unimodal or decreasing shape when $0 < α < \frac{1}{3}$ , or $α \geq \frac{1}{3}$ , respectively, for all $λ > 0$ .

Proof. If $α \geq \frac{1}{3}$ and $λ > 0$ , we have that $h (t | ϑ)$ is increasing. In this case, using the Lemma II.4 presented by Ramos et al. (2018), $r (t | ϑ)$ is decreasing. Now, if $λ > 0$ and $0 < α < \frac{1}{3}$ , then $h (t | ϑ)$ has a bathtub shape and since $h (0) r (0) > 1$ , from the same Lemma II.4 used above, we have that $r (t | ϑ)$ has a unimodal shape property.

Figure 1 presents different shapes of the hazard and the mean residual life function. The hazard function is characterized by three distinct regions according to a bathtub curve of the hazard rate. The assessment of the impact of each region it is essential to improve the reliability of the components and systems by preventive maintenance or to eliminate early-life failures.

Figure 1
Hazard and mean residual life function shapes for WH distribution considering different values of

α

and

λ

.

3 - CLASSICAL INFERENCE

3.1 - COMPLETE DATA

In this section, we use the maximum likelihood method to obtain the classical estimators of the unknown parameters of the WH distribution which may be obtained by direct maximization of the likelihood function. Let $T_{1}, \dots, T_{n}$ be a random sample such that $T_{i} \overset{i i d}{\sim} WH (λ, α)$ , then, the likelihood function from (1) is given by

L (ϑ | 𝐭) = \frac{3^{n}}{Γ (α)^{n}} {(\frac{α}{λ})}^{n α} {\prod_{i = 1}^{n} t_{i}^{3 α - 1}} exp (- \frac{α}{λ} \sum_{i = 1}^{n} t_{i}^{3}) .

(8)

The estimates are obtained by maximizing the likelihood function. From the expressions $\frac{\partial}{\partial λ} log L (ϑ | 𝐭) = 0$ and $\frac{\partial}{\partial α} log L (ϑ | 𝐭) = 0$ , the likelihood equations are given by

n (1 + log (\frac{α}{λ})) - n ψ (α) + 3 \sum_{i = 1}^{n} log (t_{i}) = \frac{1}{λ} \sum_{i = 1}^{n} t_{i}^{3}

(9)

and

- \frac{n α}{λ} + \frac{α}{λ^{3}} \sum_{i = 1}^{n} t_{i}^{3} = 0,

(10)

where $ψ (k) = \frac{\partial}{\partial k} log Γ (k) = \frac{Γ' (k)}{Γ (k)}$ is the digamma function. The MLE of $\hat{λ}$ is given by

\hat{λ} = \frac{1}{n} \sum_{i = 1}^{n} t_{i}^{3} .

(11)

Substituting $\hat{λ}$ in (9), the MLE of $\hat{α}$ can be obtained by solving

ξ (α; 𝐭) = log (\frac{1}{n} \sum_{i = 1}^{n} t_{i}^{3}) - \frac{3}{n} \sum_{i = 1}^{n} log (t_{i}) - log (α) + ψ (α) .

(12)

Theorem 3.1. Suppose that $\hat{λ} = \frac{1}{n} \sum_{i = 1}^{n} t_{i}^{3}$ is the MLE of $λ$ . Hence, the root of $ξ (α; 𝐭) = 0$ , $\hat{α}$ , is unique.

Proof. Since $log (α) - ψ (α)$ is strictly monotone and continuous with range in $(- \infty, 0)$ , we have that for $α > 0$ there is a unique solution in

log (α) - ψ (α) = log (\frac{1}{n} \sum_{i = 1}^{n} t_{i}^{3}) - \frac{1}{n} \sum_{i = 1}^{n} log (t_{i}^{3}),

and the proof is completed.

The MLEs are asymptotically normally distributed with a bivariate normal distribution given by

({\hat{α}}_{M L E}, {\hat{λ}}_{M L E}) \sim N_{2} [(α, λ), I^{- 1} (α, λ))] for n \to \infty,

where $I (α, λ)$ is the Fisher information matrix

I (α, λ) = n [\begin{matrix} \frac{(α ψ' (α) - 1)}{α} & 0 \\ 0 & \frac{α}{λ^{2}} \end{matrix}],

(13)

and $ψ' (k) = \frac{\partial}{\partial k} ψ (k)$ is the trigamma function. These results are useful to construct asymptotic confidence intervals for the parameters of the WH distribution.

3.2 - CENSORED DATA

A fundamental problem that one meets with lifetime data is that available data are a mixture of complete and incomplete observations due to many reasons. The case most easily encountered of incompleteness is random censoring (for more details see Lawless 2011LAWLESS JF. 2011. Statistical models and methods for lifetime data. Volume 362, J Wiley \& Sons.). In this type of censoring there are the follow random variables, the lifetime $T_{i}$ related to $i$ th individual and the censoring time $C_{i}$ , such that data observed are $(t_{i}, δ_{i})$ , where $t_{i} = min (T_{i}, C_{i})$ and the censoring indicator $δ_{i} = I (T_{i} \leq C_{i})$ , resulting 1 if the lifetime is observed and 0 otherwise. This type of censoring has as special cases the type I and II censoring mechanisms. We assume that the random censoring times $C_{i}$ s are independent of $T_{i}$ s. In this case, the likelihood function for $ϑ$ is given by $L (ϑ, 𝐭) = \prod_{i = 1}^{n} f (t_{i} | ϑ)^{δ_{i}} R (t_{i} | ϑ)^{1 - δ_{i}}$ .

Letting the failure times $T_{1}, \dots, T_{n}$ be a random sample of WH distribution, the likelihood function considering data with random censoring is given by

\begin{matrix} L (α, λ | 𝐭, 𝛅) = & \frac{3^{d}}{Γ (α)^{n}} {(\frac{α}{λ})}^{d α} exp (- \frac{α}{λ} \sum_{i = 1}^{n} δ_{i} t_{i}^{3}) \times \\ \prod_{i = 1}^{n} {t_{i}^{- (3 α + 1) δ_{i}} Γ {(α, \frac{α}{λ} t_{i}^{3})}^{1 - δ_{i}}}, \end{matrix}

(14)

where $d = \sum_{i = 1}^{n} δ_{i}$ . The logarithm of the likelihood function (14) is given by

\begin{matrix} l (α, λ | 𝐭, 𝛅) = & d log (3) + d α log (\frac{α}{λ}) + (3 α - 1) \sum_{i = 1}^{n} δ_{i} log (t_{i}) - n log (Γ (α)) \\ + \sum_{i = 1}^{n} (1 - δ_{i}) log (Γ (α, \frac{α}{λ} t_{i}^{3})) - \frac{α}{λ} \sum_{i = 1}^{n} δ_{i} t_{i}^{3}, \end{matrix}

(15)

From $\partial l (α, λ | 𝐭, 𝛅) / \partial α = 0$ and $\partial l (α, λ | 𝐭, 𝛅) / \partial λ = 0$ , the likelihood equations are given as follows

\begin{matrix} d log (\frac{α}{λ}) + d + 3 & \sum_{i = 1}^{n} δ_{i} log (t_{i}) + \sum_{i = 1}^{n} (1 - δ_{i}) Ψ_{1} (α, λ t_{i}^{3}) \\ - n ψ (α) = \frac{1}{λ} \sum_{i = 1}^{n} δ_{i} t_{i}^{3} \end{matrix}

(16)

\sum_{i = 1}^{n} (1 - δ_{i}) Ψ_{2} (α, λ t_{i}^{3}) + \sum_{i = 1}^{n} \frac{δ_{i} α}{λ^{3} t_{i}^{3}} - \frac{d α}{λ} = 0,

(17)

where $Ψ_{1} (a, b) = \frac{\partial}{\partial a} \log Γ (a, \frac{a}{b}) and Ψ_{2} (a, b) = \frac{\partial}{\partial b} \log Γ (a, \frac{a}{b})$ can be computed numerically. Numerical methods are required to find the solution of these non-linear equations.

4 - BAYESIAN INFERENCE

In this section, we considered a Bayesian approach to obtain the parameter estimators of the WH distribution. Our interest is to find accurate estimates and prediction (condition/performance in the future). These problems are often challenging to deal with when using classical inference methods. In particular, our setting presents many interest quantities, and we are simultaneously interested in all the parameters of the model and several functions of them.

One may consider the Jeffreys prior, which is obtained by taking the square root from the determinant of the Fisher information matrix in (13), i.e.,

π_{J} (λ, α) \propto \frac{\sqrt{α ψ' (α) - 1}}{λ} .

(18)

On the other hand, the Jeffreys prior may not be a good choice in the multiparametric case Bernardo (2005)BERNARDO JM. 2005. Reference analysis. Handbook of Statistics 25: 17-90.. Moreover, as it will be discussed further, this prior is not a matching prior for both parameters, i.e., the marginal posterior intervals have not accurate frequentist coverage in a sense discussed by Tibshirani (1989). Another well-known class of non-informative priors is then considered, the so-called reference priors Bernardo (1979)BERNARDO JM. 1979. Reference posterior distributions for bayesian inference. J Royal Stat Soc Series B, p. 113-147.. The main idea for obtaining such prior is to maximize the expected Kullback-Leibler divergence between the posterior distribution and the prior distribution. The obtained reference prior provides posterior distributions with interesting properties such as invariance, consistency under marginalization and consistent sampling properties.

In cases where the Fisher information matrix has orthogonal parameters, the following Lemma (see Berger et al. 2015) can be used to easily obtain a one-at-a-time reference prior for any chosen parameter of interest and any ordering of the nuisance parameters in the derivation (hereafter referred to as overall reference prior).

Lemma 4.1. Berger et al. (2015)BERGER JO et al. 2015. Overall objective priors. Bayesian Anal 10(1): 189-221.. Considering the unknown parameters $ϑ = (ϑ_{1}, ϑ_{2})$ and the posterior distribution p $(ϑ_{1}, ϑ_{2} | t)$ which is asymptotically normally distributed with dispersion matrix $S (ϑ_{1}, ϑ_{2}) = I^{- 1} (ϑ_{1}, ϑ_{2})$ . If $I (ϑ_{1}, ϑ_{2})$ is of the form

I (ϑ_{1}, ϑ_{2}) = diag (f_{1} (ϑ_{1}) g_{1} (ϑ_{2}), f_{2} (ϑ_{2}) g_{2} (ϑ_{1})),

where $f_{i} (\cdot)$ and $g_{i} (\cdot)$ are positive functions of $ϑ_{i}$ for $i=1,2$ $i = 1, 2$ , then the overall reference prior is given by

π_{R} (ϑ_{1}, ϑ_{2}) = \sqrt{f_{1} (ϑ_{1}) f_{2} (ϑ_{2})} .

Consider the posterior distribution with dispersion matrix $S (α, λ) = I^{- 1} (α, λ)$ is given by

p (ϑ | 𝐭) \propto \frac{1}{Γ (α)^{n}} {(\frac{α}{λ})}^{n α} {\prod_{i = 1}^{n} t_{i}^{3 α}} exp (- \frac{α}{λ} \sum_{i = 1}^{n} t_{i}^{3}) .

Since $t_{i}^{3}, i = 1, \dots, n$ are constant terms supose that $y_{i} = t_{i}^{3}$ . Then, we have that

p (ϑ | 𝐭) \propto \frac{1}{Γ (α)^{n}} {(\frac{α}{λ})}^{n α} {\prod_{i = 1}^{n} y_{i}^{α}} exp (- \frac{α}{λ} \sum_{i = 1}^{n} y_{i}) .

The posterior above has the same form of the posterior distribution of the gamma model reparametrized by its mean. Crain and Morgan (1975)CRAIN BR and MORGAN RL. 1975. Asymptotic normality of the posterior distribution for exponential models. Ann Stat, p. 223-227. proved that for the exponential family the posterior distribution tends asymptotically to the multivariate normal. Since our model is included in this case, Lemma 4.1 can be used to derive the overall referente prior.

The overall reference prior for the WH distribution is given by

π_{R} (α, λ) \propto \frac{1}{λ} \sqrt{\frac{α ψ' (α) - 1}{α}} .

(19)

An interesting aspect of this overall reference prior is that such prior satisfies the solutions of both partial differential equations

{\begin{matrix} \frac{\partial}{\partial λ} {\frac{ι_{12} π}{ι_{22} {(ι_{11} - ι_{12}^{2} / ι_{22})}^{\frac{1}{2}}}} - \frac{\partial}{\partial α} {\frac{π}{{(ι_{11} - ι_{12}^{2} / ι_{22})}^{\frac{1}{2}}}} = 0 \\ \frac{\partial}{\partial α} {\frac{ι_{12} π}{ι_{11} {(ι_{22} - ι_{12}^{2} / ι_{11})}^{\frac{1}{2}}}} - \frac{\partial}{\partial λ} {\frac{π}{{(ι_{22} - ι_{12}^{2} / ι_{11})}^{\frac{1}{2}}}} = 0 \end{matrix}

where $ι_{i j} (ϑ), i, j = 1, 2$ , are the elements of the Fisher information matrix given in (13). This implies that the credible interval for $ϑ_{i}, i = 1, 2$ has a coverage error $O$ $O$ ( $n^{-1}$ $n^{- 1}$ ) in the frequentist sense, i.e.,

P [ϑ_{i} \leq ϑ_{i}^{1 - a} (π; X) | ϑ_{- j}] = 1 - a - O (n^{- 1}),

where $ϑ_{i}^{1 - a} (π; X) | ϑ_{- j}$ refers to the $(1 - a)$ th quantile of the posterior distribution of $ϑ_{i}$ . It is important to point out that the Jeffreys prior only satisfies the second partial differential, therefore, such prior is not a matching prior for both parameters, which is undesirable.

4.1 - COMPLETE DATA

The joint posterior distribution for $α$ and $λ$ , produced by the overall reference prior is proportional to the product of the likelihood function (8) and the prior distribution (19), resulting in

\begin{matrix} p (α, λ | 𝐭) \propto & \frac{\sqrt{α ψ' (α) - 1}}{α^{\frac{1}{2}} λ Γ (α)^{n}} {(\frac{α}{λ})}^{n α} {\prod_{i = 1}^{n} t_{i}^{3 α - 1}} exp (- \frac{α}{λ} \sum_{i = 1}^{n} t_{i}^{3}) . \end{matrix}

The normalizing constant is given by

\begin{matrix} d (𝐭) = & \int_{𝒜} \frac{\sqrt{α ψ' (α) - 1}}{α^{\frac{1}{2}} λ Γ (α)^{n}} {(\frac{α}{λ})}^{n α} {\prod_{i = 1}^{n} t_{i}^{3 α - 1}} exp (- \frac{α}{λ} \sum_{i = 1}^{n} t_{i}^{3}) d ϑ, \end{matrix}

and $𝒜 = {(0, \infty) \times (0, \infty)}$ is the parameter space for $ϑ$ . The posterior distribution (20) will be proper if and only if $d (𝐭) < \infty$ .

Theorem 4.2. The posterior distribution (20) is proper if and only if $n \geq 1$ .

Proof. Since $\frac{\sqrt{α ψ' (α) - 1}}{α^{\frac{1}{2}} λ Γ (α)^{n}} {(\frac{α}{λ})}^{n α} {\prod_{i = 1}^{n} t_{i}^{3 α - 1}} exp (- \frac{α}{λ} \sum_{i = 1}^{n} t_{i}^{3}) \geq 0$ , by Tonelli theorem (see Folland 1999) we haveFOLLAND GB. 1999. Real analysis: modern techniques and their applications. Wiley, New York, 2nd edition.

\begin{matrix} d (𝐭) = & \int_{𝒜} \frac{\sqrt{α ψ' (α) - 1}}{α^{\frac{1}{2}} λ Γ (α)^{n}} {(\frac{α}{λ})}^{n α} {\prod_{i = 1}^{n} t_{i}^{3 α - 1}} exp (- \frac{α}{λ} \sum_{i = 1}^{n} t_{i}^{3}) d ϑ \\ = & \int_{0}^{\infty} \int_{0}^{\infty} \frac{\sqrt{α ψ' (α) - 1}}{α^{\frac{1}{2}} λ Γ (α)^{n}} {(\frac{α}{λ})}^{n α} {\prod_{i = 1}^{n} t_{i}^{3 α - 1}} exp (- \frac{α}{λ} \sum_{i = 1}^{n} t_{i}^{3}) d λ d α \\ = & \int_{0}^{\infty} \frac{\sqrt{α ψ' (α) - 1}}{α^{\frac{1}{2}}} \frac{Γ (n α)}{Γ (α)^{n}} \frac{{(\prod_{i = 1}^{n} t_{i})}^{3 α}}{{(\sum_{i = 1}^{n} t_{i}^{3})}^{n α}} d α \\ = & s_{1} (𝐭) + s_{2} (𝐭) \end{matrix}

where

s_{1} (𝐭) = \int_{0}^{1} \frac{\sqrt{α ψ' (α) - 1}}{α^{\frac{1}{2}}} \frac{Γ (n α)}{Γ (α)^{n}} \frac{{(\prod_{i = 1}^{n} t_{i})}^{3 α}}{{(\sum_{i = 1}^{n} t_{i}^{3})}^{n α}} d α

and

s_{2} (𝐭) = \int_{1}^{\infty} \frac{\sqrt{α ψ' (α) - 1}}{α^{\frac{1}{2}}} \frac{Γ (n α)}{Γ (α)^{n}} \frac{{(\prod_{i = 1}^{n} t_{i})}^{3 α}}{{(\sum_{i = 1}^{n} t_{i}^{3})}^{n α}} d α .

Ramos et al. (2018) already proved in the context of the Nakagami-m distribution that

\begin{matrix} lim_{α \to 0^{+}} \frac{\sqrt{α ψ' (α) - 1}}{α^{- \frac{1}{2}}} = 1, lim_{α \to \infty} \frac{\sqrt{α ψ' (α) - 1}}{α^{- \frac{1}{2}}} = \frac{1}{\sqrt{2}} and \end{matrix}

\frac{Γ (n α)}{Γ (α)^{n}} \propto α^{n - 1} for α \in (0, c] and \frac{Γ (n α)}{Γ (α)^{n}} \propto n^{n α} α^{\frac{n - 1}{2}}

for $α \in [c, \infty)$ and for any $c>0$ $c > 0$ . Therefore, we have that

\begin{matrix} s_{1} (𝐭) & = \int_{0}^{1} \frac{\sqrt{α ψ' (α) - 1}}{α^{\frac{1}{2}}} \frac{Γ (n α)}{Γ (α)^{n}} \frac{{(\prod_{i = 1}^{n} t_{i})}^{3 α}}{{(\sum_{i = 1}^{n} t_{i}^{3})}^{n α}} d α \\ \propto \int_{0}^{1} \frac{α^{- \frac{1}{2}}}{α^{\frac{1}{2}}} α^{n - 1} \frac{{(\prod_{i = 1}^{n} t_{i})}^{3 α}}{{(\sum_{i = 1}^{n} t_{i}^{3})}^{n α}} d α \\ = \int_{0}^{1} α^{n - 2} e^{- q (t) n α} d α < \infty \end{matrix}

and

\begin{matrix} s_{2} (𝐭) & = \int_{1}^{\infty} \frac{\sqrt{α ψ' (α) - 1}}{α^{\frac{1}{2}}} \frac{Γ (n α)}{Γ (α)^{n}} \frac{{(\prod_{i = 1}^{n} t_{i})}^{3 α}}{{(\sum_{i = 1}^{n} t_{i}^{3})}^{n α}} d α \\ \propto \int_{1}^{\infty} \frac{α^{- \frac{1}{2}}}{α^{\frac{1}{2}}} n^{n α} α^{\frac{n - 1}{2}} \frac{{(\prod_{i = 1}^{n} t_{i})}^{3 α}}{{(\sum_{i = 1}^{n} t_{i}^{3})}^{n α}} d α \\ = \int_{1}^{\infty} α^{\frac{n - 3}{2}} e^{- p (t) n α} < \infty, \end{matrix}

where $p (t) = log (\frac{\frac{1}{n} \sum_{i = 1}^{n} t_{i}^{3}}{\sqrt[n]{\prod_{i = 1}^{n} t_{i}^{3}}}) > 0$ and $q (t) = log (\frac{\sum_{i = 1}^{n} t_{i}^{3}}{\sqrt[n]{\prod_{i = 1}^{n} t_{i}^{3}}}) > 0$ by the inequality of the arithmetic and geometric means. Finally, we obtain $d (𝐭) = s_{1} (𝐭) + s_{2} (𝐭) < \infty$ .

The marginal posterior distribution for $α$ is given by

p (α | 𝐭) \propto \sqrt{\frac{α ψ' (α) - 1}{α}} \frac{Γ (n α)}{Γ (α)^{n}} \frac{{(\prod_{i = 1}^{n} t_{i})}^{3 α}}{{(\sum_{i = 1}^{n} t_{i}^{3})}^{n α}} .

(21)

The conditional posterior distribution for $λ$ is given by

p (λ | α, 𝐭) \sim IG (n α, α \sum_{i = 1}^{n} t_{i}^{3}),

(22)

where IG $(\cdot, \cdot)$ denotes the inverse gamma distribution with PDF $f (y | a, b) = b^{a} y^{- a - 1} e x p (- b y^{- 1}) / Γ (a)$ and shape parameter $a$ and scale parameter $b$ .

4.2 - MAXIMUM A POSTERIORI PROBABILITY (MAP) ESTIMATORS

Firstly, we considered the Bayesian approach to derive efficient closed-form estimators for parameters of the generalized gamma distribution when one of the parameters is known. Let $W$ follows a GG distribution with $α, λ$ and $ϕ$ parameters, then, the likelihood function is given by

L (ϑ ∣ 𝐰) = \frac{φ^{n}}{Γ (α)^{n}} {(\frac{α}{λ})}^{n α} {\prod_{i = 1}^{n} w_{i}^{φ α - 1}} exp (- \frac{α}{λ} \sum_{i = 1}^{n} w_{i}^{φ}) .

(23)

Although different objective priors could be considered such as Jeffreys’ prior or reference priors (Ramos et al. 2017RAMOS PL, ACHCAR JA, MOALA FA, RAMOS E and LOUZADA F. 2017. Bayesian analysis of the generalized gamma distribution using non-informative priors. Statistics 51(4): 824-843.) to perform inference with this generalized model, they depend on polygamma functions which do not allow us to obtain closed-form estimators. On the other hand, they considered the following objective prior

π (ϑ) \propto \frac{1}{λ^{c 1} α^{c 2} φ^{c 3}},

(24)

where $c_{i} \geq 0, i = 1, 2, 3$ are known hyperparameters. From the likelihood function (23) and the prior distribution (24), the joint posterior distribution for $ϑ$ is given by,

p (ϑ | w) = \frac{1}{d (w)} \frac{φ^{n - c_{3}}}{α^{c_{2}} λ^{c_{1}} Γ (α)^{n}} {(\frac{α}{λ})}^{n α} {\prod_{i = 1}^{n} w_{i}^{φ α}} \exp (- \frac{α}{λ} \sum_{i = 1}^{n} w_{i}^{φ}),

(25)

where $d (𝐰)$ is a normalized constant with parameter space $𝒜 = {(0, \infty) \times (0, \infty) \times (ϵ, M)}$ , $0 < ϵ < 3$ is a small constant and $M > 3$ is a large constant. We select $(ϵ, M)$ for the interval of $φ$ as the interest is in the situation where $φ = 3$ . Hence, any interval $(ϵ, M)$ that contains $φ = 3$ is adequate for our purposes.

To achieve the maximum a posteriori probability estimator of $ϑ$ we consider ${\hat{ϑ}}_{M A P} = \underset{ϑ}{a r g m a x} \log (π (ϑ | ω))$ . Hence, we have that the MAP estimate of $λ$ is given by

\hat{λ} = \frac{\hat{α} \sum_{i = 1}^{n} w_{i}^{\hat{φ}}}{n \hat{α} + c_{1}} \cdot

(26)

As can be noted, (26) will be equal to the MLE (11) if and only if $c_{1} = 0$ , i.e, $λ$ is unbiased when $φ = 3$ . Hence we consider only that $c_{1} = 0$ . Moreover, if $c_{1} = 0$ , then (25) is a proper posterior distribution, i.e., $d (𝐰) < \infty$ (see Louzada et al. 2018LAWLESS JF. 2011. Statistical models and methods for lifetime data. Volume 362, J Wiley \& Sons.). The MAP estimate of $α$ is given by

\hat{α} = \frac{(n - c_{3})}{(\frac{1}{\hat{λ}} \sum_{i = 1}^{n} w_{i}^{\hat{φ}} log (w_{i}^{\hat{φ}}) + \sum_{i = 1}^{n} log (w_{i}^{\hat{φ}}))},

(27)

and the MAP estimate for $φ$ is obtained by solving the non-linear equation

log (\hat{α}) - ψ (\hat{α}) = log (\hat{λ}) - \frac{1}{n} \sum_{i = 1}^{n} log (w_{i}^{\hat{φ}}) + \frac{c_{2}}{n \hat{α}} .

For the WH distribution ( $ϕ = 3$ ), the MAP estimator for $λ$ is ${\hat{λ}}_{M A P} = \frac{1}{n} \sum_{i = 1}^{n} t_{i}^{3}$ and the estimate of $α$ is obtained from

{\hat{α}}_{MAP} = \frac{(n - c_{3}) \frac{1}{n} \sum_{i = 1}^{n} t_{i}^{3}}{(\frac{1}{n} \sum_{i = 1}^{n} t_{i}^{3} \sum_{i = 1}^{n} \log (t_{i}^{3}) - \sum_{i = 1}^{n} t_{i}^{3} \log (t_{i}^{3}))} \cdot

(28)

It is possible to show that the asymptotic variance ${\hat{σ}}_{α}^{2}$ and ${\hat{σ}}_{λ}^{2}$ of ${\hat{α}}_{M A P}$ and ${\hat{λ}}_{M A P}$ are given by

\begin{matrix} {\hat{σ}}_{α}^{2} = \frac{(n - c_{3})^{2}}{n^{3}} (α^{2} + α^{3} ψ^{'} (α + 1)) \end{matrix}

(29)

and

\begin{matrix} {\hat{σ}}_{λ}^{2} = \frac{λ^{2}}{n α} \cdot \end{matrix}

(30)

4.3 - CENSORED DATA

In the case of censored data, the joint posterior distribution for $α$ and $λ$ , produced by the overall reference prior is given by

\begin{matrix} p (α, λ | 𝐭, 𝛅) \propto & \sqrt{\frac{α ψ' (α) - 1}{α}} \frac{3^{d}}{λ Γ (α)^{n}} {(\frac{α}{λ})}^{d α} \prod_{i = 1}^{n} {t_{i}^{- 3 α δ_{i}}} \times \\ \times \prod_{i = 1}^{n} {Γ {(α, \frac{α}{λ} t_{i}^{3})}^{1 - δ_{i}}} exp (- \frac{α}{λ} \sum_{i = 1}^{n} δ_{i} t_{i}^{3}) . \end{matrix}

(31)

To sample from the joint posterior distribution using the Markov Chain Monte Carlo (MCMC) methods it is necessary to consider the conditional distributions for the parameters. The conditional posterior distribution of $α$ is given by

\begin{matrix} p (α | λ, 𝐭, 𝛅) \propto & \frac{1}{Γ (α)^{n}} \sqrt{\frac{α ψ' (α) - 1}{α}} exp (- \frac{α}{λ} \sum_{i = 1}^{n} δ_{i} t_{i}^{3}) \times \\ {(\frac{α}{λ})}^{d α} \prod_{i = 1}^{n} {t_{i}^{- 3 α δ_{i}} Γ {(α, \frac{α}{λ} t_{i}^{3})}^{1 - δ_{i}}} . \end{matrix}

The conditional posterior distribution of $λ$ is given by

\begin{matrix} p (λ | α, 𝐭, 𝛅) \propto & {(\frac{1}{λ})}^{d α + 1} \prod_{i = 1}^{n} Γ {(α, \frac{α}{λ} t_{i}^{3})}^{1 - δ_{i}} exp (- \frac{α}{λ} \sum_{i = 1}^{n} δ_{i} t_{i}^{3}) . \end{matrix}

Since the marginal distributions do not belong to any known parametric family, the Metropolis-Hastings algorithm was considered in order to generate samples from the marginal posterior. The Gamma distribution was used as transition kernel $q (ϑ_{i}^{(j)} | ϑ_{i}^{(*)}, b), i = 1, 2$ , for sampling values of $ϑ_{i}$ , where b is the parameter that control the rate of acceptance of the algorithm. In our case, we choose b to be equal to one. However, other higher values can also be considered. To increase the convergence of the MCMC method, we could use the closed-form estimators as a good initial value for $λ$ and $α$ .

The Metropolis-Hastings algorithm operates as follows:

1. Start with an initial value $α^{(1)}, λ^{(1)}$ and set the iteration counter $j = 1;$
2. Generate a random value $α^{(*)}$ from the proposal $Gamma (α^{(j)}, b)$ with PDF given by $f (y ∣ a, b) = \frac{b^{a}}{Γ (a)} y^{a - 1} e x p^{- b y}$ and random values $u_{1}$ and $u_{2}$ from an independent uniform in $(0, 1)$ ;
3. Evaluate the acceptance probability

η_{1} (α^{(j)}, α^{(*)}) = min (1, \frac{p (α^{(*)} | λ^{(j)}, 𝐭, 𝛅)}{p (α^{(j)} | λ^{(j)}, 𝐭, 𝛅)} \frac{q (α^{(j)} ∣ α^{(*)}, b)}{q (α^{(*)} | α^{(j)}, b)});

4. If $η_{1} (α^{(j)}, α^{(*)}) \geq u_{1}$ , then $α^{(j + 1)} = α^{(*)}$ . Otherwise, $α^{(j + 1)} = α^{(j)}$ ;
5. Generate a random value $λ^{(*)}$ from the proposal $Gamma (λ^{(j)}, b)$ and a random value $u$ from an independent uniform in $(0, 1)$ ;
6. Evaluate the acceptance probability

η_{2} (λ^{(j)}, λ^{(*)}) = min (1, \frac{p (λ^{(*)} | α^{(j + 1)}, 𝐭, 𝛅)}{p (λ^{(j)} | α^{(j + 1)}, 𝐭, 𝛅)} \frac{q (λ^{(j)} | λ^{(*)}, b)}{q (λ^{(*)} | λ^{(j)}, b)});

7. If $η_{2} (λ^{(j)}, λ^{(*)}) \geq u_{2}$ , then $λ^{(j + 1)} = λ^{(*)}$ . Otherwise, $λ^{(j + 1)} = λ^{(j)}$ ;
8. Change the counter from $j$ to $j + 1$ and return to step 2 until convergence is reached.

5 - NUMERICAL ANALYSIS

In this section, the proposed estimation methods are investigated via Monte Carlo simulation in order to compare their efficiency. We used two of the most commonly used measures to assess the performance of the estimators, namely, the mean relative error (MRE) and the root-mean-square error (RMSE), given by ${MRE}_{ϑ_{i}} = \frac{1}{N} \sum_{j = 1}^{N} \frac{{\hat{ϑ}}_{i, j}}{ϑ_{i}}$ and ${RMSE}_{ϑ_{i}} = \frac{1}{N} \sum_{j = 1}^{N} \sqrt{({\hat{ϑ}}_{i, j} - ϑ_{i})^{2}}$ , for $i = 1, 2,$ respectively, where $N$ is the number of estimates and ${\hat{ϑ}}_{i, j},, j = 1, \dots, N$ is the estimates obtained through the MLE and MAP estimators. The MRE and the RMSE of the $λ$ are the same for the different estimation procedures.

Taking into account this approach, the most efficient estimators are those ones yielding MREs closer to one with smaller RMSEs. It is also computed the $95 %$ coverage probability ( $C P_{95 %}$ ) of the credibility intervals (CI) obtained from the Bayesian estimators and the confidence intervals from the classical approach for $α$ and $λ$ . Under this approach, the estimators with best coverage probabilities will show the frequencies of intervals that cover the true values of $ϑ$ closer to the nominal level. The samples of size $n$ were generated by assuming $T \sim WH (λ, α)$ . The results were computed by using the software R. Overall, the main aim of this section is to compared the Bayes estimators with the classical approach in order to verify if our proposed estimators return more accurate results. This approach has been considered for other models such as Gamma (Louzada et al. 2019LOUZADA F, RAMOS PL and NASCIMENTO D. 2018. The inverse nakagami-m distribution: A novel approach in reliability. IEEE Trans Rel (99): 1-13.), Weibull (Teimouri et al. 2013TEIMOURI M, HOSEINI SM and NADARAJAH S. 2013. Comparison of estimation methods for the weibull distribution. Statistics 47(1): 93-109.), Kumaraswamy (Dey et al. 2018DEY S, MAZUCHELI J and NADARAJAH S. 2018. Kumaraswamy distribution: different methods of estimation. J Comput Appl Math 37(2): 2094-2111.) and the GG distribution (Ramos et al. 2017), to list a few.

5.1 - COMPLETE DATA

Recall that it was presented a class of MAP estimators for $α$ that depends on $c_{3}$ . Therefore, before going on with the simulation study, it is necessary to find a value for $c_{3}$ in which the MRE is closer to one. Figure 2 presents the MREs for $α_{MAP}$ considering different values of $c_{3}$ , for $ϑ = (3, 4)$ and $ϑ = (10, 4)$ and $n = 20$ . We observed that a good choice is $c_{3} = 2.9$ . Therefore, we considered such value in (28), where $n \geq 3$ . Since both estimators obtained from the reference posterior and the closed-form Bayes estimators came from the Bayesian approach, they will be referred to as Bayes and CBayes estimators, respectively. To achieve the MLEs, numerical techniques must be used to solve the non-linear equation (12), the uniroot function available in R was considered to find the estimate, and as provided theoretically, we find a unique solution in all situations.

Figure 2
MREs for

α

considering

c_{3} = (2, 2.1, \dots, 4)

,

α = 3

,

λ = 4

and

n = 20

(left panel)

α = 10

,

λ = 4

and

n = 20

(right panel) for

N = 100, 000

simulated samples.

Figure 3 shows the MREs, RMSEs and CPs for the estimates of $α$ and $λ$ obtained by using the Monte Carlo (MC) method where $α = (4, 0.5)$ , $λ = 2$ and $n = (15, 20, \dots, 100)$ . The horizontal lines in the figures correspond to the minimum MREs and RMSEs, equal to one and zero, respectively. Notice that for $λ$ we have the same estimator using the three different estimators for complete data. In this case, both MRE and RMSE are similar. On the other hand, the confidence and credibility intervals are computed differently for each method. We observe that the estimates of the parameters are asymptotically unbiased, i.e., the MREs tend to one when $n$ increases and the RMSEs decrease to zero. However, the Bayes estimators, obtained from the reference posterior, outperform the other estimation procedures and present extremely efficient estimates for both parameters even for small sample sizes. It worth mentioning that if one needs fast computation, the CBayes returned similar estimates when compared to the Bayes estimators, obtained by considering reference priors.

Figure 3
MREs, RMSEs and CPs for

α

and $\lambda$

λ

considering $\alpha=4, \lambda=2$

α = 4, λ = 2

(upper panel) and $\alpha=0.5, \lambda=2$

α = 0.5, λ = 2

(lower panel) for

N = 100, 000

simulated samples and $n=(15,20,\ldots,100)$

n = (15, 20, \dots, 100)

.

5.2 - CENSORED DATA

In this section, we considered the Bayes estimators in the presence of randomly censored data. The censored data were generated as follows. We first generated two random samples of size $n$ , where $X \sim WH (α, λ)$ and $W \sim U (0, t_{c})$ , with $t_{c}$ as fixed value. Then, from the obtained samples, we took $T_{j} = min (X_{j}, W_{j}), j = 1, \dots, n$ , and defined $δ_{j} = 0$ if $T_{j} = X_{j}$ or $δ_{j} = 1$ if $T_{j} = W_{j}$ .

Note that $t_{c}$ has to be selected in order to obtain the desired proportion of censoring. In order to obtain approximately $0.3$ and $0.5$ proportions of censored data, i.e., $30 %$ and $50 %$ of censorship we considered $t_{c} = 9.3$ and $t_{c} = 4.8$ , respectively. The simulation study is performed by considering $ϑ = (2, 4)$ , $N = 100, 000$ and $n = (20, 25, \dots$ , $100)$ . To achieve the MLEs considering censored data the maxLik package to maximize the log-likelihood function (15). Different initial values were considered which returned the same estimates. Figure 4 shows the MREs, RMSEs and the coverage probabilities from the estimates of $α$ and $λ$ obtained by using the MC method where $α = 4$ , $λ = 2$ .

Figure 4
MREs, RMSEs and CPs for

α

and

λ

considering

α = 4, λ = 2

for

N = 100, 000

simulated samples,

30 %

(upper panel) and

50 %

(lower panel) of censoring and

n = (20, 25, \dots, 100)

.

The results indicate that the MLE of $α$ has a systematic bias, as shown in Figure 4. On the other hand, from the Bayes estimators, we have observed more accurate results. The coverage probabilities are also close to the nominal levels as the MLE. Therefore, the Bayes estimators are also recommended to perform inference for the parameters of the WH distribution in the presence of censoring.

6 - PREDICTION ANALYSIS

In many application, we may be interested in identifying the time to the next event (failure). In these contexts, predicting future observations plays an important role, especially, when costs are linked with the events. In order to provide a predictive analysis of future events, we consider the Bayesian prediction with WH distribution using observed order statistics.

Several authors have performed prediction analysis from the Bayesian point of view (see, for instance, Pradhan and Kundu (2011)PRADHAN B and KUNDU D. 2011. Bayes estimation and prediction of the two-parameter gamma distribution. J Stat Comp Simul 81(9): 1187-1198., Kundu and Raqab (2012)KUNDU D and RAQAB MZ. 2012. Bayesian inference and prediction of order statistics for a type-ii censored Weibull distribution. J Stat Plan Inference 142(1): 41-47., Asgharzadeh et al. (2015)ASGHARZADEH A, VALIOLLAHI R and KUNDU D. 2015. Prediction for future failures in weibull distribution under hybrid censoring. J Stat Comput Simul 85(4): 824-838.) for other distributions and the predictive distribution of future failure times and its respective credibility intervals.

We denote the order statistics of $T_{1}, \dots, T_{m}$ by $T_{(1)} < \dots < T_{(m)}$ . Let $T_{(m + 1)} < \dots < T_{(n)}$ be the unobserved future sample and $t_{(m)}$ denote the $m$ -th order statistic. From the Markov property of the conditional order statistics, we have

\begin{matrix} f_{T_{m + k}} (y | 𝐭) & = f_{T_{m + k} | T_{m}} (y | 𝐭) \\ = \frac{(n - m)! f (y) {(F (y) - F (t_{m}))}^{k - 1} (1 - F (y))^{n - m - k}}{(k - 1)! (n - m - k)! (1 - F (t_{m}))^{n - m}}, \end{matrix}

(32)

for $y > t_{(m)}$ . After some algebra we have

\begin{matrix} f_{T_{m + k}} (y | 𝐭) = & \frac{3 (n - m)!}{(k - 1)! (n - m - k)!} y^{3 α - 1} {(\frac{α}{λ})}^{α} exp (\frac{α}{λ} y^{3}) \\ \times \frac{{(γ (α, \frac{α}{λ} y^{3}) - γ (α, \frac{α}{λ} t_{m}^{3}))}^{k - 1} {(Γ (α, \frac{α}{λ} y^{3}))}^{n - m - k}}{{(Γ (α, \frac{α}{λ} t_{(m)}^{3}))}^{n - m}} . \end{matrix}

(33)

The posterior predictive density of $T_{(m + k)}$ given $𝐭$ is given by

p_{T_{m + k}} (y | 𝐭) = \int_{0}^{\infty} \int_{0}^{\infty} f_{T_{m + k}} (y | 𝐭) p (λ, α | 𝐭, 𝛅) d λ d α,

where $p (λ, α | 𝐭, 𝛅)$ is given in (31). Consequently, the predictive density of $T_{(m + k)}$ according to $y > t_{(m)}$ is given by

f_{T_{m + k}}^{*} (y | 𝐭) = \int_{0}^{\infty} \int_{0}^{\infty} f_{T_{m + k} | T_{m}} (y | 𝐭) p (λ, α | 𝐭, 𝛅) d λ d α .

(34)

We also should note here that one striking result of the Bayesian predictive approach is the advantage in constructing a credibility interval for $T_{(m + k)}$ using the MCMC techniques.

7 - REAL-DATA APPLICATIONS

We examined the adjust of the proposed model on three real datasets to emphasize its flexibility and compare its performance with other distributions. A closer inspection of the following figures shows that the WH distribution exhibits best fit for each dataset when compared to commonly used distributions. In order to discriminate the best fit, we considered some key performance measures, namely, the Akaike information criterion (AIC) given by $A I C = - 2 l (\hat{ϑ}; x) + 2 p,$ the Corrected Akaike information criterion (AICc) given by $A I C c = A I C + [2 p (p + 1)] / n - p - 1,$ and the Bayesian information criterion (BIC) given by $B I C = p ln n - 2 l (\hat{ϑ}; x),$ where $p$ is the dimension of $ϑ$ and $\hat{ϑ}$ is the estimate of the parameters and $l (\hat{ϑ}; x)$ is the logarithm of the likelihood function given in (8) and (14).

7.1 - LIFE TEST IN ELECTRICAL APPLIANCES

Table I presents a well-known dataset available in Lawless (2011), the number of 1000s of cycles to failure for a group of 60 electrical appliances in a life test, whose data are uncensored.

Thumbnail

TABLE I
Failure times in 60 electrical appliances.

The results summarized in Table II show that the WH distribution is the best model since resulted in the lowest values of the different discrimination methods.

Thumbnail

TABLE II
Results of AIC, AICc and BIC criteria for all fitted distributions considering the dataset related to the failure time of 60 electrical appliances.

In order to discriminate the best fit, we considered the AIC, AICc, and BIC available in Table II. For the three criteria, we observed that the proposed model has smaller values, indicating a better fit for the WH distribution.

Table III provides the summary statistics: the MAP estimates, standard-deviations (SD) and 95% credibility intervals (CI) for $α$ , $λ$ and $y^{*}$ (future failure time). We can obtain the MAP estimates of the parameters $λ$ and $α$ from (26) and (28), respectively. In order to predict the future failures of an item, as described in Section 6, we considered the posterior predictive density of $T_{m + k}$ from (34).

Thumbnail

TABLE III
MAPs, SDs and 95% CIs for α, λ and y^* related to the WH distribution.

As can be seen from Table I, the last failure was observed at 9.701 cycles. Then, the last line of Table III shows the prediction of the 61st failure will be at $y^{*} = 9.908$ cycles with 95% predictive interval given by (9.7185; 12.1094).

7.2 - FAILURE TIME DATA OF ELECTRICAL COMPONENTS

The following datasets motive the application of WH distribution in the presence of censored data. The first dataset concerns electrical failures in the components from agricultural machines. The follow-up time was a harvesting season. In this period, the electrical components have failed 29 times collected in 2017. The data are shown in Table IV.

Thumbnail

TABLE IV
Dataset related to the agricultural machines with failure in electrical components (+ indicates censoring).

In order to discriminate the best fit, we used the AIC, AICc, and BIC (see Table V). In all of them, the proposed model has smaller values, indicating a better fit.

Thumbnail

TABLE V
Results of AIC, AICc and BIC criteria for all fitted distributions considering the dataset related to the agricultural machine’s components electrical problems.

Table VIprovides the summary statistics: the MAP estimates, SDs and 95% CIs for $α$ , $λ$ and $y^{*}$ (future failure time). As can be seen from the Table IV the last failure was observed on the $46^{t} h$ day. Then from the last line of Table VI we can see the prediction of the $47^{t} h$ failure that will be at $50.13$ $50.13$ days, with a 95% predictive interval given by (47.27; 78.25).

Thumbnail

TABLE VI
MAPs, SDs and 95% CIs for α, λ and y^* related to the WH distribution.

Turning now to another type of failure, Table VII presents the time up to corrective maintenance of the agricultural machine. In this case, correctly predicting the next maintenance is of main interest in order to reduce costs.

Thumbnail

TABLE VII
Dataset related to the agricultural machine’s corrective maintenance (+ indicates censoring).

From the results presented in Table VIII we observe from the AIC, AICc and BIC that the WH distribution returned better fit for the dataset.

Thumbnail

TABLE VIII
Results of AIC, AICc and BIC criteria for all fitted distributions considering the dataset related to the agricultural machine’s revision.

Table Table IXshows the MAPs, standard deviations, and 95% credible intervals for the parameters of the WH distribution. From Table VII the last failure was observed at 13 days. Applying the proposed methodology, the prediction of the 31st maintenance will be at 13.37 days with 95% predictive interval given by (13.03;16.83).

Thumbnail

TABLE IX
MAPs, SDs and 95% CIs for α, λ and y^* related to agricultural machine’s revision dataset.

Figure 5presents the reliability function fitted by different distributions and the Kaplan-Meier curve for the three data sets. In all cases, the WH distribution provides an improved fit for the datasets.

Figure 5
Reliability function adjusted by different distributions superimposed to the empirical reliability function. Right-hand panel: Estimated hazard function.

Therefore, the practical importance of the WH distribution is observed for these datasets, since it provides a better fitting in comparison with other essential distributions, allowing to conduct prevision for the next failures.

8 - CONCLUSIONS

In this paper, we revisit the WH distribution and presented mathematical properties of this important distribution, which can be used in situations where the data present bathtub or increasing hazard rate. Initially, the parameter estimators and their asymptotic intervals were explored using the maximum likelihood theory under complete and censored data. Further, we considered an objective Bayesian analysis with an overall reference prior in order to obtain improved estimators, which outperform the ones obtained via maximum likelihood. It is proved that the reference posterior distribution is proper when $n \geq 1$ and the resulting marginal posterior intervals have accurate frequentist coverage. Moreover, we have introduced MAP estimators for the parameters of WH distribution that have closed-form expressions. A simulation study revealed the superiority of the Bayes estimators.

A Bayes prediction of the future failure time was also presented based on observed order statistics. The proposed methodology was applied to three data sets associated with failure time. Many possible extensions of this current work can be considered. This distribution is a promising distribution to be used in studies involving recurrent event data. In this case, we can consider a rate model from a nonhomogeneous Poisson process, where the parametric baseline rate function is a WH rate function. Further, classical point prediction could be studied for this model, for example, maximum likelihood prediction, best-unbiased predictor, conditional median prediction along with prediction interval using Pivotal method and highest conditional density (HCD) method. Our approach should be investigated further in this context.

ACKNOWLEGMENTS

The authors are very grateful to the Editor and the three reviewers for their helpful and useful comments that improved the manuscript. This study was financed in part by the Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES) and Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq). Pedro L. Ramos is grateful to the Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP, Proc. 2017-25971-0).

REFERENCES

ASGHARZADEH A, VALIOLLAHI R and KUNDU D. 2015. Prediction for future failures in weibull distribution under hybrid censoring. J Stat Comput Simul 85(4): 824-838.
BERGER JO et al. 2015. Overall objective priors. Bayesian Anal 10(1): 189-221.
BERNARDO JM. 1979. Reference posterior distributions for bayesian inference. J Royal Stat Soc Series B, p. 113-147.
BERNARDO JM. 2005. Reference analysis. Handbook of Statistics 25: 17-90.
COX DR and REID N. 1987. Parameter orthogonality and approximate conditional inference. J Royal Stat Society Series B 49(1): 1-18.
CRAIN BR and MORGAN RL. 1975. Asymptotic normality of the posterior distribution for exponential models. Ann Stat, p. 223-227.
DEY S, MAZUCHELI J and NADARAJAH S. 2018. Kumaraswamy distribution: different methods of estimation. J Comput Appl Math 37(2): 2094-2111.
FOLLAND GB. 1999. Real analysis: modern techniques and their applications. Wiley, New York, 2^nd edition.
ISHIKAWA T, TOTANI T, NISHIMICHI T, TAKAHASHI R, YOSHIDA N and TONEGAWA M. 2014. On the systematic errors of cosmological-scale gravity tests using redshift-space distortion: non-linear effects and the halo bias. Mon Notices Royal Astron Soc 443(4): 3359-3367.
KHODABIN M and AHMADABADI A. 2010. Some properties of generalized gamma distribution. Math Sci 4(1): 9-28.
KUNDU D and RAQAB MZ. 2012. Bayesian inference and prediction of order statistics for a type-ii censored Weibull distribution. J Stat Plan Inference 142(1): 41-47.
LAWLESS JF. 2011. Statistical models and methods for lifetime data. Volume 362, J Wiley \& Sons.
LOUZADA F, RAMOS PL and NASCIMENTO D. 2018. The inverse nakagami-m distribution: A novel approach in reliability. IEEE Trans Rel (99): 1-13.
LOUZADA F, RAMOS PL and RAMOS E. 2019. A note on bias of closed-form estimators for the gamma distribution derived from likelihood equations. Am Stat, p. 1-8.
PRADHAN B and KUNDU D. 2011. Bayes estimation and prediction of the two-parameter gamma distribution. J Stat Comp Simul 81(9): 1187-1198.
RAMOS PL, ACHCAR JA, MOALA FA, RAMOS E and LOUZADA F. 2017. Bayesian analysis of the generalized gamma distribution using non-informative priors. Statistics 51(4): 824-843.
RAMOS PL, LOUZADA F and RAMOS E. 2018. Posterior properties of the nakagami-m distribution using noninformative priors and applications in reliability. IEEE Trans Rel 67(1): 105-117.
STACY EW. 1962. A generalization of the gamma distribution. Ann Math Stat, p. 1187-1192.
TEIMOURI M, HOSEINI SM and NADARAJAH S. 2013. Comparison of estimation methods for the weibull distribution. Statistics 47(1): 93-109.
TIBSHIRANI R. 1989. Noninformative priors for one parameter of many. Biometrika 76(3): 604-608.
WILSON EB and HILFERTY MM. 1931. The distribution of chi-square. Proc Natl Acad Sci 17(12): 684-688.
ZEMZAMI M and BENAABIDATE L. 2016. Improvement of artificial neural networks to predict daily streamflow in a semi-arid area. Hydrol Sci J 61(10): 1801-1812.

Publication Dates

Publication in this collection
19 Aug 2019
Date of issue
2019

History

Received
02 Jan 2019
Accepted
10 June 2019

This is an open-access article distributed under the terms of the Creative Commons Attribution License

[1] ASGHARZADEH A, VALIOLLAHI R and KUNDU D. 2015. Prediction for future failures in weibull distribution under hybrid censoring. J Stat Comput Simul 85(4): 824-838.

[2] BERGER JO et al. 2015. Overall objective priors. Bayesian Anal 10(1): 189-221.

[3] BERNARDO JM. 1979. Reference posterior distributions for bayesian inference. J Royal Stat Soc Series B, p. 113-147.

[4] BERNARDO JM. 2005. Reference analysis. Handbook of Statistics 25: 17-90.

[5] COX DR and REID N. 1987. Parameter orthogonality and approximate conditional inference. J Royal Stat Society Series B 49(1): 1-18.

[6] CRAIN BR and MORGAN RL. 1975. Asymptotic normality of the posterior distribution for exponential models. Ann Stat, p. 223-227.

[7] DEY S, MAZUCHELI J and NADARAJAH S. 2018. Kumaraswamy distribution: different methods of estimation. J Comput Appl Math 37(2): 2094-2111.

[8] FOLLAND GB. 1999. Real analysis: modern techniques and their applications. Wiley, New York, 2^nd edition.

[9] ISHIKAWA T, TOTANI T, NISHIMICHI T, TAKAHASHI R, YOSHIDA N and TONEGAWA M. 2014. On the systematic errors of cosmological-scale gravity tests using redshift-space distortion: non-linear effects and the halo bias. Mon Notices Royal Astron Soc 443(4): 3359-3367.

[10] KHODABIN M and AHMADABADI A. 2010. Some properties of generalized gamma distribution. Math Sci 4(1): 9-28.

[11] KUNDU D and RAQAB MZ. 2012. Bayesian inference and prediction of order statistics for a type-ii censored Weibull distribution. J Stat Plan Inference 142(1): 41-47.

[12] LAWLESS JF. 2011. Statistical models and methods for lifetime data. Volume 362, J Wiley \& Sons.

[13] LOUZADA F, RAMOS PL and NASCIMENTO D. 2018. The inverse nakagami-m distribution: A novel approach in reliability. IEEE Trans Rel (99): 1-13.

[14] LOUZADA F, RAMOS PL and RAMOS E. 2019. A note on bias of closed-form estimators for the gamma distribution derived from likelihood equations. Am Stat, p. 1-8.

[15] PRADHAN B and KUNDU D. 2011. Bayes estimation and prediction of the two-parameter gamma distribution. J Stat Comp Simul 81(9): 1187-1198.

[16] RAMOS PL, ACHCAR JA, MOALA FA, RAMOS E and LOUZADA F. 2017. Bayesian analysis of the generalized gamma distribution using non-informative priors. Statistics 51(4): 824-843.

[17] RAMOS PL, LOUZADA F and RAMOS E. 2018. Posterior properties of the nakagami-m distribution using noninformative priors and applications in reliability. IEEE Trans Rel 67(1): 105-117.

[18] STACY EW. 1962. A generalization of the gamma distribution. Ann Math Stat, p. 1187-1192.

[19] TEIMOURI M, HOSEINI SM and NADARAJAH S. 2013. Comparison of estimation methods for the weibull distribution. Statistics 47(1): 93-109.

[20] TIBSHIRANI R. 1989. Noninformative priors for one parameter of many. Biometrika 76(3): 604-608.

[21] WILSON EB and HILFERTY MM. 1931. The distribution of chi-square. Proc Natl Acad Sci 17(12): 684-688.

[22] ZEMZAMI M and BENAABIDATE L. 2016. Improvement of artificial neural networks to predict daily streamflow in a semi-arid area. Hydrol Sci J 61(10): 1801-1812.

0.014	0.034	0.059	0.061	0.069	0.080	0.123	0.142
0.381	0.464	0.479	0.556	0.574	0.839	0.917	0.969
1.088	1.091	1.174	1.270	1.275	1.355	1.397	1.477
1.702	1.893	1.932	2.001	2.161	2.292	2.326	2.337
2.811	2.886	2.993	3.122	3.248	3.715	3.790	3.857
4.106	4.116	4.315	4.510	4.580	5.267	5.299	5.583
0.165	0.210	0.991	1.064	1.578	1.649	2.628	2.785
3.912	4.100	6.065	9.701

Criteria	WH	Weibull	Gamma	Lognormal	Logistic
AIC	215.26	218.23	218.02	237.13	248.95
AICc	215.47	218.44	218.23	237.34	249.16
BIC	219.45	222.42	222.21	241.32	253.14

ϑ	MAP	SD	CI_95%ϑ
α	0.2132	0.0306	(0.1593; 0.2772)
λ	43.7261	15.7220	(28.3993; 87.6349)
y^*	9.9075	0.6349	(9.7185; 12.1094)

Criteria	WH	Weibull	Gamma	Lognormal	Logistic
AIC	236.57	238.01	237.80	241.34	255.41
AICc	237.03	238.47	238.26	241.80	255.87
BIC	239.31	240.74	240.53	244.00	258.14

ϑ	MAP	SD	CI_95%ϑ
α	0.2239	0.0480	( 0.1521; 0.3435)
λ	27733.5	7654.4	(12949.92; 42160.68)
y^*	50.1313	8.3780	(47.2707; 78.2493)

Brasil

Brasil

Improved Bayes estimators and prediction for the Wilson-Hilferty distribution

Abstract

1 - INTRODUCTION

2 - WILSON-HILFERTY DISTRIBUTION

2.1 - QUANTITATIVE METHODS FOR THE RELIABILITY

2.1.1 - Reliability function and hazard rate function

2.1.2 - Mean residual life function

3 - CLASSICAL INFERENCE

3.1 - COMPLETE DATA

3.2 - CENSORED DATA

4 - BAYESIAN INFERENCE

4.1 - COMPLETE DATA

4.2 - MAXIMUM A POSTERIORI PROBABILITY (MAP) ESTIMATORS

4.3 - CENSORED DATA

5 - NUMERICAL ANALYSIS

5.1 - COMPLETE DATA

5.2 - CENSORED DATA

6 - PREDICTION ANALYSIS

7 - REAL-DATA APPLICATIONS

7.1 - LIFE TEST IN ELECTRICAL APPLIANCES

7.2 - FAILURE TIME DATA OF ELECTRICAL COMPONENTS

8 - CONCLUSIONS

ACKNOWLEGMENTS

REFERENCES

Publication Dates

History

1	1	1	1	1	1	1	2	2	3	3	3
3	3	4	4	4	4	4	4	4	5	5	5
5	5	5	5	5	5	6	6	6	6	6	6
6	6	6	6	6	6	6	6	6	6	6	6
7	7	7	7	7	7	7	7	7	7	7	7
7	7	8	8	8	8	8	8	8	8	8	8
8	9	9	9	9	9	11	11	11	11	11	11
11	11	13	13+	13+

Criteria	WH	Weibull	Gamma	Lognormal	Logistic
AIC	441.29	442.52	451.37	469.56	442.29
AICc	441.43	442.66	451.51	469.70	442.43
BIC	446.26	447.50	456.34	474.53	447.27

ϑ	MAP	SD	CI_95%ϑ
α	0.6309	0.0793	(0.4892; 0.7949)
λ	423.5057	60.1510	(332.7807; 564.9964)
y^*	13.3713	1.0384	(13.0293; 16.8254)

1	1	1	2	2	2	2	4	6	8	8
9	11	12	15	21	21	21	21	23	24	27
29	31	36	39	41	45	46	47+	47+

1	1	1	1	1	1	1	2	2	3	3	3
3	3	4	4	4	4	4	4	4	5	5	5
5	5	5	5	5	5	6	6	6	6	6	6
6	6	6	6	6	6	6	6	6	6	6	6
7	7	7	7	7	7	7	7	7	7	7	7
7	7	8	8	8	8	8	8	8	8	8	8
8	9	9	9	9	9	11	11	11	11	11	11
11	11	13	13+	13+

1	1	1	1	1	1	1	2	2	3	3	3
3	3	4	4	4	4	4	4	4	5	5	5
5	5	5	5	5	5	6	6	6	6	6	6
6	6	6	6	6	6	6	6	6	6	6	6
7	7	7	7	7	7	7	7	7	7	7	7
7	7	8	8	8	8	8	8	8	8	8	8
8	9	9	9	9	9	11	11	11	11	11	11
11	11	13	13+	13+