Acessibilidade / Reportar erro

A BIVARIATE GENERALIZED EXPONENTIAL DISTRIBUTION DERIVED FROM COPULA FUNCTIONS IN THE PRESENCE OF CENSORED DATA AND COVARIATES

Abstract

In this paper, we introduce a Bayesian analysis for a bivariate generalized exponential distribution in the presence of censored data and covariates derived from Copula functions. The generalized exponential distribution could be a good alternative to analyze lifetime data in comparison to usual existing parametric lifetime distributions as Weibull or Gamma distributions. We have being using standard existing MCMC (Markov Chain Monte Carlo) methods to simulate samples for the joint posterior of interest. Two examples are introduced to illustrate the proposed methodology: an example with simulated bivariate lifetime data and an example with a real lifetime data set.

Bivariate generalized exponential distribution; copula function; Bayesian analysis; censored data; covariates


1 INTRODUCTION

In medical, engineering or other lifetime data applications, we could have more than one lifetime associated to each unit. A special situation is the presence of two lifetimes T1 and T2 associated to each unit. In this situation, we could consider some existing bivariate lifetime distribution that has been introduced in the literature (see for example, Gumbel, 1960GUMBEL E. 1960. Bivariate exponential distributions. Journal of the American Statistical Association, 55: 698-707.; Freund, 1961FREUND J. 1961. A bivariate extension of the exponential distribution. Journal of the American Statistical Association, 56: 971-977.; Marshall & Olkin, 1967; Downton, 1972; Block & Basu, 1974BLOCK H & BASU A. 1974. A Continuous bivariate exponential extension. Journal of the American Statistical Association, 69: 1031-1037.; Sarkar, 198722 SARKAR S. 1987. A continuous bivariate exponential distribution. Journal of the American Statistical Association, 82: 667-675.; Hawkes, 1988).

Usually these bivariate lifetime distributions generalize some popular existing univariate lifetime distributions as exponential, Weibull, Gamma or a log-normal distribution (see for example, Lawless, 198214 LAWLESS J. 1982. Statistical model and methods for lifetime data. John Wiley. New York.).

Other parametrical lifetime distributions could be generalized for the bivariate case. One of these models is given by the generalized exponential distribution (see for example, Gupta & Kundu, 199910 GUPTA RD & KUNDU D. 1999. Generalized exponential distribution. Australian and New Zealand Journal of Statistics, 41: 173-188.; Raqab & Ahsanullah, 200120 RAQAB MZ & AHSARULLAH M. 2001. Estimation of the location and scale parameters of generalized exponential distribution based on order statistics. Journal of Statistics Computation and Simulation, 69: 109-124.; Raqab, 200219 RAQAB MZ. 2002. Inferences for generalized exponential distribution based on record statistics. Journal of Statistics Planning and Inference, 104: 339-350.; Zheng, 200228 ZHENG G. 2002. Fisher information matrix in type II censored data from exponentiated exponential family. Biometrical Journal, 44: 353-357.; Gupta & Kundu, 200711 GUPTA RD & KUNDU D. 2007. Generalized exponential distribution: existing results and same recent developments, Journal of Statistical Planning and Inference, DOI: 10.1016/j.jspi. 2007.03.030.
https://doi.org/10.1016/j.jspi. 2007.03....
; Sarhan, 200721 SARHAN AM. 2007. Analysis of incomplete, censored data in competing risks models with generalized exponential distribution. IEEE Transactions on Reliability, 56: 132-138.). Kundu & Gupta (2009) introduced a singular bivariate Generalized Exponential (BVGE) distribution to analyze lifetime data.

An alternative and flexible way to derive different bivariate lifetime distributions could be given by copula functions (see, for example, Trived & Zimmer, 2005a24 TRIVEDI PK & ZIMMER DH. 2005a. Copula Modelling, New Publishes.,b25 TRIVEDI PK & ZIMMER DH. 2005b. Copula modeling: an introduction to practicioneis. Foundations and trends in econometrics, 1(1): 1-111.; Nelsen, 2006). There are several copula functions in the literature for construction of multivariate liftetime distributions where the most used are Farlie-Gumbel-Morgensten (Morgensten, 195617 MORGENSTERN D. 1956. Einfache Beispiele Zweidimensionaler Verteilungen. Mitteilingsblatt fur Matgematische Statistik, 8: 234-253.), and the Archimedean copulas as Clayton (1978), Gumbel (1960)GUMBEL E. 1960. Bivariate exponential distributions. Journal of the American Statistical Association, 55: 698-707. and Frank (1979).

Recently, new copulas have been proposed as the class of Archimedean copulas in d-dimension given in McNeil & Neslehová (2009). McNeil & Neslehová (2010) also generalise the Archimedean copulas to obtain the Liouville copula. Marshall-Olkin Copulas (see Li, 2012) and the Generalized Farlie-Gumbel-Morgenstern copula (Bekrizadeh et al., 2012) are other examples.

Achcar & Santos (2010) construct bivariate Weibull distributions suitable for survival analysis by using different copula functions. The bivariate Birnbaum-Saunders distribution proposed by Kundu et al. (2010) show that it can be obtained as a Gaussian copula. More recently, Kunduintroduced the bivariate Sinh-normal distribution, which can be obtained as a bivariate Gaussian copula.

Kundu & Gupta (2011) also derived an absolute continuous bivariate Generalized Exponential distribution based on the Clayton copula.

In this paper, we introduce other bivariate generalized exponential distributions derived from the Farlie-Gumbel-Morgensten to analyze lifetime data. We also investigate the performance of this new distribution.

Inferences for these different versions of bivariate lifetime models could present some difficulties using standard classical inference methods, especially in the presence of censored data and covariates, a usual situation in applications.

In this way, we consider the use of Bayesian methods where the samples for the joint posterior distribution of interest are simulated using MCMC (Markov Chain Monte Carlo) methods as the popular Gibbs sampling algorithm (see for example, Gelfand & Smith, 1990GELFAND AE & SMITH AFM. 1990. Sampling based approaches to calculating marginal densities. Journal of the American Statistical Association, 85: 398-409.; or Casela & George, 1992CASELLA G & GEORGE E. 1992. Explaining the Gibbs samples. The American Statistician, 46: 167-174.) or the Metropolis-Hastings algorithm (see for example, Chib & Greenberg, 1995).

The paper is organized as follows: in Section 2, we introduce some concepts of copula functions; in Section 3, we present the generalized exponential distribution; in Section 4, we derive a bivariate generalized exponential distribution from a Copula function; in Section 5, we introduce a Bayesian analysis in the presence of censored data; in Section 6, we consider the presence of covariates and censored data; in Section 7, we introduce two examples; finally in Section 8, we present some concluding remarks.

2 COPULA FUNCTIONS

Copula functions can be used to link marginal distributions with a joint distribution. For specified univariate marginal distribution functions C, the function,

which is defined using a copula function C, results in a multivariate distribution function with univariate marginal distributions specified as F 1(t 1), F 2(t 2), ... F m(t m), (see for example, Frees (1998)FREES EW & VALDEZ EA. 1998. Understanding Relationships Using Copulas. North American Actuarial Journal, 2: 1-25. or Nelsen (1999)18 NELSEN RB. 1999. An introduction to copulas. Springer-Verlag, New York.).

It is important to point out that any multivariate distribution function F can be written in the form of a copula function (Sklar, 195923 SKLAR A. 1959. Fonctions de repartition à n-dimensions et leurs marges. Inst. Stat. University Paris, 8: 229-231.); that is, if F(t 1, t 2, ..., t m) is a joint multivariate distribution function with univariate marginal distribution functions F 1(t 1), F 2(t 2), ... F m(t m), thus there exists a copula function C(U 1, U 2, ..., U m) such that (1) occurs.

If every F i is continuous, then C is unique.

In the bivariate cases, let T 1 and T 2 be two random variables with continuous distribution functions F 1 and F 2.

The probability integral transform can be applied separately to the two random variables to define U = F 1(t 1) and V = F 2(t 2), where U and V have uniform (0,1) distributions, but are usually dependent if T 1 and T 2 are dependent (T 1 and T 2 independent, implies that U and V are independent).

Specifying dependence between T 1 and T 2 is the same as specifying dependence between U and V.

With U and V uniform random variables, the problem reduces to specifying a bivariate distribution between two uniforms, that is, a copula.

3 GENERALIZED EXPONENTIAL DISTRIBUTION

A generalized exponential distribution (see Gupta & Kundu, 199910 GUPTA RD & KUNDU D. 1999. Generalized exponential distribution. Australian and New Zealand Journal of Statistics, 41: 173-188.) can be a good alternative to the usual Gamma and Weibull distributions commonly used to analyse lifetime data (see also, Raqab & Ahsanullah, 200120 RAQAB MZ & AHSARULLAH M. 2001. Estimation of the location and scale parameters of generalized exponential distribution based on order statistics. Journal of Statistics Computation and Simulation, 69: 109-124.; Raqab, 200219 RAQAB MZ. 2002. Inferences for generalized exponential distribution based on record statistics. Journal of Statistics Planning and Inference, 104: 339-350.; Zheng, 200228 ZHENG G. 2002. Fisher information matrix in type II censored data from exponentiated exponential family. Biometrical Journal, 44: 353-357.; Gupta & Kundu, 200711 GUPTA RD & KUNDU D. 2007. Generalized exponential distribution: existing results and same recent developments, Journal of Statistical Planning and Inference, DOI: 10.1016/j.jspi. 2007.03.030.
https://doi.org/10.1016/j.jspi. 2007.03....
, 200812  GUPTA RD & KUNDU D. 2008. Generalized exponential distribution: Bayesian inference. Computational Statistics and Data Analysis, 52(4): 1873-1883.; Sarhan, 200721 SARHAN AM. 2007. Analysis of incomplete, censored data in competing risks models with generalized exponential distribution. IEEE Transactions on Reliability, 56: 132-138.).

The generalized exponential distribution with two parameters has density given by,

where t > 0, α > 0, and λ > 0, are respectively, shape and scale parameters.

The density (2) has great flexibility of fit depending on the parameter α: if α < 1, we have a decreasing function and if α > 1, we have a unimodal function with mode given by λ-1log α.

Observe that if α = 1, we have an exponential distribution with parameter λ.

The survival and hazard functions associated to (2), are given, respectively, by,

and

Observe that the hazard function h(t;α,λ) is increasing from 0 to λ if α > 1; decreasing if α < 1 and constant if α = 1.

This behavior of the hazard function is similar to the behavior of the hazard function of the gamma distribution.

The moment generation function for a random variable T with a generalized exponential distribution with density (2) is given (see Gupta & Kundu, 200812  GUPTA RD & KUNDU D. 2008. Generalized exponential distribution: Bayesian inference. Computational Statistics and Data Analysis, 52(4): 1873-1883.) by,

for s < λ; Γ(x) is the gamma function.

From (5), we get the moments of interest. The mean and variance of T are given, respectively, by,

where ψ(·) is the digamma function given by

4 A BIVARIATE GENERALIZED EXPONENTIAL DISTRIBUTION DERIVED FROM THE FARLIE-GUMBEL-MORGENSTERN COPULA

Different copula functions introduced in the literature could be used to obtain a bivariate generalized exponential distribution.

A special case is given by the Farlie-Gumbel-Morgensten Copula (see Morgensten, 195617 MORGENSTERN D. 1956. Einfache Beispiele Zweidimensionaler Verteilungen. Mitteilingsblatt fur Matgematische Statistik, 8: 234-253.) given by,

where -1 < θ < 1, u = F 1(t 1) (marginal distribution for the random variable T 1) and ν = F 2(t 2) (marginal distribution for the random variable T 2).

Observe that θ is a parameter associated to the dependence between the random variables T 1 and T 2 and related to the Spearman's rank correlation ρ S(T 1, T 2) ("Spearman's rho") and Kendall's rank correlation ρ τ(T 1, T 2) ("Kendall's tau") by the relations,

From (7), we get ρ S(T 1, T 2) = θ/3 and ρ τ(T 1, T 2) = 2θ/9 (see for example, Nelsen, 199918 NELSEN RB. 1999. An introduction to copulas. Springer-Verlag, New York.).

From (8), we could use the information of experts to directly elicit a prior distribution for the correlation (a value between -1 and 1) between T 1 and T 2; another possibility is to use empirical Bayesian methods to choose a prior distribution for the dependence parameter θ using the relations given by (8). Some authors introduce different methods for eliciting a prior distribution for the correlation (see for example, Clemen, Fischer & Winkler, 2000CLEMEN RT, FISCHER GW & WINKLER RL. 2000. Assessing dependence: Some experimental results. Management Science, 46: 1100-1115.).

Let us assume the marginal generalized exponential distributions (see (2)), given by,

From (7), the joint distribution function for T 1 and T 2 is given by,

where t 1 > 0 and t 2 > 0.

The joint density function for T 1 and T 2 is given by,

That is, from (11), we have,

where ƒ1(t 1) and ƒ2(t 2) are the marginal densities for T 1 and T 2 with parameters (α 1, λ1) and (α 2, λ2), respectively (see (2)), and F 1(t 1) and F 2(t 2) are given by (9).

Figure 1 displays the joint density ƒ(t 1, t 2 | λ1, λ2, α 1, α 2, θ) given in (12) for five different levels of dependence, assuming the parameters λ1 = 1, λ2 = 1.5, α 1 = 2 and α 2 = 3. Figures 1a and 1b show a three-dimensional plot and a contour plot of the bivariate density, respectivelly, when θ is 0.5. Figures 1c, 1d, 1e and 1f illustrate contour plots for different levels of dependence parameter, θ = 0.01, 0.9, -0.5 and -0.8, respectively.

Figure 1
Contour plots of bivariate density with generalized exponential marginal densities.

Through the Figure 1 we can verify if the dependence condition is confirmed by θ, looking at the contours of bivariate density ƒ(t 1, t 2 | λ1, λ2, α 1, α 2, θ). A variable T 1 is highly correlated (positively or negatively) with other T 2, then it is expected the high density values are along a diagonal axis (see Figures 1b, d, e and f).

Also observe that the joint bivariate survival function for the lifetimes T 1 and T 2 is given by,

where F 1(t 1) and F 2(t 2) are given by (9) and F(t 1, t 2) is given by (10). That is,

5 A BAYESIAN ANALYSIS IN THE PRESENCE OF CENSORED DATA

Suppose either T 1 or T 2 can be censored and that censoring is independent of the lifetimes. Let us subdivide the n observations into four classes:

  • C1: both t1i and t 2i are observed lifetimes, i = 1, ..., n;

  • C2: t1i is a lifetime and t 2i is a censoring time (that is, we only know that T 1i > t 2i);

  • C3: t1i is a censoring time and t 2i is a lifetime;

  • C4: both t1i and t 2i are censoring times.

The likelihood function for a continuous model (see for example, Lawless, 198214 LAWLESS J. 1982. Statistical model and methods for lifetime data. John Wiley. New York., page 479) is given by,

where ƒ(t 1i , t 2i) is the joint probability density function for T 1i and T 2i; S(t 1i , t 2i) is the joint survival function; and are the partial derivatives of S(t 1i , t 2i) with respect to t 1i and t 2i, respectively.

Let us define the indicator variables δ 1i and δ 2i, by,

for j = 1,2; i = 1, ...n, where n is the number of observations.

In this way, we rewrite the likelihood function (15) as,

Observe that if we do not have censored data, the likelihood function (17) reduces to,

In (17), we replace S(t 1i, t 2i) by (14), ƒ(t 1i , t 2i) by (12), and F 1(t 1i) and F 2(t 2i) are given by (9), that is,

The first derivatives of S(t 1i, t 2i) with respect to t 1i and t 2i are given by,

that is,

For a Bayesian analysis, let us assume the following prior distribution for λ1, λ2, α 1, α 2 and θ:

for j = 1,2; U[a,b] denotes an uniform distribution in the interval (a,b); a j, b j, c j and c j are known hyperparameters. We further assume prior independence among the parameters.

Other prior distributions also could be considered, as gamma priors for α j and λj, j = 1,2.

The joint posterior distribution for v = (λ1, λ2, α 1, α 2, θ)' is given by,

where π(v) is the joint prior distribution for v;L(v | t) is the likelihood function (17) and t = (t 1, ..., t n), t i = (t 1i, t 2i), i = 1, ..., n is a vector of observed lifetime data.

To get the posterior summaries of interest, we simulate samples for the joint posterior distribution (23) using MCMC methods.

In this way, we could simulate samples for the joint posterior distribution (23) from the conditional distributions π1 | λ2, α 1, α 2, θ, t), π2 | λ1, α 1, α 2, θ, t), π (α 1 | λ1, λ2, α 2, θ, t), π (α 2 | λ1, λ2, α 2, θ, t) and π (θ | λ1, λ1, α 1, α 2, t) using the Gibbs sampling algorithm or the Metropolis-Hastings algorithm, when the conditional distributions are not identified as known distributions that are easy to simulate.

Considering the presence of a vector X = (X 1, X 2, ..., X p)' of covariates associated to each bivariate lifetime T 1 and T 2, let us consider the following regression model:

where β j = (β j1, β j2, ..., β jp)' is the regression parameter vector, j = 1,2, associated to the covariate vector x i = (x 1i, x 2i, ..., x pi)', i=1,2, ..., n.

We also assume the presence of censored observations.

For a Bayesian analysis, we assume the following prior distributions for γ j, α j, β jk and θ:

for j = 1,2; k = 1, ..., p and a j, b j, c j, d j and g are known hyperparameters and N(0, g 2) denotes a normal distribution with mean zero and variance g 2. We further assume prior independence among the parameters.

6 APPLICATIONS

6.1 Simulated data sets

As a first application, let us consider four simulated data sets from the bivariate generalized exponential distribution (10) in the presence of censored data with sample sizes n=10 (d 1 = 10, d 2 = 9); n = 20 (d 1 = 8, d 2 = 20); n = 30 (d 1 = 27, d 2 = 30) and n = 50 (d 1 = 44, d 2 = 47) where d 1 is the number of uncensored lifetimes t 1i and d 2 is the number of uncensored lifetimes t 2i, i = 1,2, ..., n, and the following values for the parameters of the model: λ1 = 0.001; λ2 = 0.005; α 1 = 1; α 2 = 0.5 and θ = 0.5.

We assume a reparametrization for λ1 and λ2 given by = log(λ1) and = log(λ2) to get better convergence for the Gibbs sampling algorithm using the software Winbugs (Spiegelhalter et al., 200327 SPIEGELHALTER DJ, THOMAS A, BEST NG & LUNN D. 2003. Winbugs version 1.4 user manual, MRC Biostatistics Unit, Institute of Public Helath and department of epidemiology and public health, Imperial College, School of Medicine (http://www.mrc_bsu.com.ac.uk/bugs).
http://www.mrc_bsu.com.ac.uk/bugs...
) which only requires the specification of the joint distribution for the data and prior distributions for the parameters. For all considered sample sizes, we assume the following prior distributions:

~ U(-10, 0), ~ U(-10, 0), α 1 ~ U(0, 3), α 2 ~ U(0, 2) and θ ~ U(0, 1).

The selection of the hyperparameters is made so that the prior expectation of the parameters is close to their true values while the variances assume very small values in order to haveinformative prior distributions. Thus, a sensitivity analysis was performed to determine a final specification of the parameters. Here we present only the best result we obtained for these priors.

In Table 1, we have the posterior summaries of interest obtained from a final simulated Gibbs sample of size 2000 of the joint posterior distribution for the parameters of the model after discarding the first 5000 Gibbs samples to eliminate the effect of the initial values for the Gibbs sampling algorithm. In Table 1, we also have the maximum likelihood estimates (MLE) and the 95% confidence intervals based on the asymptotical normality of the MLE.

Table 1
Posterior summaries (simulated data sets) and MLE estimates.

From the results of Table 1, we observe that we get accurate Bayesian inferences for the parameters of the model, especially considering large samples sizes. The only exception is the parameter θ, where the 95% credible intervals are very large considering the four simulated samples (n = 10, n = 20, n = 30 or n = 50).

Figure 2 illustrates the performance of marginal posterior densities for each parameter α 1, α 2, λ1, λ2 and θ when the sample size increase, that is, for n = 10, 20, 50 and 100. From the plots of Figure 2, we observe that as the sample size increases, we have more accurate Bayesian inferences (see also Table 1).

Figure 2
Marginal posterior densities for each parameter with different sample sizes, n= 10, 20, 50 and 100.

To improve the Bayesian inferences for θ, we could assume a transformation, , considering θ positive an assumption usually observed for the data (0 < θ < 1), the same prior distributions for , , α 1 and α 2 assumed for the results of Table 1 and a normal prior N(0, 1) distribution for , we have in Table 2, the posterior summaries of interest.

Table 2
Posterior summaries (simulated data set, logit transformation for θ).

From the results of Table 2, we observe similar posterior summaries for the parameters α 1, α 2, , , λ1 and λ2 as obtained in Table 1, but better accuracy for the Bayesian estimator for θ as observed in the 95% credible intervals. We also observe that the accuracy of the Bayesian intervals for θ have some improvement as the data sample sizes increases, an indication ofidentifiability using the reparametrized form of θ.

The Bayesian estimation also performs well in applications with a large number of censored data. Table 3 shows the results of simulation studies for different sample sizes as n=10 (d 1 = 10, d 2 = 5); n = 20 (d 1 = 12, d 2 = 20); n = 50 (d 1 = 30, d 2 = 20) and n = 100 (d 1 = 30, d 2 = 90).

Table 3
Posterior summaries of simulated data set with high level of censoring.

In summary, we observe good inferences considering the posterior mean for all the random quantities, especially when the samples size increases in comparison to the real value of the parameters (λ1 = 0.001; λ2 = 0.005; α 1 = 1; α 2 = 0.5 and θ = 0.5) used to simulate the lifetimes T 1 and T 2.

Figure 3 shows a comparison of the posterior distributions of each parameter in order to analyze the sensitivity of the estimation when the number of censored lifetime data is increased. We consider samples of size n = 50 with (d 1 = 44, d 2 = 47) and (d 1 = 30, d 2 = 20). Despite the high degree of censoring, the performance of the estimates are quite similar for the parameters α 1, λ1 and θ of the bivariate generalized exponential distribution. We note that the parameters α 2 and λ2 are more sensitive to censorship certainly due to the higher percentage of censorship occurred for T 2 variable, hence requiring a larger number of observations.

Figure 3
Marginal posterior densities for each parameter with different sample sizes, n = 10, 20, 50 and 100 with high level of censoring.

6.2 Recurrence times of infection for kidney patients

In this application, we consider a survival time data set introduced by McGilchrist & Aisbett (1991)16 MCGILCHIST C & AISBETT C. 1991. Regression with frailty in survival analysis. Bimetrics, 47: 461-466. related to kidney infection where the recurrence of infection for 38 Kidney patients, using portable dialysis machines, are recorded. The time recorded, called infection time, is either the lifetime (in days) of the patient until an infection occurred and the catheter had to be removed, or the censoring time, where the catheter was removed by other reasons. The catheter is reinserted after some time and the second infection time is again observe or censored (data set in Table 4).

Table 4
Recurrence times of infections in 38 kidney patients.

As a first analysis, let us assume the bivariate generalized exponential distribution with density (12) not considering the presence of the covariate sex. Let us devote this model as "model 1".

For a Bayesian analysis of "model 1", not considering the presence of the covariate sex, let us assume the same reparametrization for λ1, λ2 and θ considered in Section 6.1, that is, = log(λ1), = log(λ2) and (from a preliminary data analysis, the sample correlation between the lifetimes considering only the uncensored observations is given by 0.181; that is, we are assuming 0 < θ < 1), and the following prior distributions: ~ U(-10, 0), ~ U(-10, 0), α 1 ~ U(0, 3), α 2 ~ U(0, 2) and ~ U(-1, 1).

Using the WinBugs software, we simulated 2000 Gibbs samples from the joint posterior distribution of interest taking every 30th simulated sample after a "burn-in-sample" period of size 5000.

Convergence of the Gibbs sampling algorithm was monitored using standard existing methods as traceplots for the simulated samples. The posterior densities for each parameter are shown in Figure 4.

Figure 4
Marginal posterior densities for each parameter of "model 1".

In Table 5, we have the posterior means, the posterior standard-deviations and 95% Bayesian intervals for the parameters of the model. We also have the MLE estimates and their asymptotical confidence intervals.

Table 5
Bayesian summaries ("model 1" not considering the covariate sex).

Often concern focusses on the survival function S(t 1, t 2) given in (14). For instance, by using the priors proposed in this paper we have P(T 1 > 8 days, T 2 > 12 days) = 0.833 with a credible interval (0.737, 0.914).

Figure 5 shows graphically the posterior distribution of the probability P(T 1 > 8 days, T 2 > 12 days).

Figure 5
Posterior density of survival S(8,12).

Finally, we derive the joint density and survival functions from the (12) and (14) respectively, for kidney infection in patients data and a plot of both functions is provided in Figure 6.

Figure 6
Plots and contours of bivariate functions for the "model 1".

In the presence of the covariate sex, denoted by X, we assume the regression model introduced in Section 5, that is,

where I = 1, 2, ..., 38; X i = 1 (male) and X i = 0 (female) (see (24)). Let us denote this model as "model 2".

Also assuming the reparametrization , and the prior distributions γ j ~ U(0, 1), β j ~ N(0, 1), α j ~ U(0, 3), j = 1, 2 and ~ U(-1, 1), we have in Table 5, the posterior summaries of interest based on 2000 simulated Gibbs samples (every 30th sample and a "burn-in-sample" of size 5000).

From the results of Table 6, we observe that zero is included in the 95% credible intervals for β 1 and β 2, but zero is very close to the inferior credible limit for β 1, that is, we have an indication that the lifetime T 1 is affected by the covariate sex.

Table 6
Bayesian summaries ("model 2" in the presence of the covariate sex).

To compare the proposed models, we could use some existing Bayesian discrimination criteria, as for example, DIC (Deviance information criterion) introduced by Spiegelhalter et al. (2002)26 SPIEGELHALTER DJ, BEST N, CARLIN B & VAN DER LINDE A. 2002. Bayesian measures of model complexity and fit (with discussion). Journal of the Royal Statistical Society, B, 64: 583-639. (see appendix APPENDIX DEVIANCE INFORMATION CRITERION (DIC) The Deviance Information Criterion (DIC) is a criterion specifically useful for selection model under the Bayesian approach where samples of the posterior distributions for the parameters of the model are obtained using MCMC methods. The deviance is defined by, where θ is a vector of unknown parameters of the model, is the likelihood function of the model and c is a constant that does not need to be known when the comparison between models is made. The DIC criterion defined by Spirgelhalter (2002) is given by, where D( ) is the deviance evaluated at the posterior mean =E(θ | data) and nD is the effective number of parameters of the model given by nD = - D( ), where = E(D(θ) | data) is the posterior deviance measuring the quality of the data fit for the model. Smaller values of DIC indicates better models. Note that these values could be negative. ) and given automatically by the WinBugs software. Assuming "model 1" (not considering the presence of the covariate sex), we obtained a MonteCarlo estimate for DIC given by the value 686.33; assuming "model 2" (presence of the covariate sex), the DIC value is given by 677.695; that is, "model 2" gives better fit for the data set introduced in Table 3 (smaller value for DIC).

7 CONCLUDING REMARKS

The use of bivariate generalized exponential distributions derived from copula functions couldbe a good alternative to analyze bivariate lifetime data, usually in the presence of censored observations and covariates. Other copula functions also could be used to derive bivariate lifetime distributions. Bayesian inference for this class of models using standard existing MCMC methods is a good alternative to get accurate inference results. It is important to point out that the dependence parameter θ can present some identifiability problems and this problem could be solved considering an informative prior from θ based on expert opinion or considering a reparametrization, as it was observed in this paper.

The use of existing Bayesian softwares like the WinBugs software gives a great simplification to get the posterior summaries of interest.

  • 1
    BLOCK H & BASU A. 1974. A Continuous bivariate exponential extension. Journal of the American Statistical Association, 69: 1031-1037.
  • 2
    CASELLA G & GEORGE E. 1992. Explaining the Gibbs samples. The American Statistician, 46: 167-174.
  • 3
    CHIB SE. 1995. Understanding the Metropolis-Hastings algorithm. The American Statistician, 49: 327-335.
  • 4
    CLEMEN RT, FISCHER GW & WINKLER RL. 2000. Assessing dependence: Some experimental results. Management Science, 46: 1100-1115.
  • 5
    DOWTON F. 1970. Bivariate exponential distributions in reliability theory. Journal of Royal Statistical Society, B, 32: 408-417.
  • 6
    FREES EW & VALDEZ EA. 1998. Understanding Relationships Using Copulas. North American Actuarial Journal, 2: 1-25.
  • 7
    FREUND J. 1961. A bivariate extension of the exponential distribution. Journal of the American Statistical Association, 56: 971-977.
  • 8
    GELFAND AE & SMITH AFM. 1990. Sampling based approaches to calculating marginal densities. Journal of the American Statistical Association, 85: 398-409.
  • 9
    GUMBEL E. 1960. Bivariate exponential distributions. Journal of the American Statistical Association, 55: 698-707.
  • 10
    GUPTA RD & KUNDU D. 1999. Generalized exponential distribution. Australian and New Zealand Journal of Statistics, 41: 173-188.
  • 11
    GUPTA RD & KUNDU D. 2007. Generalized exponential distribution: existing results and same recent developments, Journal of Statistical Planning and Inference, DOI: 10.1016/j.jspi. 2007.03.030.
  • 12
    GUPTA RD & KUNDU D. 2008. Generalized exponential distribution: Bayesian inference. Computational Statistics and Data Analysis, 52(4): 1873-1883.
  • 13
    HAWKES A. 1972. A bivariate exponential distribution with application to reliability. Journal of the Royal Statistical Society, B, 34: 129-131.
  • 14
    LAWLESS J. 1982. Statistical model and methods for lifetime data. John Wiley. New York.
  • 15
    MARSHALL A & OLKIN I. 1976. A generalized bivariate exponential distribution. Journal of Applied Probability, 4: 291-302.
  • 16
    MCGILCHIST C & AISBETT C. 1991. Regression with frailty in survival analysis. Bimetrics, 47: 461-466.
  • 17
    MORGENSTERN D. 1956. Einfache Beispiele Zweidimensionaler Verteilungen. Mitteilingsblatt fur Matgematische Statistik, 8: 234-253.
  • 18
    NELSEN RB. 1999. An introduction to copulas. Springer-Verlag, New York.
  • 19
    RAQAB MZ. 2002. Inferences for generalized exponential distribution based on record statistics. Journal of Statistics Planning and Inference, 104: 339-350.
  • 20
    RAQAB MZ & AHSARULLAH M. 2001. Estimation of the location and scale parameters of generalized exponential distribution based on order statistics. Journal of Statistics Computation and Simulation, 69: 109-124.
  • 21
    SARHAN AM. 2007. Analysis of incomplete, censored data in competing risks models with generalized exponential distribution. IEEE Transactions on Reliability, 56: 132-138.
  • 22
    SARKAR S. 1987. A continuous bivariate exponential distribution. Journal of the American Statistical Association, 82: 667-675.
  • 23
    SKLAR A. 1959. Fonctions de repartition à n-dimensions et leurs marges. Inst. Stat. University Paris, 8: 229-231.
  • 24
    TRIVEDI PK & ZIMMER DH. 2005a. Copula Modelling, New Publishes.
  • 25
    TRIVEDI PK & ZIMMER DH. 2005b. Copula modeling: an introduction to practicioneis. Foundations and trends in econometrics, 1(1): 1-111.
  • 26
    SPIEGELHALTER DJ, BEST N, CARLIN B & VAN DER LINDE A. 2002. Bayesian measures of model complexity and fit (with discussion). Journal of the Royal Statistical Society, B, 64: 583-639.
  • 27
    SPIEGELHALTER DJ, THOMAS A, BEST NG & LUNN D. 2003. Winbugs version 1.4 user manual, MRC Biostatistics Unit, Institute of Public Helath and department of epidemiology and public health, Imperial College, School of Medicine (http://www.mrc_bsu.com.ac.uk/bugs).
    » http://www.mrc_bsu.com.ac.uk/bugs
  • 28
    ZHENG G. 2002. Fisher information matrix in type II censored data from exponentiated exponential family. Biometrical Journal, 44: 353-357.

APPENDIX

DEVIANCE INFORMATION CRITERION (DIC)

The Deviance Information Criterion (DIC) is a criterion specifically useful for selection model under the Bayesian approach where samples of the posterior distributions for the parameters of the model are obtained using MCMC methods.

The deviance is defined by,

where θ is a vector of unknown parameters of the model, is the likelihood function of the model and c is a constant that does not need to be known when the comparison between models is made.

The DIC criterion defined by Spirgelhalter (2002)26 SPIEGELHALTER DJ, BEST N, CARLIN B & VAN DER LINDE A. 2002. Bayesian measures of model complexity and fit (with discussion). Journal of the Royal Statistical Society, B, 64: 583-639. is given by,

where D( ) is the deviance evaluated at the posterior mean =E(θ | data) and nD is the effective number of parameters of the model given by nD = - D( ), where = E(D(θ) | data) is the posterior deviance measuring the quality of the data fit for the model. Smaller values of DIC indicates better models. Note that these values could be negative.

Publication Dates

  • Publication in this collection
    Jan-Apr 2015

History

  • Received
    04 Nov 2012
  • Accepted
    11 Apr 2014
Sociedade Brasileira de Pesquisa Operacional Rua Mayrink Veiga, 32 - sala 601 - Centro, 20090-050 Rio de Janeiro RJ - Brasil, Tel.: +55 21 2263-0499, Fax: +55 21 2263-0501 - Rio de Janeiro - RJ - Brazil
E-mail: sobrapo@sobrapo.org.br