Acessibilidade / Reportar erro

The long-term exponentiated complementary exponential geometric distribution under a latent complementary causes framework

Abstracts

In this paper we proposed a new long-term distribution derived from the exponentiated complementary exponential geometric distribution (LECEG). The LECEG distribution is obtained straightforwardly from the exponentiated complementary exponential geometric (ECEG) and accommodates decreasing and unimodal hazard functions in a latent complementary causes scenario, where only the maximum lifetime among all causes is observed. We derive the density, quantile, survival and failure rate functions for the proposed distribution, as well as some proprieties such as the characteristic function, mean, variance and r-th order statistics. The estimation is based on maximum likelihood approach. A simulation study is performed in order to assess the performance of the maximum likelihood estimates. The practical importance of the new distribution was demonstrated in three real datasets.

exponentiated complementary exponential geometric distribution; latent competing risks; long-term survivals


Neste trabalho, uma nova distribuição de longa duração derivada da distribuição geométrica exponencial complementar exponenciada é apresentada. A distribuição LECEG acomoda taxa de risco decrescente e unimodal, presente em um cenário de causas complementar latente, em que apenas o tempo máximo de vida entre todas as causas é observado. Obtivemos as funções de densidade, quantis, sobrevivência e taxa de falha para a distribuição proposta, bem como algumas propriedades, tais como a função característica, média, variância e estatísticas r-ésima ordem. A estimativa é baseada na abordagem de máxima verossimilhança. Um estudo de simulação é realizado a fim de avaliar o desempenho das estimativas de máxima verossimilhança. A importância prática da nova distribuição foi demonstrada em três conjuntos de dados reais.

distribuição geométrica exponencial complementar exponenciada; termo de longa duração; riscos complementares


The long-term exponentiated complementary exponential geometric distribution under a latent complementary causes framework

F. LouzadaI,* * Corresponding author: Francisco Louzada ; C.Y. YamachiII; V.A.A. MarchiII; M.A.P. FrancoII

IDepartamento de Matemática Aplicada e Estatística, ICMC, USP, 13566-590 São Carlos, SP, Brazil. E-mail: louzada@icmc.usp.br

IIDepartamento de Estatística, UFSCar, 13565-905 São Carlos, SP, Brazil. E-mails: cintiayurie@yahoo.com.br; vitor.marchi@gmail.com; mapfranco@ufscar.br

ABSTRACT

In this paper we proposed a new long-term distribution derived from the exponentiated complementary exponential geometric distribution (LECEG). The LECEG distribution is obtained straightforwardly from the exponentiated complementary exponential geometric (ECEG) and accommodates decreasing and unimodal hazard functions in a latent complementary causes scenario, where only the maximum lifetime among all causes is observed. We derive the density, quantile, survival and failure rate functions for the proposed distribution, as well as some proprieties such as the characteristic function, mean, variance and r-th order statistics. The estimation is based on maximum likelihood approach. A simulation study is performed in order to assess the performance of the maximum likelihood estimates. The practical importance of the new distribution was demonstrated in three real datasets.

Keywords: exponentiated complementary exponential geometric distribution, latent competing risks, long-term survivals.

RESUMO

Neste trabalho, uma nova distribuição de longa duração derivada da distribuição geométrica exponencial complementar exponenciada é apresentada. A distribuição LECEG acomoda taxa de risco decrescente e unimodal, presente em um cenário de causas complementar latente, em que apenas o tempo máximo de vida entre todas as causas é observado. Obtivemos as funções de densidade, quantis, sobrevivência e taxa de falha para a distribuição proposta, bem como algumas propriedades, tais como a função característica, média, variância e estatísticas r-ésima ordem. A estimativa é baseada na abordagem de máxima verossimilhança. Um estudo de simulação é realizado a fim de avaliar o desempenho das estimativas de máxima verossimilhança. A importância prática da nova distribuição foi demonstrada em três conjuntos de dados reais.

Palavras-chave: distribuição geométrica exponencial complementar exponenciada, termo de longa duração, riscos complementares.

1 INTRODUCTION

Survival analysis are usually considered in areas such as public health, actuarial science and industrial reliability having with initial approach the exponential distribution. The use of the exponential distribution requires that the failure rate function be constant. However, sometimes the constant failure rate is not satisfied and so distributions which non-constant failure rate are necessary. The literature is vast and grows constantly and interested readers can refer to Adamidis & Loukas (1998), M. Chahkandi & Ganjali (2009), Louzada et al. (2011), Morais & Barreto-Souza (2010), Cancho et al. (2011), F. Hemmati et al. (2011), Bakouch et al. (2011), Louzada et al. (2013) amongst others.

Adamidis & Loukas (1998) proposed the exponential geometric (EG) distribution. The EG distribution is a compound of the geometric with the exponential distribution. The EG distribution is characterized in a latent competing risks scenarios (Louzada-Neto, 1999) where the competing causes are unknown and only the minimum failure lifetime is observed. It also accommodates decreasing failure rates. Louzada et al. (2011) proposed the complementary exponential geometric (CEG) distribution. The CEG distribution is characterized in a latent complementary risks scenarios and accommodate increasing failure rate. The latent complementary scenario is characterized where the competing causes are unknown and only the maximum failure lifetime is observed. Yamachi et al. (2013) proposed the exponentiated complementary exponential geometric (ECEG) distribution which is a generalization of the CEG distribution. The ECEG distribution accommodate increasing and decreasing failure rates.

In some cases one part of the population is not susceptible to the event of interest and according to R. Maller & X. Zhou (1996), it seems adequate to consider the two components mixture model, where a component represents the survival time of susceptible individuals to a certain event (in risk - IR), while the other component represents the survival times of the not susceptible individuals to the event (out of risk - OR), allowing infinite survival times. An individual belongs to one group or another with certain probability.

The two components mixture model has been used in medicine, especially for data analysis of cancer clinical trials were we observe the time to occurrence of death, or the time until the outbreak of a disease, but in the presence of a significant proportion of cured or immune patients (J.W. Boag, 1949; J. Berkson & R.P. Gage, 1952). Farewell (1982) worked with some distributions with long term survivors, (Maller & S. Zhou, 1995) discuss test for models with presence of people immune to event, (Chen & Ibrahim, 2001) analyse maximum likelihood method to estimate models with cure fraction and missing covariates, (Cancho et al., 2009) introduced the Log-exponentiated-Weibull Regression Models with cure rate, (Perdona & Louzada, 2011) with a failure rate model in the presence of immune patients.

The long-time model formulation using the components mixture model is described as following. Let Y be a random variable that represents the time until the occurrence of a event of interest, and p be the probability of an individual belongs to the group OR. Considering that exist the possibility of the individual be not susceptible to the event of interest, the improper population survival function is given by (R. Maller & X. Zhou, 1996), S(y) = pSO R(y) + (1 - p)SI R(y), where SO R(y) and SI R(y) are the survival functions of the individuals OR and IR, respectively. The individual in OR not present the time of the event of interest, so SO R(y) = 1. Then, S(y) can be rewritten as,

From (1.1), limy→ ∞ S(y) = p, and therefore the survival function is improper and the limit correspond to the OR proportion.

Considering that the event of interest may be caused by unknown number of competing/complementary causes (Louzada-Neto, 1999), we have a scenario with latent competing/complementary failure causes in presence of long-term survivals. We can use for SI R(y) the EG or CEG distributions, amongst others. For example, (Roman et al., 2012) presented the LEG distribution, that uses the EG distribution in SI R(y).

Assuming that the SI R(y) is given by the ECEG distribution, we have

Using (1.2) in (1.1) we have the survival function of the long-term exponentiated complementary exponential geometric (LECEG) distribution.

The paper is organized as follows. Section 2 presents the density, quantile, survival and failure rate functions for the proposed distribution and some proprieties such the characteristic function, mean, variance and k-th order statistics. Section 3 presents the inference based on maximum likelihood approach. Section 4 presents the results of a simulation study to assess the performance of the maximum likelihood estimates. Section 5 presents the LECEG distribution in presence of covariates. Section 6 illustrates the application of the proposed distribution in three real datasets. Section 7 provides some concluding remarks.

2 MODEL FORMULATION

Let Y be a nonnegative random variable denoting the lifetime of a component. The random variable Y is said to have a LECEG distribution with parameters α > 0 λ > 0,0 < θ < 1 and 0 < p < 1 if its probability density function (pdf) is given by,

where λ is the scale parameter, α and θ are shape parameters and p is the long-term parameter. Figure 1 shows the LECEG pdf and survival function for p = 0.1, α = 0.1,0.4,0.7,1,2,3.5, θ = 0.2,0.7,0.8 and λ = 2,3.


The improper survival function of the LECEG distribution is given by,

where y > 0, θ ∈ (0,1), α, λ > 0, p ∈ (0,1).

The quantile function of (2.1) is given by

where u ∈ (0,1 - p).

From (2.1) and (2.2) it is easy verify that the failure rate function for the LECEG distribution is given by

Figure 2 shows the failure rate function for α = 0.1,0.4,0.7,1,2,3.5, λ = 2,3, p = 0.1 and θ = 0.2,0.7,0.8. It is noted that the failure rate function (2.4) is decreasing or unimodal. Assuming some values to the parameters α, λ, θ and p, it is noted that: If 0 < α < 1 we have decreasing failure function and if α > 1 we have unimodal failure rate function.


From Figure 2 it is noted that the failure function decreases as the proportion of cured patients increases.

Proposition 2.1. The characteristic function of a random variable Y with LECEG distribution is given by

Proof. Using the known formula ΦY(y) = ei ty f(t)dt,

where u = 1 - e-λt. Using (7.2) for solve the integral in (2.6) the characteristic function is obtained.

Proposition 2.2. The mean and variance from a random variable Y with LECEG distribution are given by

respectively, where 2F1(0,0,1,0)(a, b, c; z) =

+ ψ(c)2F1(a, b, c; z), γ is the Euler's constant γ0.5772, ψ(0) is the digamma function, ψ(1) is the first derivative of the digamma function, (x)k is the pochhammer symbol Γ(x + k) / Γ(x), ψ(x) = Γ'(x) / Γ(x) and 2F1(0,0,2,0)(a, b, c; z) = .

Proof. Considering that E(Yr) = Y (0)(r)/ir, we can obtain the first and second moments, E(Y) and E(Y2), respectively. Using the first and second moments, the variance is obtained by V(Y) = E(Y2) - [E(Y)]2. The results were founded using the Mathematica Software (2010). Order statistics play an important role in quality control testing and reliability, where it is necessary predict the failure of a future item based on the times of a few early failure. These predictors are often based on moments of order statistics.

Order statistics play an important role in quality control testing and reliability, where it is necessary predict the failure of a future item based on the times of a few early failure. These predictors are often based on moments of order statistics.

Proposition 2.3. Let Y1, Y2, ..., Yn be iid random variable such that Yj follows a LECEG distribution for j = 1, 2, ..., n. The pdf of the k-th order statistic, say Yk:n, is given by (for y > 0)

where

Proof. Considering a random sample of size n from the LECEG distribution. It is well known that

where

Using (2.7), we have,

Proposition 2.4. The characteristic function of the k-th order statistic from a random variable Y with LECEG distribution is given by

where

Proof. Using the known formula ≅Yk:n(t) = ei t y fk:n(t)dt and (7.1) from Appendix Appendix ,

where

Proposition 2.5. The mean of k-th order statistic from a LECEG distribution is given by

Proof. Considering that E(Yk:n) = ≅Yk:n (0)' / ir, we have the first moment.

In reliability, the ratio of two consecutive moments of reversed residual life characterize the distribution uniquely. The reversed failure rate function is given by

3 INFERENCE

In this section we consider maximum likelihood estimation (MLE). Assuming the lifetimes Yi, i = 1, ..., n from the LECEG distribution independently and identically distributed and independent from the censoring mechanism ci, i = 1, ..., n, the maximum likelihood estimates (MLEs) of the parameters are obtained by direct maximization of the log-likelihood function given by,

where ci is a censoring indicator which is equal to 0 or 1, if the data is censured or observed, respectively.

The advantage of the MLE procedure is that it runs immediately using existing statistical packages. We have considered the optim routine of the R (Team, 2008). Large-sample inference for the parameters are based on the MLEs and their estimated standard errors in the asymptotic normality.

The asymptotic normality is also useful for testing goodness of fit of the parameters in the distribution, also it is useful for comparing the LECEG distribution with some of its special sub-models, the long-term complementary exponential geometric (LCEG) and the long-term exponential (LE) distributions, via the likelihood ratio (LR) statistic.

4 SIMULATION STUDY

In this section is realized a simulation study to assess the asymptotic performance of MLEs in the LECEG model. Was generated lifetimes samples from the LECEG distribution with parameters values α = 1.3, θ = 0.9, λ = 1.1 and a cured fraction p = 0.10 through rejection algorithm using the weibull distribution with scale parameter 1.2 and shape 1.1 as auxiliary function. Also the samples was generated with censorship levels of 20%, 30% and 40% for sample sizes of n = 30, 50, 100, 200 and 500. For each sample size, we conducted 1000 simulations and then calculated the average bias (AB) and the mean square error (MSE) of the MLEs. Table 1 shows the bias and MSE of the MLEs for sample sizes n = 30, 50, 100, 200 and 500. From Table 1 was observed that the AB and MSE are close to zero and the MSEs decrease as sample size increases and both AB and MSE increases according to censorship level increases in the sample. Similar results are observed for others combinations of α, λ, θ and p.

5 THE LECEG REGRESSION MODEL

In this section the LECEG is presented when the location parameter is affected by covariates. Models that the location parameter is affected by covariates are considered when the survival time is affected by characteristics (covariates) of the individual.

Considering covariates for the location parameter λ in the LECEG model, the survival function (2.2) for the individual yi, i = 1, ..., n is given by

where λi = , = (1, xi1, ..., xip) is the covariate vector and β' = (β0, ..., βp) is the respective coefficient vector.

From (3.1), the log-likelihood is given by

where λi= , x' = (1, xi1, ..., xip) is the covariate vector and β' = (β0, ..., βp) is the respective coefficient vector.

In modeling, to determine which potential covariates are influencing the response variable, processes of covariate selection such forward and backward stepwise are commonly used. Based on the covariate selection process described in Collet (1994), we consider the following steps:

Step 1: Fit the model with only the intercept and record the log likelihood, then adjust all possible models with one covariate and perform the likelihood ratio test to see which were significant adjustments. If we have more than one model remain significantly with the one that has the lowest -l(.) and AIC, BIC criteria.

Step 2: Fit the model with two covariates considering the covariate model selected in Step 1, and noting the one with the lowest -l(.) and AIC, BIC criteria.

Step 3 and so on: Fit models with the possible addition of another covariate always considering the covariates selected in the previous step until there remain no further covariate, remembering to stay with the one with the lowest -ℓ(.) and AIC, BIC criteria.

6 APPLICATIONS

In this section, we compare the LECEG distribution with its particular case (the LCEG distribution and LE distribution), as well as with the long-term exponentiated gumbel (LEGU) and long-term Lindley (LLI) distribution, on three datasets.

LEGU is the distribution from (Lindley, 1958) added the parameter p of long-term and LLI the distribution from (Nadarajah & Kotz, 2006) with the p long-term parameter added.

The datasets are related to the medical field, which is related to the genesis and establishment of the proposed model. The main idea is show the applicability of the new distribution and the direct possibility of choosing between it or its particular case, as well as its competitiveness in terms of fitting related to an usual survival distribution.

As our first dataset, we consider 25 lifetimes with approximately 32% of censoring from P. Allison (1995). The second dataset, consists of 40 lifetimes of patients undergoing treatment with 7.5% of censoring Prentice (1973).

Firstly, in order to verify the shape of the failure rate function, we follow a standard graphical methodology for data analysis, the total time on test (TTT) plot, described by R. Maller & X. Zhou (1983). According to Aarset (1987), in its empirical version the TTT plot is given by G(r / n) = [( Yi : n) - (n - r)Yr : n] / ( Yi : n), where r = 1, ..., n and Yi : n represent the order statistics of the sample. It has been shown that the failure function is increasing (decreasing) if the TTT plot is concave (convex). Figure 3 show the TTT plot for the considered datasets, implying in decreasing failure rate functions. Therefore was fitted the five distributions for the datasets, the LECEG, LCEG, LE, LEGU and LLI. Table 2 shows the MLEs and their variances in parentheses for the five fitted distributions.


We compare the fitted distribution by the -ℓ (where = (, , , )), the AIC (Akaike's information criterion, -2ℓ + 2k, where k is the number of parameters in the model) and BIC (Schawartz's Bayesian information criterion, -2ℓ + 2log(n), where n is the size sample). The preferred model is the one with the smaller value on each criterion. Table 3 shows thevalues of the -ℓ, AIC and BIC, the Kolmogorov-Sminorv (KS) statistic and its p-value, jointly with the criterion values of the long-term exponentiated gumbel (LEGU) distribution and the long-term Lindley (LLI) distribution. The LECEG distribution outperforms its concurrent distributions in all considered criteria for the two datasets. These results are corroborated by the plots in Figure 4, which shows the fitted survival functions for all fitted distributions and the empirical Kaplan-Meier survival function.


As our third dataset, consider the data extracted from Kalbfleisch & Prentice (1980), An acknowledgement goes to the book Kalbfleisch & Prentice (1980). The dataset considered consist of the following variables: Acceptance years (X1), years (X2), status (alive or dead) (c), transplant(yes or no) (X3), surgery (if the patient has gone through some kind of surgery before) (X4) and the lifetime (Y). Table 4 shows the coefficient and/or interaction term, with their standard deviations in parenthesis, obtained from the MLE procedure using the variable selection describes in Section 5. Also the -l(.), AIC and BIC criterion values are presented. From Table 4 the LECEG model was considered as the best fitted model.

According Cox & Snell (1968) if the model fits the data well then the true cumulative hazard function conditional on the covariate vector has an exponential distribution with a hazard rate of one, i.e, we can verify if the residuals have an exponential distribution with parameter one. Figure 5 shows the Kaplan Meier curve of the Cox-Snell residuals for the fitted distributions superimposed by an exponential distribution with a hazard rate of one, indicating a reasonable fit for the LECEG model over the LCEG and LE models.


Considering the LECEG model with λi = exp(8.71 - 32.33 * X3 - 1.06 * X4 - 0.17 * X1 + 0.44 * X1 * X3), the average risk of dying for those patient who did not undergo surgery is 2.89 times of the average risk of dying for those patient already underwent surgery.

7 CONCLUDING REMARKS

In this paper we provided the LECEG distribution that is an extension of the ECEG distribution with a long term parameter and belong in the latent complementary risks scenarios, i.e., where the lifetime associated with a particular risk is not observable but the maximum lifetime among all risks. The properties of the proposed distribution are discussed, including its probability density function, the quantile, the survival and failure rate functions, the characteristic function and r-th order statistics. Maximum likelihood inference is implemented straightforwardly. The practical importance of the LECEG distribution was demonstrated in three applications.

ACKNOWLEDGMENT

We thanks the Editorial Boarding and the Anonymous Referees for their comments and criticisms in earlier versions of the manuscript. The research is partially supported by CNPq, CAPES and FAPESP.

Received on January 13, 2013

Accepted on March 11, 2014

Appendix A

The following binomial series expansion is used in the present paper

where (r)k = r(r + 1) ... (r + k - 1) is a Pochhammer symbol, (-r)k = (-1)k(r - k + 1)k and if |x| < 1 the series converge.

Through the paper we use the following relationship:

  • [1] K. Adamidis & S. Loukas. A lifetime distribution with decreasing failure rate. Statistics Probability Latters, 39 (1998), 35-42.
  • [2] M. Chahkandi & M. Ganjali. On some lifetime distributions with decreasing failure rate. Computational Statistics & Data Analysis, 53(1) (2009), 4433-4440.
  • [3] F. Louzada, M. Roman & V.G. Cancho. The complementary exponential geometric distribution: Model, properties, and a comparison with its counterpart. Computational Statistics & Data Analysis, 55(8) (2011), 2516-2524.
  • [4] A.L. Morais & W. Barreto-Souza. A coumpound class of weibull and power series distributions. Computational Statistics & Data Analysis, 55(1) (2010), 1410-1425.
  • [5] V.G. Cancho, F. Louzada-Neto & G.D.C. Barriga. The poisson-exponential lifetime distribution. Computational Statistics & Data Analysis, 55(1) (2011), 677-686.
  • [6] F. Hemmati, E. Khorram & S. Rezakhan. A new three-parameter ageing distribution. Journal of Statistical Planning & Inference, 141(1) (2011), 2266-2275.
  • [7] H.S. Bakouch, B.M. Al-Zahrani, A.A. Al-Shomrani, V.A.A. Marchi & F. Louzada. An extended lindley distribution. Journal of the Korean Statistical Society, (2011).
  • [8] F. Louzada, V. Marchi & J. Carpenter. The complementary exponentiated exponential geometric lifetime distribution. Journal of Probability and Statistics, 12, (2013).
  • [9] Louzada-Neto. Poly-hazard regression models for lifetime data. Biometrics, 55 (1999), 1121-1125.
  • [10] C.Y. Yamachi, M. Roman, F. Louzada, Maria A.P. Franco & V.G. Cancho. The Exponentiated Complementary Exponential Geometric Distribution. Journal of Modern Mathematics Frontier, 2(1), (2013).
  • [11] R. Maller & X. Zhou. Survival Analysis with Long-Term Survivors John Wiley and Sons Chichester, (1996).
  • [12] J.W. Boag. Maximum likelihood estimates of the proportion of patients cured by cancer therapy. Journal of the Royal Statistical Society, (1949), 15-44.
  • [13] J. Berkson & R.P. Gage. Survival curve for cancer patients following treatment. Journal of the American Statistical Association, 47 (1952), 501-515.
  • [14] V.T. Farewell. The use of mixture models for the analysis of survival data with long-term survivors. Biometrics, 38(4) (1982), 1041-1046.
  • [15] A.R. Maller & S. Zhou. Testing for the presence of immune or cured individuals in censored survival data. Biometrics, 51 (1995), 1197-1205.
  • [16] M.H. Chen & J.G. Ibrahim. Maximum likelihood methods for cure rate models with missing covariates. Biometrics, 57 (2001), 43-52.
  • [17] V.G. Cancho, E.M.M. Ortega & H. Bolfarine. The log-exponentiated-weibull regression models with cure rate: Local influence and residual analysis. Journal of Data Science, 7 (2009), 433-458.
  • [18] G.S.C. Perdona & F. Louzada. A general hazard model for lifetime data in the presence of cure rate. Journal of Applied Statistics, 38 (2011), 1395-1405.
  • [19] M. Roman, F. Louzada, V.G. Cancho & J.G. Leite. A new long-term survival distribution for cancer data. Journal of Data Science, 10 (2012), 241-258.
  • [20] Mathematica Software. Wolfram mathematica: Technical computing software. Wolfram Research, Inc., (2010).
  • [21] R Development Core Team. R: A language and environment for statistical computing. R Foundation for Statistical Computing, (2008).
  • [22] D. Collet. Modelling Survival Data in Medical Research Chapman and Hall, (1994).
  • [23] D.V. Lindley. Fiducial distributions and bayes theorem. Journal of the Royal Statistical Society, 20(1) (1958), 102-107.
  • [24] S. Nadarajah & S. Kotz. The exponentiated type distributions. Acta Applicandae Mathematicae, 92(2) (2006), 97-111.
  • [25] P. Allison. Survival Analysis Using SAS a pratical guide SAS, (1995).
  • [26] R.L Prentice. Exponential survivals with censuring and explanatory variable. Biometrika, 60 (1973), 279-288.
  • [27] R. Maller & X. Zhou. Graphical Methods for Data Analysis Chapman and Hall, (1983).
  • [28] M.V. Aarset. How to identify a bathtub hazard rate. Statistical Methods and Applications, 36(1) (1987), 106-108.
  • [29] J.D. Kalbfleisch & R.L. Prentice. The Statistical Analysis of Failure Time Data Wiley & Sons, (1980).
  • [30] D.R. Cox & E.J. Snell. A general definition of residuals. Royal Statistical Society, 30(2) (1968), 248-275.

Appendix

  • *
    Corresponding author: Francisco Louzada
  • Publication Dates

    • Publication in this collection
      10 June 2014
    • Date of issue
      Apr 2014

    History

    • Accepted
      11 Mar 2014
    • Received
      13 Jan 2013
    Sociedade Brasileira de Matemática Aplicada e Computacional Rua Maestro João Seppe, nº. 900, 16º. andar - Sala 163 , 13561-120 São Carlos - SP, Tel. / Fax: (55 16) 3412-9752 - São Carlos - SP - Brazil
    E-mail: sbmac@sbmac.org.br