Acessibilidade / Reportar erro

A new class of distributions as a finite functional mixture using functional weights

Abstract

In this paper, we introduce a new family of distributions whose probability density function is defined as a weighted sum of two probability density functions; one is defined as a warped version of the other. We focus our attention on a special case based on the exponential distribution with three parameters, a dilation transformation and a weight with polynomial decay, leading to a new life-time distribution. The explicit expressions of the moments generating function, moments and quantile function of the proposed distribution are provided. For estimating the parameters, the method of maximum likelihood estimation is used. Two applications with practical data sets are given.

Key words
finite mixture; weighted distribution; exponential distribution; moment generating function; maximum likelihood estimation

INTRODUCTION

The mixture distributions arise in a wide variety of applications, including children’s heights distribution, discussed by Everitt & Hand (1981)EVERITT BS & HAND DJ. 1981. Finite mixture distributions. Chapman and Hall London, New York, p. 143., and plasma concentration of Beta-Carotene given in Schlattmann (2009)SCHLATTMANN P. 2009. Medical applications of finite mixture models. Springer-Verlag Berlin Heidelberg, p. 246.. Also, a natural application of mixture distributions is in the modelling of heterogeneous data where each component of the mixture distribution corresponds to a cluster of the data. Since the mixture distributions have the potential to model a wide variety of random phenomena, they have received increasing attention in the literature and have been explored by many researchers in various contexts, see Mclachlan & Peel (2000)MCLACHLAN G & PEEL D. 2000. Finite mixture models, Wiley Series in Probability and Statistics, New York, p. 456.. The \(k\)-component mixture distribution is defined via the following probability density function (pdf):

\[h(x)=\sum\limits_{i=1}^{k} p_{i} f_{i}(x), \label{eq1}\](1)
where \(p_{i}\in [0,1]\), \(\sum\limits_{i=1}^{k} p_{i}=1\), and \(f_{i}(x)\) is the pdf of the \(i\)th cluster of population. The mixture distribution in (1) is expressed as a weighted sum of pdfs and shows enough flexibility in modeling heterogeneous data having multiple modes, see Elmahdy & Aboutahoun (2013)ELMAHDY EE & ABOUTAHOUN AW. 2013. A new approach for parameter estimation of finite Weibull mixture distributions for reliability modeling. Appl Math Model 37(4): 1800-1810., Elmahdy (2017)ELMAHDY EE. 2017. Modelling reliability data with finite Weibull or lognormal mixture distributions. Appl Math Inf Sci 11: 1081-1089., Frühwirth-Schnatter (2006)FRÜHWIRTH-SCHNATTER S. 2006. Finite mixture and Markov switching models. Springer Science and Business Media, p. 167., Seidel (2010)SEIDEL W. 2010. Mixture models. in Lovric, M., International Encyclopedia of Statistical Science, Springer-Verlag Berlin Heidelberg. and the monographs cited above. On the other side, a given distribution can be generalized by employing a composite function. To be more specific, let \(f(x)\) be a pdf with support on \((a,b)\), \(G(x)\) be increasing function on \((a,b)\) with \(\lim\limits_{x\rightarrow a^{+}}G(x)=a\), \(\lim\limits_{x\rightarrow b^{-}}G(x)=b\) and \(g(x)=G'(x)\). Then one can show that \(g(x)f(G(x))\) is also a pdf based on the warping of \(f(x)\). For instance, when \(f(x)\) is the pdf of an exponential distribution, then \(g(x)f(G(x))\) is the pdf of the Weibull distribution for a polynomial term \(G(x)\). Further developments and examples can be found in, e.g., Al-Hussaini (2012)AL-HUSSAINI EK. 2012. Composition of cumulative distribution functions. J Statist Th Appl 11: 323-336., Alzaatreh et al. (2015ALZAATREH A, FAMOYE F & LEE C. 2015. A new method for generating families of continuous distributions. Metron 71: 63-79., 2012ALZAATREH A, LEE C & FAMOYE F. 2012. Gamma-Pareto distribution and its applications. J Mod Appl Stat Methods 11: 78-94.), Sharma et al. (2017)SHARMA VK, BAKOUCH HS & SUTHAR K. 2017. An extended Maxwell distribution: Properties and applications. Commun Stat Simul Comput 46: 6982-7007., and the references therein.

Combining the two approaches mentioned above, a natural way to increase the flexibility of \(f(x)\) and \(g(x)f(G(x))\) is to consider a mixture of these two pdfs defined by \(h(x)=p f(x)+(1-p)g(x)f(G(x))\) with \(p\in [0,1]\). The parameter \(p\) operates a compromise between \(f(x)\) and \(g(x)f(G(x))\), with \(h(x)=f(x)\) if \(p=1\) and \(h(x)=g(x)f(G(x))\) if \(p=0\). It is important to note that the proportion \(p\) is assumed to be a fixed constant regardless of the support of the random variable, although this can seem impractical in certain cases. Keeping this in mind, we introduce a new generator of distributions which generalizes the finite mixture of the pdfs \(f(x)\) and \(g(x)f(G(x))\) by introducing a Lebesgue measurable and monotonic function \(w(x)\) with \(w(x)\in [0,1]\) for any \(x\in (a,b)\). The new family of distributions is characterized by the following pdf:

\[h(x)=w(x)f(x)+ \left[1-w\left( G(x)\right)\right] g(x)f\left(G(x)\right),\](2)
(further details are given in Proposition Proposition 1 below). Thus, \(h(x)\) can be viewed as two components “functional mixture“ of \(f(x)\) and \(g(x)f(G(x))\) with the “functional weights“ \(\left\lbrace w(x), 1-w(G(x))\right\rbrace\). It provides a compromise between \(f(x)\) and \(g(x)f(G(x))\), using a monotonic weight function which depends on the variations of \(x\). The introduction of such a functional weight in a finite mixture of distributions is also motivated by the weighted distributions’ utility in efficient modeling and prediction from data, see Saghir et al. (2017)SAGHIR A, HAMEDANI GG, TAZEEM S & KHADIM A. 2017. Weighted distributions: A brief review, perspective and characterizations. Intern J of Stat and Prob 6: 109-131., and the references therein.

The rest of the paper is organized as follows. In the second section (The proposed family of distributions and some of its properties) presents the fundamentals of our proposed family of distributions. Some special cases are discussed. We also stress the significance of the family, with a highlight on existing connections with other well-known families of distributions. Expressions of the ordinary moments are derived. The third section (The FWE distribution and its properties) is devoted to a special case, providing a new lifetime distribution with three parameters. It is based on the exponential distribution for \(f(x)\), a dilation transformation for \(G(x)\), and a weight with polynomial decay for \(w(x)\). The moments of this distribution are also provided. The maximum likelihood estimations of the parameters are considered in the forth section (Applications), and two real life applications are presented to demonstrate the applicability of the distribution. A concluding remark is given in the last section (Concluding remarks).

THE PROPOSED FAMILY OF DISTRIBUTIONS AND SOME OF ITS PROPERTIES

In this section, we formally define the proposed family of distributions along with some particular cases. Expressions of ordinary moments are also investigated in the general case.

Construction of the family

The pdf of the family is defined in the next proposition.

Proposition 1. Let\(f(x)\)be a pdf with support on\((a,b)\)with\((a,b)\in \mathbb{R}^2\cup\{-\infty, \infty\}^2\)(including the semi-infinite intervals:\((0,\infty)\)and\((-\infty,0)\), and the real line\(\mathbb{R}\)),\(G(x)\)a differentiable increasing function on\((a,b)\)with \(\lim\limits_{x\rightarrow a^{+}}G(x)=a\), \(\lim\limits_{x\rightarrow b^{-}}G(x)=b\), \(g(x)=G'(x)\)and \(w(x)\)a measurable function on \((a,b)\)with \(w(x)\in [0,1]\)for any \(x\in (a,b)\). Then the following function is a pdf:

\begin{aligned} \label{dens} h(x)=w(x)f(x)+ \left[1-w\left( G(x)\right)\right] g(x)f\left(G(x)\right).\end{aligned}(3)

Proof of Proposition 1. Using the assumptions on \(f(x)\), \(w(x)\) and \(G(x)\), and after some standard analytical arguments, we can show that \(h(x)\ge 0\). Now, we need to show that \(\int_{a}^{b}h(x)dx=1\). By using the change of variables for \(y=G(x)\), we obtain

\begin{aligned} \int_{a}^{b}h(x)dx&=&\int_{a}^{b}w(x)f(x)dx+\int_{a}^{b} \left[1-w\left( G(x)\right)\right] g(x)f\left(G(x)\right)dx\\ & = & \int_{a}^{b}w(x)f(x)dx+ \int_{a}^{b}\left[1-w(y)\right]f(y)dy=\int_{a}^{b}f(x)dx=1.\end{aligned}

This completes the proof. \(\square\)

Table I lists some special cases of pdfs as presented in (3) with various choices of \(f(x)\), \(w(x)\) and \(G(x)\). It is important to note there that the pdfs that are given in Table I are new to the statistics literature. The particular distribution based on exponential density with \(w(x)= \frac{1}{1+\alpha x^2}\) and \(G(x)=\frac{x}{\sigma}\) will be discussed in detail in the next section (The FWE distribution and its properties). Readers may explore other cases for their properties and potential applications in future studies. The motivation for the choice of this distribution will be clear later.

Table I
Some new pdfs \(\pmb{h(x)}\), defined by (3), with various choices of \(\pmb{f(x)}\), \(\pmb{w(x)}\) and \(\pmb{G(x)}\) \(^{\star}\).

It is important to note that the pdf \(h(x)\) is very flexible and can be expressed as a sum of the functions of different natures/shapes. Due to the complex structure, the cumulative distribution function (cdf) associated to \(h(x)\) does not necessarily have a closed form. However, if we take \(w(x)=F(x)\), the cdf of the family has a very nice closed form expression and a probabilistic interpretation. The associated pdf and cdf are given by

\[h(x)=F(x)f(x)+ \left[1-F\left( G(x)\right)\right] g(x)f\left(G(x)\right)\]

and

\begin{aligned} H(x)=\frac{1}{2}[F(x)]^2+\frac{1}{2}\left[ 1- \left[ 1-F\left(G(x)\right)\right]^2\right], \label{eq23}\end{aligned} (4)

respectively. The random variable associated with (4) can be interpreted as the following random variable:

\begin{aligned} Z=\epsilon \sup(X,Y)+(1-\epsilon) \inf(G^{-1}(X), G^{-1}(Y)),\end{aligned}

where \(\epsilon\) is a random variable following the Bernoulli distribution with parameter \(\frac{1}{2}\), \(X\) and \(Y\) are independent and identically distributed random variables with common pdf \(f(x)\), and \(G^{-1}(x)\) denotes the inverse/quantile function of \(G(x)\). Therefore, the proposed family of distributions includes finite mixtures of \(\sup\) and \(\inf\) of random variables.

Another point is to show the possible connection between the proposed family and skewed distributions. Consider the case where \(a=-\infty\), \(b=\infty\), \(G(0)=0\), \(f(x)\) be a symmetric pdf around \(0\) and \(w(x)={\bf 1}_{\{x>0\}}\) is the indicator function over the set \(\{x> 0\}\). Hence, the pdf \(h(x)\) becomes a skewed version of \(f(x)\) as follows

\begin{aligned} h(x)= \left\{ \begin{array}{ll} \displaystyle f(x)& \mbox{ if } x\le 0,\\ \displaystyle g(x) f(G(x))& \mbox{ if } x> 0.\\ \end{array} \right.\end{aligned}

The proposed family of distributions can be used to generate another skewed family of distributions investigated by Huang & Chen (2007)HUANG WJ & CHEN Y. 2007. Generalized skew-Cauchy distribution. Stat Probab Lett 77: 1137-1147.. It can be defined as follows. Let \(k(x)\) be a pdf symmetric around \(0\) and \(m(x)\in [0,1]\) be a Lebesgue measurable function satisfying \(m(x)+m(-x)=1\), \(x\in\mathbb{R}\), almost everywhere. Then the function \(f(x)=2k(x)m(x)\), \(x\in\mathbb{R}\), is a proper pdf.

Using pdf (3) with \(w(x)=m(x)\), we can derive a new skewed family of distributions; for a given \(k(x)\), we get

\begin{aligned} h(x)=2\left(k(x) (m(x))^2+ \left[1-m\left( G(x)\right)\right] g(x)k\left( G(x)\right)m\left(G(x)\right)\right), \quad x\in \mathbb{R}. \end{aligned}

Based on this idea, it is possible to develop new skewed families of distributions.

Some properties of the family

We now present the ordinary moments of the proposed family of distributions specified by (3). Let \(X\) be a random variable with pdf \(h(x)\), defined by (3), and \(Y\) be a random variable with pdf \(f(x)\). The \(r\)th non-central moment of \(X\) is

\begin{aligned} \mu^{\prime}_{r}& =E(X^r)=\int_{a}^{b}x^{r}h(x)dx = \int_{a}^{b}x^{r}\left[w(x)f(x)+\left[1-w(G(x))\right]g(x)f(G(x))\right]dx\\ &= \int_{a}^{b}x^{r}w(x)f(x)dx+\int_{a}^{b}x^{r}\left[1-w(G(x))\right]g(x)f(G(x))dx\\ &= \int_{a}^{b}x^{r}w(x)f(x)dx+\int_{a}^{b}(G^{-1}(x))^{r}(1-w(x))f(x)dx\\ &= E(Y^{r}w(Y))+E\left[(G^{-1}(Y))^{r}(1-w(Y))\right]. \end{aligned}

Upon rearranging this equality, an alternative formula is

\[\mu^{\prime}_{r}=E((G^{-1}(Y))^{r})+E\left[w(Y)(Y^{r}-(G^{-1}(Y))^{r})\right].\]

In particular, the mean of \(X\) is given by \(\mu^{\prime}_{1}=E(G^{-1}(Y))+E\left[w(Y)(Y-G^{-1}(Y))\right]\) and the variance of \(X\) is given by \(V(X)=\mu^{\prime}_{2}-(\mu^{\prime}_{1})^2\).

The moments generating function is

\begin{aligned} M(t) &=E(e^{t X})=\int_{a}^{b}e^{tx}h(x)dx=\int_{a}^{b}e^{tx}\left[w(x)f(x)+\left[1-w(G(x))\right]g(x)f(G(x))\right]dx\\ &= E(e^{tY}w(Y))+E\left[e^{tG^{-1}(Y)}(1-w(Y))\right].\end{aligned}

One can verify that \(\mu^{\prime}_{r}=M^{(k)}(t) \mid_{t=0}\). Using the same mathematical arguments, the \(r\)th non-central conditional moment of \(X\) is given by, for \(t\in (a,b)\),

\begin{aligned} \mu^{\prime}_{r}(t)& =\int_{a}^{t}x^r h(x)dx=E(Y^r w(Y) {\bf 1}_{Y\le t})+E\left[ (G^{-1}(Y))^r(1-w(Y)){\bf 1}_{\{Y\le G^{-1}(t)\}}\right]. \end{aligned}

The mean deviation about the median \(M\) can be written as

\begin{aligned} \delta &=E(|X-M|)= \int_{a}^{b}|x-M|h(x)dx\\ &= E(Yw(Y))+E\left[(G^{-1}(Y))(1-w(Y))\right]-2 \mu^{\prime}_{1}(M).\end{aligned}

All the expectations above can be calculated or approximated for specific functions \(f(x)\), \(w(x)\) and \(G(x)\).

In the next section, we focus on a submodel of the family with three parameters based on the exponential distribution, a dilation transformation and a weight with polynomial decay, called the functional weighted exponential distribution.

THE FWE DISTRIBUTION AND ITS PROPERTIES

In this section, we consider a special submodel of the proposed family based on the exponential distribution and discuss some of its properties.

Definition

We now consider the exponential pdf \(f(x)=\theta e^{-\theta x},\) \(\theta>0,\) \(x>0\), the weight function \(w(x)= \frac{1}{1+\alpha x^2}\), \(\alpha\ge 0\) and \(G(x)=\frac{x}{\sigma}\), \(> 0\). Note that \(w(x)\in [0,1]\) and \(G(x)\), a standard dilation function, satisfying \(G(0)=0\) and \(\lim_{x\rightarrow \infty}G(x)=\infty\). The pdf (3) with \(a=0\) and \(b=\infty\) becomes

\begin{aligned} \label{densFWE} h(x) = \frac{\theta e^{-\theta x}}{1+\alpha x^2} + \frac{ \alpha \theta x^{2} e^{-\theta \frac{x}{\sigma}}}{\sigma (\sigma^2 +\alpha x^2)}, \qquad x>0. \end{aligned} (5)

We call the distribution with pdf (5) the functional weighted exponential distribution, FWE for short. It is of interest because of the compromise made between the functions with exponential decay \(f(x)\) and \(g(x)f\left(G(x)\right)\), and a function with polynomial decay \(w(x)\). This ensures greater flexibility in terms of the rates of decay, which is an advantage for modeling a wide variety of lifetime data. Also, note that the FWE distribution is reduced to the exponential distribution when \(\alpha=0\) or \(=1\). Thus the proposed distribution can be considered as an extension of the exponential distribution. Figure 1 shows the pdf plots of the FWE distribution for selected values of the parameters. The pdf of the FWE distribution takes decreasing and uni-modal shapes depending on the choices of the parameters. The first derivative of \(h(x)\) is

\[h^{\prime}(x)=-\theta\left( \frac{\alpha x e^{-\frac{\theta x}{\sigma}} (\theta \sigma^2 x +\alpha \theta x^3-2 \sigma^3)}{\sigma^2(\sigma^2+\alpha x^2)^2}+ \frac{e^{-\theta x} (\theta+\alpha \theta x^2+2\alpha x)}{(1+\alpha x^2)^2}\right),\]

and, when it exists, the mode of the distribution, \(x_0\), satisfies the following equation:

\[\alpha x_0 e^{-\frac{\theta x_0}{\sigma}} (\theta \sigma^2 x_0 +\alpha \theta x_0^3-2 \sigma^3)(1+\alpha x_0^2)^2+e^{-\theta x_0} (\theta+\alpha \theta x_0^2+2\alpha x_0)\sigma^2(\sigma^2+\alpha x_0^2)^2=0.\] (6)
Figure 1
Various shapes of the pdf of the FWE distribution.

Also, observe that \(h(x)=\theta -\theta^2 x + x^2 \left[\frac{\theta^3}{2}+\alpha \theta \left( \frac{1}{\sigma^3}-1\right)\right] + O(x^3)\) and \(\lim\limits_{x\rightarrow 0} h(x)=\theta>0\). Thus, we see the role of the parameters in the curvature of the pdf around \(x=0\), mainly for the polynomial term \(x^2\). Also, we have \(\lim\limits_{x\rightarrow \infty} h(x)=0\) as

\[h(x)=e^{-\theta x}\left[ \frac{\theta}{\alpha x^2}-\frac{\theta}{\alpha^2 x^4}+O\left( \frac{1}{x^6}\right)\right]+ e^{-\frac{\theta x}{\sigma}} \left[ \frac{\theta}{\sigma}-\frac{\sigma \theta}{\alpha x^2}+\frac{\sigma^3 \theta}{\alpha^2 x^4}+O\left( \frac{1}{x^6}\right)\right].\]

From a probabilistic point of view, the FWE distribution comes from the simple stochastic representation, \(Y=S_X X\), where \(X\) is a random variable with pdf \(f(x)\), \(\not =1\), and \(S_X\) is a random variable such that

\[P(S_x = 1 \mid X=x)= \frac{1}{1+\alpha x^2}, \qquad P(S_x = \sigma \mid X=x)= \frac{\alpha x^2}{1+\alpha x^2}.\]

We can observe that \(h(x)\) is a weighted exponential distribution function since it can be written as

\[h(x)=W(x) \theta e^{-\theta x}, \qquad W(x)=\frac{1}{1+\alpha x^2}+ \frac{\alpha x^2}{\sigma(\sigma^2+\alpha x^2)}e^{-\theta \left(\frac{1}{\sigma}-1\right)x}.\]

Note that \(W(0)=1\) and \(\lim_{x\rightarrow \infty}W(x)=0\) if \(<1\), \(\lim_{x\rightarrow \infty}W(x)=1\) if \(=1\) and \(\lim_{x\rightarrow \infty}W(x)=\infty\) if \(>1\). For more information on weighted distributions see Saghir et al. (2017)SAGHIR A, HAMEDANI GG, TAZEEM S & KHADIM A. 2017. Weighted distributions: A brief review, perspective and characterizations. Intern J of Stat and Prob 6: 109-131.. The practical aspects of the FWE distribution are studied in the applications section.

Moments of the FWE distribution

Let \(X\) be a random variable with pdf \(h(x)\), defined by (5). Then the \(r\)th non-central moment of \(X\) is given by

\begin{aligned} \mu^{\prime}_{r} &amp; = E(X^{r})=\int_{0}^{\infty}x^{r}h(x)dx= \theta \int_{0}^{\infty} \frac{x^{r}}{1+\alpha x^{2}} e^{-\theta x}dx + \frac{\alpha \theta}{\sigma} \int_{0}^{\infty} \frac{x^{r+2}}{\sigma^{2}+\alpha x^{2}} e^{-\theta\frac{x}{\sigma}}dx \\ &amp; = \theta \int_{0}^{\infty} \frac{x^{r}}{1+\alpha x^{2}} e^{-\theta x}dx + \alpha \theta \sigma^{r} \int_{0}^{\infty} \frac{x^{r+2}}{1+\alpha x^{2}} e^{-\theta x}dx\\ &amp; = \theta \int_{0}^{\infty} \frac{x^{r}}{1+\alpha x^{2}} (1+\alpha \sigma^{r} x^2) e^{-\theta x}dx=\frac{1}{\theta^r}\int_{0}^{\infty} \frac{1}{1+\alpha (x/\theta)^{2}} \left(x^r e^{-x}+\frac{\alpha}{\theta^2} \sigma^{r} x^{r+2}e^{-x}\right) dx. \end{aligned}

Using the expression \(\frac{1}{1+\alpha (x/\theta)^2}=\frac{1}{\alpha (x/\theta)^2} \frac{1}{1+(\alpha (x/\theta)^2)^{-1}}\) and geometric series, we obtain

\begin{aligned} \mu^{\prime}_{r} &amp; = \frac{1}{\theta^r}\bigg( \int_{0}^{\frac{\theta}{\sqrt{\alpha}}} \sum_{i=0}^{\infty}(-1)^{i} \frac{\alpha^i}{\theta^{2i}} \left(x^{r+2i}e^{- x}+\frac{\alpha}{\theta^2} \sigma^r x^{r+ 2i+2}e^{- x}\right) dx \\ \nonumber &amp;+ \int_{\frac{\theta}{\sqrt{\alpha}}}^{\infty}\sum_{i=0}^{\infty}(-1)^{i} \frac{\alpha^{-(i+1)}}{\theta^{-2(i+1)}} \left(x^{r-2(i+1)}e^{- x}+\frac{\alpha}{\theta^2} \sigma^r x^{r-2i}e^{- x}\right) dx \bigg) ,\\ &amp; = \frac{1}{\theta^r}\bigg(\sum_{i=0}^{\infty}(-1)^{i} \frac{\alpha^i}{\theta^{2i}} \left[\gamma\left(r+2i+1,\frac{\theta}{\sqrt{\alpha}}\right)+\frac{\alpha}{\theta^2} \sigma^r \gamma\left(r+2i+3,\frac{\theta}{\sqrt{\alpha}}\right)\right] \\ \nonumber &amp;+ \sum_{i=0}^{\infty}(-1)^{i} \frac{\alpha^{-(i+1)}}{\theta^{-2(i+1)}} \left[\Gamma\left(r-2i-1,\frac{\theta}{\sqrt{\alpha}}\right)+\frac{\alpha}{\theta^2} \sigma^r \Gamma\left(r-2i+1,\frac{\theta}{\sqrt{\alpha}}\right)\right] \bigg),\end{aligned}

where \(\gamma(a,x) = \int_{0}^{x}s^{a-1}e^{-s}ds\), \(a, \, x>0\) and \(\Gamma(a,x) = \int_{x}^{\infty}s^{a-1}e^{-s}ds\), \(a\in\mathbb{R}, x>0\), are the upper and lower incomplete gamma functions, respectively. In particular, the mean of \(X\) is given by

\begin{aligned} \mu^{\prime}_{1} &amp; = \frac{1}{\theta}\bigg(\sum_{i=0}^{\infty}(-1)^{i} \frac{\alpha^i}{\theta^{2i}} \left[\gamma\left(2(1+i),\frac{\theta}{\sqrt{\alpha}}\right)+\frac{\alpha}{\theta^2} \sigma \gamma\left(2(i+2),\frac{\theta}{\sqrt{\alpha}}\right)\right] \nonumber \\ &amp;+ \sum_{i=0}^{\infty}(-1)^{i} \frac{\alpha^{-(i+1)}}{\theta^{-2(i+1)}} \left[\Gamma\left(-2i,\frac{\theta}{\sqrt{\alpha}}\right)+\frac{\alpha}{\theta^2} \sigma \Gamma\left(2(1-i),\frac{\theta}{\sqrt{\alpha}}\right)\right] \bigg).\end{aligned} (7)

The variance of \(X\) can be obtained as \(V(X)=\mu^{\prime}_2-(\mu^{\prime}_{1})^2\).

Using similar mathematical arguments, the moment generating function can be expressed as, for \(t <\theta/\max(1, \sigma)\),

\begin{aligned} M_X (t) &amp;= E(e^{tX})=\int_{0}^{\infty}e^{tx}h(x)dx= \int_{0}^{\infty} e^{tx} \left(\frac{\theta}{1+\alpha x^{2}} e^{-\theta x} + \frac{\alpha \theta}{\sigma}\frac{x^2}{\sigma^{2}+\alpha x^{2}} e^{-\theta \frac{x}{\sigma}}\right)dx \nonumber \\ \nonumber &amp;= \theta \left( \int_{0}^{\infty}\frac{1}{1+\alpha x^{2}}e^{-(\theta-t)x}dx + \alpha \int_{0}^{\infty}\frac{x^{2}}{1+\alpha x^{2}} e^{-(\theta-\sigma t)x}dx \right)\\ \nonumber &amp;= \theta \bigg(\int_{0}^{\frac{1}{\sqrt{\alpha}}} \sum_{i=0}^{\infty}(-1)^{i}(\alpha x^{2})^{i} e^{-(\theta-t)x}dx + \alpha \int_{0}^{\frac{1}{\sqrt{\alpha}}} \sum_{i=0}^{\infty}(-1)^{i}(\alpha x^{2})^{i}x^{2}e^{-(\theta-\sigma t)x}dx\\ \nonumber &amp;+ \int_{\frac{1}{\sqrt{\alpha}}}^{\infty} \sum_{i=0}^{\infty}(-1)^{i}(\alpha x^{2})^{-(i+1)} e^{-(\theta-t)x}dx + \alpha \int_{\frac{1}{\sqrt{\alpha}}}^{\infty}\sum_{i=0}^{\infty}(-1)^{i}(\alpha x^{2})^{-(i+1)}x^{2}e^{-(\theta-\sigma t)x}dx\bigg)\\ \nonumber &amp; =\theta \bigg( \sum_{i=0}^{\infty} (-1)^{i}\alpha^{i}\bigg[\frac{1}{(\theta-t)^{2i+1}}\gamma\left(2i+1,\frac{\theta-t}{\sqrt{\alpha}}\right) + \frac{1}{(\theta-\sigma t)^{2i+3}}\gamma\left(2i+3,\frac{\theta -\sigma t}{\sqrt{\alpha}}\right)\bigg] \\ \nonumber &amp; + \sum_{i=0}^{\infty} (-1)^{i}\alpha^{-(i+1)}\bigg[\frac{1}{(\theta-t)^{-2i-1}}\Gamma\left(-2i-1,\frac{\theta -t}{\sqrt{\alpha}}\right)\\ \label{eqMxt} &amp; + \frac{1}{(\theta-\sigma t)^{-2i+1}}\Gamma\left(-2i+1,\frac{\theta -\sigma t}{\sqrt{\alpha}}\right)\bigg] \bigg).\end{aligned} (8)

The mean deviation about the median \(M\) is given by

\begin{aligned} \delta(x)&amp; =&amp; E(|X-M|)=\int_{0}^{\infty}|x-M|h(x)dx=\mu^{\prime}_1-2\int_{0}^{M}xh(x)dx,\\ &amp; = &amp; \mu^{\prime}_1 -2\theta\left(\int_{0}^{M}\frac{x}{1+\alpha x^{2}}e^{-\theta x}dx+\alpha\sigma\int_{0}^{\sigma M}\frac{x^{3}}{1+\alpha x^{2}}e^{-\theta x}dx\right).\end{aligned}

The integrals appeared above can be written in terms of sums as done in (8), by distinguishing the cases \(M>\frac{1}{\sqrt{\alpha}}\), \(M\le\frac{1}{\sqrt{\alpha}}\), \(M>\frac{1}{\sqrt{\alpha}}\) and \(M\le\frac{1}{\sqrt{\alpha}}\). For the sake of brevity, we omit it.

The Shannon entropy is defined by \(S\left[ h(X) \right]= -E\left[\log h(X)\right]\). We have

\begin{aligned} S\left[ h(X) \right] &amp;= -E \left[ \log \left( \frac{\theta e^{-\theta X}}{1+\alpha X^2} + \frac{\alpha \theta X^{2} e^{-\theta \frac{X}{\sigma}}}{\sigma(\sigma^{2}+\alpha X^2)} \right) \right],\\ &amp;= -E \left[ \log \left( \frac{1 + \alpha \sigma^{-1} X^{2} (1+\alpha X^2)(\sigma^{2}+\alpha X^{2})^{-1} e^{\frac{\sigma -1}{\sigma}\theta X}}{\theta^{-1} (1+\alpha X^{2}) e^{\theta X}} \right) \right],\\ &amp;= -E \left[ \log \left( 1 + \alpha \sigma^{-1} X^{2} (1+\alpha X^2)(\sigma^{2}+\alpha X^{2})^{-1} e^{\frac{\sigma -1}{\sigma}\theta X} \right) \right] + E \left[ \log \left( {\theta^{-1} (1+\alpha X^{2}) e^{\theta X}} \right) \right].\end{aligned}

Using the series expansion \(E \left[ \log (1+X^{n}) \right] = - \sum\limits_{i=1}^{\infty} \frac{(-1)^{i}}{i} E(X^{in})\), we arrive at

\begin{aligned} S\left[ h(X) \right] = \sum_{i=1}^{\infty} \frac{(-\alpha)^i}{i \sigma^{i}} E \left[ \left( \frac{X^{2}(1+\alpha X^2)}{\sigma^{2} +\alpha X^{2}} e^{\frac{\sigma -1}{\sigma}\theta X} \right)^{i} \right] - \sum_{i=1}^{\infty} \frac{(-\alpha)^i}{i} E(X^{2i}) + \theta E(X) - \log\left(\theta\right).\end{aligned} (9)

The expansions of the expectations can be done via similar mathematical arguments used for the moments.

APPLICATIONS

This section includes two real life applications of distributions fitting. We present the goodness-of-fit of the FWE distribution based on the standard maximum likelihood method. We compare the fit of the FWE distribution with five other distributions having three parameters, which are the most popular generalizations of the Weibull and gamma distributions. The distributions considered for comparison are:

  • Modified Weibull (MW) distribution by Sarhan & Zaindin (2009)SARHAN AM & ZAINDIN M. 2009. Modified Weibull distribution. Appl Sci 11: 23-36. with cdf

    \[F(x)=1-e^{- \beta x^\lambda -\alpha x}, \quad \lambda>0, \beta, \alpha \geq 0 \text{ with } \beta+\alpha>0.\]

  • Exponentiated Weibull (EW) distribution by Mudholkar & Srivastava (1993)MUDHOLKAR GS & SRIVASTAVA DK. 1993. Exponentiated Weibull family for analyzing bathtub failure-rate data. IEEE Trans Reliab 42: 299-302. with cdf

    \[F(x)=\left( 1-e^{- \beta x^\lambda}\right)^{\alpha}, \quad \beta>0, \lambda>0, \alpha>0.\]

  • Exponentiated gamma (EG) distribution by Cordeiro et al. (2011)CORDEIRO GM, ORTEGA EMM & SILVA GO. 2011. The exponentiated generalized gamma distribution with application to lifetime data. J Stat Comput Simul 81: 827-842. with cdf

    \[F(x)=\left[GammaCDF(\alpha,\beta) \right] ^\theta, \quad \beta>0, \theta>0, \alpha>0,\]
    where \(GammaCDF(\alpha,\beta)\) is the cdf of the gamma distribution with shape parameter \(\alpha\) and scale parameter \(\beta\).

  • Weighted Weibull (WtW) distribution by Shahbaz et al. (2010)SHAHBAZ S, SHAHBAZ MQ & BUTT NS. 2010. A class of weighted Weibull distribution. Pakistan J Stat Oper Res 6: 53-59. with pdf

    \[f(x)=\frac{\beta+1}{\beta}\alpha\lambda x^{\alpha-1}e^{-\lambda x^\alpha} \left(1-e^{-\beta\lambda x^\alpha} \right) , \quad \beta>0, \lambda>0, \alpha>0.\]

  • Extended generalized gamma (EGG) distribution used by Lee & Wang (2013)LEE ET & WANG JW. 2013. Statistical methods for survival data analysis. 3rd edn. Wiley, Hoboken, p. 521. with pdf

    \[f(x)=\frac{\lambda \alpha^\alpha \beta^{\lambda\alpha}}{\Gamma (\alpha)} x^{\lambda\alpha-1} e^{-\alpha(\beta x)^\lambda}, \quad \beta>0, \lambda>0, \alpha>0.\]
    The EGG distribution is reduced to the Weibull distribution for \(\alpha=1\).

The fitting results are compared using two practical data sets related to reliability and survival analysis. Data sets are discussed in the following subsections.

Failure times data

The goodness-of-fit of the FWE distribution is first accessed for lifetimes of fatigue fracture of Kevlar 373/epoxy that are subjected to constant pressure at a stress level of 90 until all fail. The data set was reported by Barlow et al. (1984)BARLOW RE, TOLAND RH & FREEMAN T. 1984. A Bayesian analysis of stress rupture life of kevlar 49/epoxy spherical pressure vessels. In: Proceedings of the conference on applications of statistics. Marcel Dekker, New York, NY, USA. and studied by Andrews & Herzberg (2012)ANDREWS DF & HERZBERG AM. 2012. Data: a collection of problems from many fields for the student and research worker, Springer, Berlin, p. 442.. Descriptive statistics are presented in Table II. Since \(Skewness =1.980\) and \(Kurtosis = 2.16\), the data are positively skewed and have a lower peak for its frequency distribution than the normal curve.

Table II
Descriptive statistics of both data sets.

In full generality, the shapes of the empirical failure (hazard) rate of a data set can be identified by the concept of total time on test plot (TTT) of Aarset (1987)AARSET MV. 1987. How to identify bathtub hazard rate. IEEE Trans Reliab 36: 106-108.. The scaled TTT transform is

\[g(u)=\frac{H^{-1}(u)}{H^{-1}(1)},\]
with \(H^{-1}(u)=\int_{0}^{F^{-1}(u)}[1-F(y)]dy\), \(u\in (0,1)\), and its empirical version is
\[g_{n}\left(\frac{r}{n}\right)=\frac{1}{\sum\limits_{i=1}^{n}x_{i:n}}\left[ \sum% \limits_{i=1}^{r}x_{i:n}+(n-r)x_{r:n}\right],\]
where \(r=1,2,\ldots ,n\) and \(x_{i:n}\), \(i=1,2,\ldots ,n\), represent the order statistics of the sample. It has been shown that the scaled TTT transform is convex (concave) if the hazard rate is decreasing (increasing), and for bathtub (unimodal) hazard rates, the scaled TTT transform is first convex (concave) and then concave (convex). Figure 2 indicates that the failure times data set has an increasing hazard rate.

Figure 2
TTT plots for both data sets.

The MLEs of the distribution parameters along with their standard errors (SEs) are shown in Table III for this data set. From this table, it is clear that the SEs corresponding to the estimates of parameters of the FWE distribution are the smallest among the others.

Table III
The MLEs and SEs of the parameters of the distributions for failure times data.

We now apply formal goodness-of-fit tests in order to verify which distribution fits better the given data set. We consider Akaike Information Criterion (AIC \(= 2p - 2 \widehat{\ln(\ell)}\)), Bayesian Information Criterion (BIC \(= p \ln(n) - 2 \widehat{\ln(\ell)}\)), \(-\widehat{\ln(\ell)}\) and Kolmogorov-Smirnov (K-S) statistic along with p-value as goodness-of-fit criterion, where \(\widehat{\ln(\ell)}\) is the value of the likelihood function evaluated at the parameter estimates, \(n\) is the number of observations, and \(p\) is the number of estimated parameters. For a given data set, the smaller AIC or BIC indicates a better fit. These statistics are computed using MLEs of the parameters based on the data set and presented in Table IV. We can note from this table that the FWE distribution has smaller values of AIC, BIC and KS statistics, among others. Therefore, we can conclude that FWE distribution fits better than the considered distributions for the given set of data. The Probability-Probability (PP) plots of the distributions are given in Figure 3 for failure times data set. Figures 4 and 5 show the fitted pdf and cdf of the FWE distribution for this data set, respectively. The fitted and empirical estimates are extremely close. These figures indicate that the FWE distribution can provide good estimates of the probabilities associated with lifetimes of fatigue fracture of Kevlar 373/epoxy, e.g., \(q=P\left(2<X<3 \right)= 0.184\), and its estimate is \(\hat{q}= 0.176\).

Table IV
The log-likelihood, AIC, BIC, KS and p-values for the fitted distributions for failure times data.
Figure 3
Probability-Probability (PP) plots of the distributions for failure times data.
Figure 4
Fitted density plot of the FWE distribution for failure times data.

Figure 5
Fitted cdf plot of the FWE distribution for failure times data.

Survival times data

In this subsection, we present the modelling of survival times of 33 patients who died from acute myelogenous leukaemia. The survival times are noted in weeks. The data set is obtained from Feigl & Zelen (1965)FEIGL P & ZELEN M. 1965. Estimation of exponential survival probabilities with concomitant information. Biometrics 21: 826-838. and is also available in “MASS“ package of R software. The frequency distribution of the data is heavy tailed and right skewed, see Table II. From Figure 2, we can see that the TTT plot for failure times data set first convex and then concave, which means the data set has a bathtub shaped hazard rate.

We compute the MLEs, along with respective SEs, of the parameters of all distributions for survival times data. They are presented in Table V. For each distribution, the log-likelihood, AIC, BIC, KS and p-values are obtained using the MLEs. They are shown in Table VI. From this table, we see that the FWE distribution has the smallest AIC, BIC and KS values over all other distributions. The p-value of the KS test statistic is maximum for the FWE distribution. Therefore, we can conclude that the FWE distribution is a better model for modelling survival times than the EG, EM, EGG WtW and MW distributions. PP plots of the distributions are given in Figure 6 for the survival times data set. Figure 7 shows the fitted pdf of the FWE distribution for the given data set. Figure 8 shows the fitted cdf of the FWE distribution. Since the fitted and empirical estimates are very close to each other, we can say that the FWE distribution fits well with this frequency distribution.

Table V
The MLEs and SEs of the parameters of the distributions for survival times data.
Table VI
The log-likelihood, AIC, BIC, KS and p-values for the fitted distributions for survival times data.
Figure 6
Probability-Probability (PP) plots of the distributions for survival times data.
Figure 7
Fitted density plot of the FWE distribution for survival times data.
Figure 8
Fitted cdf plot of the FWE distribution for survival times data.

CONCLUDING REMARKS

We introduce and study a new family of distributions based on a finite functional mixture using functional weights. Some mathematical properties of the new family are investigated. A special case based on the polynomial weights and the exponential distribution, called the functional weighted exponential (FWE) distribution, is studied in detail. The estimates of the unknown parameters of the FWE distribution are obtained using the maximum likelihood method. The usefulness of the proposed submodel, FWE, is demonstrated via two real life data sets.

ACKNOWLEDGMENTS

The authors express their gratitude to the two reviewers and the associated editor for their informative comments on the paper. Dr. Sharma greatly acknowledges the financial support from Science and Engineering Research Board, Department of Science \& Technology, Govt. of India, under the scheme Early Career Research Award (file no.: ECR/2017/002416).

REFERENCES

  • AARSET MV. 1987. How to identify bathtub hazard rate. IEEE Trans Reliab 36: 106-108.
  • AL-HUSSAINI EK. 2012. Composition of cumulative distribution functions. J Statist Th Appl 11: 323-336.
  • ALZAATREH A, FAMOYE F & LEE C. 2015. A new method for generating families of continuous distributions. Metron 71: 63-79.
  • ALZAATREH A, LEE C & FAMOYE F. 2012. Gamma-Pareto distribution and its applications. J Mod Appl Stat Methods 11: 78-94.
  • ANDREWS DF & HERZBERG AM. 2012. Data: a collection of problems from many fields for the student and research worker, Springer, Berlin, p. 442.
  • BARLOW RE, TOLAND RH & FREEMAN T. 1984. A Bayesian analysis of stress rupture life of kevlar 49/epoxy spherical pressure vessels. In: Proceedings of the conference on applications of statistics. Marcel Dekker, New York, NY, USA.
  • CORDEIRO GM, ORTEGA EMM & SILVA GO. 2011. The exponentiated generalized gamma distribution with application to lifetime data. J Stat Comput Simul 81: 827-842.
  • ELMAHDY EE. 2017. Modelling reliability data with finite Weibull or lognormal mixture distributions. Appl Math Inf Sci 11: 1081-1089.
  • ELMAHDY EE & ABOUTAHOUN AW. 2013. A new approach for parameter estimation of finite Weibull mixture distributions for reliability modeling. Appl Math Model 37(4): 1800-1810.
  • EVERITT BS & HAND DJ. 1981. Finite mixture distributions. Chapman and Hall London, New York, p. 143.
  • FEIGL P & ZELEN M. 1965. Estimation of exponential survival probabilities with concomitant information. Biometrics 21: 826-838.
  • FRÜHWIRTH-SCHNATTER S. 2006. Finite mixture and Markov switching models. Springer Science and Business Media, p. 167.
  • HUANG WJ & CHEN Y. 2007. Generalized skew-Cauchy distribution. Stat Probab Lett 77: 1137-1147.
  • LEE ET & WANG JW. 2013. Statistical methods for survival data analysis. 3rd edn. Wiley, Hoboken, p. 521.
  • MCLACHLAN G & PEEL D. 2000. Finite mixture models, Wiley Series in Probability and Statistics, New York, p. 456.
  • MUDHOLKAR GS & SRIVASTAVA DK. 1993. Exponentiated Weibull family for analyzing bathtub failure-rate data. IEEE Trans Reliab 42: 299-302.
  • SAGHIR A, HAMEDANI GG, TAZEEM S & KHADIM A. 2017. Weighted distributions: A brief review, perspective and characterizations. Intern J of Stat and Prob 6: 109-131.
  • SARHAN AM & ZAINDIN M. 2009. Modified Weibull distribution. Appl Sci 11: 23-36.
  • SCHLATTMANN P. 2009. Medical applications of finite mixture models. Springer-Verlag Berlin Heidelberg, p. 246.
  • SEIDEL W. 2010. Mixture models. in Lovric, M., International Encyclopedia of Statistical Science, Springer-Verlag Berlin Heidelberg.
  • SHAHBAZ S, SHAHBAZ MQ & BUTT NS. 2010. A class of weighted Weibull distribution. Pakistan J Stat Oper Res 6: 53-59.
  • SHARMA VK, BAKOUCH HS & SUTHAR K. 2017. An extended Maxwell distribution: Properties and applications. Commun Stat Simul Comput 46: 6982-7007.

Publication Dates

  • Publication in this collection
    28 June 2021
  • Date of issue
    2021

History

  • Received
    28 Sept 2018
  • Accepted
    14 Feb 2019
Academia Brasileira de Ciências Rua Anfilófio de Carvalho, 29, 3º andar, 20030-060 Rio de Janeiro RJ Brasil, Tel: +55 21 3907-8100 - Rio de Janeiro - RJ - Brazil
E-mail: aabc@abc.org.br