Acessibilidade / Reportar erro

A New Class of Exponentiated Beta-Skew-Laplace Distribution

Abstract

This paper introduce two new families of distributions that allow fitting unimodal, bimodal or trimodal data sets. Statistical properties such as distribution function, moments, moment generating function and stochastic representation of these new families are studied in details. The problem of estimating parameters is addressed by considering the maximum likelihood method and Fisher information matrices are derived. A small Monte Carlo simulation study is conducted to examine the performance of the obtained estimators. The methodology developed is illustrated with three real data applications.

Key words
Asymmetry; beta-skew-Laplace distribution; information matrix; maximum likelihood estimation; multimodality

Introduction

Skew-symmetric distributions for modeling skew and bimodal data sets have been proposed by many researchers. Some of these proposals are obtained from generalizations of the standard normal distribution by incorporating additional parameters. Elal-Olivero (2010)ELAL-OLIVERO D. 2010. Alpha-skew-normal distribution. Proyecciones (Antofagasta) 29(12): 224-240. for example, introduces the bimodal-normal (BN) distribution and defines the alpha-skew-normal (ASN) density of families which has sufficient flexibility to fit unimodal as well as bimodal type data. Sobhan et al. (2016)SOBHAN S, DOOSTPARAST M & JAMALIZADEH A. 2016. The alpha-beta skew normal distribution: properties and applications. Statistics 50(2): 338-349. extended the ASN model by adding a new parameter to the distribution. This extension, which is called alpha-beta-skew-normal (ABSN) allows to fit data sets of up to four modes and the admissible intervals for asymmetry and kurtosis parameters are wider than those of the skew-normal (SN) and ASN distributions. Bolfarine et al. (2018)BOLFARINE H, MARTÍNEZ-FLÓREZ G & SALINAS HS. 2018. Bimodal symmetric-asymmetric power-normal families. Commun Stat Theory Methods 47(2): 259-279. proposed a new family of distributions using methods analogous to those of Kim (2005)KIM HJ. 2005. On a class of two-piece skew-normal distribution. Statistics 39(6): 537-553. and Arnold et al. (2009)ARNOLD BC, GÓMEZ HW & SALINAS HS. 2009. On multiple constraint skewed models. Statistics 43(3): 279-293. with the main characteristic that their normalization constant has a closed and simple form, in addition, the Fisher information matrix is non-singular, which guarantees the asymptotic properties of the maximum likelihood estimators of the parameters when large samples are available. Martínez-Flórez et al. (2018)MARTÍNEZ-FLÓREZ G, BOLFARINE H & GÓMEZ HW. 2018. Censored bimodal symmetric - asymmetric families. Stat Interface 11: 237-249. present a flexible-normal model, which extends the SN model and is suitable for fitting censored symmetric and asymmetric uni-bimodal data. One of the main advantages of this model is that involves fewer parameters to be estimated than the mixture of normal distributions.

Other proposals of bimodal distributions have been considered in the statistical literature. Ma & Genton (2004)MA Y & GENTON MG. 2004. Flexible class of skew-symmetric distributions. Scand J Stat 31(3): 459-468. proposed a flexible class of skew-symmetric distributions for which the probability density function has the form of a product of a symmetric density and a skewing function. The new class of distribution is able to capture skewness, heavy tails and multimodality systematically. Gómez et al. (2009)GÓMEZ HW, ELAL-OLIVERO D, SALINAS H & BOLFARINE H. 2009. Bimodal extension based on the skew-normal distribution with application to pollen data. Environmetrics 22(1): 50-62. consider an extension to the skew‐normal model through the inclusion of an additional parameter which can lead to both uni- and bi-modal distributions. Elal-Olivero et al. (2009)ELAL-OLIVERO D, GÓMEZ HW & QUINTANA FA. 2009. Bayesian modeling using a class of bimodal skew-elliptical distributions. J Stat Plan Inference 139(4): 1484-1492. consider an extension of the family of skew-elliptical distributions studied by Azzalini (1985)AZZALINI A. 1985. A class of distributions which includes the normal ones. Scand J Stat 12(2): 171-178. by using Bayesian approach. The new class is referred to as bimodal skew-elliptical (BSE) distributions and the model is able to adopt both uni- and bimodal shapes.

Although some proposals in statistical literature has been developed to deal with the problem of uni-bimodal, symmetric or asymmetric data analysis, there are few works in the statistical literature able of fitting multimodal data (three or more modes), among these works stand out the works of Sobhan et al. (2016) and Ma & Genton (2004). Therefore, it is important to introduce new distribution functions that manage to model data sets that have multiple modes and can fit to different degrees of asymmetry and kurtosis.

The main objective of this work is to introduce two new distributions that allow fitting symmetric data with up to three modes and it is obtained by considering the Laplace distribution. The Laplace distribution has been considered previously by Shams-Harandi & Alamatsaz (2013)SHAMS-HARANDI S & ALAMATSAZ MH. 2013. Alpha–skew–Laplace distribution. Stat Probab Lett 83(3): 774-782. to introduce a new class of skew distributions of Elal-Olivero’s type. This class of distributions has been shown to be flexible enough to support both unimodal and bimodal shapes with increasing and U-shaped hazard function for the truncated case at the origin.

Probability distributions such as those introduced by Elal-Olivero et al. (2009), which is obtained by incorporating an additional parameter to the normal distribution have shown to be very useful in the statistical literature, since it allows to expand the range of possibilities to fit data sets, especially with more than a mode, in this work, we also consider the alpha-power family of distributions introduced by Durrans (1992)DURRANS SR. 1992. Distributions of fractional order statistics in hydrology. Water Resour Res 28(6): 1649-1655., which has shown to be a useful alternative to the asymmetric distribution of Azzalini (1985) since it has a range of kurtosis higher, see Pewsey et al. (2012)PEWSEY A, GÓMEZ HW & BOLFARINE H. 2012. Likelihood-based inference for power distributions. Test 21(4): 775-789.. The two new distributions introduced in this paper are capable of adjusting data with two or more modes and a higher degree of kurtosis compared to some competing models and In addition, data with a high degree of asymmetry can also be fitted.

The rest of the article is organized as follows: Section “The Beta-Skew-Laplace Distribution“ introduces the BSL distribution which we will denote it by BSL(β). We discuss some of its important features and properties and consider its truncated situations as well. Further, we consider a location-scale family and obtain moments of this new family. We provide a stochastic representation and a method of generating data from this distribution. Maximum likelihood estimation of parameters of the location-scale BSL(β) distribution is also considered. In Section “Exponentiated Beta-Skew-Laplace Distribution“ we extend the BSL distribution to the exponentiated case and discuss some of its important features and properties. We consider a location-scale family and the extension to censored data is also presented. Section “Simulation Study“ presents a small Monte Carlo simulation study. Section “Illustrations“ describes fitting of the proposed model to some real data sets and compares it with several rival models.

THE BETA-SKEW-LAPLACE DISTRIBUTION

In order to introduce the new class of skew-Laplace distribution, in this section we define the Bimodal-Laplace distribution of order k. First, we recall that a random variable Y is said to have a Laplace distribution with location parameter μ and scale parameter σ if its probability density function (pdf) is given by

f ( y ; μ , σ ) = 1 2 σ exp ( | y μ | σ ) , y ,

which is denoted by L(μ,σ). We consider the standard Laplace distribution for simplicity, i.e., the case μ=0 and σ=1, and we define the bimodal-Laplace (BL) distribution of order k as follows:

Definition 1. The random variable X is said to have a beta-skew-Laplace distribution with parameter β, denoted by XBSL(β), if X has pdf given by

f ( x ; β ) = ( ( 1 β x 3 ) 2 + 1 ) 2 ( 1 + 360 β 2 ) 1 2 exp ( | x | ) , x , (1)

where β is a parameter of asymmetry.

Notice that the function (1) for given β is a proper pdf since f(x;β)0 for all (x,β)2, and by the moments of standard Laplace distribution, we can see that

[ ( 1 β x 3 ) 2 + 1 ] 1 2 exp ( | x | ) , d x = 2 + 720 β 2 , β .

Figure 1 presents the shape of the density function of the BSL distribution for some selected values of the parameter β. From Figure 1 can be seen that the proposed model is very flexible and the parameter β have significant effects on the possible number of modes of the distribution.

Figure 1
Probability density function of 𝐁𝐒𝐋(𝛃) for some selected values of 𝛃.

The following properties of the BSL distribution are deducted immediately from the definition:

  1. If β=0, then XL(0,1).

  2. If β±, then XdBL(3).

  3. XBSL(β).

Proposition 1. The density function of the BSL(β) distribution has at most three modes.

Proof. Differentiating (1) with respect to x we obtain

f ( x ; β ) = exp ( | x | ) 4 ( 1 + 360 β 2 ) [ β 2 x 5 | x | + 6 β 2 x 5 + 2 β x 2 | x | 6 β x 2 2 x | x | ] , x 0 = 1 4 ( 1 + 360 β 2 ) { ( β 2 x 6 + 6 β 2 x 5 + 2 β x 3 6 β x 2 2 ) exp ( x ) , if  x > 0 ( β 2 x 6 + 6 β 2 x 5 2 β x 3 6 β x 2 + 2 ) exp ( x ) , if  x < 0 (2)

Clearly, each expression in (2) has at most six zeros. Thus, the function f(x;β) can have at most three modes. ◻

Proposition 2. Let XBSL(β), then for k=1,2,3

𝔼 [ X 2 k ] = 1 2 ( 1 + 360 β 2 ) ( 2 Γ ( 2 k + 1 ) + β 2 Γ ( 2 k + 7 ) ) , (3)

and

𝔼 [ X 2 k 1 ] = β Γ ( 2 k + 3 ) 2 ( 1 + 360 β 2 ) . (4)

where Γ() is the Gamma function.

Proof. Knowing that, for k=1,2,3,, 𝔼[Z2k]=Γ(2k+1) and 𝔼[Z2k1]=0, when ZL(0,1), we obtain

𝔼 [ X 2 k ] = 1 2 ( 1 + 360 β 2 ) ( 2 𝔼 [ Z 2 k ] 2 β 𝔼 [ Z 2 ( k + 2 ) 1 ] + β 2 𝔼 [ Z 2 ( k + 3 ) ] ) = 1 2 ( 1 + 360 β 2 ) ( 2 Γ ( 2 k + 1 ) + β 2 Γ ( 2 k + 7 ) ) .

on the other hand

𝔼 [ X 2 k 1 ] = 1 2 ( 1 + 360 β 2 ) ( 2 𝔼 [ Z 2 k 1 ] 2 β 𝔼 [ Z 2 ( k + 1 ) ] + β 2 𝔼 [ Z 2 ( k + 3 ) 1 ] ) = β Γ ( 2 k + 3 ) 2 ( 1 + 360 β 2 ) .◻

Proposition 3. Let XBSL(β), then the mean and the variance of X, and the indices of skewness and kurtosis, which are denoted by β1 and β2, respectively, are given by

  1. 𝔼[X]=24β1+360β2.

  2. Var[X]=2(1+10152β2+3628800β4)1+720β2+129600β4.

  3. β1=576β(11662β2745200β3)[2(1+10152β2+3628800β4)]3/2.

  4. β2=24(1+216β2(343+24β2(15997+2028600β2+680400029β4)))[2(1+10152β2+3628800β4)]2.

Figure 2 presents the behavior of the skewness and kurtosis indices as a function of the β parameter. Notice that, the length of the admissible intervals for the kurtosis parameter of the BSL distribution are larger than the corresponding intervals of the alpha-skew-normal (ASN) model of Elal-Olivero (2010), the skew-normal (SN) model by Azzalini (1985), the power-normal (PN) model, see Pewsey et al. (2012); and the alpha-power skew normal (SNAP) distribution by Martínez-Flórez et al. (2014)MARTÍNEZ-FLÓREZ G, BOLFARINE H & GÓMEZ HW. 2014. Skew-normal alpha-power model. Statistics 48(6): 1414-1428. which are (1.700,3.7489), (3.000,3.869), (1.717,4.3556) and (4.5328,5.4386), respectively.

Figure 2
The graphs of skewness (on the left) and kurtosis (on the right) of 𝐁𝐒𝐋(𝛃) as function of 𝛃.

Proposition 4. If (x;β) is the cumulative distribution function of XBSL(β) then

( x ; β ) = { 2 exp ( x ) + 2 β Γ ( 4 , x ) + β 2 Γ ( 7 , x ) 4 ( 1 + 360 β 2 ) , if x < 0 , 1 2 exp ( x ) 2 β Γ ( 4 , x ) + β 2 Γ ( 7 , x ) 4 ( 1 + 360 β 2 ) , if x 0 ,

where Γ(n,x) is the upper incomplete gamma function defined as:

Γ ( n , x ) = x + t n 1 e t d t ; for x , n

Proof. For x<0, we obtain

( x ; β ) = P ( X x ) = 1 4 ( 1 + 360 β 2 ) x ( 2 2 β t 3 + β 2 t 6 ) exp ( t ) d t = 1 4 ( 1 + 360 β 2 ) [ 2 exp ( x ) + 2 β x + t 3 exp ( t ) d t + β 2 x + t 6 exp ( t ) d t ] = 1 4 ( 1 + 360 β 2 ) [ 2 exp ( x ) + 2 β Γ ( 4 , x ) + β 2 Γ ( 7 , x ) ]

On the other hand, for x0, we obtain

( x ) = 1 P ( X > x ) = 1 1 4 ( 1 + 360 β 2 ) x + ( 2 2 β t 3 + β 2 t 6 ) exp ( t ) d t = 1 1 4 ( 1 + 360 β 2 ) [ 2 x + exp ( t ) d t 2 β x + t 3 exp ( t ) d t + β 2 x + t 6 exp ( t ) d t ] = 1 1 4 ( 1 + 360 β 2 ) [ 2 exp ( x ) 2 β Γ ( 4 , x ) + β 2 Γ ( 7 , x ) ]◻

Remark 1. We consider a truncated BSL(β) distribution to obtain a new and useful lifetime distribution. A random variable T has a truncated beta-skew Laplace distribution (at zero), denoted by TBSL(β), if its pdf is given by

f ( t ) = 1 + ( 1 β t 3 ) 2 2 ( 1 + 6 β + 360 β 2 ) exp ( t ) , t > 0 , (5)

We consider a truncated Note that, if β=0, (5) reduces to the standard exponential distribution.

The survival function and hazard rate of a random variable T following a TBSL(β) distribution is given by

S T ( t ) = P ( T > t ) = 2 exp ( t ) 2 β Γ ( 4 , t ) + β 2 Γ ( 7 , t ) 2 ( 1 + 6 β + 360 β 2 ) , t > 0 ,

and

h T ( t ) = 2 2 β t 3 + β 2 t 6 2 exp ( t ) 2 β Γ ( 4 , t ) + β 2 Γ ( 7 , t ) exp ( t ) , t > 0 ,

respectively. Some shapes of the hazard function are presented in Figure 3.

Figure 3
Illustrations of the hazard rate of 𝐁𝐒𝐋(𝛃) for selected values of 𝛃.

Proposition 5. If MX(t) is the moment generating function (MGF) of XBSL(β) then

M X ( t ) = [ ( M 1 ( t ) + M 1 ( t ) ) 6 β ( M 4 ( t ) M 4 ( t ) ) + 360 β 2 ( M 7 ( t ) + M 7 ( t ) ) ] 2 ( 1 + 360 β 2 )

where Mi(t) is the MGF of a random variable Gamma(i,1) given by:

M i ( t ) = ( 1 t ) i

Proof. From Equation (1), the MGF of the BSL distribution is expressed as

M X ( t ) = + e t x ( 2 2 β x 3 + β 2 x 6 ) exp ( | x | ) d x 4 ( 1 + 360 β 2 ) = 0 + ( 2 + 2 β x 3 + β 2 x 6 ) e ( 1 + t ) x d x + 0 + ( 2 2 β x 3 + β 2 x 6 ) exp ( ( 1 t ) x ) d x 4 ( 1 + 360 β 2 ) = ( M 1 ( t ) + M 1 ( t ) ) 6 β ( M 4 ( t ) M 4 ( t ) ) + 360 β 2 β ( M 7 ( t ) + M 7 ( t ) ) 2 ( 1 + 360 β 2 )

where Mi(t)=(1t)i.

Remark 2. Let XBSL(β). The beta-skew-Laplace family of distributions with location and scale parameters can be defined as the distribution of Z=μ+σX, which has pdf given by

f B S L ( z ; μ , σ , β ) = ( 1 β ( z μ σ ) 3 ) 2 + 1 4 σ ( 1 + 360 β 2 ) exp ( | z μ | σ ) , z , (6)

and it is denoted by ZBSL(μ,σ,β).

The distribution function associated to the density function (6) is given by

B S L ( z ; μ , σ , β ) = { 2 exp ( z μ σ ) + 2 β Γ ( 4 , z μ σ ) + β 2 Γ ( 7 , z μ σ ) 4 ( 1 + 360 β 2 ) , if z < μ , 1 2 exp ( z μ σ ) 2 β Γ ( 4 , z μ σ ) + β 2 Γ ( 7 , z μ σ ) 4 ( 1 + 360 β 2 ) , if z μ , (7)

and the moment of order r by

𝔼 [ Z r ] = i = 0 r ( r i ) σ i μ r i 𝔼 [ X i ] , r = 1 , 2 ,

where 𝔼[Xi] is the i-th moment of XBSL(β).

Maximum likelihood estimation

We consider a random sample z=(z1,,zn) of size n from BSL(μ,σ,β) distribution. The log-likelihood distribution is given by

( μ , σ , β ) = n log σ n log 4 ( 1 + 360 β 2 ) + i = 1 n log ( 1 + ( 1 β x i 3 ) 2 ) i = 1 n | x i | (8)

where xi=(ziμ)/σ. Note that (μ,σ,β) is a continuous function in each parameter, however, it is not differentiable at zi=μ, i=1,,n. We assume ziμ, i=1,,n, then, the score functions can be obtained by taking the derivative of the likelihood function with respect to parameters μ, σ and β. By equating the score functions to zero and solving the system equations, we provide the maximum likelihood estimators (MLEs) of μ, σ and β, which do not have a closed form. The MLEs can be obtained by numerical method such as Newton-Rapshon type procedure. We use the maxLike function of the statistical software R Development Core Team (2018)R DEVELOPMENT CORE TEAM. 2018. R: A Language and Environment for Statistical Computing. URL http://www.R-project.org. ISBN 3-900051-07-0.
http://www.R-project.org...
for maximizing the likelihood function (μ,σ,β) which use the Newton-Rapshon method. The Fisher information matrix, which is denoted by Iy(Θ), where Θ=(μ,σ,β), has entries given by iθkθk=𝔼[2/θkθk], for k,k=1,2,3, with (θ1,θ2,θ3)=(μ,σ,β).

We can observe that the BSL(μ,σ,β) family does not satisfy the regularity conditions, because the Laplace distribution, as particular case of the BSL family density, is not differentiable at μ. However, under a weaker assumption that the density is absolutely continuous, which is the case for the Laplace density (see Kotz et al. (2001)KOTZ S, KOZUBOWSKI T & PODGORKI K. 2001. The Laplace distribution and generalizations: A revisits with applications to communications, economics, Engineering and finance. Springer.), we can obtain the observed Fisher information matrix Jy(Θ^), with entries jθkθk=2/θkθk, for k,k=1,2,3. Thus, we can find a lower bound for the standard errors (SE) of the MLEs as the square root of the diagonal elements of the observed Fisher information matrix.

EXPONENTIATED BETA-SKEW-LAPLACE DISTRIBUTION

Durrans (1992) introduced an asymmetric version of the normal distribution known as exponentiated normal or power-normal model. Its pdf is given by

f P N ( x ; α ) = α ϕ ( x ) ( Φ ( x ) ) α 1 , x , α + , (9)

The power-normal is denoted shortly by PN(α) and the α parameter controls the asymmetry and kurtosis of the distribution. The PN model is an alternative to the skew-normal model of Azzalini (1985) to fit asymmetric data and it was studied widely by other authors, see for example Pewsey et al. (2012) and Gupta & Gupta (2008)GUPTA RD & GUPTA RC. 2008. Analyzing skewed data by power-normal model. Test 17(1): 197-210.. Based on the general structure of the model in (11), which is known as the alpha-power (AP) family of distributions with pdf given by

fAP(x;α)=αf(x)((x))α1,x,α+,(10)
where is an absolutely continuous cdf, and f=d/dx is the respective pdf, in this section we introduce an extension of the BSL distribution to exponentiated families (10) by considering the α parameter.

Definition 2. The random variable X is said to have an exponentiated beta-skew-Laplace distribution, which we will denote by XEBSL(μ.σ,α,β), if X has pdf given by

f E B S L ( x ; μ , σ , β ) = α f B S L ( x ; μ , σ , β ) ( B S L ( x ; μ , σ , β ) ) α 1 , x , (11)

where fBSL() and BSL() are the pdf and the cdf of the BSL distribution, respectively.

The graph in Figure 4 shows the behavior of EBSL distribution for some selected values of α and β for the standard case, that is, μ=0 and σ=1 which is denoted by EBSL(α,β). It is observed from the graph that, the distribution has bimodal and trimodal behavior.

Figure 4
Probability density function of EBSL(𝛂,𝛃) for selected values of 𝛂 and 𝛃.

Properties of the EBSL distribution

Let XEBSL(μ.σ,α,β),

  1. If α=1, then XBSL(μ,σ,β).

  2. If β=0, the pdf (11) reduces to the following pdf

    fEL(x;μ,σ,α)=αfL(x;μ,σ)(L(x;μ,σ))α1,x,(12)
    where fL() and L() are the pdf and cdf of the Laplace distribution. In sequel Equation (12) is referred as the exponentiated Laplace (EL) distribution, XEL(μ,σ,α).

  3. If α=1 and β=0, then \(X\sim{\textrm L}(\mu,\sigma)\).

  4. If β±, then XdBL(μ,σ,3), that is, the bimodal-symmetric Laplace of order 3 distribution of scale-location follows.

Lemma 1. The density function (11) has at most three modes.

Lemma 2. Let YEBSL(α,β), then the k-th moment of Y is given by

𝔼 [ Y k ] = α 4 ( 1 + 360 β 2 ) + ( 2 y k 2 β y k + 3 + β 2 y k + 6 ) exp ( | y | ) [ B S L ( y ; α , β ) ] α 1 d y . (13)

Maximum likelihood estimation

We consider a random sample z=(z1,,zn) of size n from EBSL(μ,σ,α,β) distribution. The log-likelihood distribution is given by

( μ , σ , α , β ) a m p ; = n log α n log σ n log 4 ( 1 + 360 β 2 ) i = 1 n | x i | a m p ; + i = 1 n log { 1 + ( 1 β x i 3 ) 2 } + ( α 1 ) i = 1 n log F B S L ( x i ; μ , σ , β )

where xi=(ziμ)/σ. The function (μ,σ,α,β) is continuous in each parameter, but it is not differentiable at zi=μ, i=1,,n. To obtain the MLEs of μ, σ, α and β and their corresponding standard errors, we proceed similarly as in BSL model.

Extension to Censored Data

Based on the goodness of the EBSL distribution to fit asymmetric uni-multimodal data, in this section we introduce the censored EBSL model which we will be denote by CEBSL.

Definition 3. Suppose that the random variable X follows EBSL distribution and consider a random sample 𝐗=(X1,,Xn), where only the Xi values greater than a constant k are recorded. In addition, for values Xik only the value of k is recorded. Therefore, for i=1,2,,n, the observed values Xio, can be written as

X i o = { k , if X i k , X i , if X i > k .

The resulting sample is said to be a censored EBSL and we say that X is a censored random variable EBSL. We will use the notation XCEBSL(μ,σ,α,β).

From Definition 3 it follows that P(Xio=k)=P(Xik)=(BSL((kμ)/σ))α and for the observations Xio=Xi the distribution of Xio is the same of Xi, i.e., XioEBSL(μ,σ,α,β). For convenience, we choose to work with the case of left-censored data, however, the followings results can be extended to other types of censorship.

Properties of the CEBSL Model

Let XCEBSL(μ,σ,α,β),

  1. If α=1, then XCBSL(μ,σ,β), where CBSL indicates the censored beta-skew-Laplace model.

  2. If β=0, then XCEL(μ,σ,α,ν), where CEL indicates the censored exponentiated Laplace model.

  3. If α=1 and β=0, then XCL(μ,σ,ν), that is, the censored Laplace model follows.

The estimates of the parameters of the model can be obtained by using maximum likelihood method, where the log-likelihood function is given by

( θ θ ; X ) a m p ; = α 0 log F B S L ( y k ; α , β ) + n 1 log α n 1 log σ n 1 log 4 ( 1 + 360 β 2 ) a m p ; + 1 log { 1 + ( 1 β y i 3 ) 2 } 1 | y i | + ( α 1 ) 1 log F B S L ( y i ; μ , σ , β )

where yk=(kμ)/σ, yi=(xiμ)/σ, and 1 and 0 are the sum over censored individuals and uncensored individuals, respectively, and n1 is the number of uncensored individuals.

In order to identify atypical observations and/or model misspecification, we analyzed the transformation of the martingale residual, rMTi, proposed by Barros et al. (2010)BARROS M, GALEA M, GONZÁLEZ M & LEIVA V. 2010. Influence diagnostics in the tobit censored response model. Stat Methods Appl 19(3): 379-397. doi:10.1007/ s10260-010-0135-y.. These residuals are defined by

r M T i = s g n ( r M i ) 2 [ r M i + ρ i log ( ρ i r M i ) ] , i = 1 , , n

where rMi=ρi+logS(xi;𝛉̂) is the martingal residual proposed by Ortega et al. (2003)ORTEGA EM, BOLFARINE H & PAULA GA. 2003. Influence diagnostics in generalized log-gamma regression models. Comput Stat Data Anal 42(1-2): 165-186., where ρi=0,1 indicates whether the ith observation is censored or not, respectively, sgn(rMi) denotes the sign of rMi and S(yi;𝛉̂)=P𝛉̂(Yi>yi) represents the survival function evaluated at yi, where 𝛉̂ are the MLE for 𝛉.

SIMULATION STUDY

In this section, a small Monte Carlo simulation is conducted to evaluate the maximum likelihood estimation of the BSL(μ,σ,β) distribution parameters. We considered samples sizes of n=100, 200, 500 and 1,000; the true values of the parameter were β=1.5, 0.15, 0, 0.15 and 2.0 while, without loss of generality we considered values for the location and scale parameter set to μ=0 and σ=1, respectively. We generated 2,000 artificial samples and to examine the performance of the MLEs, for each sample size and for each estimate, which we denote by θ̂j, we computed the bias, denoted by 𝔼[θ̂j]θj and the mean square error (MSE), defined by 𝔼[(θ̂jθ)2], for j=1,2,3. Table I presents the bias and the MSE of the MLEs of the parameters μ, σ and β.

Table I
Bias and MSE of the MLE of unknonw parameter for the BSL distribution.

We can see from Table I that MSE of the MLEs are decreasing in n and become zero as n goes to infinity. Also, we can see that the absolute value of the bias of the MLEs are decreasing and tends to zero, and so the estimates are asymptotically unbiased. For β>0 (β<0) the MLE β̂ underestimates (overestimates) the parameter β. The MLE μ̂ underestimates (overestimates) the parameter μ when β <0 (β>0). For β>0 (β <0) the MLE σ̂ overestimates (underestimates) the parameter σ.

ILLUSTRATIONS

In this section, we apply the proposed models to some real data sets. We use the statistical software R version 3.5.3 with the package maxLike for maximizing the corresponding likelihood functions (R Development Core Team 2018). For comparing purposes of various models, the AIC (Akaike 1974AKAIKE H. 1974. A new look at statistical model identification. IEEE Trans Automat Contr 19(6): 716-723.), BIC (Schwarz 1978SCHWARZ G. 1978. Estimating the dimension of a model. Ann Stat 6(2): 461-464.) and CAIC (Bozdogan 1987BOZDOGAN H. 1987. Model selection and akaike’s information criterion (AIC): The general theory and its analytical extensions. Psychometrika 52(3): 345-370.) information criteria were used.

Illustration 1: Old faithful geyser data

The second data set concerns to measurements about wait times between the eruptions (in minutes) of the Old Faithful geyser in Yellowstone National Park, Wyoming, U.S. More information of these data can be seen in Azzalini & Bowman (1990)AZZALINI A & BOWMAN AW. 1990. A Look at some data on the old faithful geyser. J R Stat Soc Ser C Appl Stat 39(3): 357-365. who takes a look at this set of data. In addition to the EBSL and BSL models, we also fit the ASL and the ABSL models. The results of the fit of these models can be seen in Table II. The standard errors of the estimators were obtained by using the observed information matrix, and again, to compare the fitted models, we use the AIC, BIC and CAIC criteria. According to any of these criteria, the best model is EBSL. Figure 5 shows the histogram of the data together with the different fitted models.

Table II
Parameter estimates (S.E) of the fitted models to the old faithful geyser data set.

Illustration 2: Stellar abundances data

The third data set is related to measurements for 68 solar-type stars4, which are available in the package astrodatR (Feigelson 2014FEIGELSON ED. 2014. astrodatR: Astronomical Data. URL https://cran.r-project.org/web/packages/astrodatR/. R package v. 0.1.
https://cran.r-project.org/web/packages/...
) of the software R Development Core Team (2018) under the name Stellar abundances. These data were previously analyzed by Mattos et al. (2018)MATTOS T, GARAY AM & LACHOS VH. 2018. Likelihood-based inference for censored linear regression models with scale mixtures of skew-normal distributions. J Appl Stat 45(11): 2039-2066. using the Scale Mixture of Skew Normal Censored Regression (SMSNCR) models. We take only the response variable: log N(Be), which represents the log of the abundance of beryllium scaled to Sun’s abundance (i.e. the Sun has logN(Be)=0.0).

In astronomical research, a previously identified sample of objects (stars, galaxies, quasars, X-ray sources, etc.) is observed at some new wavebands. According to Feigelson (2014), due to limited sensitivities, some objects may be undetected, leading to upper limits in their derived luminosities. For this dataset we have 12 left-censored data points, i.e. 12 undetected beryllium measurement, that represents 19.35% of observations. Table III presents the ML estimates for the parameters of the censored alpha-skew-Laplace (CASL), censored beta-skew-Laplace (CBSL), censored alpha-beta-skew-Laplace (CABSL) and censored exponentiated beta-skew-Laplace (CEBSL) models, together with their corresponding standard errors. Table III also compares the fit of the four models using the model selection criteria (AIC, BIC and CAIC). Note that the CEBSL model has better fit than the CASL, CBSL and CABSL models. The plots of rMTi with generated confidence envelopes are presented in Figure 6. From this figure, we can see clearly that the CBSL, CABSL and CEBSL models fit better the data than the CASL model, since, in that cases, there are not observations which lie outside the envelopes.

Table III
Parameter estimates (S.E) of the fitted models to the Stellar abundances data set.
Figure 5
Old faithful geyser data: Density function of the fitted models.
Figure 6
Stellar abundances data. Envelopes of transformed martingale residuals for CASL, CBSL, CABSL and CEBSL models.

CONCLUSIONS

In this paper, two new class of unimodal, bimodal as well as trimodal skew distribution is proposed. The main statistical properties and the problem of estimation of the parameters are studied in details by using maximum likelihood method. The models extend the Laplace distribution to trimodal symmetric and asymmetric case. An extension to censored data is also considered. Furthermore, we have shown that such distributions are more flexible than certain rival models and they fit better to some real data sets.

REFERENCES

  • AKAIKE H. 1974. A new look at statistical model identification. IEEE Trans Automat Contr 19(6): 716-723.
  • ARNOLD BC, GÓMEZ HW & SALINAS HS. 2009. On multiple constraint skewed models. Statistics 43(3): 279-293.
  • AZZALINI A. 1985. A class of distributions which includes the normal ones. Scand J Stat 12(2): 171-178.
  • AZZALINI A & BOWMAN AW. 1990. A Look at some data on the old faithful geyser. J R Stat Soc Ser C Appl Stat 39(3): 357-365.
  • BARROS M, GALEA M, GONZÁLEZ M & LEIVA V. 2010. Influence diagnostics in the tobit censored response model. Stat Methods Appl 19(3): 379-397. doi:10.1007/ s10260-010-0135-y.
  • BOLFARINE H, MARTÍNEZ-FLÓREZ G & SALINAS HS. 2018. Bimodal symmetric-asymmetric power-normal families. Commun Stat Theory Methods 47(2): 259-279.
  • BOZDOGAN H. 1987. Model selection and akaike’s information criterion (AIC): The general theory and its analytical extensions. Psychometrika 52(3): 345-370.
  • DURRANS SR. 1992. Distributions of fractional order statistics in hydrology. Water Resour Res 28(6): 1649-1655.
  • ELAL-OLIVERO D. 2010. Alpha-skew-normal distribution. Proyecciones (Antofagasta) 29(12): 224-240.
  • ELAL-OLIVERO D, GÓMEZ HW & QUINTANA FA. 2009. Bayesian modeling using a class of bimodal skew-elliptical distributions. J Stat Plan Inference 139(4): 1484-1492.
  • FEIGELSON ED. 2014. astrodatR: Astronomical Data. URL https://cran.r-project.org/web/packages/astrodatR/ R package v. 0.1.
    » https://cran.r-project.org/web/packages/astrodatR/
  • GÓMEZ HW, ELAL-OLIVERO D, SALINAS H & BOLFARINE H. 2009. Bimodal extension based on the skew-normal distribution with application to pollen data. Environmetrics 22(1): 50-62.
  • GUPTA RD & GUPTA RC. 2008. Analyzing skewed data by power-normal model. Test 17(1): 197-210.
  • KIM HJ. 2005. On a class of two-piece skew-normal distribution. Statistics 39(6): 537-553.
  • KOTZ S, KOZUBOWSKI T & PODGORKI K. 2001. The Laplace distribution and generalizations: A revisits with applications to communications, economics, Engineering and finance. Springer.
  • MA Y & GENTON MG. 2004. Flexible class of skew-symmetric distributions. Scand J Stat 31(3): 459-468.
  • MARTÍNEZ-FLÓREZ G, BOLFARINE H & GÓMEZ HW. 2014. Skew-normal alpha-power model. Statistics 48(6): 1414-1428.
  • MARTÍNEZ-FLÓREZ G, BOLFARINE H & GÓMEZ HW. 2018. Censored bimodal symmetric - asymmetric families. Stat Interface 11: 237-249.
  • MATTOS T, GARAY AM & LACHOS VH. 2018. Likelihood-based inference for censored linear regression models with scale mixtures of skew-normal distributions. J Appl Stat 45(11): 2039-2066.
  • ORTEGA EM, BOLFARINE H & PAULA GA. 2003. Influence diagnostics in generalized log-gamma regression models. Comput Stat Data Anal 42(1-2): 165-186.
  • PEWSEY A, GÓMEZ HW & BOLFARINE H. 2012. Likelihood-based inference for power distributions. Test 21(4): 775-789.
  • R DEVELOPMENT CORE TEAM. 2018. R: A Language and Environment for Statistical Computing. URL http://www.R-project.org ISBN 3-900051-07-0.
    » http://www.R-project.org
  • SCHWARZ G. 1978. Estimating the dimension of a model. Ann Stat 6(2): 461-464.
  • SHAMS-HARANDI S & ALAMATSAZ MH. 2013. Alpha–skew–Laplace distribution. Stat Probab Lett 83(3): 774-782.
  • SOBHAN S, DOOSTPARAST M & JAMALIZADEH A. 2016. The alpha-beta skew normal distribution: properties and applications. Statistics 50(2): 338-349.

Publication Dates

  • Publication in this collection
    24 Oct 2022
  • Date of issue
    2022

History

  • Received
    22 Dec 2019
  • Accepted
    2 June 2020
Academia Brasileira de Ciências Rua Anfilófio de Carvalho, 29, 3º andar, 20030-060 Rio de Janeiro RJ Brasil, Tel: +55 21 3907-8100 - Rio de Janeiro - RJ - Brazil
E-mail: aabc@abc.org.br