THE TRANSMUTED HALF-NORMAL DISTRIBUTION WITH APPLICATION TO PRECIPITATION DATA

The Half-Normal distribution has been intensively extended in the recent years. A review of the literature showed that at least 10 extensions of the Half-Normal distribution were introduced between 2008 and 2016. These extensions generalized the behavior of the density and hazard functions, which are restricted to monotonous decreasing and monotonically increasing, respectively. In this paper we propose a new extension called the transmuted Half-Normal distribution using the quadratic rank transmutation map, introduced by Shaw & Buckley (2009). A comprehensive account of mathematical properties of the new distribution is presented. We provide explicit expressions for the moments, moment-generating function, Shannon’s entropy, mean deviations, Bonferroni and Lorenz curves, order statistics, and reliability. The estimation of the parameters is implemented by the maximum likelihood method. The bias and accuracy of the estimators are assayed by the Monte Carlo simulations. This proposed distribution allows us to incorporate covariates directly in the mean and consequently to quantify their influences on the average of the response variable. Experiment with two real data sets show usefulness and its value as a good alternative to several extensions of the Half-Normal distribution in data modeling with and without covariates.


INTRODUCTION
Underlying any parametric inference procedure a probability distribution is used to describe the behavior of a random variable in the population. Over the last years, an uncountable number of probability distributions, mostly having support of positive real numbers, have been proposed in In Section 2 we present the HN distribution used as the baseline distribution to be extended. Its transmuted version is presented in Section 3. Various statistical and reliability properties of the THN are explored and discussed in Section 4. A few characterizations are considered in Section 5. Estimation by the maximum likelihood method is presented in Section 6. In Section 7, Monte Carlo simulations are conducted to investigate the bias and accuracy the maximum likelihood estimators. Two applications considering the proposed distribution are presented in Section 8. Section 9, with some concluding remarks, closes the paper.

THE HALF-NORMAL DISTRIBUTION
If a nonnegative random variable X has HN distribution with scale parameter θ > 0, then the probability density function (p.d.f.) and the cumulative distribution function (c.d.f.) are written, respectively, as: 3 where φ (·) and Φ(·) denote, respectively, the p.d.f. and c.d.f. of distribution of a random variable with standard Normal distribution. The corresponding hazard rate function (h.r.f.) is written as: which is monotonically increasing for every θ . It is worth remembering that p.d.f. is monotonically decreasing for all θ .
Formally, if a random variable Z is normally distributed with mean zero and variance one, then the HN distribution is the distribution of X = θ |Z|.
such that for k = 1 we have E (X) = 2 π θ and for k = 2, E X 2 = θ 2 . The third and fourth standardized moments (asymmetry and kurtosis, respectively) do not depend on θ and are written as: The quantile function can be written in the form: where 0 < p < 1 and Φ −1 (·) is the quantile function of a standard Normal distribution. Note that from (2) we can generate pseudo-random values of X. An alternative strategy can be found in Singh (1994) as a consequence of the application of the Box-Muller transformation Singh (1995).

THE TRANSMUTED HALF-NORMAL DISTRIBUTION
Motivated by the need for more versatile density and hazard functions, Shaw & Buckley (2009) proposed a strategy that has proved useful in extending a baseline distribution. The distribution obtained by this process is called the transmuted-baseline distribution, for example, transmuted Weibull distribution. In this section, from the transmutation procedure, we will introduce the transmuted HN distribution (THN). Figure 1 illustrates how the parameter λ influences the behavior of (3). Since θ is the scale parameter it is set to θ = 1.  It can be checked that the cdf F(x | θ , λ ) of THN(θ , λ ) is a convex combination (finite mixture) of the cdf of maximum and minimum of two i.i.d HN(θ ) random variables by writing F(x | θ , λ ) as That is HN ≥ hazard order THN for all 0 ≤ λ ≤ 1 and HN ≤ hazard order THN for all −1 ≤ λ ≤ 0.
In fact it can be shown that THN ≤ likelihood order HN for all 0 ≤ λ ≤ 1 and THN ≥ likelihood order HN for all −1 ≤ λ ≤ 0.
The mean residual life for individuals at age t is the average remaining life time and corresponds to the ratio of the area under the survival curve to the right of t and S(x | θ , λ ). From (5) we have: Theorem 4.1. The mean residual life of the THN distribution is given by: which can be written as: From (8) we have: and The expressions of the coefficient of variation, skewness and kurtosis are obtained from the following relations:

Moment generating function
Theorem 4.3. If X has THN distribution then the moment generating function of X, M X (t), is given by: Proof. Let X be a random variable with THN distribution, then the moment generating function of X is given by: From the Taylor series expansion of the function e t x we have that (10) can be written in the form: Solving integrals, we get:

Differential entropy
Here we investigate differential entropy of a continuous random variable. This is a measure of the uncertainty variation, and a large entropy value indicates a greater uncertainty in the data. One of the most popular measures is the Shannon entropy Shannon (1951).
The concept of Shannon's entropy refers to the uncertainty of a probability distribution. An important fact is that the entropy H Sh is not a function of the random variable X, but rather of the probability distribution of that variable.
Definition 4.1. The differential entropy H Sh of a continuous random variable X with a probability density function f (x) is defined as where S is the support set of the random variable.
Theorem 4.4. The Shannon's entropy for a continuous random variable X with THN distribution is given by: Proof. In fact, it follows directly from (3) that Using the distributive properties of the logarithmic function we have: Since the functions in question are all integrable, we have Solving the integrals, we have that the Shannon's entropy for the THN distribution is given by:

Mean deviations
The amount of dispersion in a population can be measured by deviations around the mean and median, defined by: where µ and M denote the mean and median, respectively.
To calculate these measures we can use the following relations presented in Nadarajah & Kotz (2006a): where F(µ) and F(M) can be calculated according to the equation (4). Taking the f (x) as the p.d.f. of the THN distribution, we have: In an analogous way we obtain M 0 x f (x | θ , λ ) dx.

Bonferroni and Lorenz curves
The Bonferroni and Lorenz curves proposed by Bonferroni (1930), commonly used in areas such as reliability, demography, economics, medicine and insurance, are applications of the mean deviations and are considered by economists as a measure of social inequality, since they relate the accumulated percentages of income and population . The Bonferroni and Lorenz curves are defined as: Suppose that X is a nonnegative random variable with probability density function f (x) and cumulative distribution function F(x). The Bonferroni and Lorenz curves denoted by B(p) and L(p), respectively, are defined as: In particular, for the THN distribution we have: Theorem 4.5. Let X be a nonnegative continuous random variable that has THN distribution. The Bonferroni and Lorenz curves are given by: For first order statistic, we replace in the general formula j = 1. The result generates the p.d.f. of the minimum of the THN distribution given by:

Extreme Values
The HN(θ ) distribution belongs to the max domain of attraction of the Gumbel extreme value distribution. Hence, there must exist a strictly positive function, say h(t), such that for every x ∈ (0, ∞) (Leadbetter et al., 1987).
Then it can be shown that for every x ∈ (0, ∞). Therefore THN(θ , λ ) belongs to the max domain of attraction of the Gumbel extreme value distribution with lim n→∞ P[a n (X n: for some suitable norming constants a n > 0 and b n ∈ R, where X n:n is the maximum order statistic.

Stress strength reliability
Stress strength reliability estimation is of great interest in engineering, being used in stress-force models or as a measure of performance in electrical and electronic systems. However, it can also be applied in other areas, as it allows a general measure of the differences between two populations Asgharzadeh et al. (2011).
Theorem 4.6. Let X 1 ∼ THN(θ 1 , λ 1 ) and X 2 ∼ THN(θ 2 , λ 2 ), are independent then Proof. Using the p.d.f. and c.d.f. defined in (3) and (4) we have: Applying the distributive law, using properties of integration and isolating terms independent of x, we get: On solving the integrals we arrive at the desired result.
Note that for the case where θ 1 = θ 2 , it immediately follows that

CHARACTERIZATION
Here we discuss characterizations of THN(θ , λ ) distribution based on: (i) simple relationship between two truncated moments and (ii) maximum and minimum order statistics.

Characterizations by two truncated moments
Here we present a characterization of THN(θ , λ ) distribution based on a simple relationship between two truncated moments using the following theorem due to Glänzel (1987)  . This theorem also holds if the interval I is not closed and also when the cd f F is not in a compact form. This characterization is stable in the sense of weak convergence.
Theorem 5.1. Let (Ω, F , P) be a probability space and I = [l, u] be an interval for some l < u (l = −∞, u = ∞ might as well be allowed) . Let X : Ω → I be a continuous random variable with the cd f F and let w 1 and w 2 be two real valued functions defined on I such that is defined for some real function ξ . Assume that w 1 , w 2 ∈ C 1 (I), ξ ∈ C 2 (H) and F are twice continuously differentiable and strictly monotone function on H. More over, assume that the equation ξ w 1 = w 2 has no real solution in the interior of I. Then F is uniquely determined by the functions w 1 , w 2 and ξ , particularly where the function s is a solution of the differential equation s (x) =

Remark:
The general solution of the differential equation in Corollary 5.1 is where C is a constant. The set of functions given in Proposition 5.1 satisfies this differential equation when C = 0. It need to be noted that there are other triplets (w 1 , w 2 , ξ ) satisfying the conditions of Theorem 5.1.

Characterizations by order statistics
Here we use two results from Hamedani et al. (2017) stated in the theorem below to characterize THN(θ , λ ) distribution by maximum (X n:n ) and minimum (X 1:n ) order statistics from a sample of size n from THN(θ , λ ) distribution.

MAXIMUM LIKELIHOOD ESTIMATOR
Let x = (x 1 , . . . , x n ) be a random sample from the THN distribution with p.d.f. expressed by (3). Then, the log-likelihood function, a part from constant terms, can be written as: The MLEs θ and λ of θ and λ , respectively may be obtained by maximization of (13), or solving the following likelihood equations:

SIMULATION STUDY
In this section we present the results of a Monte Carlo simulation used to evaluate the bias and mean square error of the estimates obtained by the maximum likelihood method. In analyzing the bias of θ , we note that for values of λ = −0.9, −0.7, −0.5, −0.3, 0, 0.3, 0.5, θ presented an excellent estimate converging to zero even for small samples. Although for λ = 0.7 and 0.9 the convergence of θ is a little more time consuming, we have observed that with the increase in the sample size the estimated bias is very low, around −0.04 and −0.08 respectively.
We also note that for the values of λ = −0.9, −0.7, −0.5, −0.3, 0, 0.3, 0.5, λ is estimated accurately. Like in the case of θ , λ = 0.7, 0.9 also has a slower convergence, with the estimated bias close to −0.11 and −0.02, respectively.  Though the bias are very low for both the estimated parameters, it is comparatively higher for λ . Moreover higher the amplitude of λ higher is the bias in the estimates of both the parameters showing that the transmutation parameter λ , exerts influence in the estimation of the scale parameter θ .
The mean square errors of θ are extremely low, with the positive values of λ the ones that present a more precise convergence. Although the positive λ parameters do not converge directly to zero, we can note from the graph that the errors in the estimate are less than 0.03. Thus we can conclude that the errors in the estimates are practically insignificant.
Finally, when we observe the mean square error of the parameter λ , we realize that θ exerts influence on its estimation. Analogously to the previous one, we can see that, for all scenarios, the errors tend to zero, the furthest being close to 0.1.
In general, we can conclude that the estimators have the property of asymptotic unbiasedness, since the bias tends to zero as n increases, while trend in the mean squared error show the consistency, because when the value of n increases the errors tend to zero.

REAL DATA ANALYSIS
In this section, we illustrate the applicability of THN distribution using two real data sets that were not been analyzed before in the literature. Our objective is to evaluate its adjustment in relation to other distributions already presented in the literature.
In both applications the data used were compiled from the daily series of daily precipitation obtained at the portal of the National Institute of Meteorology (http://www.inmet.gov.br). In the adjustment of distributions, the cumulative total for the month was considered as a response. The cumulative total monthly, quarterly, and so forth is widely used in the calculation of the Standardized Precipitation Index (SPI).

Monthly Precipitation
For the first application, we consider the data for the month of February of each year, from 1974 to 2016, of the city of Chapecó localized in the state of Santa Catarina. It is important to note that due to some fault it does not record measurement in the month of February of some years in the considered period. Table 1 shows the summary measures of the data used. Looking at Figure 5, it is possible to note that the risk is increasing, an indication that the THN distribution may be an appropriate model for adjustment.
11. Beta Generalized Half-Normal Geometric (BGHNG) due to Ramires et al. (2013). B(a, b). Table 2 shows the maximum likelihood estimates (standard errors) of the fitted distribution. All estimates were obtained in SAS/NLMIXED procedure SAS (2010), applying the Newton-Raphson optimization technique. Although, for all distributions, the resulting variancecovariance matrices were positive definite and max ∂ ∂ Θ (Θ | x) | Θ= Θ < 0.000001, we observed atypical standard errors for some parameters in the GMHN, BGHN, EGHN and KGHN distributions. Our guess is that they converge to a local minimum or, most likely, that the parameters are linear functions of each other (or almost collinear) on the data in question. In fact, from the correlation matrices we obtain corr( θ , α) = 0. To compare the distributions, we consider the statistics based on the likelihood −2×Log-Like, AIC, AICc and BIC and the measures of good adjustment KS, AD and CvM. The best model is one that provides the minimum values of these criteria. Table 3 shows such values and the index indicates the classification obtained for each distribution. We also have in the last column a total rating (sum of ratings) for each distribution. Since there are large uncertainties in the estimates of the GMHN, BGHN, EGHN, KGHN and BGHNG distributions, they will not be considered. When looking at the table, we can see that the THN distribution ranked first, followed by the TGHN and PHN distribution. Both distributions, THN and PHN, have two parameters. The TGHN distribution has three parameters, and the additional parameter (λ ) has zero in its respective confidence interval. In view of this, we can infer that the models that obtained the best fit were THN (17) and PHN (24).

Ten-Day Precipitation
The data consists of the ten-day accumulated precipitation between April and June in the station WMO 83498 localized in the state of Bahia, Brazil. We have considered the historical series from  1961 to 2017. In this application we adopted a regression structure for the scale parameter, that is, log(θ i ) = β 0 + β 1 x i (14) where x i denote the ith observation associate the ith ten-day period. The periods of ten days were considered for the months of April, May, June and July. We have considered the distributions HN, GHN II and THN since only they have closed analytical expressions for the mean.
In Table 4 are reported the parameter estimates and standard errors. Since the β 1 has a negative sign for all distributions, decrease in the accumulated precipitation is indicated as the time passes. The empirical and estimated means (95% confidence intervals) are presented in Table 5 and Figure 6. Also, in Table 6 are present several criteria to discriminate between the HN, GHN II and THN distributions. From these results we can conclude that GHN II and THN provide better fit than the HN distribution. It is also observed that THN has the lowest values of those criteria.    Figure 6 -The empirical and estimated means.

CONCLUSION
In this paper, we present the THN distribution formulated from the quadratic transmutation proposed by Shaw & Buckley (2009). Some characteristics and mathematical properties of the proposed distribution are studied. It is important to note that the moment generating function, moment of order k, mean, variance, asymmetry and kurtosis have explicit analytic expressions, which depend only of the parameters of the THN distribution. Due to the simplicity of the distribution, it was possible to calculate the uncertainty measures as a Shannon entropy and the mean deviations. As the Bonferroni and Lorenz curves as well as the reliability characteristic are presented, giving opportunities to areas such as engineering to extract benefits from the use of the proposed distribution. Distribution of order statistic was calculated for the distribution of THN, as well as their respective expressions for the densities of the maximum and minimum distribution. A Monte Carlo simulation study showed that the parameters are efficiently estimated by maximum likelihood method and show a low bias and a low accuracy even for a small sample sizes, which indicates the potential that the new distribution provides for modeling. In the first application using daily precipitation data and 11 models proposed in the literature (derived from the HN distribution) were placed in competition with the distribution proposed in this work. Five models were withdrawn from the analysis because of their inconsistent estimates and by looking at the statistics based on likelihood −2×Log-Like, AIC, AICc and BIC, as well as KS, AD and CvM, the THN distribution showed a better fit in comparison with the models used. Since a distribution presents an explicit and simple expression for the mean, it was possible to use it for an application using regression structure. When observing the −2×Log-Like, AIC, BIC and SSR criteria, we noticed that the THN distribution presented a good of fit, reinforcing its supremacy when compared to the models used here.
It is important to mention that during the peer review process the THN distribution was considered by Balaswamy (2018). Although both work propose the same distribution, we emphasize that in our paper a more comprehensive account of mathematical properties of the new distribution was presented (survival, hazard rate and residual life functions and their properties; asymptotic behavior of the tails; moments, associated measures and moment generating function; differential entropy; mean deviations; Bonferroni and Lorenz curves; order statistics; extreme values; stress strength reliability; characterizations by two truncated moments; characterizations by order statistics). In addition, we studied the bias and accuracy of the parameters estimated by the maximum likelihood method and illustrated the applicability of THN distribution using two real data sets that were not been analyzed before in the literature.