A competitive family to the Beta and Kumaraswamy generators: Properties, Regressions and Applications

: We define two new flexible families of continuous distributions to fit real data by compounding the Marshall–Olkin class and the power series distribution. These families are very competitive to the popular beta and Kumaraswamy generators. Their densities have linear representations of exponentiated densities. In fact, as the main properties of thirty five exponentiated distributions are well-known, we can easily obtain several properties of about three hundred fifty distributions using the references of this article and five special cases of the power series distribution. We provide a package implemented in R software that shows numerically the precision of one of the linear representations. This package is useful to calculate numerical values for some statistical measurements of the generated distributions. We estimate the parameters by maximum likelihood. We define a regression based on one of the two families. The usefulness of a generated distribution and the associated regression is proved empirically.

The cdf H(z) and survival function H(z) = 1 -H(z) of the MO class with baseline G(z; τ τ τ) are and respectively, where ᾱ = 1α.
Five important distributions are special cases of (4): the zero-truncated Poisson, logarithmic, negative binomial, geometric and zero-truncated binomial distributions.
The cdf of X = max{Z 1 , . . ., Z N } conditional given N = n is and then the unconditional cdf of X follows from (4) The conditional cdf of Y = min{Z 1 , . . ., Z N } under N = n is and then the unconditional cdf of Y follows from (4) as Equations ( 5) and ( 6) define two Marshall-Olkin Power Series-G (MOPS-G) families under baseline G.They provide a strong motivation for explaining the failure time of any mechanism formed by an unknown number N of identical and independent (parallel or serial) components.The densities of X and Y are obtained by differentiating ( 5) and ( 6).We emphasize that these equations can generate many MOPS models.For each baseline G, we can generate ten (2 × 5) associated models from the five discrete distributions in Equation (4).For α = 1, we have the Power Series-G (PS-G) classes under baseline G.
The minimum (Y) and maximum (X) statistics can be applied in several series and parallel systems with identical components and have many industrial and biological applications.In parallel systems, the random variable Y models the time of the first component to fail, while X models the time for the breakout system.A dual interpretation can be given for systems with serial components.These random variables are also very useful in oncology.For example, suppose we are studying a recurrence of a certain type of cancerous tumor of an individual after undergoing any kind of treatment.So, the time for the first cell to activate to produce cancer cells can be modeled by the generated distribution of Y, while the disease manifestation (if it occurs only after an unknown number of factors have been active) can be modeled by the generated distribution of X.
Four new distributions based on the MOPS construction are introduced for illustrative purposes in Section Four special models.We derive linear representations for the densities of X and Y in Section Expansions.A package in R is presented in Section Numerical evaluation to calculate numerically several mathematical properties for the generated distributions based on the linear representations.General structural properties for the two families are addressed in Section Properties.In Section Estimation, we estimate the parameters for one of the families.We introduce in Section Regression the Marshall-Olkin Truncated Poisson Weibull regression defined from one of the families.In Section Two simulation studies, some simulations examine the accuracy of the maximum likelihood estimates (MLEs) and the quantile residuals (qrs).Two applications prove the utility of our finding in Section Applications.Finally, we offer concluding remarks in Section Conclusions.

FOUR SPECIAL MODELS
First, consider the zero-truncated Poisson in (4).The cdfs of the Marshall-Olkin Zero-Truncated Poisson-G (MOTP-G) distributions are determined from Equations ( 5) and ( 6) as and The Weibull cdf with scale parameter λ > 0 and shape parameter β > 0 is (for x ≥ 0) Then, the cdf and survival function of the MO-Weibull (MOW) distribution are respectively.By inserting the last two formulae in Equations ( 7) and ( 8) and differentiating the resulting expressions, we obtain the MOTP-Weibull (MOTPW) densities and respectively, where u = u(x) = (λx) β in f X (x) and u = u(y) = (λy) β in f Y (y).
Second, consider the geometric distribution in (4).The cdfs of the Marshall-Olkin Geometric-G (MOG-G) classes follow from Equations ( 5) and ( 6) and The Burr XII (BXII) cdf is (for x > 0) where β > 0 and λ > 0 are shape parameters.For λ = 1 and β = 1 in Equation ( 13), we have the log-logistic (LL) and Lomax distributions, respectively.Hence, the cdf and survival function of the Marshall-Olkin Burr XII (MOBXII) distribution are and

respectively.
By inserting the last two formulae in Equations ( 11) and ( 12) and differentiating the resulting expressions with respect to x and y, respectively, we obtain the MOG-Burr XII (MOGBXII) densities and For the MOTPW and MOGBXII distributions (to the maximum X) referred to (9) and ( 14), some plots of the densities and cumulative functions are displayed in Figures 1 and 2, respectively.The various forms of the densities indicate more flexibility than the parent distributions.
We can note increasing, decreasing, and unimodal shapes for the hrf of the MOTPW distribution in Figure 3. Also, we see a slightly different hrf with increasing, decreasing and increasing shape.
We now obtain the density of Y when α > 1.By changing the denominator in Equation ( 20), we have Applying expansion (17) in the last equation . .and k = 0, 1, . ..).Using the binomial theorem, we can rewrite F Y (y) as where π i+k≥1 (y) is the exp-G density with power parameter i + k ≥ 1.
Equations ( 18), ( 19), ( 21) and ( 22) are the main results of this section.These linear representations have great utility for deriving structural properties of the maximum X and minimum Y from well-known exp-G properties.More than thirty five exp-G models have been studied so far and then it is possible to construct at least three hundred fifty (70 × 5) MOPS-G models with properties determined from those exp-G properties.We can use statistical platforms with ten terms to have precise results.

NUMERICAL EVALUATION
In order to evaluate the analytical results presented in the previous sections, a package was implemented using the R programming language (R Core Team 2022).The MarshallOlkinPSG package was constructed in a generic way, that is, its most important functions allow generalizations for any baseline G distribution or even inform a zero-truncated PS distribution.
The library code can be obtained from GitHub at https://github.com/prdm0/MarshallOlkinPSG.On the library's website (see https://prdm0.github.io/MarshallOlkinPSG) it is possible to have more information on the functions implemented through the documentation and usage examples.
To install the package hosted and maintained on GitHub, it is necessary to previously install the remotes library.With the prerequisite met, the package MarshallOlkinPSG can be installed as: # Install the remotes package : # install .packages (" remotes ") remotes :: install_github ("prdm0/ MarshallOlkinPSG ", force = TRUE) The function eq_19() implements Equation ( 19) and compares, for example, with the exact MOTPW density in Equation (9).To facilitate comparison, the function pdf_theorical() implements this density function.By doing help(eq_19) it is possible to access an example of comparison of the two equations.Note that Equation ( 19) approximates ( 9) very well when finite sums are taken in applied problems.In other words, the results achieved by the function eq_19() approximates very well those from pdf_theorical().The function eq_19() will also allow any baseline cdf G(x) as an argument of eq_19().
The function eval_plot_moptw() allows to validating numerically Equation ( 19) by means of plots.The true parameters for the MOTPW density are: α = 1.20, θ = 1.50, β = 1.33, and λ = 2.In addition, we require just a few terms in the sums to obtain a reasonable level of precision as shown in the plots in Figure 5, where six or eight terms provide very accurate approximations.

PROPERTIES
We now provide some mathematical properties of T s that can be easily utilized in the linear representations of the previous section to find the corresponding properties of X and Y.
The nth ordinary moment of T s has the form where Q G (u; τ τ τ) = G -1 (u; τ τ τ) is the qf of G.
Explicit expressions for several exp-G moments can be determined from ( 23).
The nth incomplete moment of T s follows the previous algebra where the integral can be calculated for the great majority of G distributions.The first incomplete moment m 1 (y) is the most important case of ( 24) to find mean deviations and Lorenz and Bonferroni curves.
The moment generating function (mgf) of T s follows as The mgfs of exp-G distributions con be determined from Equation (25).
An Acad Bras Cienc (2022) 94( 2) e20201972 10 | 20 The log-likelihood function for ψ from a random sample x 1 , . . ., x n of X is A similar development can be conducted for the random variable Y defined from Equation ( 6) for any baseline G.
We can find the MLE ψ by maximizing Equation ( 27) using the MaxBFGS sub-routine (Ox program), optim function (R), and PROC NLMIXED (SAS).The AdequacyModel package can also maximize ( 27) using the PSO (particle swarm optimization) approach from the quasi-Newton BFGS, Nelder-Mead and simulated-annealing methods to maximize the log-likelihood function and it does not require initial values.Details are available at Marinho et al. (2019) and https://github.com/prdm0/AdequacyModel.These scripts can be executed for a wide range of initial values and may lead to more than one maximum.However, in these cases, we consider the MLEs corresponding to the largest value of the maximum log-likelihood.There are sufficient conditions for the existence of these estimates such as compactness of the parameter space and the concavity of the log-likelihood function, but they can exist even when the conditions are not satisfied.In general, there is no explicit solution for the estimates from maximizing ( 27), but we can establish theoretical conditions on their existence and uniqueness for very special models by examining the ranges of the score components.
In a similar manner, we can construct many other regressions based on other MOPS-G distributions defined from Equations ( 5) and ( 6).
The log-likelihood function for the vector ψ = (α, θ, η T 1 , η T 2 ) T from the MOTPW regression can be reduced to We obtain the MOTPW distribution for λ i = λ and β i = β.
Let ψ be the MLE of ψ.Equation ( 29) can also be maximized using the gamlss regression framework (Stasinopoulos & Rigby 2008) in R.

GAUSS M. CORDEIRO et al.
A COMPETITIVE FAMILY TO THE BETA AND KUMARASWAMY G

TWO SIMULATION STUDIES
We perform two simulation studies.The first one examines the accuracy of the MLEs of the parameter estimates in the MOTPW distribution.The second one does the same for the MOTPW regression.

The MOTPW distribution
First, we evaluate the precision of the estimates in the MOTPW distribution based on 1,000 Monte Carlo simulations using the R software.The simulation procedure follows as: The inverse function Q(u) = F -1 (u) comes from ( 7) Generate u ∼ U(0, 1) and obtain the values x = Q(u) of the MOTPW distribution.
For each scenario and value of n, one thousand samples are generated from the MOTPW regression fitted to each generated data set.The quantities reported in Table II are in good agreement with the asymptotic results for the MLEs.

Residual analysis
We investigate the quantile residuals (qrs) to verity the adequacy of the response distribution to determine outliers in the MOTPW regression.The same approach can be adopted to many other regressions defined from the distributions in ( 5) and ( 6).The qrs are given by (Dunn & Smyth 1996) where Φ(•) is the normal cdf and λ i and β i are defined in Equation ( 28).We consider the same scenarios for the simulations in Section Two Simulation Studies.For each fitted regression, the qrs are calculated from Equation (31).Figures 6, 7, 8, and 9 display QQ plots which show that the empirical distribution of these residuals is close to the standard normal distribution.

APPLICATIONS
The beta Weibull (BW) and Kumaraswamy Weibull (KwW) distributions have been widely used to fit real data in the last ten years or so.We compare the MOTPW distribution with the BW and KwW distributions since all of them have four parameters.The BW density pioneered by Lee et al. (2007) is where all parameters are positive.The KwW density introduced by Cordeiro & de Castro (2011) has the form where all parameters are positive.

Application 1: Hourly dollar wage data
The first application refers to hourly dollar wages for n = 534 US workers.These data are obtained from the SemiPar package (Wand et al. 2005).Table III lists the estimates, standard errors (SEs) in parentheses, and three classical statistics.The lowest values of these measures reveal that the MOTPW is the best model.Next, the likelihood ratio (LR) statistic for comparing the MOTPW and TPW models is 6.159 (p-value < 0.013) which supports the wider distribution.
Figure 10a shows the histogram and the estimated MOTPW density.Figure 10b provides the empirical function and estimated MOTPW cdf, thus revealing that this distribution is appropriate for these data.We note that the co-variable d i1 is significant and d i2 is not.So, there is a real difference between normal and chemical diabetes groups in relation to relative weight and no difference between normal and overt diabetes groups to relative weight.The same findings can be seen in Figure 12.
The LR statistic to compare the MOTPW and TPW regressions is w = 4.590 (p-value=0.032) that indicates that the fist regression is superior to the second regression to these data in terms of model fitting.
The plot of the residuals reported in Figure 11a does not detect outliers and departures from the general assumptions.The worm plot (Buuren & Fredriks 2001) of the residuals in Figure 11b and the QQ plot displayed in Figure 11c show the adequacy of the MOTPW regression for the current data.
A graphical comparison from the estimated cdfs in Figure 12 also supports the regression analysis.

CONCLUSIONS
We define two flexible Marshall-Olkin-Power-Series (MOPS) families of continuous distributions which can be very useful to fit real data.They are obtained by combining the Marshall-Olkin class (Marshall & Olkin 1997) and the power series distribution.Hundreds of continuous distributions can be easily formulated from the two families.We discuss some special distributions and maximum likelihood estimation.We introduce the Marshall-Olkin Truncated Poisson Weibull regression associated with one of the families.Some mathematical properties of these families are presented.We provide a package implemented in R software which can be used to determine numerically some mathematical properties for any distribution in the new families.The utility of the proposed models is proved empirically in two applications.

Figure 3 .Figure 4 .
Figure 3. Plots of the hrf of the MOTPW model.
20 GAUSS M. CORDEIRO et al.A COMPETITIVE FAMILY TO THE BETA AND KUMARASWAMY G

An
Acad Bras Cienc (2022) 94(2) e20201972 7 | 20 GAUSS M. CORDEIRO et al.A COMPETITIVE FAMILY TO THE BETA AND KUMARASWAMY G By differentiating F Y (y), the density of Y can be expressed as

Figure 5 .
Figure 5. Numerical evaluation of (19) with finite sums, where N N N and K K K denote the upper limits of terms in the related sums with the running indices n n n and k k k, respectively.

Table I .
Simulation results for the MOTPW distribution.

Table II .
Simulation results for the MOTPW regression.

Table IV .
Measures for diabetes data.

Table V .
Results for diabetes data.