THE EM ALGORITHM FOR STANDARD STOCHASTIC FRONTIER MODELS

Andrade, Bernardo B. de; Souza, Geraldo S.

doi:10.1590/0101-7438.2019.039.03.0361

ABSTRACT

The Expectation-Maximization (EM) algorithm is developed for the stochastic frontier models most used in practice with cross-section data. The resulting algorithms can be easily programmed into a computer and are shown to be worthy alternatives to general-purpose optimization routines currently used. The algorithms for the half normal and the exponential models have closed-form expressions whereas those for the truncated normal and gamma models will require the numerical solution of a nonlinear equation. Implementations of the EM algorithm either as a stand-alone routine or in accelerated form and also combined with Newton-like methods are discussed. We provide illustrations, along with R tools, for cost and production frontiers.

Keywords:
efficiency; EM acceleration; gamma; maximum likelihood

1 INTRODUCTION

In this paper we consider the Expectation-Maximization (EM) algorithm (^{Dempster et al., 1977}6 DEMPSTER AP, LAIRD NM & RUBIN DB. 1977. Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, Series B, 39(1): 1-38.; ^{Wu, 1983}23 WU CFJ. 1983. On the Convergence Properties of the EM Algorithm. Ann. Statist., 11(1): 95-103.) for doing stochastic frontier (SF) analysis, a technique for measuring economic efficiency. In statistical terminology, an SF model is a linear mixed-effects model, making it natural to consider the EM algorithm for maximum likelihood (ML) estimation.

The regression part of SF models represents the production frontier of the i-th firm: the response y is (possibly) some transformation of measured output and x is a vector of (possibly transformed) inputs. The relation between input and outputs is stochastic: the error ν is the conventional Gaussian noise, $ν_{i} ~ N (0, σ_{ν}^{2})$ , representing stochastic elements outside the control of the firm. Each firm possesses an inefficiency term $u_{i} \geq 0$ which makes the observed output smaller than its (stochastic) potential. Thus,

y_{i} = x_{i}^{'} β + ν_{i} - u_{i} .

(1)

We assume cross-sectional data for which the νi’s and the ui’s are independent of each other and across observations.

In most applications, u is assumed to be either half-normally or exponentially distributed, $u_{i} ~ N^{+} (0, σ_{u}^{2})$ or $u_{i} ~ E x p (λ)$ , respectively. These canonical specifications for u have been generalized by letting $u_{i} ~ N^{+} (μ, σ_{u}^{2})$ (truncated-normal) and $u_{i} ~ G (α, λ)$ (Gamma). These standard models are detailed in the comprehensive work of ^{Kumbhakar & Lovell (2003}16 KUMBHAKAR S & LOVELL C. 2003. Stochastic Frontier Analysis. Cambridge University Press.).

The estimation method of choice for SF analysis is ML but variations of least squares, nonparametric and Bayesian estimation have also been developed and are surveyed by ^{Greene (2008}10 GREENE WH. 2008. The Econometric Approach to Efficiency Analysis. In: FRIED HO, LOVELL CAK & SCHMIDT SS (Eds.), The Measurement of Productive Efficiency and Productivity Growth. chap. 2. Oxford University Press.). SF models were preceded by (non-stochastic) frontier models whose estimates were computed by linear and quadratic programming (^{Aigner & Chu, 1968}1 AIGNER D & CHU DS. 1968. On estimating the industry production function. American Economic Review, 58: 826-839.). Current software for SF analysis inspected by the authors (^{Bogetoft & Otto, 2015}4 BOGETOFT P & OTTO L. 2015. Benchmarking with DEA and SFA. R package version 0.26.; ^{Coelli & Henningsen, 2013}5 COELLI T & HENNINGSEN A. 2013. frontier: Stochastic Frontier Analysis. Available at: http://CRAN.R-Project.org/package=frontier. R package version 1.1-0.
http://CRAN.R-Project.org/package=fronti... ; ^{SAS Institute Inc., 2003}20 SAS INSTITUTE INC. 2003. proc qlim. SAS/STAT Software , Version 9.1. Cary, NC. Available at: http://www.sas.com/.
http://www.sas.com/... ; ^{StataCorp., 2015}21 STATACORP. 2015. frontier. Stata Statistical Software : Release 14. College Station, TX. Available at: http://www.stata.com/.
http://www.stata.com/... ; ^{Greene, 2012}11 GREENE WH. 2012. LIMDEP 10. Available at: http://www.limdep.com/. Econometric Software, Inc.
http://www.limdep.com/... ) has been written around Newton-like routines based on either user-supplied derivatives or numerical approximations. Strictly speaking, the ML problem of SF analysis is a constrained optimization problem since there are parameters restricted to the positive axis. With luck and good starting values this may be of no practical effect in a given application but the usual practice with Newton-like methods is to reparameterize the model and retrieve standard errors by means of the delta method. EM updates will keep the iterates for the positive parameters positive and in our experience with SF estimation EM seems to work with more liberal starting values.

The use of EM in SF estimation is not new. Instead of using Newton, ^{Greene (1982}8 GREENE WH. 1982. Maximum likelihood estimation of stochastic frontier production models. Journal of Econometrics, 18(2): 285-289.) devised a simple iterative scheme by manipulating the likelihood equations of the half normal SF model. This method was later improved by ^{Lee (1983}17 LEE LF. 1983. On maximum likelihood estimation of stochastic frontier production models. Journal of Econometrics, 23(2): 269-274.) who showed that Greene’s scheme did not necessarily solve the likelihood equations and, in addition, did not restrict variance estimates to be positive. Lee devised a new scheme based on solving a set of rewritten equations which made explicit use of the conditional expectations $E (u | ν + u)$ and $E (u^{2} | ν + u)$ . The new scheme actually solved the likelihood equations and preserved positivity of variance estimates. Lee did not mention that his scheme was in fact the EM algorithm for the half normal SF model. ^{Huang (1984}12 HUANG CJ. 1984. Estimation of Stochastic Frontier Production Function and Technical Inefficiency via the EM Algorithm. Southern Economic Journal, 50(3): 847-856.) devised the same algorithm, this time explicitly invoking EM, apparently independently of Lee. In Section 3.1 we will revisit (with different notation) the EM algorithm devised by ^{Lee (1983)}17 LEE LF. 1983. On maximum likelihood estimation of stochastic frontier production models. Journal of Econometrics, 23(2): 269-274. and ^{Huang (1984)}12 HUANG CJ. 1984. Estimation of Stochastic Frontier Production Function and Technical Inefficiency via the EM Algorithm. Southern Economic Journal, 50(3): 847-856. for the half normal model. In later sections we develop the EM algorithm for the other three standard models.

EM does not directly yield standard errors of estimates. Lee was not explicit about calculation of standard errors and Huang used least-squares errors for the regression coefficients. We will use more general methods (Section 3.5). Two illustrations are given in Section 4: estimation of a production frontier with the half normal model using data from the Brazilian Census of Agriculture and estimation of a cost function with exponential and gamma inefficiencies. Concluding remarks and some practical considerations are given in Section 5.

2 STOCHASTIC FRONTIER MODELS IN EM FORM

The SF model (1) can be written as a hierarchy in terms of the total error $ε = y - x^{'} β$ and the inefficiencies u,

ε_{i} | u_{i} ~ N (- u_{i}, σ_{ν}^{2}), u_{i} ~ f_{u},

where fu denotes the density for the inefficiency term. At times, we will make the dependence of ε on β explicit by writing ε(β). The resulting joint density is

f_{θ} (u, ε) = f_{u} (u) \frac{1}{σ_{ν} \sqrt{2 π}} \exp [- \frac{{(ε (β) + u)}^{2}}{2 σ_{ν}^{2}}],

where the vector θ comprises β, $σ_{ν}^{2}$ and any parameters from fu.

The loglikelihood of a set of n independent observations is thus given by

l (θ) = \sum_{i = 1}^{n} \log \int_{0}^{\infty} f_{θ} (u_{i}, ε_{i}) d u_{i},

(2)

In order to maximize the likelihood, instead of directly using (2), the EM algorithm uses the complete-data likelihood (^{Dempster et al., 1977}6 DEMPSTER AP, LAIRD NM & RUBIN DB. 1977. Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, Series B, 39(1): 1-38.; ^{Wu, 1983}23 WU CFJ. 1983. On the Convergence Properties of the EM Algorithm. Ann. Statist., 11(1): 95-103.),

l_{c} (θ) = \sum_{i = 1}^{n} \log f_{θ} (u_{i}, ε_{i}) .

(3)

The iterations of the algorithm result from two steps: Let θ _t be the estimate at iteration t of the algorithm and let $E_{t} (\cdot) \equiv E_{θ_{t}} (\cdot)$ . Then, the E-step is the evaluation of

E_{t} (l_{c} (θ) | ε) = \int f_{θ} (u, ε), f_{θ_{t}} (u | ε) d u,

(4)

and the M-step is the maximization, with respect to θ, of the function

Q (θ | θ_{t}) = E_{t} (l_{c} (θ) | ε) .

(5)

It has been shown that the sequence generated by iteratively solving (4) and (5) converges to a stationary point of ℓ. Furthermore, under regularity conditions (^{Wu, 1983}23 WU CFJ. 1983. On the Convergence Properties of the EM Algorithm. Ann. Statist., 11(1): 95-103.), the ascent property holds: upward movements in Q imply upward movements in ℓ,

l (θ_{t + 1}) - l (θ_{t}) \geq Q (θ_{t + 1} | θ_{t}) - Q (θ_{t} | θ_{t}) .

In order to present the steps of the EM algorithm in the context of SF models we start by denoting the first two conditional moments of (u|ε) by

M_{i} (θ) = E_{θ} (u_{i} | ε_{i}) and K_{i} (θ) = E_{θ} (u_{i}^{2} | ε_{i}) .

(6)

Making explicit the dependence of the total error on the coefficients, ε _i (β), we define

R i (θ_{t}, β) = E_{t} [(u_{i} + ε_{i} {(β))}^{2} | ε_{i}] = K_{i} (θ_{t}) + 2 ε_{i} (β) M_{i} (θ_{t}) + ε_{i}^{2} (β) .

(7)

Note that it is crucial to distinguish β inside ε from β _t as part of θ _t .

Now let

{\bar{M}}_{t} = \frac{1}{n} \sum_{i} M_{i} (θ_{t}), {\bar{K}}_{t} = \frac{1}{n} \sum_{i} K_{i} (θ_{t}),

(8)

and

{\bar{R}}_{t} (β) = \frac{1}{n} \sum_{i} R_{i} (θ_{t}, β) .

(9)

Finally, we define the “working” response w,

w_{i} (θ_{t}) = y_{i} + M_{i} (θ_{t}) .

(10)

The above definitions will be used in the expressions for the Q function and its derivatives in the following sections. We close this section by noting that the integrals in the E-step can be carried out explicitly except in the gamma model (Section 3.4).

3 EM ALGORITHMS FOR THE STANDARD SF MODELS

The original models used in SF analysis were the half normal and exponential models, the former being the default option in most software for SF analysis. EM is developed for these two models in Sections 3.1 and 3.2 resulting in extremely simple algorithms with closed-form expressions and intuitive iterative schemes. The econometric literature on SF models has considered generalizations of those two models to allow for a nonzero mode for u leading to the truncated normal (Section 3.3) and gamma (Section 3.4) SF models. Another direction of generalization is to make some (or all) of the parameters in the distribution of the noise dependent on covariates. In any case, the resulting EM algorithm will not render explicit updates as with the simple half normal and exponential specifications.

3.1 Half Normal Model

We now consider implementation of the EM scheme with the half normal model, $u_{i} ~ N^{+} (0, σ_{u}^{2})$ , that is, for $u \geq 0$ ,

f_{u} (u) = \frac{1}{2 σ_{u}} φ (\frac{u}{σ_{u}}),

where φ denotes the standard normal density. The EM algorithm for the half normal model was introduced by ^{Huang (1984}12 HUANG CJ. 1984. Estimation of Stochastic Frontier Production Function and Technical Inefficiency via the EM Algorithm. Southern Economic Journal, 50(3): 847-856.) but ^{Lee (1983}17 LEE LF. 1983. On maximum likelihood estimation of stochastic frontier production models. Journal of Econometrics, 23(2): 269-274.) arrived at the same algorithm from a different standpoint.

The Q function, after some simplifications using expressions (6)–(9), becomes (constant term omitted)

Q (θ | θ_{t}) = - n [\log (σ_{u} σ_{ν}) + \frac{1}{2 σ_{u}^{2}} {\bar{K}}_{t} + \frac{1}{2 σ_{ν}^{2}} {\bar{R}}_{t} (β)],

and thus,

\frac{\partial Q}{\partial β} = 0 \Leftrightarrow \sum_{i} x_{i} [M_{i} (θ_{t}) + ε_{i} (β)] = 0, \frac{\partial Q}{\partial σ_{ν}^{2}} = 0 \Leftrightarrow σ_{ν}^{2} = {\bar{R}}_{t} (β), \frac{\partial Q}{\partial σ_{u}^{2}} = 0 \Leftrightarrow σ_{u}^{2} = {\bar{K}}_{t} .

We finally obtain the iterative scheme in Algorithm A below with w given by (10).

Algorithm A:
EM for the half-normal SF model

Some remarks are due. All of the conditional expectations in the calculations above are available in closed form since the conditional distribution of u given ε is truncated normal (^{Kumbhakar & Lovell, 2003}16 KUMBHAKAR S & LOVELL C. 2003. Stochastic Frontier Analysis. Cambridge University Press., Chapter 3),

(u_{i} | ε_{i}) ~ N^{+} ({\tilde{μ}}_{i}, b^{2}) w i t h {\tilde{μ}}_{i} = \frac{- ε_{i} σ_{u}^{2}}{σ_{u}^{2} + σ_{ν}^{2}}, \frac{σ_{ν}^{2} σ_{u}^{2}}{σ_{u}^{2} + σ_{ν}^{2}} .

Therefore, the conditional moments in (6) are given by

M_{i} (θ) = {\tilde{μ}}_{i} + b δ_{i}

and

K_{i} (θ) = M_{i}^{2} (θ) + b^{2} [1 - δ_{i} (δ_{i} + {\tilde{μ}}_{i} / b)]

where $δ_{i} = \frac{φ (- {\tilde{μ}}_{i} / b)}{1 - Φ (- {\tilde{μ}}_{i} / b)}$ and Φ is the cumulative distribution function of the standard normal density φ.

Note that Algorithm A has remarkably simple and intuitive updates. The regression coefficients are updated in A1 by OLS with the working response (10), which is a version of $y + E (u | ε)$ , a sort of “correction” for the presence of u in the overall error $ε = ν - u$ . In A2 the noise variance is updated as a mean square based on the squared “residuals” ${(u + ε)}^{2}$ given by (7). The inefficiency variance is updated last by the mean of the Ki’s. Since Ki is the conditional mean of $u_{i}^{2}$ and its model mean is zero, the variance update A3 is simply a version of Var(u|ε).

3.2 Exponential Model

We now extend the EM algorithm to the other models starting with the exponential SF model. The notation developed in Section 2 and used in Section 3.1 remains useful. The inefficiencies are assumed to have an exponential density with mean 1/λ.

Similar to Section 3.1, using definitions (6)–(9), the Q function for the exponential model is given by

Q (θ | θ_{t}) = n [\log (\frac{λ}{σ_{ν}}) - λ {\bar{M}}_{t} - \frac{1}{2 σ_{ν}^{2}} {\bar{R}}_{t} (β)] .

As before, the EM iterations are obtained by setting $\frac{\partial Q}{\partial θ} = 0$ , resulting in Algorithm B below.

Algorithm B:
EM for the exponential SF model

Remarks similar to those for the half normal model can be made. Again, all expectations involved are known analytically since the distribution of (u|ε) is also truncated normal (^{Kumbhakar & Lovell, 2003}16 KUMBHAKAR S & LOVELL C. 2003. Stochastic Frontier Analysis. Cambridge University Press., Chapter 3),

(u_{i} | ε_{i}) ~ N^{+} ({\tilde{μ}}_{i}, σ_{ν}^{2}), {\tilde{μ}}_{i} = - (ε_{i} + λ σ_{ν}^{2}) .

Therefore, the conditional moments in (6) are given by

M_{i} (θ) = {\tilde{μ}}_{i} + σ_{ν} δ_{i}

and

K_{i} (θ) = M_{i}^{2} (θ) + σ_{ν}^{2} [1 - δ_{i} (δ_{i} + {\tilde{μ}}_{i} / σ_{ν})]

where $δ_{i} = \frac{φ (- {\tilde{μ}}_{i} / σ_{ν})}{1 - Φ (- {\tilde{μ}}_{i} / σ_{ν})} .$

The updates in Algorithm B mirror those of the half normal case. B1 and B2 are analogous to A1 and A2 (allowing, of course, for different expressions for Mi and Ki for each model). In B3, λ is updated by the inverse of the estimated mean of the inefficiencies, $M^{- 1}$ , which is the conditional version of E−1(u|ε).

3.3 Truncated Normal Model

We now present the EM scheme with the truncated normal model, $u_{i} ~ N^{+} (µ, σ_{u}^{2})$ , that is, for $u \geq 0$ ,

f_{u} (u) = \frac{\frac{1}{σ_{u}} φ (\frac{u - μ}{σ_{μ}})}{Φ (\frac{μ}{σ_{μ}})} .

The Q function is, after some simplifications using (6)–(9),

Q (θ | θ_{t}) = - n [\log (σ_{u} σ_{ν}) + \log (Φ (\frac{- μ}{σ_{u}})) + \frac{1}{2 σ_{u}^{2}} (K_{t} - 2 μ M_{t} + \frac{μ^{2}}{n}) + \frac{1}{2 σ_{ν}^{2}} R_{t} (β)] .

After setting ∂Q/∂ θ to zero, we arrive at the iterative scheme in Algorithm C below, using, once again, definitions (6)–(10).

Algorithm C:
EM for the truncated normal SF model

As before, all conditional moments involved in Algorithm C have simple closed forms since (^{Kumbhakar & Lovell, 2003}16 KUMBHAKAR S & LOVELL C. 2003. Stochastic Frontier Analysis. Cambridge University Press., Chapter 3),

(u_{i} | ε_{i}) ~ N^{+} ({\tilde{μ}}_{i}, b^{2}), {\tilde{μ}}_{i} = \frac{μ σ_{ν}^{2} - ε_{i} σ_{u}^{2}}{σ_{u}^{2} + σ_{ν}^{2}}, b^{2} = \frac{σ_{ν}^{2} σ_{u}^{2}}{σ_{u}^{2} + σ_{ν}^{2}},

which leads to

M_{i} (θ) = {\tilde{μ}}_{i} + b δ_{i},

and

K_{i} (θ) = M_{i}^{2} (θ) + b^{2} [1 - δ_{i} (δ_{i} + {\tilde{μ}}_{i} / b)],

where $δ_{i} = \frac{φ ({\tilde{μ}}_{i} / b)}{1 - Φ (- {\tilde{μ}}_{i} / b)} .$

The updates C1 and C2 are analogous to steps A1 and A2 in Algorithm A. Iteration C3 does not give an explicit update for µ since it requires the solution of a nonlinear equation with an upper bound which guarantees positivity in C4. The inefficiency variance is updated last by a term of the form $K - μ M$ which is a version of $E (u^{2} | ε) - E^{2} (u | ε)$ with µM in the place of M ².

3.4 Gamma Model

In the gamma model the inefficiencies are assumed to have a gamma density with mean α/λ. We will adopt the same definitions used so far in expressions (6)–(10) and in addition we define

L_{i} (θ) = E_{θ} (\log (u_{i}) | ε_{i})

(11)

and

L_{t} = \frac{1}{n} \sum_{i} L_{i} (θ_{t}) .

(12)

The Q function is thus (constant term omitted)

Q (θ | θ_{t}) = n [\log (\frac{λ^{α}}{Γ (α) σ_{ν}}) + (α - 1) L_{t} - λ M_{t} - \frac{1}{2 σ_{ν}^{2}} {\bar{R}}_{t} (β)] .

As opposed to the previous models, the conditional moments involved in the Q function for the gamma SF model do not have simple closed-form expressions. The conditional density f (u|ε) has an intractable normalizing constant (^{Kumbhakar & Lovell, 2003}16 KUMBHAKAR S & LOVELL C. 2003. Stochastic Frontier Analysis. Cambridge University Press., Chapter 3) which affects not only EM calculations but also the loglikelihood and its derivatives. More specifically, recalling (6), we have for the gamma model,

M_{i} (θ) = \frac{h_{i} (α)}{h_{i} (α - 1)} and K_{i} (θ) = \frac{h_{i} (α + 1)}{h_{i} (α - 1)}

where

h_{i} (r) = \int_{0}^{\infty} u^{r} φ_{i} (u) d u,

(13)

and φ_i (·) is the density function of a normal random variable with mean $- (ε_{i} + λ σ_{ν}^{2})$ and variance $σ_{ν}^{2}$ .

Evaluation of (13) was initially addressed by simulation-based methods (^{Greene, 2003}9 GREENE WH. 2003. Simulated Likelihood Estimation of the Normal-Gamma Stochastic Frontier Function. Journal of Productivity Analysis, 19(2): 179-190.; ^{Kozumi & Zhang, 2005}15 KOZUMI H & ZHANG X. 2005. Bayesian and non-bayesian analysis of gamma stochastic frontier models by Markov Chain Monte Carlo methods. Computational Statistics, 20(4): 575-593.) but, recently, we showed that quadrature and Fourier methods can be used with higher accuracy (^{Andrade & Souza, 2018}2 ANDRADE BB & SOUZA GS. 2018. Likelihood computation in the normal-gamma stochastic frontier model. Computational Statistics, 33(2): 967-982.) and this is adopted in the implementations of Section Similarly, $L_{i} = \int_{0}^{\infty} \log (u) φ_{i} (u) d u$ may be computed via numerical quadrature.

Setting ∂Q/∂θ to zero yields Algorithm D below where we make use of the digamma function $ψ (a) = Γ^{'} (a) / Γ (a)$ .

Algorithm D:
EM for the gamma SF model

Similar to the truncated normal model, Algorithm D requires the solution of a nonlinear equation in step D3. All other updates are straightforward. We also note that Algorithms A–D have analogous updates for β and $σ_{ν}^{2}$ . The differences occur only in the updates of the parameters related to the distribution of u, steps 3 and 4 in Algorithms A–D.

3.5 Standard Errors and Hybrid EM

EM does not directly yield standard errors. In addition, EM can be extremely slow to converge. Acceleration schemes have been proposed alongside ways of estimating standard errors (^{Jamshidian & Jennrich, 1997}14 JAMSHIDIAN M & JENNRICH RI. 1997. Acceleration of the EM Algorithm by Using Quasi-Newton Methods. Journal of the Royal Statistical Society. Series B, 59(3): 569- 587.). We will call these methods hybrid EM since in order to obtain standard errors, the likelihood (not necessary for the updates) must be provided.

In the illustrations that follow, we have used different EM implementations:

I. Straightforward implementation of the EM updates (Algorithms A, B and C) with the variance of ML estimates, $\hat{θ}$ , obtained with the outer product of gradients (OPG),

\hat{Var} (\hat{θ}) = {({\hat{G}}^{'} \hat{G})}^{- 1},

(14)

where $\hat{G}$ is the $n \times p$ matrix of derivatives of the loglikelihood, ${\hat{G}}_{i j} = [\partial l_{i} / \partial θ_{j}] |_{\hat{θ}}$ . Here, derivatives were computed via numerical differentiation of the loglikelihood (2). Explicit computation of $\partial l_{i} / \partial θ_{j}$ is possible for the half-normal, exponential and truncated normal SF models and may also be used to obtain $\hat{G}$ instead of numerical differentiation.
II. Hybridization schemes:
(II.1) Simple follow-up of EM by any Newton-like method: Running EM in the beginning then switching to a quasi-Newton method is a well established strategy (^{McLachlan & Krishnan, 2007}18 MCLACHLAN G & KRISHNAN T. 2007. The EM Algorithm and Extensions. Wiley.). The idea is to take advantage of the global convergence properties of EM at first and then bypass EM’s slow convergence by switching to a Newton-like method such as DFP or BFGS, which features fast, superlinear, local convergence (^{Dennis & Moré, 1974}7 DENNIS JE & MORÉ JJ. 1974. A characterization of superlinear convergence and its application to quasi-Newton methods. Math. Comp., 28: 549-560.). Such a hybrid algorithm is possible as long as the the log-likelihood (2) is available (at least numerically). In addition, regardless of whether the chosen Newton method uses user-supplied derivatives or some built-in numerical derivative routine, a Hessian matrix evaluated at the optimum, $\hat{H}$ , will be generated at the final iteration which is then used for variance estimation,

\hat{V a r} (\hat{θ}) = - {\hat{H}}^{- 1} .

(15)

(II.2) Squared extrapolation method: Next we consider a hybridization scheme known as SQUAREM (^{Varadhan & Roland, 2008}22 VARADHAN R & ROLAND C. 2008. Simple and Globally Convergent Methods for Accelerating the Convergence of Any EM Algorithm. Scandinavian Journal of Statistics, 35(2): 335-353.). It belongs to the non-monotone (partially monotone) class of acceleration schemes (^{Varadhan & Roland, 2008}22 VARADHAN R & ROLAND C. 2008. Simple and Globally Convergent Methods for Accelerating the Convergence of Any EM Algorithm. Scandinavian Journal of Statistics, 35(2): 335-353.; ^{Zhou et al., 2011}24 ZHOU H, ALEXANDER D & LANGE K. 2011. A quasi-Newton acceleration for high-dimensional optimization algorithms. Statistics and Computing, 21(2): 261-273.) and along with the more elaborate scheme of ^{Zhou et al. (2011)}24 ZHOU H, ALEXANDER D & LANGE K. 2011. A quasi-Newton acceleration for high-dimensional optimization algorithms. Statistics and Computing, 21(2): 261-273. is considered the gold standard among acceleration schemes (^{Zhou et al., 2011}24 ZHOU H, ALEXANDER D & LANGE K. 2011. A quasi-Newton acceleration for high-dimensional optimization algorithms. Statistics and Computing, 21(2): 261-273.). The general form of this accelerated EM algorithm is given is pseudo code form below. Details about the calculation of the steplength are given in Varadhan and Roland. In the R implementation we used (^{Bobb & Varadhan, 2018}3 BOBB JF & VARADHAN R. 2018. turboEM: A Suite of Convergence Acceleration Schemes for EM, MM and Other Fixed-Point Algorithms. Available at: https://CRAN. R-project.org/package=turboEM. R package version 2018.1.
https://CRAN. R-project.org/package=turb... ), standard errors are obtained by a numerically computed hessian if the loglikelihood is provided.

Algorithm SQUAREM:
Pseudo code (^{Varadhan & Roland, 2008}22 VARADHAN R & ROLAND C. 2008. Simple and Globally Convergent Methods for Accelerating the Convergence of Any EM Algorithm. Scandinavian Journal of Statistics, 35(2): 335-353.)

(II.3) Quasi-newton acceleration of EM: the method developed by ^{Zhou et al. (2011}24 ZHOU H, ALEXANDER D & LANGE K. 2011. A quasi-Newton acceleration for high-dimensional optimization algorithms. Statistics and Computing, 21(2): 261-273.) surpasses previous quasi-Newton acceleration methods in that it does not handle the observed information matrix or the Hessian of the algorithm map (EMupdate), a property shared with the above squared extrapolation method. The general form of the update is given by

θ_{t + 1} = EMupdate (θ_{t}) - ∆ [θ_{t} - EMupdate (θ_{t})] .

The matrix Δ is related to a low-rank approximation of the differential of the EM update. Its construction is detailed in ^{Zhou et al. (2011}24 ZHOU H, ALEXANDER D & LANGE K. 2011. A quasi-Newton acceleration for high-dimensional optimization algorithms. Statistics and Computing, 21(2): 261-273.).
In both SQUAREM and quasi-Newton acceleration the Hessian of the loglikelihood evaluated at the estimate is obtained numerically thus providing standard errors, according to (15). Note that the likelihood is not needed in the updates, just for Hessian computation.

4 ILLUSTRATIONS

In the following illustrations we have monitored the relative change in estimates,

RC = \frac{||θ_{t} - θ_{t - 1}||}{ζ + ||θ_{t - 1}||},

with stopping rule $RC < ζ$ . We have set $ζ = 10^{- 6}$ . The hybrid scheme (II.1) runs EM until RC reaches a less stringent level (say $ζ = 10^{- 3}$ ), followed by a built-in BFGS routine (which computes the necessary derivatives of the loglikelihood numerically).

Starting values for all schemes are obtained by adapting ordinary least squares (OLS) and the results for half normal and exponential models were checked against those provided by Stata14’s rfrontier ^{StataCorp. (2015)}21 STATACORP. 2015. frontier. Stata Statistical Software : Release 14. College Station, TX. Available at: http://www.stata.com/.
http://www.stata.com/... which uses a Newton routine.

Versioned code in R language for the applications below are available from the authors’ website as illustrated in the ^Appendix APPENDIX: R TOOLS .

4.1 Agricultural Production Frontier

As a first illustration, we use data from the Brazilian Census of Agriculture (2006) (^{IBGE, 2006}13 IBGE. 2006. Census of Agriculture 2006. Available at: Available at: http://www.ibge.gov.br/english . Accessed: 2016-04-18.
http://www.ibge.gov.br/english... ) including 219 municipalities in the Brazilian Midwest region to fit the half normal and exponential models using EM. The specific technology considered is Cobb-Douglas,

\log (Y_{i}) = β_{0} + β_{1} \log (K_{i}) + β_{2} \log (L_{i}) + ν_{i} - u_{i},

where Y is gross output of farms, ranches and agricultural enterprises in the municipality and K and L are aggregate measures of capital and labor, respectively.

For the half normal model, a combination of method of moments and OLS, known as modified OLS (MOLS), is available (with closed-form expressions) (^{Kumbhakar & Lovell, 2003}16 KUMBHAKAR S & LOVELL C. 2003. Stochastic Frontier Analysis. Cambridge University Press.) and the parameter estimates could be used as (excellent) starting values for likelihood maximization.

A rougher starting point is simply $β_{0} = OLS$ estimates and $σ_{ν, 0}^{2} = σ_{u, 0}^{2}$ = half of the residual variance from OLS. As expected, with either of these sets of starting values we obtained convergence with different generic optimization routines available in R and Stata14. EM converged in about 370 iterations when started from MOLS and it took close to 600 iterations when started from the rougher point. The acceleration scheme SQUAREM (II.2) used close to 20 iterations with either starting point. Computing times for all scenarios considered were less than a second on a personal computer running Linux OS (i5-6500 CPU 3.20GHz) with no parallelization. Regardless of method and software, results for point estimates agreed within reasonable numerical accuracy. However, standard errors differ slightly for the technology coefficients and more so for the parameters of the random components. The results are presented in Table 1.

Thumbnail

Table 1
– Estimation results from half normal SF model with 219 agricultural DMUs and different routines.

l (\hat{θ}) = - 170.45

.

For the exponential model, starting our routines at $β_{0} = OLS$ estimates and $σ_{ν, 0}^{2} = 1 / λ_{0}^{2}$ = half of the residual variance from OLS, the EM algorithm took far less iterations (around 100) to reach the desired relative precision with SQUAREM requiring only 10 iterations. As opposed to the half normal model, standard errors for the parameters of the random components were less variable across methods. In passing, we note that the likelihood-ratio test of $σ_{u} = 0$ rejected the null (as opposed to the half normal case). The results are presented in Table 2.

Thumbnail

Table 2
– Estimation results from exponential SF model with 219 agricultural DMUs and different routines.

l (\hat{θ}) = - 167.94

.

4.2 Electricity Cost Function

Here we use the EM algorithm to fit the gamma model to electricity data from 1970 ( $n = 158$ firms and holding companies) which has appeared in ^{Greene (2003}9 GREENE WH. 2003. Simulated Likelihood Estimation of the Normal-Gamma Stochastic Frontier Function. Journal of Productivity Analysis, 19(2): 179-190.) and other places. The cost function is Cobb-Douglas with a quadratic term in log output (log Q). The other variables involved are total cost of generation (C) and the unit prices of fuel (F), capital (K) and labor (L),

\log {(C / F)}_{i} = β_{0} + β_{1} \log {(L / F)}_{i} + β_{2} \log {(K / F)}_{i} + β_{3} \log Q_{i} + β_{4} \log^{2} Q_{i} + ν_{i} + u_{i} .

We note that software regularly used by us (R, Stata and SAS) do not fit the SF gamma model, most likely due to issues in integration and identifiability. LIMDEP (^{Greene, 2012}11 GREENE WH. 2012. LIMDEP 10. Available at: http://www.limdep.com/. Econometric Software, Inc.
http://www.limdep.com/... ) uses a simulated likelihood method ^{Greene (2003)}9 GREENE WH. 2003. Simulated Likelihood Estimation of the Normal-Gamma Stochastic Frontier Function. Journal of Productivity Analysis, 19(2): 179-190. to estimate the gamma SF model. Our use of the gamma model (Algorithm D) has been operationalized by numerical integration which is the focus of (^{Andrade & Souza, 2018}2 ANDRADE BB & SOUZA GS. 2018. Likelihood computation in the normal-gamma stochastic frontier model. Computational Statistics, 33(2): 967-982.). The estimates of the gamma SF model produced by EM (Algorithm D with the appropriate sign changes due to this being a cost function and therefore $ε = ν + u$ ) are shown in Table 3.

Thumbnail

Table 3
– Estimation results from gamma SF model with 158 DMUs.

The starting point used was the OLS estimates for the β’s, $α_{0} = 1$ (exponential model) and $σ_{ν, 0}^{2}$ and $1 / λ_{0}^{2}$ = half of the residual variance from OLS.

We note that different choices of initial values have led to other stationary points with lower likelihood. In fact, several local maxima may be the reason for the identifiability questions raised by ^{Ritter & Simar (1997}19 RITTER C & SIMAR L. 1997. Pitfalls of Normal-Gamma Stochastic Frontier Models. Journal of Productivity Analysis, 8(2): 167-182.) with respect to the gamma SF model. Simulated maximum likelihood (SML) estimates reported by ^{Greene (2003}9 GREENE WH. 2003. Simulated Likelihood Estimation of the Normal-Gamma Stochastic Frontier Function. Journal of Productivity Analysis, 19(2): 179-190.) are shown in the last line of Table 3. The SML estimates suggest an exponential model and have lower likelihood than that achieved by the EM estimates. We have not succeeded in estimating standard errors by numerical computation of the hessian due to issues involving numerical integration. The OPG method yields the errors reported in Table 3; the standard error associated with $\hat{λ}$ is extremely low relative to the estimate and could be the result of numerical inaccuracy.

Running time differences were irrelevant in the model fits of the previous section. Here, we must note that, because numerical integration and root finding are involved in the EM steps, the computing time becomes an issue. EM took approximately 2 minutes to converge and the gain from using SQUAREM is noticeable, with running time of 36 seconds.

5 FINAL REMARKS AND PRACTICAL CONSIDERATIONS

The hierarchical structure of SF models motivated our study of the EM algorithm. We have given the necessary information to implement the EM algorithm in four standard SF models. EM calculations resulted in simple algorithms with closed-form expressions for the half normal and exponential models and more elaborate versions for the truncated normal and gamma models.

This paper does not bring any comparative study of optimization methods for SF models. In general, the user’s needs, programming environment, programming ability and objectives, access to built-in routines (for optimization and nonlinear equation solving) and data characteristics will dictate the methods and criteria for useful and specific comparisons. We offer the following general remarks. Since the loglikelihood is available for the standard models (and at least numerically for the gamma model), several general-purpose optimization methods may be used to compute ML estimates. Such methods may require up to second derivatives or be derivative-free, including trust regions, Newton, quasi-Newton, conjugate gradient, Nelder-Mead, pattern search, etc. Specific implementations may require user-supplied derivatives or be automated to compute them numerically. EM is another tool in this toolbox. It does not require derivatives in the estimation but we have used numerically calculated first and second derivatives in the computation of standard errors by means of the hessian and the outer product of gradients estimate (14). It is well known that EM is relatively slow and Newton is very fast when close to the solution (^{Dennis & Moré, 1974}7 DENNIS JE & MORÉ JJ. 1974. A characterization of superlinear convergence and its application to quasi-Newton methods. Math. Comp., 28: 549-560.). Differences in running time may be negligible depending on the data size and model used. Yet, small differences may matter if one is performing large simulations. The ascent property (Section 2) gives EM more room for choosing initial values than most methods. Implementing Algorithms A–D in most languages should be straightforward whereas general-purpose methods are much more elaborate. Yet, built-in versions of quasi-Newton are readily available in systems like R and Matlab. EM’s computer storage and sensitivity to data sparsity have not been studied by us relative to any other method. Since units may vary widely for different kinds of inputs (hours, tons, dollars, counts) we suggest rescaling x for better numerical stability.

Acceleration schemes for EM have been discussed and the examples showed that SQUAREM accelaration substantially reduces the computing time. We would certainly consider them in implementations of EM for more complex models with covariates and heterocedasticity. The above considerations, as usual, call for further research, especially with regards to the gamma SF model.

References

¹
AIGNER D & CHU DS. 1968. On estimating the industry production function. American Economic Review, 58: 826-839.
²
ANDRADE BB & SOUZA GS. 2018. Likelihood computation in the normal-gamma stochastic frontier model. Computational Statistics, 33(2): 967-982.
³
BOBB JF & VARADHAN R. 2018. turboEM: A Suite of Convergence Acceleration Schemes for EM, MM and Other Fixed-Point Algorithms Available at: https://CRAN. R-project.org/package=turboEM R package version 2018.1.
» https://CRAN. R-project.org/package=turboEM
⁴
BOGETOFT P & OTTO L. 2015. Benchmarking with DEA and SFA R package version 0.26.
⁵
COELLI T & HENNINGSEN A. 2013. frontier: Stochastic Frontier Analysis Available at: http://CRAN.R-Project.org/package=frontier R package version 1.1-0.
» http://CRAN.R-Project.org/package=frontier
⁶
DEMPSTER AP, LAIRD NM & RUBIN DB. 1977. Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, Series B, 39(1): 1-38.
⁷
DENNIS JE & MORÉ JJ. 1974. A characterization of superlinear convergence and its application to quasi-Newton methods. Math. Comp., 28: 549-560.
⁸
GREENE WH. 1982. Maximum likelihood estimation of stochastic frontier production models. Journal of Econometrics, 18(2): 285-289.
⁹
GREENE WH. 2003. Simulated Likelihood Estimation of the Normal-Gamma Stochastic Frontier Function. Journal of Productivity Analysis, 19(2): 179-190.
¹⁰
GREENE WH. 2008. The Econometric Approach to Efficiency Analysis. In: FRIED HO, LOVELL CAK & SCHMIDT SS (Eds.), The Measurement of Productive Efficiency and Productivity Growth chap. 2. Oxford University Press.
¹¹
GREENE WH. 2012. LIMDEP 10 Available at: http://www.limdep.com/ Econometric Software, Inc.
» http://www.limdep.com/
¹²
HUANG CJ. 1984. Estimation of Stochastic Frontier Production Function and Technical Inefficiency via the EM Algorithm. Southern Economic Journal, 50(3): 847-856.
¹³
IBGE. 2006. Census of Agriculture 2006 Available at: Available at: http://www.ibge.gov.br/english Accessed: 2016-04-18.
» http://www.ibge.gov.br/english
¹⁴
JAMSHIDIAN M & JENNRICH RI. 1997. Acceleration of the EM Algorithm by Using Quasi-Newton Methods. Journal of the Royal Statistical Society. Series B, 59(3): 569- 587.
¹⁵
KOZUMI H & ZHANG X. 2005. Bayesian and non-bayesian analysis of gamma stochastic frontier models by Markov Chain Monte Carlo methods. Computational Statistics, 20(4): 575-593.
¹⁶
KUMBHAKAR S & LOVELL C. 2003. Stochastic Frontier Analysis Cambridge University Press.
¹⁷
LEE LF. 1983. On maximum likelihood estimation of stochastic frontier production models. Journal of Econometrics, 23(2): 269-274.
¹⁸
MCLACHLAN G & KRISHNAN T. 2007. The EM Algorithm and Extensions Wiley.
¹⁹
RITTER C & SIMAR L. 1997. Pitfalls of Normal-Gamma Stochastic Frontier Models. Journal of Productivity Analysis, 8(2): 167-182.
²⁰
SAS INSTITUTE INC. 2003. proc qlim. SAS/STAT Software , Version 9.1 Cary, NC. Available at: http://www.sas.com/
» http://www.sas.com/
²¹
STATACORP. 2015. frontier. Stata Statistical Software : Release 14 College Station, TX. Available at: http://www.stata.com/
» http://www.stata.com/
²²
VARADHAN R & ROLAND C. 2008. Simple and Globally Convergent Methods for Accelerating the Convergence of Any EM Algorithm. Scandinavian Journal of Statistics, 35(2): 335-353.
²³
WU CFJ. 1983. On the Convergence Properties of the EM Algorithm. Ann. Statist., 11(1): 95-103.
²⁴
ZHOU H, ALEXANDER D & LANGE K. 2011. A quasi-Newton acceleration for high-dimensional optimization algorithms. Statistics and Computing, 21(2): 261-273.

APPENDIX: R TOOLS

Publication Dates

Publication in this collection
2 Dec 2019
Date of issue
Sep-Dec 2019

History

Received
28 Nov 2018
Accepted
11 Sept 2019

This is an open-access article distributed under the terms of the Creative Commons Attribution License

[1] ¹
AIGNER D & CHU DS. 1968. On estimating the industry production function. American Economic Review, 58: 826-839.

[2] ²
ANDRADE BB & SOUZA GS. 2018. Likelihood computation in the normal-gamma stochastic frontier model. Computational Statistics, 33(2): 967-982.

[3] ³
BOBB JF & VARADHAN R. 2018. turboEM: A Suite of Convergence Acceleration Schemes for EM, MM and Other Fixed-Point Algorithms Available at: https://CRAN. R-project.org/package=turboEM R package version 2018.1.
» https://CRAN. R-project.org/package=turboEM

[4] ⁴
BOGETOFT P & OTTO L. 2015. Benchmarking with DEA and SFA R package version 0.26.

[5] ⁵
COELLI T & HENNINGSEN A. 2013. frontier: Stochastic Frontier Analysis Available at: http://CRAN.R-Project.org/package=frontier R package version 1.1-0.
» http://CRAN.R-Project.org/package=frontier

[6] ⁶
DEMPSTER AP, LAIRD NM & RUBIN DB. 1977. Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, Series B, 39(1): 1-38.

[7] ⁷
DENNIS JE & MORÉ JJ. 1974. A characterization of superlinear convergence and its application to quasi-Newton methods. Math. Comp., 28: 549-560.

[8] ⁸
GREENE WH. 1982. Maximum likelihood estimation of stochastic frontier production models. Journal of Econometrics, 18(2): 285-289.

[9] ⁹
GREENE WH. 2003. Simulated Likelihood Estimation of the Normal-Gamma Stochastic Frontier Function. Journal of Productivity Analysis, 19(2): 179-190.

[10] ¹⁰
GREENE WH. 2008. The Econometric Approach to Efficiency Analysis. In: FRIED HO, LOVELL CAK & SCHMIDT SS (Eds.), The Measurement of Productive Efficiency and Productivity Growth chap. 2. Oxford University Press.

[11] ¹¹
GREENE WH. 2012. LIMDEP 10 Available at: http://www.limdep.com/ Econometric Software, Inc.
» http://www.limdep.com/

[12] ¹²
HUANG CJ. 1984. Estimation of Stochastic Frontier Production Function and Technical Inefficiency via the EM Algorithm. Southern Economic Journal, 50(3): 847-856.

[13] ¹³
IBGE. 2006. Census of Agriculture 2006 Available at: Available at: http://www.ibge.gov.br/english Accessed: 2016-04-18.
» http://www.ibge.gov.br/english

[14] ¹⁴
JAMSHIDIAN M & JENNRICH RI. 1997. Acceleration of the EM Algorithm by Using Quasi-Newton Methods. Journal of the Royal Statistical Society. Series B, 59(3): 569- 587.

[15] ¹⁵
KOZUMI H & ZHANG X. 2005. Bayesian and non-bayesian analysis of gamma stochastic frontier models by Markov Chain Monte Carlo methods. Computational Statistics, 20(4): 575-593.

[16] ¹⁶
KUMBHAKAR S & LOVELL C. 2003. Stochastic Frontier Analysis Cambridge University Press.

[17] ¹⁷
LEE LF. 1983. On maximum likelihood estimation of stochastic frontier production models. Journal of Econometrics, 23(2): 269-274.

[18] ¹⁸
MCLACHLAN G & KRISHNAN T. 2007. The EM Algorithm and Extensions Wiley.

[19] ¹⁹
RITTER C & SIMAR L. 1997. Pitfalls of Normal-Gamma Stochastic Frontier Models. Journal of Productivity Analysis, 8(2): 167-182.

[20] ²⁰
SAS INSTITUTE INC. 2003. proc qlim. SAS/STAT Software , Version 9.1 Cary, NC. Available at: http://www.sas.com/
» http://www.sas.com/

[21] ²¹
STATACORP. 2015. frontier. Stata Statistical Software : Release 14 College Station, TX. Available at: http://www.stata.com/
» http://www.stata.com/

[22] ²²
VARADHAN R & ROLAND C. 2008. Simple and Globally Convergent Methods for Accelerating the Convergence of Any EM Algorithm. Scandinavian Journal of Statistics, 35(2): 335-353.

[23] ²³
WU CFJ. 1983. On the Convergence Properties of the EM Algorithm. Ann. Statist., 11(1): 95-103.

[24] ²⁴
ZHOU H, ALEXANDER D & LANGE K. 2011. A quasi-Newton acceleration for high-dimensional optimization algorithms. Statistics and Computing, 21(2): 261-273.

	${\hat{β}}_{0}$	${\hat{β}}_{1}$	${\hat{β}}_{2}$	${\hat{σ}}_{u}^{2}$	${\hat{σ}}_{v}^{2}$
MLE	1.18	0.73	0.24	0.15	0.22
Standard Errors
OPG (see eq. (14))	(.59)	(.03)	(.03)	(.10)	(.04)
Hybrid EM (see II.1)	(.60)	(.06)	(.05)	(.12)	(.05)
SQUAREM (see II.2)	(.56)	(.04)	(.03)	(.14)	(.05)
Stata14	(.60)	(.06)	(.05)	(.12)	(.05)

	${\hat{β}}_{0}$	${\hat{β}}_{1}$	${\hat{β}}_{2}$	${\hat{σ}}_{u}^{2}$ †	${\hat{σ}}_{v}^{2}$
MLE	1.59	0.70	0.26	0.07	0.20
Standard Errors
OPG (see eq. (14))	(.58)	(.04)	(.03)	(.02)	0.20	(.03)
Hybrid EM (see II.1)	(.60)	(.06)	(.05)	(.03)	(.03)
SQUAREM (see II.2)	(.58)	(.02)	(.03)	(.03)	(.03)
Stata14	(.60)	(.06)	(.05)	(.03)	(.03)

	${\hat{β}}_{0}$	${\hat{β}}_{1}$	${\hat{β}}_{2}$	${\hat{β}}_{3}$	${\hat{β}}_{4}$	${\hat{σ}}_{v}^{2}$	$\hat{α}$	$\hat{λ}$	$\hat{l}$
EM	-7.044	0.146	0.135	0.455	0.028	0.012	0.258	5.876	93.4
	(.22)	(.04)	(.04)	(.04)	(.002)	(.002)	(.15)	(.01)
SML	-7.034	0.145	0.138	0.445	0.028	0.011	1.082	9.577	91.6

Brasil

Brasil

THE EM ALGORITHM FOR STANDARD STOCHASTIC FRONTIER MODELS

ABSTRACT

1 INTRODUCTION

2 STOCHASTIC FRONTIER MODELS IN EM FORM

3 EM ALGORITHMS FOR THE STANDARD SF MODELS

3.1 Half Normal Model

3.2 Exponential Model

3.3 Truncated Normal Model

3.4 Gamma Model

3.5 Standard Errors and Hybrid EM

4 ILLUSTRATIONS

4.1 Agricultural Production Frontier

4.2 Electricity Cost Function

5 FINAL REMARKS AND PRACTICAL CONSIDERATIONS

References

APPENDIX: R TOOLS

Publication Dates

History