On-line version ISSN 1807-0302

Comput. Appl. Math. vol.31 no.2 São Carlos  2012

http://dx.doi.org/10.1590/S1807-03022012000200006

Integrating Ridge-type regularization in fuzzy nonlinear regression

R. FarnooshI,*; J. GhasemianI; O. Solaymani FardII

ISchool of Mathematics, Iran University of Science and Technology, Narmak, Tehran 16844, Iran. E-mails: rfarnoosh@iust.ac.ir / jghasemian@iust.ac.ir
IISchool of Mathematics and Computer Science, Damghan University, Damghan, Iran. E-mail: osfard@du.ac.ir

ABSTRACT

In this paper, we deal with the ridge-type estimator for fuzzy nonlinear regression models using fuzzy numbers and Gaussian basis functions. Shrinkage regularization methods are used in linear and nonlinear regression models to yield consistent estimators. Here, we propose a weighted ridge penalty on a fuzzy nonlinear regression model, then select the number of basis functions and smoothing parameter. In order to select tuning parameters in the regularization method, we use the Hausdorff distance for fuzzy numbers which was first suggested by Dubois and Prade [8]. The cross-validation procedure for selecting the optimal value of the smoothing parameter and the number of basis functions are fuzzified to fit the presented model. The simulation results show that our fuzzy nonlinear modelling performs well in various situations.

Mathematical subject classification: Primary: 62J86; Secondary: 62J07.

Key words: fuzzy nonlinear regression, regularization method, Monte Carlo method, Gaussian basis expansion.

1 Introduction

Finding the relationships if any, existing in a set of variables when at least one is random, is known as an important task in statistics. On one hand, regression analysis, especially nonlinear regression, is an essential tool to analyze data. Many researchers use nonlinear regression more than any other statistical tool. Nonlinear models have been applied to a wide range of situations, even to finite populations. These models tend to be used either when they are suggested by theoretical considerations or to build known nonlinear behavior into a model (see for example [22, 27, 30]).

On the other hand, in most statistical practices, particularly in biology, business or government, the underlying processes are generally complex and not well understood. Also, a model obtained as the solution of a differential equation generating from engineering, chemistry, or physics is usually nonlinear. Much applied work using linear models represents a distortion of the underlying subject matter. This means that we have no idea about the form of the relationship (see [12, 27]). One of the major advantages in using nonlinear regression is the wide range of functions that can be fit.

In many practical situations, it may be unrealistic to predetermine a fuzzy parametric regression especially for a large data set with a complicated underlying variation trend. In this respect, some other approaches have been developed to deal with the fuzzy regression problems without predefining a specific form of the underlying regression relationship or with a nonlinear form of regression relationship.

The multicolinearity [21] among the independent variables leads to increasing error in estimating of regression coefficients. Shrinkage estimators have been developed for many situations including the linear and nonlinear regression models. To yield consistent estimators, the nonlinear models, instrumental variables estimation seems necessary. For this reason, the use of shrinkage estimation to nonlinear settings is recommended. The Poisson regression model [25] and the probit model [1] as well as to the Box-Cox transformation [18] are cases of such models. These papers provide theoretical results indicating the superior performance in terms of risk of certain shrinkage estimators over unrestricted estimation [10].

One of the most commonly used regularization methods is ridge regularization, which was first used in the context of least square regression in [17]. We consider applying this regularization method to nonlinear regression models involving basis expansions, which are useful tools to analyze the data with complex structures. Expressing regression functions as a linear combination of known nonlinear functions called basis functions is the main purpose in basis expansions. However, natural cubic splines, B-splines, Fourier series and radial basis functions are widely used as basis functions [24]. In the current study, we use Gaussian basis functions [4] because they can be expressed in a simple form and can be easily implemented. When Gaussian basis functions are constructed for unequally spaced data, narrow basis functions with very small dispersions might be constructed, which can unsmoothing or unstable results if we employ these bases. Therefore, to overcome this problem, a shrinkage estimation, which allows to avoid these effects on the narrow basis functions by estimating their coefficients towards exactly zero can be used [29].

The structure of paper is as follows: Section 2 explains basic concepts offuzzy numbers which we used in this paper. In Section 3, multivariate fuzzynonlinear regression model based on ridge estimation will be presented. In Section 4, we discuss the selection of number basis functions and the smoothing parameter. Finally, in the last two sections, some numerical examples and comments are given.

2 Preliminaries

In this section, we read some definitions and introduce the notations which will be used throughout the paper.

Definition 2.1. ([3]) Let X be a nonempty set. A fuzzy set ũ in X is characterized by its membership function ũ : X → [0, 1]. For each x X, ũ(x) is interpreted as the degree of membership of an element x in the fuzzy set ũ.

Let us denote by RF the class of fuzzy subsets of the real axis R (i.e. ũ : R → [0, 1]) satisfying the following properties:

(i) ũ is normal, i.e., there exists s0 R such that ũ(s0 ) = 1,
(ii)
ũ is a convex fuzzy set, i.e.,

(iii) ũ is upper semi-continuous on R,
(iv)
cl {s R | ũ(s) > 0} is compact, where cl denotes the closure of a subset.

RF is called the space of fuzzy numbers, and obviously R RF.

For 0 < α < 1 denote [ũ]α = {s R | ũ(s) > α} and [ũ]0 = cl{s R | ũ(s) > 0} . It is clear that the α-level set of a fuzzy number is a closed and bounded interval [uα, α], where uα denotes the left-hand endpoint of [ũ]α and α denotes the right-hand endpoint of [ũ]α.

Another definition for a fuzzy number is as follows.

Definition 2.2. ([3]) A fuzzy number ũ in parametric form is a pair (u, ) of functions u(α), and (α), 0 < α < 1, which satisfies the following requirements:

(i) u(α) is a bounded non-decreasing left continuous function in (0, 1], and right continuous at 0,
(ii) (
α) is a bounded non-increasing left continuous function in (0, 1], and right continuous at 0,
(iii) u(
α) < (α), 0 < α < 1.

A crisp number a is simply represented by u(α) = (α) = a, 0 < α < 1. We recall that for a < b < c, which a, b, c R, the triangular fuzzy number ũ = (a, b, c) determined by a, b, c is given such that u(α) = a + (b - c)α and (α) = c - (c - b)α are the endpoints of the α-level sets, for all α [0, 1].

For arbitrary ũ = (u(α), (α)), = (v(α), (α)), we define the following

(i) (ũ )(α) = (u(α) + v(α), (α) + (α)), α [0, 1],
(ii) (
ũ - )(α) = (u(α) - (α), (α) - v(α)), α [0, 1],
(iii) (k
ũ)(α) =

where k is a scalar. Moreover, when k = -1, we have k ũ = -ũ.

The Hausdorff distance between fuzzy numbers given by D : RF × RFR+ {0},

where, ũ = (u(α), (α)), = (v(α), (α)) R.

Note that (RF, D) is a complete metric space (see [3, 11, 13]) and has the following properties,

(i) D(ũ , ) = D(ũ, ), ũ,, RF,
(ii) D(k
ũ, k ) = |k| D(ũ, ), k R, ũ, RF,
(iii) D(
ũ , ) < D(ũ, ) + D(, ), ũ, , , RF.

Definition 2.3. ([3]) A mapping f : RF → RF is called a fuzzy process. Therefore, its α-level set can be written as follows,

Definition 2.4. ([11]) A mapping f : RF → RF is called continuous at point 0 RF provided for any fixed α [0, 1] and arbitrary ε > 0, there exists δ(ε, α) such that D([f()]α, [f(0)]α) < ε, whenever D([]α, [0]α) < δ(ε, α) for all RF.

3 Multivariate Fuzzy Nonlinear Regression Model Based on Ridge Estimation

A fuzzy nonlinear regression model with multivariate fuzzy input and output is considered in this section and, a fitting procedure is proposed for this model.

Suppose that we have n independent observations (i, i); i = 1, 2,..., n. Here, a fuzzy nonlinear regression model is considered as follows,

In this model, i = (i1,..., ip) (RF)p; i = 1, 2,..., n, are vectors of p-dimensional independent variables(inputs) and i RF is univariate output. The function m(.), a mapping from (RF)p to RF, is an unknown fuzzy smooth function. Moreover, [i RF for which [= (εi(α), i(α)), and εi(α) and i(α)) are independently, normally distributed with mean zero and variance σ2(α). Regarding Definition 2.3, the fuzzy function m(.) can be expressed as

So, we can write the model = m() , as the follow form,

that equivalent to

We may assume any component of m(.) or (.) has a linear combination of basis functions ϕl(x, , α); 1 = 1,..., k in the form

and

where

are fuzzy unknown coefficient parameter vectors and for Xα = (x, , α), Gaussian basis functions are given by

where µl(α) and l(α) are p-dimensional vector determining the center of the basis functions, (α) and (α) are the width parameters and ||.|| is the Euclidian norm. Unknown parameters in the models (4) and (5) include the coefficient parameters b(α) = (b(α), (α)), and the centers µl(α), l(α) and width parameters (α), (α) required for Gaussian basis functions. These parameters are generally determined in a two-stage procedure in order to avoid local minimum and identification problems. In the first stage, the centers µl(α),l(α) and dispersion(α), (α) are determined by using the k-means clustering algorithm. The data set of observations of the explanatory variables (xl(α),..., xn(α)) and (1(α),..., n(α)) are divided respectively into k clusters (C1(α),..., Ck(α)) and (1(α),..., k(α)); centers µl(α), l(α) and dispersions(α), (α) are determined by

where nl is the number of observations included in the l-th cluster Cl or l. Replacing µl(α) = (µl(α), l(α)) and (α) = ((α), (α)) in (6) by (7) respectively, we obtain a set of 2k basis functions

For n independent observations (i, i); i = 1, 2,..., n, the fuzzy nonlinear regression model based on Gaussian basis functions

given in (6) is expressed as

where

and εi(α), i(α) are error terms. If the error terms are independently and normally distributed with mean 0 and variance (α) = ((α), (α)), for all α [0, 1] the nonlinear regression model (9) has a probability density function

Then the maximum likelihood estimates of the coefficient vectors b(α), (α) and (α) = ((α), (α)) are respectively given by

where

We estimate b(α) = (b(α), (α)) and h2(α) by the regularization method, because the maximum likelihood method often yields unstable estimates in fitting a nonlinear model to data with a complex structure. We consider maximizing the penalized log-likelihood function, instead of using the log-likelihood function,

where

and H(b(α)) = [H(b(α)), H((α))] is a penalty function for b(α) and λ (> 0) is a smoothing parameter that controls the smoothness of the fitted model. Based on the ridge penalty, in 12 norm, (H(b(α)), H((α))) given by

Then, the maximum penalized likelihood estimates of b(α) = (b(α),(α)) and h2(α) = (h2(α), 2(α)) are respectively given by

Il+1 is an (l + 1) dimensional identity matrix. Note that these estimators depend on each other. Therefore, we provide an initial value for the variance first, then and are updated until convergence. This estimation method is the maximum penalized likelihood with the quadratic form (see [9, 23]). The consistency and the rate of convergence of these estimators proved in ([5, 15, 20]).

4 Selection of the number of basis function and the smoothing parameter

The estimates that achieved in (14) by the regularization method depends upon the number of basis functions 2k and the value of the smoothing parameter λ. Appropriate determining these values is a crucial issue. The fuzzified cross-validation procedure based on the Hausdorff distance between fuzzy numbers, can be described as follows. Let

be the predicted fuzzy ridge nonlinear regression function at input computed by our method. Let

The C V (k, λ) quantity gives an overall measurement of the difference between the actual values of dependent variable and its estimation. However, because of the error term in model (1) the C V (k, λ) cannot efficiently reflect the closeness between the underlying fuzzy nonlinear regression function m() and its estimate. Therefore, we define a quantity for measuring the bias between the objective function and its estimate, which is

The B I A S (k, λ) makes sense for examining the performance of the different methods by simulation. Both C V (k, λ) and B I A S (k, λ) will be reported in our simulations to numerically evaluate the performance of the proposed method. Choose k0 and λ0 as the optimal values such that

In practice, we may compute for a series of values of k and λ to obtain k0 and λ0. A smoother regression function generally corresponds to a larger value of k while a more fluctuating regression function tends to select a smaller value of k.

5 Numerical results

In this section, we use Monte Carlo simulations to investigate the performance of the proposed model by computing C V (k, λ) and B I A S (k, λ). Here, two simulation examples are considered: a curve fitting and a surface fitting. The results are obtained for some values of sample size, k and λ.

Example 5.1. Curve fitting. In this simulation repeated random samples (i, i); i = 1, 2,..., n with n = 100 or 120 were generated from a true nonlinear regression model i = m(i) i where

The design points i are uniformly distributed in [0, 1]F RF and theerrors [= (εi(α), i(α)), which εi(α) and i(α) are independently, normally distributed. The mean of distribution is zero and the standard deviations are respectively τ = 0.1 Rm and = 0.1 with Rm, being the ranges of m() or () over i [0, 1]F.

C V (k, λ) and B I A S (k, λ) are used to numerically evaluate the performance of method and the related results are summarized in Table 1. In this example it is clear that the ridge estimation must be used.

Example 5.2. Surface fitting. Next, we applied the modeling strategy to surface data. We generated random samples (i, 1i, 2i); i = 1, 2,..., n with n = 100 or 120 from a true model i = m(1i, 2i)i where

The design points (1i, 2i) are uniformly distributed in [0, 1]F × [0, 1]F (RF)2. The errors [= (εi(α), i(α)), which εi(α) and i(α) are independently, normally distributed. The mean of distribution is zero and the standard deviations are respectively τ = 0.1 Rm and = 0.1 with Rm, being the ranges of m(1, 2) or (1, 2) over (1, 2) [0, 1]F × [0, 1]F.

Table 2 shows the result for this dataset. The results show that the ridge method still produces a quite satisfactory estimate of the fuzzy nonlinear regression in the case of two-dimensional input.

Example 5.3. Consider the below function from [3].

Let

The observed fuzzy outputs are

Table 3 shows the result for this dataset.

In this example LLS and KS, respectively stand for the local linear smoothing and kernel smoothing methods.

6 Conclusions

In this study, we dealt with estimating the ridge regularization in fuzzy nonlinear regression model with modelling the data with multivariate fuzzy input and output. The ridge estimation of fuzzy nonlinear regression models based on Gaussian basis function with the cross-validation procedure for selecting the optimal values of the smoothing parameter and the number of basis function was proposed. Some simulation experiments were conducted to assess the performance of the method. By computing the C V (k, λ) and B I A S (k, λ), we found that the proposed method performs quite well in reducing the error and producing a satisfactory estimate of the parameters of nonlinear regression function.

As demonstrated by the numerical experiments, the increasing of k, the number of basis function, increases the accuracy of model. In this case we can decrease the smoothing parameter λ. The small values of C V (k, λ) and B I A S (k, λ) indicates that the proposed method tends to produce estimates that are more close to their right values and gives less biased estimates of the real function.

Acknowledgements. The authors are grateful to the associate editor and two referees for their accurate reading and their helpful suggestions and remarks.

References

[1] L.C. Adkins and R.C. Hill, Risk characteristics of a Stein-like estimator of the probit regression model. Economics Letters, 30 (1989), 19-26.         [ Links ]

[2] D.M. Bates and D.G. Watts, Nonlinear Regression Analysis and its Applications. Wiley, New York (1988).         [ Links ]

[3] B. Bede and S.G. Gal, Generalizations of the differentiability of fuzzy umber valued functions with applications to fuzzy differential equation. Fuzzy Sets and Systems, 151 (2005), 581-599.         [ Links ]

[4] C.M. Bishop, Neural Networks for Pattern Recognition. Oxford University Press, Oxford (1995).         [ Links ]

[5] D.D. Cox and F. O'Sullivan, Asymptotic Analysis of Penalized Likelihood and Related Estimators. Ann. Statist., 18 (1990), 1676-1695.         [ Links ]

[6] P. Diamond, Fuzzy least squares. Information Sciences, 46 (1988), 141-157.         [ Links ]

[7] N.R. Draper and H. Smith, Applied Regression Analysis. Wiley, New York (1980).         [ Links ]

[8] D. Dubois and H. Prade, Fuzzy Numbers: An Overview, Analysis of Fuzzy Information. Mathematical Logic,CRC Press, Boca Raton, FL, 1 (1987), 3-39.         [ Links ]

[9] P.P.B Eggermont and V.N. Lariccia, Maximum Penalized Likelihood Estimation, Springer (2001).         [ Links ]

[10] S. Ejaz Ahmed and C.J. Nicol, An application of shrinkage estimation to the nonlinear regression model. Journal of Computational Statistics and Data Analysis (to appear).         [ Links ]

[11] M. Friedman, M. Ma and A. Kandel, Numerical solutions of fuzzy differential and integral equations. Fuzzy Sets and Systems, 106 (1999), 35-48.         [ Links ]

[12] A.R. Gallant, Nonlinear Statistical Models. Wiley, New York (1987).         [ Links ]

[13] R. Goetschel and W. Voxman, Topological properties of fuzzy number. Fuzzy Sets and Systems, 10 (1983), 87-99.         [ Links ]

[14] R. Goetschel and W. Voxman, Elementary fuzzy calculus, Fuzzy sets and Systems, 18 (1986), 31-43.         [ Links ]

[15] C. Gu and C. Qiu, Penalized Likelihood regression: A Simple Asymptotic Analysis. Statistica Sinica, 4 (1994), 297-304.         [ Links ]

[16] T.J. Hastie and R.J. Tibshirani, Generalized additive models, Chapman & Hall, London (1990).         [ Links ]

[17] A.E. Hoerl and R.W. Kennard, Ridge regression: biased estimates for nonorthogonal problems. Technometrics, 12 (1970), 55-67.         [ Links ]

[18] M. Kim and R.C. Hill, Shrinkage estimation in nonlinear models. The Box-Cox transformation. Journal of Econometrics, 66 (1995), 1-33.         [ Links ]

[19] B. Kim and R.R. Bishu, Evaluation of fuzzy linear regression models by comparing membership functions. Fuzzy Sets and Systems, 100 (1998), 343-352.         [ Links ]

[20] G.A. Latham and S. Yu, N-Stage Splitting for Maximum Penalized Likelihood Estimation with Gaussian Data and Stationary Linear Iterative Methods. J. Statist. Comput. and Simul., 62 (1999), 375-393.         [ Links ]

[21] R.X. Liu, J. Kuang, Q. Gong and X.L. Hou, Principal component regression analysis with SPSS. Computer Methods and Programs in Biomedicine, 71 (2003), 141-147.         [ Links ]

[22] H. Motulsky and A. Christopoulos, Fitting Modeles to Biological Data using Linear and Nonlinear Regression. GraphPad Software, Inc., San Diego (2003).         [ Links ]

[23] T. Poggio and F. Girosi, Networks for approximation and learning. Proc. IEEE, 78 (1990), 1484-1487.         [ Links ]

[24] D. Ruppert, M.P. Wand and R.J. Carrol, Semiparametric Regression, Cambridge University Press (2003).         [ Links ]

[25] S.K. Sapra, Pre-test estimation in Poisson regression model. Applied Economics Letters, 10 (2003), 541-543.         [ Links ]

[26] C. Saunders, A. Gammerman and V. Vork, Ridge regression learning algorithm in dual variable. Proceedings of the 15th International Conference on Machine Learning, (1998), 515-521.         [ Links ]

[27] G.A.F. Seber and C.J. Wild, Nonlinear Regression. Wiley, New York (1989).         [ Links ]

[28] H. Tanaka and H. Ishibuchi, Identification of possibilistic linear systems by quadratic membership functions of fuzzy parameters. Fuzzy Sets and Systems, 41 (1991), 145-160.         [ Links ]

[29] S. Tateishi, H. Matsui and S. Konishi, Nonlinear regression modeling via the lasso-type regularization. Journal of Statistical Planning and Inference, 140 (2010), 1125-1134.         [ Links ]

[30] R. Valiant, Nonlinear prediction theory and the estimation of proportions in a finite population. J. Am. Stat. Assoc., 80 (1985), 631-641.         [ Links ]

[31] N. Wang, W.X. Zhang and C.L. Mei, Fuzzy nonparametric regression based on local linear smoothing technique. Information Sciences, 177 (2007), 3882-3900.         [ Links ]

[32] M. Yang and H. Liu, Fuzzy least squares algorithms for interactive fuzzy linear regression models. Fuzzy Sets and systems, 135 (2003), 305-316.         [ Links ]