A fuzzy/Bayesian approach for the time series change point detection problem

D'Angelo, Marcos Flávio S.V.; Palhares, Reinaldo M.; Takahashi, Ricardo H.C.; H. Loschi, Rosangela

doi:10.1590/S0101-74382011000200002

Abstract

This paper addresses the change point detection problem in time series. A methodology based on the Metropolis-Hastings algorithm applied to time series modeled as a process with Beta distribution is discussed. In order to make this methodology useful in practice, a fuzzy cluster technique is applied to the initial time series at first, generating a new data set with Beta distribution. Bayesian procedures are considered for inference and the Metropolis-Hastings algorithm is used to sample from the posteriors. In the clustering process, a Kohonen neural network is used having as objective to find the best centers of the time series to be used in the fuzzyfication process. Finally, it will be presented a simulation results in the series of the electric energy consumption in Brazil, between January of 1976 and December of 2000, five months before the blackout occurred in 2001. Such result illustrates the efficiency of the proposed methodology for change point detection in time series.

change point; fuzzy clustering; Metropolis-Hastings

A fuzzy/Bayesian approach for the time series change point detection problem

Marcos Flávio S.V. D'Angelo^I,^* * Corresponding author ; Reinaldo M. Palhares^II; Ricardo H.C. Takahashi^III; Rosangela H. Loschi^IV

^IUniversidade Estadual de Montes Claros - UNIMONTES, Departamento de Ciência da Computação, Av. Rui Braga, s/n, 39401-089 Montes Claros, MG, Brazil. E-mails: marcos.dangelo@unimontes.br / dangelo@cpdee.ufmg.br

^IIUniversidade Federal de Minas Gerais - UFMG, Departamento de Engenharia Eletrônica, Av. Antônio Carlos, 6627, Pampulha, 31270-901 Belo Horizonte, MG, Brazil. E-mail: palhares@cpdee.ufmg.br

IIIUniversidade Federal de Minas Gerais - UFMG, Departamento de Matemática, Av. Antônio Carlos, 6627, Pampulha, 31270-901 Belo Horizonte, MG, Brazil. E-mail: taka@mat.ufmg.br

^IVUniversidade Federal de Minas Gerais - UFMG, Departamento de Estatística, Av. Antônio Carlos, 6627, Pampulha, 31270-901 Belo Horizonte, MG, Brazil. E-mail: loschi@est.ufmg.br

ABSTRACT

This paper addresses the change point detection problem in time series. A methodology based on the Metropolis-Hastings algorithm applied to time series modeled as a process with Beta distribution is discussed. In order to make this methodology useful in practice, a fuzzy cluster technique is applied to the initial time series at first, generating a new data set with Beta distribution. Bayesian procedures are considered for inference and the Metropolis-Hastings algorithm is used to sample from the posteriors. In the clustering process, a Kohonen neural network is used having as objective to find the best centers of the time series to be used in the fuzzyfication process. Finally, it will be presented a simulation results in the series of the electric energy consumption in Brazil, between January of 1976 and December of 2000, five months before the blackout occurred in 2001. Such result illustrates the efficiency of the proposed methodology for change point detection in time series.

Keywords: change point, fuzzy clustering, Metropolis-Hastings.

1 INTRODUCTION

Traditional methods of describing a time series behavior are based on models which satisfactorily explain its dynamics. Some fundamental question must be answered, however, in order to obtaine a model that reasonably fits to data: Are there switching regimes in the series? Can a single model, all the time, fit the dynamics behavior? If there are significant changes for the underlying time series, it seems natural to find out those change points or change periods before modeling the whole process.

Change points detection is a problem of interest in many different areas and it has been used to analyze financial time series (Oh KJ, Roh TH & Moon MS, 2005), ecological time series (Beckage, Joseph, Belisle, Wolfson & Platt, 2007), number of crimes series (Loschi RH, Gonçalves FB & Cruz FBR, 2005), and many others. The main techniques for change points detection presented in the literature are based on sequential statistical tests and Bayesian analysis. The most common statistical test on the change point detection problem is the CUSUM (Cumulative Sum). The CUSUM test proposed by (Hinkey, 1971) is widely used in the change point detection, and applications of this method can be found in (Hadjiliadis & Moustakides, 2006), (Lee, Nishiyama & Yoshida, 2006), and (Lee S, Park S, Maekawa K & Kawai K, 2006), as well as its modifications and extensions. In the Bayesian approach, inference about the changes are made based on the posterior for the positions of the change point. In this setting, the MCMC (Markov chain Monte Carlo) methods are commonly used to approximate the posteriors. The Bayesian analysis context the reference work for multiple change points detection is the paper of (Barry & Hartigan, 1993) that introduced the product partition model (ppm) to identify change points in the mean of data having a Gaussian distribution. In (Loschi RH & Cruz FRB, 2005b) the posterior probability of any instant being a change point is used as an evidence measure that the behavior of a sequence of data changes at any moment. The effectiveness of this technique in identifying changes in a Poisson distribution when compared with the measure proposed by (Hartigan, 1990) is presented in (Loschi RH & Cruz FRB, 2005a).

However, all the previous approaches necessarily demand some type of prior knowledge about the time series, namely the type of distribution that models the data behavior. Thus, differently from former techniques, the method proposed in this paper does not require any prior knowledge about statistical properties of the time series before the application of the MCMC procedure. This is made by the introduction of a first step, in which a fuzzy set technique is applied in order to cluster and to transform the initial time series, about which there is no a priori knowledge of its distribution, into a time series which can be approximated by a Beta distribution. Specifically in this first step, the Kohonen network is used to find centers of the clusters, and in the sequel the fuzzy membership degree is computed for each point of the initial time series, generating a new time series with beta distribution. This new time series generated in the first step allows systematically to apply the same strategy to detect the change via Bayesian change point model with a fixed reference distribution: the Beta distribution. Thus the Metropolis-Hastings algorithm (Gamerman, 1997) can be used to sample from the posteriors including the one that indicates the position of the change point.

Assuming that p₁ is the first operation point, p₂ is the second operation point, ε(t) is the noise signal with π(·) distribution and m is the position of the change point, throughout the paper, it is considered the following time series:

The Figure 1 show the time series behavior with fixed p₁ = 1, p₂ = 2, m = 30 e ε(t) ~ U(0, 1) and 60 samples of time series.

To establish notation, along this paper, U(0, 1) denotes the uniform distribution in the interval [0, 1], N(0, 1) denotes the standard Normal distribution with zero mean and unit variance, t(5) denotes the Students-t distribution with 5 degrees of freedom, gamma(a, b) denotes the Gamma distribution with scale parameter a and shape parameter b and Γ(a) denotes the gamma function evaluated in a. All the time series, e.g., y(t), denotes a discrete-time sequence.

2 TIME SERIES TRANSFORMATION BY FUZZY SETS

One of the main applications of the fuzzy sets theory (Zadeh, 1965) is for clustering detection, in which the problem of grouping the data into k separate categories is solved by defining different pertinences to each element belonging to category.

Thus, in this section, the first step of the proposed formulation to the change point detection problem is detailed, namely the fuzzy clustering of the initial time series.

Definition. Let y(t) be a time series, and consider a positive integer K. The set

C = {C_i | min{y(t)} <C_i < max{y(t)}, i = 1, . . . , k}

that solves the minimization problem

is called the fuzzy cluster centers set for the time series y(t). The fuzzy membership degree or fuzzy membership function value of y(t) ∈ C_i (which means y(t) belongs to the cluster C_i) is given by

Given the set C of cluster centers, it is an easy task to measure the distance between each point in the time series y(t) and each center C_i. The problem of finding the centers can be solved, for instance, via k-means (Kaufman & Rousseeuw, 1990), C-means (Bezdek, 1981), and Kohonen network (Kohonen, 2001). In this paper, the Kohonen network technique is used.

The proposed fuzzy clustering to transform a given time series into a new one is described below:

1. Input the time series y(t);

2. Find the set of k centers C = {C_i|min{y(t)} < C_i < max{y(t)} , i = 1, 2}, that minimizes the Euclidean distance given in (2) and that is illustrated in Figure 2, considering, for example, the time series (1). For the proposal of this paper, it is considered k = 2 centers.

3. Compute the fuzzy membership degree given in (3), for each sample of the time series, y(t), with respect to each center C_i (as illustrated in Figure 3 considering, for example, the time series (1)).

Since the idea is to find just one change point, only two centers are found. The function µ1(t) defines a new time series, in which the change point is detected.

From Figure 3, it is noticeable that the fuzzy membership degree is concentrated around 1 (one) for the first 30 values observed before the change point (see also the histogram in Figure 4), and is concentrated around zero for the last 30 values observed after the change point (see also the histogram in Figure 5). Furthermore, the distribution of µ₁(t), defined according to (3), is confined into the interval [0, 1], as shown below:

if y(t) → C1 then µ₁(t)→ 1^- ;

if y(t) → C2 then µ₁(t) → 0⁺ ;

if C1 → C2 then µ₁(t) → 1/2 ;

if y(t) → [C1, C2] then µ₁(t) → [0, 1] .

Using the Kullback-Leibler divergence (Kullback & Leibler, 1951) one may conclude that the distributions of µ₁(t) before and after the change point can be approximated by Beta distributions with different input parameters, say, for t < m, one assumes a beta(a, b) distribution, and for t > m it is assumed the beta(c, d) distribution. For example, in the case in which there is a change point in the time series, the parameter a is greater than the parameter in the Beta distribution of µ₁(t) if t < m, and the parameter c is smaller than the parameter d in the Beta distribution of µ₁(t) if t > m. Figures 6-8 presents an illustration of the distributions of µ₁(t) for the time series in (1) with sample size n = 200. It is assumed that the change point took place at the position m = 100. This empirical test has been performed starting from several time series, with different probability distributions for ε(t), say, U(0, 1) in Figure 6, N(0, 1) in Figure 7, and t(5) in Figure 8. In all these cases, the behavior of the transformed time series, obtained by applying the clustering technique, was well described by Beta distributions with similar shapes.

Since the clustering technique assures the same type of probability distribution function forthe new time series, µ₁(t) this fixed statistical model can be assumed in the Bayesian formulation to detect the change point in the transformed time series. In this paper, the Metropolis-Hastings algorithm is used to sample from the posteriors, since it is a powerful and simple strategy to do it.

3 METROPOLIS-HASTINGS FORMULATION

The goal of the Metropolis-Hastings algorithm (Gamerman, 1997) is to construct a Markov chain that has a specified equilibrium distribution p that in the paper is the posterior distribution of all parameters in the change point model.

Define a Markov chain as follows. If X_i-1 = x_i-1, then generate a candidate value x^* = y from a reference distribution with density f_X_*|X (y) = q(x_i-1, x^*). The candidate value x^* is then accepted or rejected as a sample of π. The acceptance probability is

which defines the transition kernel of the chain. If the candidate is accepted, then set X_i = x^*, otherwise, set X_i = x_i-1. It is possible to show that under general conditions the sequence X₀, X₁, X₂, ... is a Markov chain with equilibrium distribution π.

In practical terms, the Metropolis-Hastings algorithm can be specified by the following steps:

1. Choose a starting value x₀, the number of iterations, R, and set the iteration counter r = 1;

2. Generate a candidate value y ~ q(x_{i, ·});

3. Calculate the acceptance probability (4) and generate u ~ U(0, 1);

4. Compute the new value for the current state:

5. If r < R, return to step 2. Otherwise stop.

As discussed in the previous section, the clustering technique generates a transformed time series with the following distribution

y(t) ~ beta(a, b) ,
y(t) ~ beta(c, d) , for t = 1, . . . , m
for t = m + 1, . . . , n

where a, b, c, d and the change point m are the parameters to be estimated. To complete the model specification, prior distributions are elicited for the parameters. As usually done whenever there is no prior information available, in this paper the following non-informative prior distributions are assumed:

a ~ gamma (0.1, 0.1)

b ~ gamma (0.1, 0.1)

c ~ gamma (0.1, 0.1)

d ~ gamma (0.1, 0.1)

m ~ U[1, ... , n] , with p(m) = 1/n .

These non-informative gamma prior distributions, with parameters 0.1, have been chosen with the purpose of spreading the whole parametric space.

Since the posteriors do not have a close form, the Metropolis-Hastings algorithm is used to sample from them. Its formulation is given by the pseudo-code in the Table 1, and the reference distributions

Thumbnail

are presented in the ^{Appendix A} A Appendix . The change point m is obtained as the mode of its posterior distribution. It is approximated by the maximum value in the histogram for the R simulations of the pseudo-code in Table 1, with the exception of the border points of the distribution. The Figure 9 illustrates the convergence of parameter m for p₁ = 1, p₂ = 2, ε(t) ~ U(0, 1), R = 100 and m = 30.

Table 2 shows the results for 100 replications of the time series assuming different values of the parameters p₂ and ε(t). It is clear from Table 2 that the proposed methodology has presented 100% of correctness in detecting the change point in almost all situations. However, in the case when the time series changes from point p₁ = 1 to point p₂ = 1.1, the proposed approach has detected points in the neighborhood of m = 30, including it. In Table 2, the sign 'XX' indicates that the percentage of correctness was not conclusive. This fact is illustrated in the Figure 10, where a test was performed assuming p₁ = 1, p₂ = 1.1, ε(t) ~ U(0, 1), R = 1000 and m = 30. This behavior is due to the fact that both the noise and the signal change have the same amplitude.

Thumbnail

4 A CASE STUDY

The energy crisis faced by Brazil in early 2001 triggered a process of energy rationing in the country. This phenomenon was responsible for modifying habits and behaviors taken so far as standards when it came to consumption of electricity in Brazil. Since this fact has been occurred more useful studies to predict the electricity sector make necessary. In this case study the proposed methodology is used to assess the change in the trend at a particular time series that characterizes the energy consumption only in the Southeast Brazil (industry trade) in the period January 1976 to December 2000 (five months before the blackout in 2001), as presented in Figure 11, available at (IPEA).

Since such time series can experienced a change in its trend in the analyzed period, it is important to find this change point before modeling more precisely the entire process as discussed in the introduction section. Before applying the proposed methodology, the series was decomposed into a time series trend and a time series of cycles, (Hodrick & Prescott, 1997). The trend time series is illustrated in Figure 12. One can notice changes in the rate of growth of consumption between points 200 and 250, and points 250 and 300. The process of detection of change points in the time series trend, in Figure 12, was found by the taking its time-derivative and is illustrated in Figure 13. The result of the proposed methodology for the change point detection in Figure 13 is illustrated in Figure 14, which shows the effectiveness of this technique. Notice that the results in Figure 14 have been obtained after 1000 simulations of the pseudo-code in Table 1.

Further this methodology cannot be used as "blackout detector", but as an auxiliary tool in the forecasting process in this time series to be used in the planning of the expansion in the energy sector.

5 CONCLUSIONS

In this paper a novel fuzzy/Bayesian methodology for change point detection in time series was proposed. The methodology is based on a two step formulation. Firstly, a fuzzy clustering generates a transformed time series with Beta distribution to be used in the second step. In the second step, a Bayesian modeling is used for change point detection in the transformed time series. The Metropolis-Hastings algorithm is used to sample form the posteriors and the position of the change point is thus obtained taking the posterior mode as it estimation.

ACKNOWLEDGEMENTS

This work has been supported by Brazilian agencies FAPEMIG and CNPq.

Received February 4, 2009 / Accepted September 8, 2010

In this appendix is presented the acceptance probability for the posterior distribution of the parameters a, b, c, d and m described in the Metropolis-Hastings algorithm. The reference distributions used in the work are the prior distributions gamma(0.1, 0.1).

The reference distribution in Metropolis-Hastings is computed by likelihood function:

1. For the parameter a:

2. For the parameter b:

3. For the parameter c:

4. For the parameter d:

5. For the parameter m:

Sendo π(m^(·)) ~ U(0, 1), então:

[1] BARRY D & HARTIGAN JA. 1993. A Bayesian Analysis for Change Point Problems, 88(421):309-319.
[2] BECKAGE B, JOSEPH L, BELISLE P, WOLFSON DB & PLATT WJ. 2007. Bayesian change-point analyses in ecology. New Phytologist, 174: 456-467.
[3] BEZDEK JC. 1981. Pattern recognition with fuzzy objective function algorithms Plenum Press.
[4] GAMERMAN D. 1997. Markov chain Monte Carlo: stochastic simulation for Bayesian inference Chapman & Hall.
[5] HADJILIADIS O & MOUSTAKIDES V. 2006. Optimal and Asymptotically Optimal CUSUM Rules for Change Point Detection in the Brownian Motion Model with Multiple Alternatives. Theory of Probability and its Applications, 50(1): 75-85.
[6] HARTIGAN JA. 1990. Partition Models, 19(8): 2745-2756.
[7] HINKEY DV. 1971. Inference About the Change Point from Cumulative Sum Test, 26: 279-284.
[8] HODRICK R & PRESCOTT EC. 1997. Postwar U.S. Business Cycles: An Empirical Investigation. Journal of Money, Credit, and Banking, 29: 1-16.
[9] IPEA - INSTITUTO DE PESQUISAS ECONÔMICA APLICADA. (n.d.). Retrieved may 10, 2008, from http://www.ipeadata.gov.br
» link
[10] KAUFMAN L & ROUSSEEUW PJ. 1990. Finding groups in data: An introduction to cluster analysis John Wiley & Sons.
[11] KOHONEN T. 2001. Self-organizing maps Springer.
[12] KULLBACK S & LEIBLER RA. 1951. On Information and Sufficiency. Annals of Mathematical Statistics, 22(1): 79-86.
[13] LEE S, NISHIYAMA Y & YOSHIDA N. 2006. Test for Parameter Change in Diffusion Processes by Cusum Statistics Based on One-step Estimators. Annals of the Institute of Statistical Mathematics, 58(2): 211-222.
[14] LEE S, PARK S, MAEKAWA K & KAWAI K. 2006. Test for Parameter Change in ARIMA Models. Communications in Statistics: Simulation and Computation, 35(2): 429-439.
[15] LOSCHI RH & CRUZ FRB. 2005. Bayesian Identification of Multiple Change Points in Poisson Data. Advances in Complex Systems, 8: 465-482.
[16] LOSCHI RH & CRUZ FRB. 2005. Extension to the product partition model: computing the probability of a change. Computational Statistics and Data Analysis, 48(2): 255-268.
[17] LOSCHI RH, GONÇALVES FB & CRUZ FBR. 2005. Avaliação de uma medida de evidência de um ponto de mudança e sua utilização na identificação de mudanças na taxa de criminalidade em Belo Horizonte. Pesquisa Operacional, 25(3): 459-463.
[18] OH KJ, ROH TH & MOON MS. 2005. Developing time-based clustering neural networks to use change-point detection: Application to financial time series. Asia-Pacific Journal of Operational Research, 22(1): 51-70.
[19] ZADEH LA. 1965. Fuzzy Sets. Information and Control, 8(3): 338-353.

A Appendix

*

Corresponding author

Publication Dates

Publication in this collection
05 Aug 2011
Date of issue
Aug 2011

History

Received
04 Feb 2009
Accepted
08 Sept 2010

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

[1] [1] BARRY D & HARTIGAN JA. 1993. A Bayesian Analysis for Change Point Problems, 88(421):309-319.

[2] [2] BECKAGE B, JOSEPH L, BELISLE P, WOLFSON DB & PLATT WJ. 2007. Bayesian change-point analyses in ecology. New Phytologist, 174: 456-467.

[3] [3] BEZDEK JC. 1981. Pattern recognition with fuzzy objective function algorithms Plenum Press.

[4] [4] GAMERMAN D. 1997. Markov chain Monte Carlo: stochastic simulation for Bayesian inference Chapman & Hall.

[5] [5] HADJILIADIS O & MOUSTAKIDES V. 2006. Optimal and Asymptotically Optimal CUSUM Rules for Change Point Detection in the Brownian Motion Model with Multiple Alternatives. Theory of Probability and its Applications, 50(1): 75-85.

[6] [6] HARTIGAN JA. 1990. Partition Models, 19(8): 2745-2756.

[7] [7] HINKEY DV. 1971. Inference About the Change Point from Cumulative Sum Test, 26: 279-284.

[8] [8] HODRICK R & PRESCOTT EC. 1997. Postwar U.S. Business Cycles: An Empirical Investigation. Journal of Money, Credit, and Banking, 29: 1-16.

[9] [9] IPEA - INSTITUTO DE PESQUISAS ECONÔMICA APLICADA. (n.d.). Retrieved may 10, 2008, from http://www.ipeadata.gov.br
» link

[10] [10] KAUFMAN L & ROUSSEEUW PJ. 1990. Finding groups in data: An introduction to cluster analysis John Wiley & Sons.

[11] [11] KOHONEN T. 2001. Self-organizing maps Springer.

[12] [12] KULLBACK S & LEIBLER RA. 1951. On Information and Sufficiency. Annals of Mathematical Statistics, 22(1): 79-86.

[13] [13] LEE S, NISHIYAMA Y & YOSHIDA N. 2006. Test for Parameter Change in Diffusion Processes by Cusum Statistics Based on One-step Estimators. Annals of the Institute of Statistical Mathematics, 58(2): 211-222.

[14] [14] LEE S, PARK S, MAEKAWA K & KAWAI K. 2006. Test for Parameter Change in ARIMA Models. Communications in Statistics: Simulation and Computation, 35(2): 429-439.

[15] [15] LOSCHI RH & CRUZ FRB. 2005. Bayesian Identification of Multiple Change Points in Poisson Data. Advances in Complex Systems, 8: 465-482.

[16] [16] LOSCHI RH & CRUZ FRB. 2005. Extension to the product partition model: computing the probability of a change. Computational Statistics and Data Analysis, 48(2): 255-268.

[17] [17] LOSCHI RH, GONÇALVES FB & CRUZ FBR. 2005. Avaliação de uma medida de evidência de um ponto de mudança e sua utilização na identificação de mudanças na taxa de criminalidade em Belo Horizonte. Pesquisa Operacional, 25(3): 459-463.

[18] [18] OH KJ, ROH TH & MOON MS. 2005. Developing time-based clustering neural networks to use change-point detection: Application to financial time series. Asia-Pacific Journal of Operational Research, 22(1): 51-70.

[19] [19] ZADEH LA. 1965. Fuzzy Sets. Information and Control, 8(3): 338-353.

Brasil

Brasil

A fuzzy/Bayesian approach for the time series change point detection problem

Abstract

A Appendix

Publication Dates

History