Acessibilidade / Reportar erro

Introducing the q-Theil index

Abstract

Starting from the idea of Tsallis on non-extensive statistical mechanics and the q-entropy notion, we recall the Theil index Th and transform it into the Th q index. Both indices can be used to map onto themselves any time series in a non linear way. We develop an application of the Th q to the GDP evolution of 20 rich countries in the time interval [1950 -2003] and search for a proof of globalization of their economies. First we calculate the distances between the "new" time series and to their mean, from which such data simple networks are constructed. We emphasize that it is useful to, and we do, take into account different time "parameters": (i) the moving average time window for the raw time series to calculate the Th q index; (ii) the moving average time window for calculating the time series distances; (iii) a correlation time lag. This allows us to deduce optimal conditions to measure the features of the network, i.e. the appearance in 1970 of a globalization process in the economy of such countries and the present beginning of deviations. The q value hereby used is that which measures the overall data distribution and is equal to 1.8125.

Econophysics; Time series analysis; Entropy


Introducing the q-Theil index

M. AusloosI,* * Electronic address: marcel.ausloos@ulg.ac.be ; J. MiśkiewiczII,† † Electronic address: jamis@ift.uni.wroc.pl

IGRAPES, ULG, B5a, Sart Tilman, B-4000 Liege, Euroland

IIInstitute of Theoretical Physics, Wrocaw University, pl. M. Borna 9, 50-204 Wrocaw, Poland

ABSTRACT

Starting from the idea of Tsallis on non-extensive statistical mechanics and the q-entropy notion, we recall the Theil index Th and transform it into the Thqindex. Both indices can be used to map onto themselves any time series in a non linear way. We develop an application of the Thqto the GDP evolution of 20 rich countries in the time interval [1950 -2003] and search for a proof of globalization of their economies. First we calculate the distances between the "new" time series and to their mean, from which such data simple networks are constructed. We emphasize that it is useful to, and we do, take into account different time "parameters": (i) the moving average time window for the raw time series to calculate the Thqindex; (ii) the moving average time window for calculating the time series distances; (iii) a correlation time lag. This allows us to deduce optimal conditions to measure the features of the network, i.e. the appearance in 1970 of a globalization process in the economy of such countries and the present beginning of deviations. The q value hereby used is that which measures the overall data distribution and is equal to 1.8125.

Keywords: Econophysics, Time series analysis, Entropy

1. INTRODUCTION

Since the fundamental work of Boltzmann [1] the entropy concept has been developed and applied to a range of subjects going from elementary termodynamics and statistical mechanics through quantum physics e.g. [2], information theory [3] up to applications in biology e.g. [4-6] and economy [7-9]. Recently Tsallis and many others in his path have shaken up the usual considerations on the entropy concept, in particular within Shannon information theory.

In fact, complex nonequilibrium systems can be often described by a superstatistics, which results from a superposition of two statistics associated with two different time scales [7, 10-14]. The methods of extracting superstatistics parameters from time series are discussed in [15]. In that line of thought, a special attention can be paid to the entropy of a time series.

On the other hand, the Theil [16] index is often used in economy and finance. It is defined through

where the average (x) is made over the ensemble of points N of the population of size N. It looks like the Shannon entropy but was invented to consider the event values themselves, in particular the income xi of agent i in a population of N agents, rather than their probability of occurrence. One peculiarity is that it measures the individual's share of income relative to the mean income (xi) of the population. With reference to information theory, Theil's measure is a difference between its maximum entropy and its present entropy at that time. Thus from the Theil index one can look at correlations between data sets, distances, hierarchies, and other usual features, through various techniques of data analysis, like those resulting after network constructions.

An interesting development is to consider that the xi quantity in Eq.(1) is time dependent. Thus one can generalize the Theil index in order to remap in a nonlinear way a time series x(t) into a Th(t), as done in Sect. 2 which recalls considerations outlined in [17]. Moreover in the spirit of non-extensive statistical entropy, following Tsallis considerations, it can be imagined to propose the q-Theil index, as done in Sect. 2. The first application is here below made to macroeconomy time series, in particular to the GDP of the richest countries. Following up on studies of correlations between GDPs of rich countries [17-25], we have analyzed web-downloaded data on GDP, used as individual wealth signatures of a country's economical state ("status"). We have calculated the fluctuations of the GDP and looked for correlations, and "distances", as reported in Sect. 3.

Usually, a system is represented by a network, nodes being scalar agents, here the countries, while links are weights, i.e. here measures of distances between two Th(t) representing GDP fluctuation correlations between two countries. In order to extract structures from the networks, we have averaged the time correlations in different windows. This allows more robustness in the subsequent network properties and reveals evolving statistical distances. In line with our previous works [17-20] we have examined three different network constructions. A discussion on economy globalization follows with a conclusion in Sect. 4. It is found that such a measure of collective habits does fit the usual expectations defined by politicians or economists, i.e. common factors are to be searched for.

2. THEIL INDEX AND TSALLIS ENTROPY

The original definition of the Theil index, see Eq.(1), allows for a peculiar mapping of a 1-D data set, such as a time series, i.e. the Theil index nonlinearly maps the "original" time series A(t) into a new one through

where the average (A)(t, T1 ) is made over the ensemble of points j in a time window of size T1, placed between t and t + T1:

Thus the Theil index is calculated for the interval [t, t + T1]. Applications of the Theil index notion can be found in other papers [17, 20], where the Theil index was applied to measure the economy globalization process.

In order to connect with Tsallis non extensive statistics we introduce the q-Theil index for a time series A(t)

in the interval [t, t + T1]. Eq.(4) corresponds to Eq.(2) when q →1.

In order to compare time series, their distance can be immediately introduced. Moreover the mean and standard deviations (std) of an ensemble of such distances can be used in further considerations. The distance between two time series (here the Theil-mapped time series) is hereby defined as the absolute value of the difference between mean values in the interval [t, t + T2]. Moreover the elements of the time series can be taken at equal times or with the time lag τ, a possibility which we also take into consideration for generality purposes. Thus we define

In Eq.(5) the mean value denoted by brackets, (...), is defined as in Eq.(3).

As a result we have three different time parameters:

1. the T1 time window while calculating the Thqindex,

2. the time lag τ, and

3. the correlation window T2.

Note: in the analysis both time windows (Thqand correlation) are used congruently so the total size of the time window is equal to the sum of the Thqand correlation time windows. Therefore the number of the generated networks is equal to the time series length minus the total time window size.

3. MACROECONOMY INDEX INPUT AND NETWORK CONSTRUCTION

3.1. GDP data

GDP data sets of the richest OECD countries were used, i.e. Austria, Belgium, Canada, Denmark, Finland, France, Greece, Ireland, Italy, Japan, the Netherlands, Norway, Portugal, Spain, Sweden, Switzerland, Turkey, U.K., U.S.A, and Germany, allowing for a linear superposition of the data before the reunification in 1991 in the latter case; an All country is also invented as in previous works [17-20].1 1 The set deviates somewhat from previous works [17, 19, 20] since there is neither Iceland nor Luxembourg but there is Turkey in the present paper. Thus N = 21. The data starts in 1950 and finishes in 2003, so there are 54 data points in every time series.

3.2. Networks

The distance matrices obtained from Eq.(5) are analysed by constructing three network structures and analysing statistical properties of the distances between nodes. The following networks are considered: unidirectional minimal length path (UMLP), bidirectional minimal length path (BMLP) and locally minimal spanning tree (LMST). The algorithms generating the mentioned networks are:

UMLP The network begins with an arbitrary chosen country, - here the All country, then the closest neighbouring country is attached and becomes the end of the network. The next country closest to the end of the network is searched and attached. The process continues until all countries are attached to the network.

BMLP The network begins with the pair of countries with the smallest distance between them. Then the country closest to the ends of the network are searched and those with shorter distance attached to the appropriate end. The algorithm is continued until all countries become nodes of the network.

LMST The root of the network is the pair of closest neighbouring countries. Then the country closest to any node is searched and attached. The algorithm is continued until all countries are attached to the network.

Notice that in the UMLP construction, All is at the begining of the chain, while in the other two constructions, All is treated as a "normal" country. The BMLP and LMST network seeds are the appropriate pairs of the closest countries according to the appropriate distance matrix. The first two networks are linear, and essentially robust against a "perturbation", as when removing or adding a country or in the case of a regrettable mathematical error, since they are based on a measure relative to a statistical mean, while the LMST is obvioulsy a tree, rather compact when only 21 data points, thus with very few branching levels, are involved in the construction. It is known that such a tree is far from robust.

4. RESULTS

First let us report that a q value of the considered data set must be given. It could be let as a free parameter and one could find some optimal value according to some criterion, or a few criteria. Here below, i.e. for the GDP time series in 1950-2003 for the countries defined in Sec.3.1 we have calculated the q value for the following considerations by the maximum likelihood estimator [26], as for Tsallis entropy, and found q = 1.8315, hereby used to calculate distances through Eq.(4) and Eq.(5). It is fair to recall that Borges in [27] calculated the (Tsallis entropy) q value for GDP of USA, Brasil, Germany and UK. He found a q value varying from 1.4 (UK) up to 2.1 (Brasil), and for USA, q=1.7.

4.1. q-Theil distance statistics

In our analysis UMLP, BMLP and LMST networks were constructed for all time windows ranging from T1 = 5 yrs, T2 = 1 y moving along the time axis by a one year step. Eleven time lag values were considered: τ [0,1, . . . ,10]. The T1, T2 and t parameters satisfy the inequality T2 +T3 + τ < 54 yrs, so the number of generated networks (Net) depends on the time window sizes and is equal to NNet = 54- T1-T2- τ, for a given triplet, - times 3, due to the type of network considered. In total this is a huge number of networks. Therefore some cases are to be extracted for the present report. 2 2 All cases are available from the authors upon request. Different presentations can be made, in a three dimensional time coordinate space. We propose a vizualisation of the data through a spectrogram method, using for the x and y axis the time window T2 and T1 respectively for a given τ. The data values are represented by a colored pixel in a convenient order.3 3 The results are presented in grey tones, but online figures are available in color. The results of calculations of the mean but also values of the corresponding standard deviations are here below presented.

The mean value and standard deviation of the distances between nodes as a function of the T1 and T2 are presented in Figs. 1 - 9 for the time lag τ = 0, 5, 10 yrs. The largest value of the mean distance, the minimum mean distance, the maximum and minimum standard deviations as a function of the time windows T1, T2 and time lags are presented in Table 5.


The mean value and standard deviation of the distances between nodes as a function of the T1 and T2 are presented in Figs. 1 - 9 for the time lag τ = 0, 5, 10 yrs. The largest value of the mean distance, the minimum mean distance, the maximum and minimum standard deviations as a function of the time windows T1, T2 and time lags are presented in Table 5.

It can be first generally observed that the mean distance between countries and the corresponding standard deviation are the biggest for UMLP networks and the smallest for LMST networks. It is also worth noticing that the mean distance depends on the time lag value. If the time lag is large the mean distance is large as well. The maximum of the mean distance occurs for the longest T1 and the shortest T2 windows sizes. The minimum mean distance is found with the oposite combination of the time windows sizes, i.e. small T1 and large T2. The standard deviations increase with the time lag and are the largest ones in the case of the longest considered time lag.

4.2. q-Theil network evolution

For further discussion the following time window sizes were chosen, i.e. (T1 = 5 yrs, T2 = 10 yrs), (T1 = 10 yrs, T2 = 5 yrs), (T1 = 10 yrs, T2 = 10 yrs), (T1 = 15 yrs, T2 = 15 yrs), for the three time lags τ = 0, 5, 10 yrs. The evolutions of mean distance between countries and the corresponding standard deviations for these chosen time windows sizes are presented in Figs. 10-21. Arrows and straight lines indicate remarkable features.


The general observations to be made at this stage are the following

• In all considered networks (UMLP, BMLP and LMST) and for all window sizes three types of evolution can be distinguished: increase, decrease and relatively stable mean distance between countries.

• These three types of evolution are better seen for long lag time (τ > 5 yrs). Therefore the lag time seems to be crucial in any analysis and discussion of the globalization process. This might suggest that some countries play a role of leaders while other follow their way.

• It is worth noticing that for the very long lag time τ = 10 yrs and time windows [(T1 = 5 yrs, T2 = 10 yrs), (T1 = 10 yrs, T2 = 5yrs), (T1 = 10yrs, T2 = 10yrs)] the maximum of the mean distance occurs around 1960,

• since then the size of the network(s) is fast decreasing over a decade up to 1970 and

• thereafter remains small and relatively stable up to 2000 or so

• when the mean size seems to reincrease.

5. CONCLUSIONS

In conclusion, the most interesting results of this analysis are

• The analysis shows the existence of a globalization process starting from 1960 till 1970 and its stabilisation thereafter, followed by a destabilisation after 2000 as observed in the decrease of the network size.

• The observation of the globalization process does not depend on the type of network constructed.

• The mean distance between countries and the corresponding std are the largest for the UMLP networks and the smallest for the corresponding LMST networks.

• With increasing time lag the Theil mapping window size T1 at which the maximum of the network size is found is always decreasing.

• The globalization process is better seen if the lag time is greater than 5 yrs, -which might be considered as the time needed for some synchronization process, but is also in fact commensurate with most government life times and election time intervals. These conjectures suggest further investigations.

• Even though for large time lags the mean values are large, the globalization evolution is the same as for short time lags (greater than 5 yrs); thus a large time lag magnifies the globalization process feature which is, therefore, easier to observe.

Table I

Of course much more work is in order to connect the above to some non extensive thermostatistics ideas. To search for a robust ("optimal") q value and the significance of the q-Theil index are open questions. Finally let us stress the interest of studying graphs, in particular to derive weighted networks such as in this paper, in order to have some comparative data organisation coherence.

The ordinary q = 1 case has been presented and discussed at the 2009 Medyfinol congress [28], where several figures illustrate the development. In short, there is neither much difference between the numerical results for q = 1 or not, nor the main macroeconomic conclusions. However we have found that the q = 1 development leads to less "stability" or "coherence" in the results, in particular when they are observed as a function of the time window. Such visual features cannot be simply quantified through statistical tests. Therefore we let the readers compare for him/herself the data in both publications.

5.1. Acknowldgements

Thanks to the organizers of NEXT2008 in Foz do Iguacu, Parana, in particular Luis C. Malacarne and Reino S. Mendes, plus kind thanks to Thais Pedreira for their welcome and careful attention to MA needs at the meeting. Let C. Tsallis be congratulated for his great insight, what he brought to modern equilibrium thermostatistics and for suggesting MA participation at such a meeting.

[16] H. Theil was a Dutch econometrician who was born on13 October 1924 in Amsterdam, graduated from the University of Amsterdam, succeeded to Jan Tinbergen at the Erasmus University Rotterdam, moved and taught later in Chicago and at the University of Florida. He died in 2000.

(Received on 04 December, 2008)

  • [1] L. Boltzmann, Vorlesungen ber Gastheorie: 2 Volumes (Leipzig, 1895/98).
  • [2] S. Gheorghiu-Svirschevski, Phys. Rev. A 63, 022105 (2001).
  • [3] C. E. Shannon, Bell Syst. Tech. J. 30, 50 (1950).
  • [4] J. L. Ruiz de la Torre, A. Velarde, and X. Manteca, Anim. Behav. 59, 269 (2000).
  • [5] D. R. Brooks and E. O.Wiley, Evolution As Entropy: Toward a Unified Theory of Biology (University of Chicago Press, 1988).
  • [6] C. T. Murray Gell-Mann, Nonextensive Entropy: Interdisciplinary Applications (Oxford University Press, 2004).
  • [7] M. Ausloos and K. Ivanova, Phys. Rev. E 68, 046122 (2003).
  • [8] J. A. Duro and J. Esteban, Econom. Lett. 60, 269 (1998).
  • [9] J. A. James and M. Thomas, J. Income Distrib. 9, 39 (2000).
  • [10] C. Beck and E. G. D. Cohen, Physica A 322, 267 (2003).
  • [11] A. Mathai and H. Haubold, Physica A 385, 493 (2007).
  • [12] H. Touchette and C. Beck, Phys. Rev. E 71, 016131 (2005).
  • [13] C. Tsallis and A. M. C. Souza, Phys. Rev. E 67, 026106 (2003).
  • [14] J.-P. Bouchaud and M. Potters, Theory of Financial Risk and Derivative Pricing (Cambridge University Press, 2003).
  • [15] C. Beck, E. G. D. Cohen, and H. L. Swinney, Phys. Rev. E 72, 056133 (2005).
  • [17] J. Miskiewicz, Physica A 387, 6595 (2008).
  • [18] J. Miskiewicz and M. Ausloos, Acta Phys. Pol. B 36, 2477 (2005).
  • [19] J. Miskiewicz and M. Ausloos, Int. J. Mod. Phys. C 17, 317 (2006).
  • [20] J. Miskiewicz and M. Ausloos, Physica A 387, 6584 (2008).
  • [21] M. Ausloos and R. Lambiotte, Physica A 382, 16 (2007).
  • [22] M. Ausloos and M. Gligor, Eur. Phys. J. B 57, 139 (2007).
  • [23] M. Gligor and M. Ausloos, J. Econ. Integration 23, 297 (2008).
  • [24] M. Ausloos and M. Gligor, Physica A 114, 491 (2008).
  • [25] M. Gligor and M. Ausloos, Eur. Phys. J. B 63, 533 (2008).
  • [26] A. C. C. Shalizi and M. Newman, Power-law distributions in empirical data, E-print, arXiv:0706.1062 (2007).
  • [27] E. P. Borges, Physica A 334, 255 (2004).
  • [28] M. Ausloos and J. Miskiewicz, Medyfinol 2009, submitted for publication, arXiv:0906.2379
  • *
    Electronic address:
  • †
    Electronic address:
  • 1
    The set deviates somewhat from previous works [17, 19, 20] since there is neither Iceland nor Luxembourg but there is Turkey in the present paper.
  • 2
    All cases are available from the authors upon request.
  • 3
    The results are presented in grey tones, but online figures are available in color.
  • Publication Dates

    • Publication in this collection
      10 Sept 2009
    • Date of issue
      Aug 2009

    History

    • Accepted
      04 Dec 2009
    • Received
      0000
    Sociedade Brasileira de Física Caixa Postal 66328, 05315-970 São Paulo SP - Brazil, Tel.: +55 11 3091-6922, Fax: (55 11) 3816-2063 - São Paulo - SP - Brazil
    E-mail: sbfisica@sbfisica.org.br