USE OF STOCHASTIC VOLATILITY MODELS IN THE VARIABILITY OF PASSENGERS AND CARGO TRANSPORT IN SOME AIRPORTS IN SÃO PAULO STATE, BRAZIL

The proposed study is related to the identification of important factors that determine the feasibility of regional airport hubs in São Paulo State, Brazil. This way, new perspectives are created to map airports in this region based on economic criteria of operations and volumes. The study is based on statistical data analysis of time series related to operations and volume of passengers and cargo transport data sets during a fixed period of time. Stochastic volatility models are applied for the logarithmic transformed counting data set considering the four largest airports selected from a group of 32 airports in the São Paulo State chosen by their importance in the amount of passenger and cargo for the period ranging from the year 2008 to the year 2014. This study is a new approach in the analysis of air transport time series.


INTRODUCTION
The literature presents many studies on hub air transport network associated with the transport of passengers and hub location problems, so-called p-hubs and few studies associated with the transport of cargo. The present study's goal is related to the association of some important economic factors to the monthly counting of passengers and volume of air cargo transported in some important airports located in a industrialized region of Brazil.
This way, new statistical modeling is proposed in the analysis of airport operations related to some economic covariates. For this goal, it is proposed the use of stochastic volatility models considering the logarithm of the counting data for the analysis of data sets related to the four more important airports selected from 32 airports located in São Paulo State obtained from statistical handling reports of São Paulo Airway Department (DAESP-http://www.daesp.sp.gov. br/), Brazil, an office linked to the air transport of São Paulo State.
Observe that standard counting models based on regression Poisson models (generalized linear models, see for example, Nelder & Wedderburn, 1972) could also be used in the analysis of the data set, but the use of stochastic volatility models in the logarithm scale of the counting data gives more flexibility to simultaneously model of the mean and variance (volatility) of the time series.
The use of stochastic volatility models is becoming very popular in the analysis of financial time series, but is rarely used to analyze time series in transport applications.
The seasonality and volatility of passenger and air cargo transportation in regional hubs are studied in terms of airport economic importance using statistical volatility models that link the dependence of observed economic factors (covariates) to the responses (passenger and cargo counting) and also non-observed factors linked to the volatilities. From the 32 airports available in the DAESP report, we consider the four more important airports in our study which gives a good view of the sector.
The customization of p-hub designs for air loads (Morrel & Pilon, 1999)  Gardiner & Ison (2008) reported the existence of three classes of important decisions for the air cargo operator to choose about which airport will operate and which is the geographical location of the airport, financial return and airport security operations. Wu et al. (2001) reported the growth of hub-and-spoke networking allowing major airports limit the size of passenger demand in the air transport to become the main hubs in their respective regions for cargo transport. Scholz & Cosel (2011) point out three important points that could be assumed as a motivation for this study: JORGE ALBERTO ACHCAR, LUIZ RODRIGO BONETTE and WALTHER AZZOLINI JUNIOR 175 • The growth of air freight tonnages changing the relationship between passengers and cargo has become a significant source of revenue for airlines and airports; • The importance of individual airports on a network usually assessed by measures linked to passenger counting, cargo and operating numbers; • The combined passenger and cargo services are operated in the major airlines.  (2015) describe that freight transport is more complex than passenger transport since the freight transport involves more professionals, more sophisticated processes combining volume, varying priority services, integration strategies and various itineraries in a transmission system (see also, Huang & Lu, 2015).
The paper is prepared as follows: in section 2, it is presented the goals of the research and the data set; in section 3, it is presented the statistical analysis of the São Paulo State airport data (cargo and passengers) using volatility stochastic models; finally, in section 4, it is presented some concluding remarks.

DESCRIPTION OF THE GOALS AND THE DATA SET
A study to determine strategic hubs associated with the flux of passengers and the volume of air cargo transported in a region is a starting point for research still scarce in Brazil where this approach could generate important strategic contributions to air transport networks and impacts on the local and regional economy of these airports. The statistical method used in this work seeks to analyze the temporal time series data obtained from statistical reports of DAESP for the years period ranging from 2008 to 2014 extracted from a set of 10,572 observations formed by 32 airports, two response variables (monthly counting of passengers and cargo) associated with the economic covariates dollar rate and unemployment rate obtained from the economic office ACSP (http://www.acsp.com.br/).
The study data set refers to the monthly flux of passengers and cargo in the period ranging from January 01, 2008 to December 31, 2014 of four airports in the state of São Paulo (Ribeirão Preto, São José do Rio Preto, Bauru/Arealva and Presidente Prudente) the most important airports of the data set. These airports are chosen by their importance and size regarding the amount of passengers and cargo in the São Paulo state. In addition to the counts (passenger/cargo), we also consider some covariates that may be correlated with the responses (passenger/cargo) as the monthly exchange rate for dollar and monthly unemployment rate. In Figure 1, it is presented the graphs of the time series for counting of passengers and cargo in the chosen airports. From Figure 1, it is observed that: • Apparently, the number of passengers increases over time for all airports; then after a given change point time there is a small decreasing.
• Apparently, the number of cargo loads increases over time in the airport of Ribeirão Preto and Bauru; the airport of São José do Rio Preto has a decreasing amount of cargo during the period. The behavior of Presidente Prudente airport is similar to the Ribeirão Preto airport.

177
The main goals of the study are given by: • For the statistical analysis of the data set, it is proposed the use of a stochastic volatility model for the logarithm of the counting data, which gives a great flexibility to simultaneously model the mean and the variance (volatility) of the time series, that is, a new statistical approach to analyze transportation data.
• The mean of the time series is modeled by a multiple linear regression model relating the mean to the selected covariates (dollar exchange, monthly unemployment rate, years and months) for the two responses (count passenger/cargo) considered in the logarithm scale.
• The variances of the time series are modeled assuming volatility stochastic models, which incorporate seasonal effects given by latent or non-observed variables.

STATISTICAL ANALYSIS OF THE SÃO PAULO STATE AIRPORT DATA USING STOCHASTIC VOLATILITY MODELS
To study the relationship between the variables and to find the most important factors affecting the variability of passenger count/load in the period from January 1, 2008 to December 31, 2014, we consider a statistical analysis assuming multiple linear regression models relating the average response in logarithmic scale (passenger and cargo counting) to the monthly covariates (dollar exchange, monthly unemployment rate, years and months) and simultaneously modeling the variances of the responses assuming volatility stochastic models.

Use of stochastic volatility models under a Bayesian approach
Stochastic volatility models (SV) has been extensively used to analyze financial time series (see Danielsson, 1994; Yu, 2002) as a powerful alternative for existing auto-regressive models such as ARCH (autoregressive conditional heteroscedastic) introduced by Engle (1982) and the generalized autoregressive conditional heteroscedastic (GARCH) models introduced by Bollerslev (1986) but rarely used in transportation or other engineering applications (see also, Ghysels, 1996; Kim & Shephard, 1998; or Meyer & Yu, 2000).
For the setting of the models, let N = 1 be a fixed integer number that records the amount of observed data (in our case it will represent the counting of passengers or cargo for the airports of São Paulo State in each month). Also, let Y j (t ), t = 1, 2, . . . , N, j = 1, 2, . . . , K , indicating the times series in the logarithm scale recording for the counting of passengers or cargo in the tth month, t = 1, 2, . . . , N and j th airport, j = 1, 2, 3, 4. Here N = 84 months and K = 4 ( j = 1 for Ribeirão Preto; j = 2 for São José do Rio Preto; j = 3 for Bauru/Arealva; j = 4 for Presidente Prudente).
In the presence of heteroscedasticity, that is, variances depending on the time t, assume that the time series Y j (t ), t = 1, 2, . . . , N ; j = 1, 2, 3, 4 can be written as, where j (t ) is a noise considered to be independent and identically distributed with a Normal distribution N (0, σ 2 ) and σ j (t ) is the square root of the variance of (1) (for simplicity, it is assumed σ 2 = 1, since in our case the obtained inference results do not have significative changes). The variance of Y j (t ) is assumed to be given by the model σ 2 e h j (t ) where h j (t ) depends on a latent variable or unobserved variable.
It is interesting to observe that usually a stochastic volatility process Y j (t ) in finance applications is given by a special case of the equation (1), that is, given by the model is the logarithm of returns and σ j (t ) is a strictly stationary sequence of positive random variables which is independent of the independent identically distributed noise sequence The independence of the noise σ j (t ) and the volatility σ j (t ) allow for a much simpler probabilistic structure than that of a GARCH process, which includes explicit feedback of the current volatility with previous volatilities and observations. This is one of the advantages of stochastic volatility models. In this case, the mutual independence of the sequences Y j (t ) and σ j (t ) and their strict stationarity immediately imply that Y j (t ) is strictly stationary. Conditions for the existence of a stationary GARCH process are much more difficult to establish (see for example, Nelson, 1990 andBougerol &Picard, 1992).
In this way, Y j (t ) has a normal distribution given by: where g j = β jo + β + j 1dollar.rate+β j 2 unemployement.rate+β + j 3months +β j 4 years; The likelihood function of the SV defined by (2) given h j (t ) which depends on a latent variable or unobserved variable is given for each j = 1, 2, 3, 4, by, is the density function of a normal distribution N (g j , σ 2 e h j (t ) ).
Bayesian inference based on Markov chain Monte Carlo (MCMC) methods (see for example, Gelfand & Smith, 1990 or Smith & Roberts, 1993) have been considered to analyze SV models.
The main reason for the use of Bayesian methods is that we may have great difficulties when using a standard classical inference approach. Those difficulties may appear in the form of high dimensionality and likelihood function (observe that under a classical approach, we should eliminate the latent variables in h j (t ) by integration) with no closed form among other factors.
For a Bayesian analysis of models (1), it is assumed that the prior distributions for the parameters and Gamma(d, e) denotes a Gamma distribution with mean d/e and variance d/e 2 . The hyperparameters a j , b j , c j , d j and e j are considered to be known and are specified latter. These prior distributions were chosen by observing the ranging of values in each parameter.

Bayesian analysis for the response log(passengers)
Let us assume prior Beta(1, 1) distributions for φ 1 j , uniform U (0, 0.1) prior distributions for φ 2 j , Gamma(1.1) prior distributions for σ 2 j ζ , normal N (0, 1) prior distributions for μ j , normal N (10, 1) prior distributions for β jo and normal N (0, 1) prior distributions for β jl , j = 1, 2, 3, 4; l = 1, 2, 3, 4. It is considered a burn-in sample with 21,000 samples to eliminate the effect of the initial values in the iterative method; after that, it is generated another 90,000 samples taking samples from 10 to 10 totaling a final sample size of 9,000 to get the posterior summaries of interest (see Table 1). In the simulation of samples of the joint posterior distribution of interest it is used the OpenBugs software (Spiegelhalter et al., 2003). Convergence of the Gibbs sampling algorithm was verified from standard traceplots of the generated Gibbs samples.
From the results in Table 1, some important observations are reported in the following subsections.

Passengers -Ribeirão Preto
Years, months and rate exchange affect the number of passengers in the Ribeirão Preto airport (the 95% credibility interval does not contain zero), that is, the regression coefficients are statistically different from zero. Observe that a 95% credibility interval corresponds to a 95% confidence interval under a classical inference approach, that is, a 5% significance level to test if each regression parameter is equal to zero against an alternative to be different of zero. Under a Bayesian approach, we get the inferences using the credibility intervals.
It is observed a positive value for the estimator of the regression parameter related to the year (0.1752) which implies that there is a significant increase in the logarithm of the number of passengers to the Ribeirão Preto airport over the years (2008-2014). It is observed a negative value for the estimator of the regression parameter related to the exchange rate (−0.3385) which implies that there is a significant decrease in the logarithm of the number of passengers to the Ribeirão Preto airport with the increase of the dollar (1/Jan/2008 to 31/December/2014).
It is observed a positive value for the estimator of the regression parameter related to the months (0.02243) implying that there is a significant increase in the logarithm of the number of passengers over the course of months. End of years lead to a significant increase in the number of passengers.

Passengers -São José do Rio Preto
Years, exchange rate and unemployment rate affect the number of passengers in the São José do Rio Preto airport (the 95% credibility interval does not contain zero), that is, the regression coefficients are statistically different from zero.

Passengers -Bauru/Arealva
Years, months and exchange rate are affecting the number of passengers in the Bauru airport (the 95% credible intervals do not contain zero), that is, the regression coefficients are statistically different from zero.
It is observed a positive value for the estimator of the regression parameter related to year (0.2378) which implies that there is a significant increase in the logarithm of the number of passengers in the Bauru airport over the years (2008)(2009)(2010)(2011)(2012)(2013)(2014).
It is observed a negative value for the estimator of the regression parameter related to the exchange rate (−0.3248) which implies that there is a significant decrease in the logarithm of the number of passengers in the Bauru airport with increase in the dollar rate (1/Jan/2008 to 31/December/2014).
It is observed a positive value for the estimator of the regression parameter related to month (0.02598) implying that there is a significant increase in the logarithm of the number of passengers over the course of months. End of years lead to a significant increase in the number of passengers.

Passengers -Presidente Prudente
Years, months and exchange rate affect the number of passengers in the Presidente Prudente airport (the 95% credibility interval does not contain zero), that is, the regression coefficients are statistically different from zero.
It is observed a positive value for the estimator of the regression parameter related to year (0.1445) which implies that there is a significant increase in the logarithm of the number of passengers in the Presidente Prudente airport over the years (2008)(2009)(2010)(2011)(2012)(2013)(2014).
It is observed a negative value for the estimator of the regression parameter related to the exchange rate (−0.2200) which implies that there is a significant decrease in the logarithm of the number of passengers in the Presidente Prudente airport with increase in the value of dollar rate (1/Jan/2008 to 31/December/2014).
It is observed a positive value for the estimator of the regression parameter related to the months (0.1445) implying that there is a significant increase in the logarithm of the number of passengers with respect to months in the Presidente Prudente airport (end of year lead to increased number of passengers).

Bayesian analysis for the response log (cargo)
Considering, the cargo volume in the four airports, it is assumed the same prior distributions assumed for the passenger case, that is, Beta(1, 1) prior distributions for φ 1 j , uniform U (0, 0.1) prior distributions for φ 2 j , Gamma(1.1) prior distributions for σ 2 j ζ , normal N (0, 1) prior distributions for μ j , normal N (10.1) prior distributions for β jo and N (0, 1) prior distributions for β jl , j = 1, 2, 3, 4; l = 1, 2, 3, 4. Also using the OpenBugs software, it is considered a burn-in sample of size 21000; after that, we generated another 90,000 samples taking samples from 10 to 10 totaling a final sample size of 9,000 to get the posterior summaries of interest (see Table 2).
From the analyses of the results in Table 2, some important observations are reported in the following subsections.

Cargo -Ribeirão Preto
Years, months and exchange rate affect the transportation of cargo in the Ribeirão Preto airport (the 95% credibility interval does not contain zero), that is, the regression coefficients are statistically different from zero.
It is observed a negative value for the estimator of the regression parameter related to the dollar exchange rate (−0.1753) which implies that there is a significant decrease in the logarithm of the freight transportation in the Ribeirão Preto airport with the rise of the dollar rate.
It is observed a positive value for the estimator of the regression parameter related to months (0.03134) implying that there is a significant increase in the logarithm of the amount of cargo in the Ribeirão Preto airport (end of year lead to increased amount of cargo). (sd: standard deviation; LL2.5%: lower limit; UL97.5%: upper limit).
It is observed a positive value for the estimator of the regression parameter relative to years (0.09188) implying that there is a significant increase in the logarithm of the transport of cargo in the Ribeirão Preto airport over the years.

Cargo -São José do Rio Preto
Years and exchange rate affect the transportation of cargo in the São José do Rio Preto airport (the 95% credibility interval does not contain zero), that is, the regression coefficients are statistically different from zero. The other factors are not significative, possibly due to the presence of some outlier that could affect the obtained inference.
It is observed a negative value for the estimator of the regression parameter related to the dollar exchange rate (−0.3780) which implies that there is a significant decrease in the logarithm of the cargo transportation in the São José do Rio Preto airport with the rise of the dollar rate.
It is observed a negative value for the estimator of the regression parameter related to years (−0.06743) which implies that there is a significant decrease in the logarithm of the transport of cargo in the São José do Rio Preto airport over the years.

Cargo -Bauru/Arealva
Years, months and exchange rate affect the transportation of cargo in the Bauru airport (the 95% credibility interval does not contain zero), that is, the regression coefficients are statistically different from zero.
It is observed a negative value for the estimator of the regression parameter related to the dollar exchange rate (−0.3563) which implies that there is a significant decrease in the logarithm of the cargo transportation in the Bauru airport with the rise of the dollar rate.
It is observed a positive value for the estimator of the regression parameter related to months (0.02059) implying that there is a significant increase in the logarithm of the amount of cargo transportation in the Bauru airport with respect to months (end of year lead to increased amount of cargo).
It is observed a positive value for the estimator of the regression parameter related to year (0.1550) which implies that there is a significant increase in the logarithm of the cargo in the Bauru airport over the years.

Cargo -Presidente Prudente
Years, months, unemployment rate and exchange rate affect the transportation of cargo in the Presidente Prudente airport (the 95% credibility interval does not contain zero), that is, the regression coefficients are statistically different from zero.
It is observed a negative value for the estimator of the regression parameter related to the dollar exchange rate (−0.5819) which implies that there is a significant decrease in the logarithm of the cargo transportation in the Presidente Prudente airport with the rise of the dollar rate.
It is observed a positive value for the estimator of the regression parameter related to months (0.0363) implying that there is a significant increase in the logarithm of the amount of cargo in the Presidente Prudente airport with respect to months (end of year lead to increased amount of cargo).
It is observed a positive value for the estimator of the regression parameter related to year (0.1883) which implies that there is a significant increase in the logarithm of the cargo transportation in the Presidente Prudente airport over the years.
It is observed a positive value for the estimator of the regression parameter related to the unemployment rate (0.1567) which implies that there is a significant increase in the logarithm of the cargo transportation in the Presidente Prudente airport with rising unemployment rate. In Figure 6, it is presented the graphs of the square roots for the volatilities considering the data set of the four airports (monthly counting of passengers and cargo).

Model fit and the estimated volatilities
From the plots of Figures 5 and 6, we get some important interpretations: Passengers: • There is great volatility for the number of passengers in the Ribeirão Preto airport between month 40 (April, 2011) and month 50 (February, 2012). Outside of this period, there is a small volatility.
• Similar behavior is observed for the number of passengers in the São José do Rio Preto airport; the volatility increases at the end of the observed period, that is, close to the end of 2014 (year that starts big economic problems in Brazil).
• The Bauru airport has a great volatility regarding the number of passengers close to the month 25 (January, 2010) and month 55 (end of 2012).
• The Presidente Prudente airport has similar behavior regarding the volatility of the number of passengers as observed for the Bauru airport.

Cargo:
• The Ribeirão Preto airport has a seasonal behavior regarding the volatilities in the transport of cargo (in some periods of the year there is greater volatility) where a large volatility occurs close to the month 65 which corresponds to the 2013 year.
• Similar behavior is observed regarding the cargo transportation in the São José do Rio Preto airport.  • The volatility of the data shows the dependence of the effective use of air modal structure on the variability of passenger demand (tourism and corporate business) and cargo (potential of local production that requires the use of modal for the flow of products), directing efforts to the economic and financial viability of the national air transportation with emphasis on the best practices inherent in the concept of a hub.

CONCLUDING REMARKS
From the obtained results of this study, it was possible to observe that some periods of the year implies in great volatilities of the air transport of passengers and cargo for the four studied air- ports in São Paulo State in the period of study. It was also possible, the identification of important factors affecting the amount of passengers and cargo for these airports.
Other important result: the fitted models used in this study can be of great importance in the prediction of number of passengers and amount of cargo, related to the economic factors considered. These results are of great interest to airport managers as planning to build new large hubs for passengers or cargo as alternative for the great airports located in São Paulo or Campinas cities.
Other structures of SV models could also be considered to analyze the data set considering autoregressive model AR(L) structures larger than 2 to get better fit of the model for the data. In addition, it is important to point out that the use of a Bayesian approach with MCMC (Markov Chain Monte Carlo) methods is facilitated using available free software like OpenBugs, which gives a great simplification in the computational work to get the posterior summaries of interest.
The use of the new model proposed in this manuscript was very important in the discovering of the most important factors affecting the means of the time series in each one of the four considered airports located in São Paulo State and also in the modeling of the volatilities associated to the flow of passengers and the volume of air cargo during the time period assumed in this study. As observed in the obtained results, some airports are more dependent on unemployment rates than others, which are located in regions with more economic stability. For a future work, it would be possible to consider other volatilities structures and to compare the obtained results with results originating from other statistical approaches, as for example, using generalized linear models for the counting data in the original scale and not in the logarithm scale as considered in this study.
Also it is important to point out, that in a future work the authors could apply the proposed statistical model for new data sets from the Brazilian office ANAC (National Civil Aviation Agency) linked to the major passenger and cargo Brazilian airports in order to identify the dependency level of demand of these airports related to the volatility of each airport, as a criterion for planning the logistics project of air transportation in Brazil.