Linking activity participation , socioeconomic characteristics , land use and travel patterns : a comparison of industry and commerce sector workers

The objective of this work is to analyze the travel behavior of industry and commerce sector workers in terms of three variables groups: activity participation, socioeconomic characteristics and land use. This work is based on the Origin-Destination survey carried out in the São Paulo Metropolitan Area (SPMA) in 1997. Relationships were found between the concerned variables (Decision Tree), and the statistical significance of independent variables was assessed (Multiple Linear Regression). We analyzed the influence of the three variables groups on travel pattern choices: (A) socioeconomic variables (Household Income, Transit Pass Ownership and Car-ownership) affect the travel mode sequence; (B) activity participation (Study, Work) has an effect on the trip purpose sequence; and (C) land use variables (accumulated proportion of jobs by distance buffers starting from the home traffic zone centroid) influence the sequence of destinations chosen, especially in the case of industry sector workers. The different spatial distributions of economic activities (commercial and industrial) in the urban environment influence the travel of workers. This paper contributes essentially proposing the land use variable, through the intervening opportunities model as well as the presentation of a methodology, formed by application of exploratory and confirmatory techniques of multivariate data analysis. B T P S Brazilian Transportation Planning Society www.transport-literature.org JTL|RELIT


Introduction
The objective of this work is to find relationships between travel patterns (dependent variable) and the three independent variable groups mentioned before: (A) activity participation; (B) socioeconomic characteristics; and (C) environmental factors (distribution and degree of activities, defined as land use variables) considering groups of individuals characterized as either industry or commerce sector workers.
People generate complex urban travel patterns while engaging in out-of-home activities, which vary based on individual characteristics, household attributes and environmental characteristics.One of the objects of the Activity Based Travel Approach is to investigate the variables that affect travel patterns.
Several authors (Bhat and Singh, 2000;Bowman and Ben-Akiva, 2000;Lu and Pas, 1999;Strathman and Dueker, 1990) suggested that the journeys of individuals can be influenced by three main factors: (A) activity participation, (B) socioeconomic characteristics and (C) environmental factors (urban densities, distances between localities, spatial coverage of the transportation network, and so on).
The idea that socioeconomic characteristics and activity participation can influence travel behavior has been studied previously (Arentze et al., 2000;Balasubramaniam and Goulias, 1999;Golob and Mcnally, 1997;Bhat and Koppelman, 1991;Kwan, 2000).The conclusions in the literature suggest that personal travel can be determined by gender, car ownership (Strambi et al., 2004;Mcguckin and Murakami, 1999), the role of the person in the family, household task allocation (Simma and Axhausen, 2001;Srinivasan and Athuru, 2005) and activity participation (classified as subsistence, leisure, medical, shopping, and so on).
However, socioeconomic characteristics and individual activities are only part of the set of variables that can allow for the prediction of travel behavior.Not only car ownership, but also the distribution of activities in the urban environment and accessibility to opportunities affect modal choice and destination choice.The locations where people live or work exert strong influence over urban trips.
Environmental factors, such as road infrastructure networking, urban configuration, localization of activity centers in the cities, density and land use influence travel behavior.In the present work, environmental factors (land use variables) are defined as the spatial distribution and degree of activity for each economic sector (industry and commerce).
The assumption that land use policy can influence travel behavior follows from the correlation between the variables considered.Diverse works confirm that land use (characterized by urban densities, cities being compact or spread-out, the distribution of economic activities in urban environments and the presence of Traffic Zones with mixing activities) is strongly related to travel behavior, especially modal choice (Cervero andRadisch, 1996: Ewing, 1995;Kitamura et al., 1997).The proposal of the land use variables is one contribution described in this paper.Besides, we can note in this work, the influence of this variables group on travel behavior, especially considering workers in industry sectors.
Six steps were performed and are described in the following sections.Section 1 describes the main characteristics of the study area and the data preprocessing; Section 2 presents the representation of independent variables: categorical and numeric; Section 3 designates the representation of the categorical dependent variable; Section 4 -Part 1 (Exploratory Analysis -Classification and Regression Tree (CART) application)was used to find a priori unknown relations; Section 5 -Part 2 (Confirmatory Analysis -Multiple Linear Regression (MLR) application)was applied to verify the statistical significance of the variables and to corroborate the previously results obtained through CART application; the conclusions finally are described in the last section.To analyze the behaviors of both industry and commerce sector workers and consider the hypothesis that the spatial distribution of economic activities in the urban environment affects the travel of individuals, two different analyses were performed: (1) Analysis of the spatial distribution of industrial and commercial jobs in the study area and (2) analysis of the main changes in the job market of the SPMA, considering the two economic sectors.
Both analyses are presented in the next two sub-sections.After that, the data treatment and the resulting sub-samples are described.

The spatial distribution of industrial and commercial jobs (SPMA)
There are concentrations of industry sector jobs (Figure 1) in some specific regions: (1) The TZs of Pimentas, Rodovia Presidente Dutra and Guarulhos; (2) the ABCD region; and (3) the counties of Barueri and Osasco.The SPMA basically has three main industrial centers, although there is some concentration of industrial jobs in the central TZs.Industrial jobs are concentrated in three main centers located within the study area, while commercial jobs are scattered over the entire region.Thus, the economic sectors were analyzed separately to obtain more appropriate results.
The purpose was to reveal the possible influence of land use comparing two specific cases (industry and commerce sector workers).It was anticipated that land use variables would exert a greater influence in the case of industry sector workers due the concentration of industrial jobs in the determined traffic zones.
This analysis can indicate future changes in the travel of individuals, as there is an increase of jobs in the tertiary sector and a decrease in industrial jobs (sub-section 1.2).There were considerable changes in the economic activities of the region, mainly in the 1980s.These include: (1) the reduction of industrial jobs; (2) the expansion and consolidation of service and commerce activities; (3) the deterioration of the job market; and (4) the increase in unemployment taxes and the number of autonomous workers.
As indicated by the IBGE information, the SPMA had an unemployment tax of 3.9% in workers in the job market increased from 19% to 29%, and the participation of the employees fell from 58% to 48% between 1991 and 2002.
The advance of technology is pointed to as a main reason for the elimination of industrial jobs.New technologies such as informatics, communications and robotics caused the disappearance of some occupational categories.Moreover, in the 1980s and 1990s, a widespread process of industrial decentralization occurred in the state of São Paulo.A great number of industrial establishments moved from the capital (and its periphery) to other cities, promoting the increase of urban medium-sized centers and generating new transformations in the SPMA.

Data preprocessing
In this stage, incomplete data or those that were not relevant to the objectives were eliminated.
The initial sample was composed of the records for 98,780 individuals, aggregate data of the home traffic zones of each individual and the Euclidean distances of all the trips.The following individuals were selected for analysis: -individuals who made 2, 3 or 4 trips on the day prior to survey; -individuals who made home-based tour; and -individuals that work in the industry or commerce sectors.
Finally, considering the differences in the geographic distribution of jobs in the SPMA and the research hypotheses that the different spatial distributions of economic activities (commercial and industrial) in the urban environment influence the travel of commerce or industry sector workers, two sub-samples were identified: (a) industry sector workers and (b) commerce sector workers.

Independent variables representation
The independent variables (categorical or numerical) used in this work belong to the three main variables groups: (A) socioeconomic characteristics, (B) activity participation and (C) land use characteristics.
The independent variables selected here are related to travel patterns.The variables of the first and the second groups were chosen based on the literature and data availability.The third variable group is proposed based on the principles of the intervening opportunities model (Schneider, 1959).It will be discussed in detail in the following subsection.

Land use variables
For elaboration of the land use variables, the intervening model principle was adopted.This principle assumes that, in the urban area, all trips are as short as possible.Individuals accomplish long trips only to reach the next acceptable destination, where their purpose is satisfied.In this work, land use variables are represented in terms of degree of "accumulated opportunities" for the distance buffer.
In this work, the term "opportunities" refers to the job supply in the economic sector.
Considering the two sub-samples, "opportunities" represents "the proportion (%) of the total jobs in the industry sector" (for the case of industrial workers) and "the proportion (%) of the total jobs in the commerce sector" (for the case of commercial workers).These measurements were collected for various distance buffers starting from the home TZ centroid up to 5, 10 km, 15 and 20 km, which were used to generate "accumulated opportunities" measurements.

Coding the categories of the dependent variable
In this study, the trip-linking representation was based on activity sequence, travel mode and destinations chosen for the individuals in the period of one day.The dependent variable coding follows the scheme below.The travel attributes (activities, travel mode and destinations) were grouped and represented by letters and numbers, as shown in Table 1.The 128 categories of dependent variables are represented by alphanumeric codes that correspond to the sequence of the adopted travel attributes.As mentioned before, it was assumed that the individual's residence was the initial origin and final destination for those who took two, three or four trips.Step C Destination (distance to the home Traffic Zone (TZ) centroid) Numbers until 5 kilometers from home TZ centroid 5-10 kilometers from home TZ centroid Step A Activity Sequence Letters Step B Travel Mode Sequence Letters In the first step, the travel patterns were represented by a sequence of letters (H, W, S and A) indicating the sequence of activities performed by the individuals during the day (HSH: home schoolhome).In the subsequent stage, the letters (P, T and N) indicate the travel mode sequence used (NN: Non-motorized -Non-motorized).
In the third step, the sequence of destinations was characterized by numbers (1, 2, 3 and 4).
Each one of the numbers represents distance buffers starting from the home zone centroid (Table 1).The first and last numbers always represent the initial and final destination, the individual's home (represented by number 1).For example, pattern 141 indicates the accomplishment of two trips: (first trip) starts at home (number 1) and continues to a destination located more than 15 km away (number 4); (second trip) denotes the return from destination (4) to the home (number 1).
According to the coding defined in the previous stages, the final travel patterns are represented by three sets of characters: The first set refers to the activity sequence (e.g., HWAH), the second set corresponds to the travel mode sequence (TTT, for example) and the final set describes the destination sequence (e.g., 1121).The resulting travel pattern HWAH TTT 1121, for example, is illustrated in Figure 4.

Exploratory analysis
The CART (Classification and Regression Tree) algorithm was used in the current research.
This technique allows successive division of a population into binary splits.The resultant subgroups are homogeneous with respect to the dependent variable, yielding a hierarchical tree of decision rules useful for prediction or classification (Breiman et al., 1984).
CART is a segmentation modeling technique that satisfies the following properties: (a) the hierarchy is called tree and each segment is a node; (b) the root node contains the complete database; (c) the root node is divided sequentially, generating child nodes; (e) for construction of the CART three main elements should be determined: a set of questions delimiting data division, a criterion for evaluation of the best division and a rule for termination of further subdivisions (stop-splitting rule); and (f) each division depends on the value of a unique independent variable.Thus, amongst the set of independent variables, the one that gives the best split is chosen (greatest segregation of the data).The procedure continues until no significant splits remain.In this form, multicollinearity does not cause any problems for the algorithm, therefore, the independent variables are analyzed separately (Piramuthu, 2008).
Using this technique, two trees were generated from the final sub-samples of industrial and commercial workers, 128 categories of dependent variable (travel patterns) and numerical and categorical independent variables (three variable sets mentioned in the work).The trees, illustrated in Figures 5-6, were generated using at least 50 observations for each leaf node, using 0.15 for global node deviance (stop-splitting rule) and resulting in 10 leaves for industry sector workers and 8 leaves for commerce sector workers.
The figures display the CART with the two most frequent travel patterns at each terminal node (leaves) obtained using SPLUS 6.1.The figures show stages of the construction of the trees, representing the influence of each variable in travel pattern choices in each sub-sample and differences of travel behavior between industrial and commercial workers.The subsection below describes the influence of each independent variable selected on the travel behavior of industry and commerce sector workers.
In analysis of the commerce sector workers sample, "car ownership" was selected for the second split: (1) AUT=0 (terminal nodes 21; 20 and 8); and (2) AUT > 1 (terminal nodes 19; 18 and 10).Specific cases were observed, in both samples, of individuals who prefer to use transit for long trips (141) even if there is a car at their household.In these cases, taking into account the distance to be covered (exceeding 15 km), travel costs will probably be lower than with car usage (fixed tariff of transit -specifically for the SPMA).These cases are shown in terminal nodes 12 (industry) and 10 (commerce).It is noted that only one of the land use variables in the sample of commercial workers was selected (COM 5 km -Figure 6 (b)).In this case, the land use variable does not intervene predominantly in destination sequence choice.This result is due to the geographic distribution of commercial jobs.Commercial jobs are sufficiently dispersed, and there are no concentrations demarcated as in the case of the industry sector jobs.
Moreover, commercial workers more frequently perform short trips (111), which can be explained on the basis of the geographic dispersion of commercial activities in the SPMA.Individuals working in the commerce sector probably live near their work locations.
For commercial workers, the variable "Work" (Figure 6 (c)) influences the choice of the trip purpose sequence.This variable is important due to sample characteristics; there is greater number of autonomous workers in the commerce sector.There is a prevalence of the HAH sequence, for example, in the autonomous group.
In the exploratory stage of this work, results obtained the CART algorithms were analyzed.
The influences of some independent variables on trip sequences (activities, travel modes or destinations) were verified.The important influence of the land use variable set in the industrial worker sample was noted.This fact is accounted for by the geographic concentration of jobs in the industry sector in the SPMA.
Considering public policy implications, we could note that is probably possible to change work related trips choices, modifying the spatial arrangement of activities in the industrial and tertiary sectors.The models with transit usage (HWH TT 141,HWH TT 131,HWH TT 121,and HWH TT 111) verify that the negative contribution of the variable "AUT" decreases (in absolute value) as travel distances increase ("111" -larger contribution of the variable AUT; "141"smaller contribution of the variable "AUT").This result corroborates the previous observation that individuals with cars in the household often choose transit for the accomplishment of longer trips ("141").In the sample of commerce sector workers, smaller influences of the land use variables were observed.The biggest value of the parameter associated with the land use variable was 0.044 for the variable "COM 5 km" in model HWH TT 141.This confirms that the influence of the land use for the case of commerce sector workers is not strong.

Conclusion
This research confirmed two main hypotheses: (1) it is possible to find relationships between urban travel patterns (dependent variable) and socioeconomic characteristics, land use and out-of-home activity participation; and (2) the different spatial distributions of economic activities (commercial and industrial) in the urban environment influence the travel of commerce or industry sector workers.
In the first and second parts of this article (exploratory and confirmatory analysis, respectively), relations were found between variables using CART algorithms and confirmed through the measurement of the statistical significances of the independent variables using Multiple Linear Regression (MLR).This study has some limitations related to the methodology application, such as: (1) constraint of travel patterns number by the statistical package used (S-PLUS 6.1), ( 2 The SPMA has experienced an increase in jobs in the tertiary sector in recent years, to the detriment of jobs in the industry sector.These changes in the job market can modify travel patterns.In this work, travel behavior differences between industry and the commerce sector workers, governed by the geographic distribution of such activities, are investigated.

Figure 3
Figure3 (a, b and c) shows the land use variables.The zone of origin is the shaded area "A".At stage (a), the values of the "opportunities" are represented in (%) for each zone together with straight-line distances from the centroid of "A".In (b), the "accumulated opportunities" for each of the four distance buffers are shown.Finally, at stage (c), the land use variables from TZ "A" are illustrated.
Figure 3 (a,b,c) -Land use variables (a) Activity sequence (b) Travel mode sequence (c) Sequence of trip destination (d) Final Travel Pattern -combination of the previous stages

Figure 4 -
Figure 4 -Travel patterns (activity, travel mode and destination) when no further data subdivision is possible, the final subgroups are considered terminal nodes or leaves;

Figure 5
Figure 5 (a) shows the "car ownership" influence.In both samples, individuals that had cars at the household used an automobile in travel sequences most frequently.According to the terminal nodes and the two predominant travel patterns, people that do not have cars at the household predominantly use transit or non-motorized travel modes (TT, NN, NNNN, NNN).There is a predominance of car usage in individuals who have at least one car at the household (PP, PPP).

Figure 5 -
Figure 5 -CART construction (a) Car Ownership influence

Figure 5 -
Figure 5 -CART construction (c) Transit Pass Ownership influence

Figure 6 -
Figure 6 -CART construction (b) Land Use influence
for automobile usage (HWH PP 141, HWH PP 131, HWH PP 121 and HWH PP 111), the contribution of the variable "AUT" is greater for shorter travel distances, such as HWH PP 111.Frequently, people choose driving short distances instead walking when they have cars in the household.The results obtained through CART application suggest that individuals that are studying (and are workers in the industry or commerce sector) perform activity sequences related to work and school -(variable -"Study").In such a way, individuals who are studying and working (variable "Study" equal to unity) accomplish travel patterns with trip purposes related to Work and School (HWHSH).A positive contribution in the two samples for this variable in the model HWHSH NNNN 11111 (industry (0.106); commerce (0.155)) was observed.People who do not use TP (value one for the variable "Transit Pass Ownership") predominantly do not use transit in the travel mode sequence.Thus, the coefficients for this variable are highly negative in the models with transit usage: HWH TT 141 (industry (-0.109); commerce (-0.132));HWH TT 131 (industry (-0.089); commerce (-0.074));HWH TT 121 (industry (-0.099); commerce (-0.107));HWH TT 111 (industry (-0.090); commerce (-0.077)).The importance of this variable in choice of travel mode, especially related to transit usage, is evident.The estimated coefficient values for this variable confirm the previous results obtained using the CART application.Positive estimated values of the coefficients are found in the case of the models associated with automobile or walking choices (PP and NN): HWH PP 141 (industry (0.024); commerce (0.025)); HWH PP 131 (industry (0.019); commerce (0.020)); HWH PP 121 (industry (0.027); commerce (0.039)); HWH PP 111 (industry (0.047); commerce (0.065)); HWH NN 111 (industry (0.175); commerce (0.149)).As noted previously, individuals with high Household Incomes frequently use cars in their trip sequences.Thus, Household Incomes greater than or equal to R$ 2970 (industry sector workers) or R$ 2260 (commerce sector workers) -(variable "Household Income"-HI equal to unity) contribute negatively to the fulfillment of the travel patterns involving transit and non-motorized usage (TT or NN): HWH TT 141 (industry (-0.025); commerce (-0.019));HWH TT 131 (industry (-0.030); commerce (-0.021));HWH TT 121 (industry (-0.055); commerce (-0.026));HWH TT 111 (industry (-0.058); commerce (-0.033));HWH NN 111 (industry (-0.770); commerce (-0.026)).Otherwise, positive values for the coefficients associated with the models (travel patterns) with car usage (PP) were observed: HWH PP 141 (industry (0.099); commerce (0.045)); HWH PP 131 (industry (0.039); commerce (0.034)); HWH PP 121 (industry (0.031); commerce (0.030)); HWH PP 111 (industry (0.029); commerce (0.029)).Moreover, it is also evident that the contribution of the variable HI increases with travel distance.This value is larger for long travel distances such as "141".This result confirms the conclusion that people with high incomes not only use predominantly automobiles but also carry out longer trips.The high value for the "Land Use" variable in the model HWH TT 141 (industry sector workers) indicates the importance of this variable in destination choices.Individuals that live in TZs with high accumulated industry opportunities (IND 5 km > 3.22%) are more disposed to carry out short trips.Thus, if the variable "IND 5km" assumes value one (5 IND km > 3.22%), this strongly and negatively influences the accomplishment of the travel pattern HWH TT 141 (travel distances above of 15 km -Destination/travel distance sequences 141).In the model pattern HWH TT 131 (industry sector workers), if the variable IND 10km has a value of 1 (10 IND km > 15.6%) there is a negative influence (-0.046) on the fulfillment of the travel pattern HWH TT 131.Individuals that live in TZs with high accumulated industry opportunities until 10 km from the centroid predominantly carry out travel distances shorter than 10 km and not between 10 and 15 km ("131").
) Lack of data related to the spatial distribution for other activities beyond industrial and commercial, (3) use of centroid distances of Traffic Zones.The results obtained allowed for detailed analysis of the influence of the three variable groups on travel patterns: (1) socioeconomic variables (Household Income, Transit Pass Ownership , Car-ownership) appear to affect the travel mode sequence used for trips; (2) activity participation (Study, Work) is important for the trip purpose sequence; and (3) land use variables (accumulated proportion of jobs by distance buffers starting from the residence zone centroid) has a significant influence on the sequence of chosen destinations.It is expected that the results of this research can be used to represent the level of activities and their geographic distribution in the urban configuration (land use variables) and to determine the influence of such variables on commerce and industry sector worker travel.