OPTIMIZATION APPROACHES TO MPI AND AREA MERGING-BASED PARALLEL BUFFER ALGORITHM

On buffer zone construction, the rasterization-base d dilation method inevitably introduces errors, and the double-sided parallel li ne method involves a series of complex operations. In this paper, we proposed a pa r llel buffer algorithm based on area merging and MPI (Message Passing Interface) to improve the performances of buffer analyses on processing large datasets. Exper imental results reveal that there are three major performance bottlenecks which signi ficantly impact the serial and parallel buffer construction efficiencies, includin g the area merging strategy, the task load balance method and the MPI inter-process r ults merging strategy. Corresponding optimization approaches involving tre e-like area merging strategy, Corresponding author Optimization approaches to MPI and area merging-bas ed... Bol. Ciênc. Geod., sec. Artigos, Curitiba, v. 20, n o 2, p.237-256, abr-jun, 2014. 2 3 8 the vertex number oriented parallel task partition method and the inter-process results merging strategy were suggested to overcome thes bottlenecks. Experiments were carried out to examine the performance efficie ncy of the optimized parallel algorithm. The estimation results suggested that th e optimization approaches could provide high performance and processing ability for buffer construction in a cluster parallel environment. Our method could provide insi ghts into the parallelization of spatial analysis algorithm.


INTRODUCTION
Rapidly expanded spatial datasets has brought unprecedented pressure to existing computational resources and the traditional analysis algorithms in geoscience.With the advancement in computer hardware technology and the development of parallel programming models, high performance computation has become an important issue for analyzing, processing, and visualizing massive geospatial data (TURTON andOPENSHAW, 1998 and2000;CLARKE, 2003;HAWICK et al., 2003).High-performance computation can be implemented via many ways, such as multi-core parallelization, cluster parallelization, graphic processing unit (GPU) acceleration, and hybrid parallelization (LIN and SNYDER, 2009;McKENNEY et al., 2011), which all depend on the paradigm of parallelism (BARNEY, 2012).In a cluster computing environment, most of proposed parallelized spatial analytical algorithms are based on the framework of the message passing interface (MPI).The two major modes for the realization of parallel computation are data decomposition and task partition (GRAMA et al., 2003), which correspond to spatial data division and pipeline parallel processing, respectively.Data division strategies for parallel spatial analysis algorithms with topological relations have been extensively discussed (SLOAN et al., 1999;MINETER and DOWERS, 1999;DARLING et al., 2000).For instance, a parallel task partition approach based on the partition of geometries has been designed and implemented (MINETER and DOWERS, 2000), and an idea of software stratifying at a low level has been proposed to encapsulate the complexity and reuse the codes of parallel algorithms (MINETER and DOWERS, 2000).Furthermore, Mineter (2003) presented a parallel vector spatial analysis platform called the TSO (Topology-Stitching-Output) software framework.This approach were based on the NTF data model, containing topological information allowing complex topology to be created and checked in parallel task partition and result sewing.However, the processing and maintenance of topological information are time consuming for large dataset.
In GIS (Geographical Information System), a buffer is defined as a zone around a map feature measured in units of distance or time (ESRI, 2013).As an important function in map information retrieval, comprehensive spatial analysis, and processing in GIS, buffer analysis solves the problem of proximity and represents an influence extent or service extent (WU, 1997).Buffer analysis algorithm is widely used in many geo-spatial fields, such as spatial data query, hybrid overlay analysis of vector and raster data, thematic mapping, and so on.
Most previous studies of buffer algorithms focused on the double-sided parallel line generation algorithm with a self-intersected polygon processing model and the dilation method by means of rasterization and vector boundary extraction.Zalik et al. (2003) elaborated an algorithm for asymmetric segment buffer generation based on the idea of sweep line through the following steps: creating basic geometric outlines, identifying intersection points between them, construct rings, and determining spatial relationships amongst the rings.Based on Zalik's algorithm, an algorithm for buffer creation and result area merging has been implemented using the sweep line approach and vector algebra (BHATIA et al., 2013).Despite their precision, the aforementioned algorithms require complex computation and spatial relation identification, and their realization is complicated.In response, Li and Du (2005) proposed a buffer creation algorithm based on a dilation algorithm.Essentially, the idea of rasterization is used to simplify vector buffer creation, and the target buffer is created by extracting the boundaries of rasterized geometries, which are expanded according to certain window sizes and rules.However, these algorithms based on the extraction of rasterized boundaries result in many errors.In research and engineering applications, buffer analysis algorithms also face efficiency limitations caused by large dataset.
Area merging can be achieved through polygon clipping algorithms, which have been intensively studied, and many algorithms have been proposed (SUTHERLAND and HODGMAN, 1974;WEILER and ATHERTON, 1977;LIANG and BARSKY, 1983).The currently recognized efficient algorithms that can process arbitrary polygon clipping within a limited amount of time include Vatti's algorithm (VATTI, 1992) and Greiner-Hormann's algorithm (GREINER and HORMANN, 1998), which with similar performances.The Vatti's algorithm supports clipping between polygons with any number of edges and in any shape (e.g., self-intersection with islands and/or holes).Murta (1998) then modified Vatti's algorithm to overcome the problem that horizontal edges could not be processed properly.Based on Vatti's algorithm, we implemented area merging and avoided its performance bottleneck by using a divide-and-conquer method.Moreover, area merging was introduced into the buffer creation to replace the complex ring construction and spatial relationship processing.Thus, the buffer creation algorithm was simplified, and a parallel buffer analytical algorithm was implemented.
At present, there is little research on optimization approaches to parallel buffer algorithms under high-performance computation, and the buffer analytical tools provided by GIS software do not have satisfactory efficiency.Therefore, it is valuable to further explore and discuss parallel buffer algorithms and their optimization approaches under the background of big data.Firstly, a serial buffer construction algorithm based on area merging was proposed, and an optimization approach based on area merging and the divide-and-conquer method was proposed.The efficiencies of the optimized algorithm and ArcGIS TM Buffer tool were compared with or without dissolving of the buffer result polygons.Secondly, a parallel buffer analysis algorithm was developed on the basis of data parallelism, and its accelerating abilities under the above two conditions were analyzed.Thirdly, the operation of the parallel buffer analysis algorithm was analyzed to identify the possible performance bottlenecks, and corresponding optimization solutions were then proposed.
In this paper, the experiments were performed under the same hardware conditions.The results showed that the buffer creation algorithm based on area merging and optimized using the divide-and-conquer method was feasible and had some advantages over the general buffer algorithm.The optimized algorithm effectively improved efficiencies in buffer creation and result dissolving, and an ideal speedup ratio was obtained.Therefore, the optimizing approaches are feasible pathway to improve area merging-based serial and parallel buffer algorithms.

AREA-MERGING BASED BUFFER ALGORITHM
The area-merging buffer construction algorithm will be introduced in this section, and then a divide-and-conquer method-based buffer zone merge strategy will be described to improve the buffer algorithm.Based on the above work, comparison experiments between our serial buffer algorithm and the ArcGIS TM buffer tool are conducted.The vertex accumulation effect and the optimization solution to it in the process of polygon merging will be discussed in this section too.

Construction of Buffering Zone
Buffer creation algorithms based on vector geometry are generally implemented through the following steps: creation of parallel lines, construction of rings, and processing of spatial relationships between intersected rings.The latter two steps involve numerous complex numerical computations (e.g., intersection calculation, vector computation, identification of included angles, and processing of self-intersection).These two steps may hardly be realized owing to many types of special cases should be handled.Therefore, we proposed a new method to replace these two steps for simplification of buffer creation by introducing the mature polygon clipping algorithm into vector buffer construction.Vatti's algorithm supports the polygon clipping operation and boolean operations (e.g., union and difference) in polygon overlay, so it is recognized as being able to process arbitrary polygon overlay within a limited time.In this study, area merging was realized on the basis of Vatti's algorithm.With the area merging approach, an unilateral buffer can be easily realized, and an asymmetric buffer can be realized using bilateral distinction and endpoint arc center translation.In this paper, only the most typical bilateral and symmetric buffer was discussed.The buffer creation algorithm based on area merging for the three basic types of geometries (point, polyline and polygon) is described as below: (1) -For point object, when the radius (r) of a buffer is known, the buffer creation and construction methods for a point are the simplest because users can only draw an end-to-end ring with P 0 as the center and r as the radius.A point (P(x,y)) on the ring and the center (P 0 (x P ,y P )) satisfy the following equation: A point on the ring can be computed from Eq. (1), and the points are connected oneby-one to form a closed ring, namely a buffer zone with P 0 as the center and r as the radius (Figure 1-(a)).For a multi-point geometry, the results may be overlapped if the buffer of each point is created as per the rule for point geometry (Figure 1-(b)).
Area merging can then be called to dissolve the overlapped areas, and the final result is shown in Figure 1-(c).(2) -For polyline object, polyline geometry can be regarded as a group of end-toend segments, and each segment is composed of a starting point and an ending point.(3) -For polygon object, a polygon denotes a plane-shaped area enclosed by a group of closed polylines.As shown in Figure 3-(a), an enclosed polyline is also called a ring, which can be divided into an interior ring and an exterior ring according to the strike of the points constituting the ring.A simple polygon only contains one exterior ring and several interior rings, and a polygon that contains several exterior rings is called a multi-polygon.Buffer creation based on area merging for a simple polygon includes the following steps: decomposition, ring construction, dissolving, and deletion.First, the rings of a polygon are decomposed into a group of polylines, and independent buffer polygons are then constructed for each polyline as per the method in Figure 2. The polyline buffers are then dissolved, and the rings are selected and deleted.For instance, in creation of a bilateral buffer, the rules are as follows: the exterior ring created from the input polygon's exterior ring is reserved; the interior ring created from the polygon's interior ring is reserved; and other rings are deleted.Figure 3-(b) shows that the input polygon was composed of an exterior ring (R 0 ) and an interior ring (R 1 ), and all of the rings were split up at the starting point/end point.A buffer was created for each ring by using the buffer creation algorithm for polyline resulting in 4 rings (R 0 ', R 0 '', R 1 ' and R 1 '') (Figure 3-(c)).
Based on the conservation rule for result buffer polygon rings, the R 0 ' exterior ring created from the R 0 exterior ring as well as the R 1 '' interior ring created from the R 1 interior ring were conserved, while R 0 '' and R 1 ' were deleted.Finally, a result polygon was created as indicted by the shadow-filled region enclosed by the real line (Figure 3-(d)).The buffer of a multi-polygon can be created by dissolving the buffers of simple polygons.In creation of the interior ring's inside buffer, the buffer's radius exceeded the buffer range that the interior ring could hold if the buffer polygon's interior ring disappeared after dissolving, so all rings created from this interior ring should be discarded.The buffer created from an interior ring will never surpass the buffer created from the exterior ring that contains it.
Figure 3 -Buffer zone construction of polygon geometry.

Divide-and-Conquer Method for Area Merging
In this study, the widely validated Vatti's algorithm was used for area merging, and the time costs relation in area merging were statistically analyzed with different data volumes.Table 1 shows that the time cost of the UNION operation of Vatti's algorithm increased with the increasing number of polygons, but the performance was unsatisfactory.Thus, we statistically analyzed how the time cost in a single UNION operation changed with the increasing number of vertices, and regression analysis was also used.With the increasing number of vertices, Figure 4 shows that the Vatti's algorithm showed a rapid growth similar to the power function.In dissolving a polygon set, we used a one-by-one 'snowball' strategy.With the progression of dissolving, the number of vertices contained in the polygons in each operation would inevitably be increased in most cases.In Figure 5, U AB is the dissolved result from polygons A and B, and the number of vertices in U AB was obviously larger than that of B or A. When Vatti's algorithm is used, U AB will consume more time than A and B, which is called the vertex accumulation effect in area merging and is the major cause of the low efficiency shown in Table 1.A tree-like merging strategy for polygon sets was designed on the basis of the divide-and-conquer method (Figure 6), which well avoided the vertex accumulation effect and effectively shortened time costs in area merging.In the divide-andconquer method, the original problem was divided into smaller scale sub-problems (n) in a similar structure as the original problem.The sub-problems were then recursively solved, and their results were conquered to obtain the solution to the original problem (Thomas et al., 2011).This method is the foundation for many efficient algorithms, such as sorting algorithms (fast sorting and merge sorting), and the Fast Fourier Transform (FFT).The divide-and-conquer method has long been applied to solving geo-spatial problems, such as the divide-and-conquer algorithm for finding the closest point pairs proposed by Bentley and Shamos (1976) and the improved algorithm in calculation of the Delaunay triangular network based on the plane point set in the divide-and-conquer method (Dwyer, 1987).Under recursive mode, each recurring operation contains three steps as follows: divide, solve, and dissolve.The final result of area merging is not related to the internal dissolving order of the polygon set.Though the divide-and-conquer method is used mainly in recursive problems, it can also be used to dissolve a polygon set with a fixed number of polygons because the process and target of dissolving are explicit.In tree-like merging, the polygons are first paired and dissolved, and the results of two adjacent pairs are then dissolved until only one polygon remains.Compared to 'snowball' merging, tree-like merging does not increase (or reduce) the number of calls of the UNION operator, but it accelerates computation by successfully avoiding the vertex cumulative effect.The data in Table 1 were also used for treelike merging, and the statistics of time costs are listed in Table 2. Tree-like merging showed obvious efficiency improvement because it effectively reduced the average number of vertices contained in the polygons in each function call of the UNION operator and, thus, successfully avoided the vertex accumulation effect hidden in the process of polygon set dissolving.The tree-like merging method was not only used in the creation of the single feature buffer polygon but also in the dissolving of intersected multi-feature buffer polygons.

Performance Analysis of Serial Algorithm
For the three major types of geometries of point, polyline, and polygon, the proposed algorithm was most representative in processing polyline geometries.To analyze the performances of the serial buffer algorithm with different data volumes, relevant experiments were conducted using real road network polyline datasets, and comparisons to the serial Buffer tool of ArcGIS TM 9.3/10.1 SP1 software on the same hardware platform were made.
Table 3 shows that when the intersected multi-feature buffer result polygons were not dissolved, the proposed algorithm showed a lower efficiency (0.7-1 lower) than the ArcGIS TM buffer tool.However, if the buffer result polygons were dissolved, ArcGIS TM 10.1 SP1 failed even after more than 10 h.Although some results were achieved by ArcGIS TM 9.3, the time cost was considerably greater than the proposed algorithm, and the proposed algorithm was more efficient with the increase of data volume.Therefore, the algorithm proposed in this study was feasible and could effectively overcome the severe performance bottleneck faced by GIS software during buffer analysis and result dissolving.b ArcGIS TM 10.1 with SP1 and background geo-processing switched off.c Task did not get results within 10 hours and was canceled.Some abnormalities were observed during the experiments.In ArcGIS TM 10.1, when the Parallel Processing Factors under ArcToolBox were set as 0, 1 or below 10% and when the created buffer result polygons were not dissolved, the CPU utilization rate of the Buffer tool was still maintained at 25-27%, but this rate was only 12% in the proposed algorithm.The computer's CPU was an Intel i7-2600, which is a quad-core CPU with hyper-threading function.physical or virtual cores were engaged in computation when the CPU utilization rate was greater than 12%.Thus, ArcGIS TM might conduct hyper-threading optimization or multi-core parallelization and optimization for its Buffer tool codes, which also explained why the proposed algorithm showed approximately doubled time costs compared to the ArcGIS TM buffer tool.

PARALLEL BUFFER ALGORITHM BASED ON MPI
The logical flow of the parallel buffer algorithm based on MPI will be described and the performance of the parallel algorithm will be studied by conducting of some parallel experiments in detail in this section.Experiments results reveal that the task load balancing and MPI inter-process results merging methods are two main bottlenecks of the parallel buffer algorithm.

Logical Flow of Parallel Buffer Algorithm
In practices, buffer analysis tools also face the challenge of large data volumes.Therefore, it is necessary to use high performance computation technology to design a parallel buffer algorithm and to study its optimization algorithms to overcome the problem of massive data in buffer analysis.The logic flow of parallel buffer analysis algorithms includes the following 4 stages: task division, parallelization to create buffer polygons, buffer area merging, and output of result data.These tasks correspond to decompose, compute, dissolve, and output in Figure 7.
Figure 7 -Logic flow of parallel buffer algorithm.
Data decomposition for parallelization based on a simple feature model uses feature identifier (FID) as the foundation, and it allocates the vector features into each computation node and concurrently creates a series of buffer polygon sets.The outputs from all processes are then delivered to the main process for dissolving.The first three steps are the core of parallel buffer analysis algorithms, and the optimization of the proposed algorithm was discussed considering these three aspects.Data output may involve several detail conditions of application environments (e.g., vector data model, parallel file system, and parallel database).These detail conditions are largely different from each other and are not the core procedures in the parallel buffer analysis algorithm.These conditions consume little time compared with other steps; therefore, they were not discussed in this paper.

Performance Analysis of Parallel Buffer Algorithm
Based on the above logic flow, a parallel buffer analysis algorithm was implemented on the basis of the MPI program model and data parallelism, and it was tested using data of several groups of real linear road networks.Table 4 shows that the proposed algorithm could improve efficiency to a certain level.When buffer result polygons were not dissolved, the 4-process parallel computation could achieve efficiency as high as that of ArcGIS TM .However, MPI and data parallelism did not bring buffer analysis algorithms with an ideal speedup ratio.With increased processes, parallel computation efficiency was reduced, indicating that parallel algorithms based on plain parallelism could be optimized further, which suggested that its bottleneck should be analyzed carefully and eliminated Table 4 -Time costs of area merging-based parallel buffer algorithm a .

Bottlenecks of Parallel Buffer Algorithm
In achieving high performance using parallel computation, one inevitable problem is how to balance the loads among parallel tasks because all computation tasks can be completed within a similar time only under load balance, which is extremely important for MPI-based parallel buffer algorithms under cluster parallel environments as the cluster system's overall utilization rate can be improved only when the waiting time before the dissolve is reduced for MPI processes that finished early.We performed two parallel buffer analysis experiments for parallel task distribution using the FID-based data decomposition strategy, and we statistically analyzed the time costs for the two procedures of buffer zone generation and dissolving with the largest velocity difference.
Table 5 shows that the numbers of features were evenly distributed among MPI processes and that certain parallel acceleration was achieved, but the numbers of vertices contained in the vector features were different among processes.The area merging based on Vatti's algorithm was sensitive to the number of vertices, and the buffer algorithm based on this operation was inevitably affected, which would cause large computation time differences among processes.The slowest process had a time cost that was 2.2 times that of the fastest process and a dissolving time that was 11.6 times that of the fastest one.Unreasonable data decomposition would result in a potential performance bottleneck for MPI algorithms; therefore, the premise for MPI inter-process load balance was to homogeneously decompose the parallel tasks under data parallelism mode, which is also an important direction for the optimization of parallel algorithms.
Based on the principle to reduce the mutual waiting time among MPI processes, there is also space for optimization and acceleration in result set merging after all processes are completed, which usually requires the redesign of a strategy to merge the MPI inter-process result sets.Table 5 shows the difference of computation time costs among different processes, especially when load balance cannot be achieved.As a result, the first finished process had to wait for the other unfinished processes.If the task of inter-process result merging is assigned to a single process (e.g., the main process in Figure 7), the single process can continue the task only after all processes are finished, which obviously reduces the parallel computation efficiency and thus becomes a performance bottleneck.In response, considering that the principle of MPI inter-process result merging is similar to that of tree-like area merging, the final target result is not associated with the order of merging between processes, and its result and process are all explicit.Thus, the final target result can also be optimized using the divide-and-conquer method.Therefore, at the process level, a tree-like merger strategy can be designed for MPI interprocess result sets to reduce the merging waiting time for inter-process result sets and to optimize and accelerate the parallel buffer algorithm.

APPROACHES TO OPTIMIZING THE PARALLEL BUFFER ALGORITHM
To overcome the bottlenecks introduced in section 3, a vertex amount-based parallel task partition strategy and a tree-like inter-process results merging method are proposed and described in this section.

Parallel Task Partition
The most straightforward method to process vector spatial data based on a simple feature model is to realize a parallel task partition through dataset division by the number of features.The principle of this method is easy.Suppose that the input data have F features and that a parallel environment contains n MPI processes, the number (m) of features that are distributed to each process are as follows when based on data decomposition: This method can obtain uniform results when the dataset has uniform features, but this situation rarely occurs.Furthermore, the low level algorithm is sensitive to the volume of vertices holding the features, not to the number of features.In most cases, this method cannot obtain load balance; therefore, new data decomposition methods should be developed.
In response to this defect, we proposed a parallel task data decomposition method based on vertex number statistics because the UNION operator for parallel vector buffer results is sensitive to the number of vertices in polygons.For data decomposition, this method depends on the number of vertices contained in geometries.Suppose that a group of input data contains N vertices and a parallel environment contains n MPI processes, then each process is expected to be assigned with a group of vector features with P vertices as follows: The number of features distributed into a process is no longer constant.However, the geometries cannot be split, and the total number of vertices P i (i=1,2,3,…,n) should be values close to P. The data decomposition can be finished by reviewing the numbers of vertices for all vector feature geometries.This method is more time-consuming than the task partition method based on the FID of features, but the experiments revealed that the higher time cost for counting the amount of vertices is acceptable considering the performance improvement.The number of MPI processes was consistently 4. When the other experimental characteristics were held constant, each of the 7 groups of road network data with different data volumes was divided based on the number of features and on the number of vertices.The contradistinction experimental results are listed in Table 6.
In Table 6, T FIDs is the total time costs of parallel computation based on the number of features, and T points is the total time costs of parallel computation based on the number of vertices.Moreover, T DP is the time cost in data division based on the number of vertices, which is already contained in T points .The results indicate that the partition method based on the number of vertices achieved a 10% higher performance at the expense of a 0.43% time consumption increase.Therefore, this method can improve computation efficiency for the parallel vector buffer algorithm.

Tree-Like Merging Between MPI Processes
When several parallel MPI processes are finished, the polygon result sets derived from all processes should also be determined for intersection and be dissolved.A simple method is to distribute all results to a single process (e.g., the main process shown in Figure 7) for area merging and output.The operation flow of this method is shown in Figure 8.One evident defect of this method is that the single process responsible for results merging has to wait until all processes are finished to continue and finish the final merge process.Regarding the significant effect of tree-like merging, we proposed to design a new strategy for merging inter-process result sets, which accelerates computation by decreasing the inter-process waiting time.This process was called the MPI inter-process tree-like merging optimization strategy, and its work flow is shown in Figure 9.With the 4 MPI processes in Figure 9 as an example, the result will be preserved and processed by the 1st process when the 1st and 2nd processes are merged.When the 3rd and 4th processes are merged, the result will be preserved and processed by the 3rd process, followed by the results of the 1st and 3rd processes being merged again.In this way, the difficulty of developing the MPI program can be reduced by providing a tree-like merging pathway for predesigned MPI parallel processes.The parallel buffer algorithm with the above merging flow was implemented to compare the parallel buffer algorithm with a single process merging strategy.Seven groups of road network data with different data volumes were used, and the other characteristics were kept constant.
Table 7 shows that the optimization of tree-like merging in MPI processes can improve efficiency by 46.6% for parallel buffer analysis algorithms on average.With regard to the 4 MPI processes, the parallel speedup ratio was increased from 1.411 to 2.708, which indicated a significant effect.Therefore, this result suggested that the tree-like merging approach in the MPI inter-process polygon set shows a significant optimizing effect for parallel buffer analysis algorithms and shows certain practical values.The logic flow of the parallel buffer analysis algorithm based on this optimizing strategy is presented in Figure 10.
Table 7 -Time costs of parallel buffer algorithm optimized by tree-like merge strategy between MPI processes.
The optimization of the MPI inter-process tree-like merging can still be improved.For instance, the merging order is not preset, but a 'first finish first merge' mode is used.An evolution coefficient can be defined for each process, and the two earliest finishing processes are merged first.After each merging process, the evolution coefficient of one process is added by 1, and the other process is ended.In each merging step, only the processes with the same evolution coefficient are merged, unless the number of processes marked by a certain evolution coefficient is only 1.After all processes are merged, the results are finally merged and output by the process with the highest evolution coefficient.However, this method would greatly increase the complexity of inter-process communication and programming.Thus, this method would significantly increase the difficulty for developing MPI parallel programs; therefore, appropriate selection and rejection are necessary in practical applications, which should be further studied.

CONCLUSIONS AND FUTURE ISSUES
In this paper, a vector buffer generation algorithm based on the traditional segment buffer zone construction algorithm and the area merging approach was proposed.The algorithm simplified the process of buffer zone construction by introducing a mature polygon clipping algorithm to dissolve the buffer results of a single feature or several features, and the processing of complex spatial relationships during feature buffer creation was avoided.Moreover, the code complexity and coupling degree were reduced.For optimization of the buffer result dissolving, a divide-and-conquer method was used to overcome the bottleneck of the vertex accumulation effect in serial buffer algorithms.The efficiency of this method was lower than that of mature commercial GIS software when the buffer results of different features were not dissolved, but numerous experiments revealed that this method could finish buffer construction for a massive dataset with arbitrary geometries in a reasonable amount of time.In creating intersected buffer zones that should be dissolved, the proposed algorithm was far more efficient in serial computation than the ArcGIS TM Buffer tool.Therefore, this buffer creation algorithm based on area merging has certain practical values.
Parallel computation is a feasible way to overcome the problem of increasingly larger spatial data volumes.Though the development of parallel algorithms is important, their optimization is also important for accelerating computation and scaling up the problems to be solved.In this paper, parallel buffer construction and a dissolving algorithm were implemented on the basis of a serial buffer algorithm and the MPI parallel programming model.We elaborated the two possible performance bottlenecks in the parallel buffer algorithm that caused low efficiency, and we proposed specific solutions, including the parallel task partition approach based on the number of vertices for parallel task load equilibrium and the tree-like merging approach to MPI inter-process result polygon sets.In the case of 4 MPI processes, the results showed that the new parallel task partition strategy improved performance by 10% at a 0.43% time cost increase.Moreover, the inter-process tree-like merging method improved efficiency by 46.6%, and the parallel speedup ratio was increased from 1.4 to 2.7, which indicated a significant effect.Therefore, we suggest that the two optimization approaches mentioned above could effectively improve performance for buffer construction and are feasible for the parallel optimization of buffer analytical algorithms.The two approaches provide certain reference values for the parallelization and optimization of other vector analysis algorithms in GIS.
In addition, the more reasonable 'first finish first merge' mode can be used in merging MPI inter-process result sets.Considering the hypothesis that buffer result polygons of adjacent vector features are more likely intersected, the relationships of adjacent vector features should be considered in parallel task division.Other rules (e.g., Hilbert spatial division curves coordinated with the number of vertices of features) can be used to obtain a better optimization approach.The above problems were not discussed in this paper and will be studied further.
shows that the buffer for segment L is created in 3 steps as follows: 1) find L's two parallel lines (L left and L right ) at two sides and with a distance of r; 2) draw two semi-arcs (C s and C e ) with the starting point (P s ) and the end point (P e ) as the center, respectively; and 3) connect L left , C s , L right , and C e successively to construct a polygon (Figure2-(c)), namely a buffer zone of segment L with r as the radius.If a polyline consisted of several segments (Figure2-(b)), area merging will be used to dissolve all segment buffer zones to form a final buffer polygon result for it (Figure2-(d)).A linear geometry composed of several independent polylines is called a multi-polyline, and its buffer zone can be created by dissolving the buffer zones of all its polylines.

Figure 4 -
Figure 4 -Fitting curve of time costs of UNION operator of Vatti's algorithm.

Figure 5 -
Figure 5 -Vertex accumulation effect existing in the polygon merge process.

Figure 6 -
Figure 6 -Process of tree-like merging of polygons.

Figure 8 -
Figure 8 -Single-process merging flow of buffer results of 4 MPI processes.

Figure 9 -
Figure 9 -Tree-like merging flow of buffer results of 4 MPI processes.

Figure 10 -
Figure 10 -Logic flow of optimized parallel buffer algorithm.

Table 1 -
Time costs of polygon merging implemented by Vatti's algorithm.

Table 2 -
Time costs of tree-like merging of polygons.

Table 3 -
Time costs of serial polygon merging based buffer algorithm.a a Experiments were carried out on Windows 7 Ultimate (x64).
In general, multiple

Table 5 -
Differences of time costs between MPI processes (data partition by FIDs).

Table 6 -
Improvements by the method of point number-based data partition.