Optimum Positioning of Base Station for Cellular Service Devices Using Discrete Knowledge Model

A good wireless network design depends on technical and financial viability and a number of other criteria that must be met. Following the emergence of new technologies and services, such as 5G transmission and the reuse of frequencies, new work is being carried out to ensure a better design for a particular area. This study examines a discrete radio propagation model which employs the K nearest neighbors classifier. The model takes into account the different characteristics of the environment. This article presents a case study for the optimum positioning of base stations in Federal University of Pará (Belém – Brazil), representing a typical Amazon environment. The mentioned scenario is heterogeneous, presenting edifications and considerable forest area. Measurement campaigns were conducted in three different frequencies for the design features of the model: 521 MHz (Brazilian digital TV system), 2100 MHz (Enhanced Data Rates for GSM Evolution), and 2600 MHz (Long Term Evolution). A study of the fading phenomenon in these frequencies was carried out to generalize the frequencies of application for the propagation loss model. When this model was ready, tests (computing simulations) were conducted in two scenarios to optimize the positioning of the radio base stations being studied.

On the premises of the Federal University of Pará (UFPA), and in several towns and cities in the Amazon, one often comes across shadow zones in various frequencies used in the service of telecommunications, especially in the cell phone services (links with, 3G, 4G, etc.) and digital television. Owing to shortcomings in telecommunication services, there has arisen the need and opportunity for the emergence of tools that are designed to improve this situation.
This article sets out an empirical model for outdoor mobile radio propagation applied to the optimum positioning of transmitter towers in an Amazon scenario. The main contributions made by this article are as follows: 1) Received signal strength prediction modelling in any area of a map through a generalized model and adapted to Amazon environments; 2) A methodology for finding the best positioning for transmitters with a view to maximizing the signal strength received by the network users.

II. RELATED WORKS
Several modelling techniques have emerged with the aim of representing the phenomenon of electromagnetic wave propagation with its various nuances. It is worth citing some of the key studiesamong these [3], [4], [5], [6] and [7] -which are related to this study and examine environments where there is a good deal of vegetation.
Ribeiro et al. [3] study the influence of vegetation in the 700 MHz band for Outdoor-to-Indoor paths. This work shows the loss of the signal (about 10 dB) and a bigger mean delay spread in environments whose path has more vegetation. The authors point out as the main causes or spreading and absorption of the signal.
Article [4] presents a three-layer deterministic model using Dyadic Green's Functions. In this, the authors emphasize differences in the received power for densely wooded urban scenarios and different climatic conditions in the UHF range.
The author of the study by [5] describes the main topics for planning the installation of new cellular technologies, in particular 5G. During the planning stage, the authors underline the importance of changing the large RBS (Radio Base Station) to a small cell format. They also highlight the need to optimize the positioning of the new RBS for better coverage, as well as propagation models designed for city maps, and the reuse of frequencies (refarming) among other factors.
In [6], the authors designed a propagation model for the Brazil digital TV band, by taking account of mixed pathways such as land-fresh water-land type in different seasons of the year, in this case, the Amazon summer and winter. The authors uses a machine learning model for the characterization of losses. The proposed model is a hybrid algorithm combining K nearest neighbors (KNN) classifier and knowledge-based agents where it uses attributes for each point of the scenario, according to the environment.
The study by [7] makes a comparison between the Okumura-Hata and COST231-Hata models, when planning the wireless network frequencies that are often used in the Long Term Evolution (LTE) In study [8], the authors modelled electromagnetic propagation on the frequency of 600 MHz by means of a hybrid ARIMA-ANN model. The received signal power was calculated as a function of the distance to the transmitter. In this work they analyzed part of the Brazilian DTV frequency range on a densely urbanized amazon city with equatorial climate (hot and wet). The results of this work suggests that the proposed modelling can be refined and, thus, applied widely in situations such as the one analyzed in the study.
In [9], the authors hold a discussion (review) about ray-tracing methods, their use in the world today and the opportunities for their use in the future by creating new algorithms. The authors recommend hybrid modelling by employing the ray-tracing method, together with empirical models and numerical methods in the planning of high frequency telecommunication services in "complicated propagation" conditions.
In [10], the authors calculate the value of the Signal to Interference Ratio (SIR) in LTE air-toground networks in low altitude flights. The results suggest that, for low altitudes, SIR is greater for macrocells than microcells. In higher altitudes, the case is the exact opposite.
In [11], a model for planning LTE networks is put forward in maritime regions by means of combinations of transmitters when they are not further than 100 km from each other.
The study carried out in [12] shows correlations between several propagation models found in the literature and applied to LTE networks in the city of Bogotá (Colombia). The results of this study show that geometric models represent the best propagation in the environment that is being analyzed.
The study carried out in [13] makes comparisons between the human and machine methods of learning based on the non-Markovian decision algorithm -that is, it depends on more than the immediately preceding state. The results show that the algorithm that follows these preconditions is able to faithfully represent the human learning curve.
In [14], the authors set out ways of improving precision and lowering the computational cost of this last category of algorithms by comparing the results of statistical methods with methods based on the theory of machine learning.

III. MATERIALS AND METHODS
Initially, measurement campaigns were carried out in the 500 MHz, 2100 MHz and 2600 MHz bands with the aim of creating a database that can be implemented for the designed model. The three sets of data were used for a parabolic fitting to obtain the estimated frequency curves (700 MHz, 1800 MHz and 2400 MHz). Following this, these data were used as entries from the model which accepts heterogeneous routes. Finally, the optimum positioning was obtained for the transmission towers through an algorithm based on the KNN classifiers [15] to resolve a complete problem of combinatorial optimization [16]. The methodology adopted in this study is represented in Fig. 1 through a flowchart. Each of the stages of the flowchart will be described in detail in the following subsections. Belém-PA at 521.14 MHz. Table I shows the main characteristics of the transmitter antenna. The spectrum analyzer provided 500 data for each measured point. These went through a data processing in which points with more than 2 standard deviations from the mean were excluded, thus removing the outliers.

B. Measurement Campaign at 2100 MHz and 2600 MHz:
Measurements at 2100 MHz and 2600 MHz were made on the premises of the UFPA for two reasons: 1) the transmission antennas of mobile phone systems radiate much less power than the towers that have lower frequencies, such as digital television; 2) the sites chosen for the installation of new towers must be within UFPA, or in a surrounding area situated less than 2 km from UFPA. Table   II shows the main characteristics of the transmitter antenna used as a reference-point for these measurements. Measurements were made at 24 points with the receiver placed in two positions: at a height of 1 m and 2 m respectively. It continued to take measurements for 2 minutes at each point and for each of the frequencies, resulting in a total number of 60 samples per point. The average values of the samples of received signal strength were used for the estimates.
The measurements were made with the aid of the Cell Signal Monitor® software, which obtains the received signal strength at configurable intervals of time. This software also identifies the network and monitors the upload-download speed.
The measurement points were not spread out in radials since the area of UFPA makes this kind of deployment impossible because it would involve the existence of points outside the premises of the university. Two identical mobile devices connected to different networks (EDGE and LTE) were used to obtain the cellular signal. Figure 2 shows a satellite image of the UFPA area and immediate vicinity, which are duly indicated.
All the information regarding the transmission tower is available in [17].

V. ELECTROMAGNETIC WAVE PROPAGATION MODEL
The planned model takes account of the frequency of transmission. A study of the pattern of frequencies was needed with a view to determining a polynomial so that the model could be used for the frequencies that had not been measured.
Propagation trend curves were designed for the frequencies that were measured. The graph in Fig. 3 shows these kinds of curves. It should be noted that there is a greater degradation of the quality of the signal in higher frequencies.
On the basis of three trend curves, a curve can be found for a fourth frequency The trend curves are of the form expressed in (1): Where is the trendline of the received power for each frequency, the distance to the referencepoint and , , = 1,2,3 are parameters to be determined by linear least square method. A fourth curve 4 = 4 + 4 ( ) can be determined using parabolic fitting. Using the three parameters , = 1,2,3, it was calculated 4 . Then, using a second parabolic fitting with the parameters , =  The curve for the 521 MHz trend was divided in two parts. The first one, in black, represents the estimated trend until 1 km from the transmitter. There were no measured data on this frequency at distances lower than 1 km from the transmitter. From this distance on, the red curve represents the trend obtained from the actual measured data in 521 MHz. When the curve turns red, it is possible to verify that the curve's decay is within the expected, since it is above the 700 MHz curve, thus confirming the quality of the measurements in this study. The degradation of the signal at a frequency of 1800 MHz also takes place in a predictable way since the curve lies within the curves of 2100 MHz and 521 MHz. The testing of 2400 MHz was conducted to find out if the curve could be adapted between the curves of 2100 MHz and 2600 MHz. Thus it should be noted that the estimated curves follow the expected pattern. Thus, it is possible to proceed to the stage of applying the equation of the trendline when calculating the loss of propagation.

A. Environment Discretization and Classification:
To design the model, the UFPA map was divided into squares with sides that measured 50 m. The discretization of the scenario was necessary so that the position of the measuring point(s) and transmitter(s) could be used in a matrix manner. Fig. 5 displays a satellite image with the mesh (discretization), transmission tower (red circle), measurement sites (blue squares) and candidate sites for the new towers (yellow triangles). The region of the map that was examined in this study is bordered by diagonal sections (straight lines in a light purplish red).  characteristic. This is one of the aspects of the planned methodology that is susceptible to refinement.
The list of attributes can be modified because the knowledge model allows other categories to be included or those already included to be replaced. However, it should be noted that if many attributes are included, it reduces the reliability of the calculations. Figure 6a shows an aerial picture of the analyzed region (and its outskirts) with the square grid used for the logical handling. Figure 6b shows the same discretized logical map, but with different colors for each attribute of each square of the analyzed region: black for "bare", green for "afforestation", ash gray for "buildings" and blue for "afforestation + buildings". The map in Fig. 6b   The loss function or received strength depends on the number of points crossed and hence, the distance. Thus, this function can be expressed by the formula in (2): In this study, the F function used was the identity function. The model can be improved by using another function which might depend in an explicit way, on the environment or even the distance. In other words, only the losses suffered in each square are taken into account.
The Euclidian distance was used to measure the distance from the transmission tower to the measured points. Since the scenario consists of a map with a discretization mesh, a straight line between two arbitrary points on the map can have a jagged or serrated (i.e. aliasing) line, in the shape of a stairway. What matters for the purposes of this study is to count how many and what types of squares were crossed by the abovementioned line, which connects the transmitter to the measured point.
In Fig. 7, there is a transmission tower and three receivers at different points in a way that shows how the squares are counted, as well as how the "straight line" formed by the squares will appear.
Each color along the straight line indicates the pathway followed by the signal to the different receivers. The appearance of the "stairway" should be noted in some of the straight lines. The losses were calculated from the reference power close to the transmission tower, together with the trend (1). The system shown is incompatible of full column rank. For this kind of system, a Linear Least Squares (LLS) method can be applied to solve it. The formal solution is given by (4). Table III shows the results calculated for each α. "Afforestation" type was higher, followed by types of "Buildings" and "Bare". The value found for "Afforestation +Buildings" was the least of all because in the area under study, the buildings made up a lower fraction.
The measures were used to calculate the intrinsic losses of each type of square crossed, however in order to have a better accuracy of the values obtained, the KNN was used to classify the adjacent squares and corroborate with the visual classification made. With the KNN ready, it is possible to use the same model for different regions, as long as there is a specialist of knowledge and a set of data to feed the model since the KNN is a supervised learning technic.
The source code was designed in MATLAB, which sought flexibility when addressing the different scenarios for planning a wireless network and this allowed the following attributes to be aggregated: height of the transmitter, height of the receiver, operating frequency, transmission strength and number of candidate sites for the positioning of new tower transmitters.
All the solutions found make use of the collected measurements.  This phenomenon is noted in the colour tones of the graph that are predominantly darker than the previous one.

VII. RESULTS AND DISCUSSION
In this section, there is a discussion of the results obtained from optimizing the position of the tower transmitters within the territorial region of UFPA for maximizing the received signal strength in the map being considered. Two scenarios were simulated that included the frequencies of 2100 MHz and 2600 MHz. The scenarios examined are as follows: 1) two transmission towers -in this scenario, no account is taken of the influence of the already existing tower near to UFPA; 2) three transmission towers -one of them being in the immediate vicinity of UFPA, the same from which the measurement data were collected. Maps were designed for received signal strength at frequencies of 2100 MHz and 2600 MHz, which were the frequencies used in the measurement campaigns. Fig. 9a shows the result of optimization in the case of two towers with a frequency of 2100 MHz and Fig. 9b for the frequency of 2600 MHz.
The centralization of at least one of the towers was expected but it would not necessarily be the best configuration, because the model takes account of the losses for different environments. It is natural that the proximity of the set of squares would lead to greater losses because if the positioning was far from this set, the signal would be weak for this group.
The map obtained for 2600 MHz has the same positioning of the towers as the map for 2100 MHz, but as the fading of the signal is greater for higher frequencies, the power received ends up by being (a) (b) Fig. 9. Two towers optimization.

B. Scenario 2 -Three Towers:
The same criterion of maximum coverage was applied to the three towers. In this scenario, the tool optimizes the positioning but does not necessarily ensure the maximum coverage; this is mainly due to the existing fixed tower (the same transmission tower from which the measurement data were collected). To ensure that the maximum coverage area could, in fact be found, this would begin to be a multi-objective optimization, which is beyond the scope of this study. Even so, owing to the existence of "shadow zones" in UFPA, where it is not possible to achieve connectivity in the frequencies examined (and often even at lower frequencies), it was decided to display this scenario, in which the number of "shadow zones" would be reduced. In this scenario, for 2100 MHz, the towers were not centralized. This is owing to the presence of the third tower, which serves to provide coverage for the north-eastern part of UFPA, and is involved in the displacement of the two others. As there is no kind of priority of service, the solution found for this scenario was to serve the needs of a public outside UFPA. The tool has several innovative features. The first of these is its hybrid character and the fact that it employs supervised machine learning in a theoretical/computational approach which makes it possible to handle different attributes of an environment.
Another innovative feature is the inclusion of a generalization for the use of frequencies based on a parabolic fitting of received signal strength in terms of the distance to the transmitter. Although the planned tool does not have a single defined mathematical formula, the algorithm that was created makes use of a lot of information originating from formulations that were well defined in their development.
The results of the model for propagation loss used in this planned tool, is proving to be suitable for the prediction of received signal strength in each point of the scenario, which distinguishes it from most propagation models which have a point by point characteristic. Our intention in future studies is to include the following in the tool: 1) Important qualitative attributes of coverage, in which the question of the mesh squares will be given priority. As a result, there will be a methodology of multi-objective optimization to ensure a greater coverage area, which will not only take account of the average value of received signal strength but also the importance of the squares on the terrain in question; 2) measurement campaigns in different frequencies, particularly the possible frequencies in 5G, can be carried out and, thus, data can be obtained to improve the model for received strength with regard to the generalization of the application; 3) Implementation of the Nelder-Mead method (in its binary variation) in the optimization stage and calculation of the results, with a view to refining the techniques for obtaining results; 4) Finally, the throughput of data can be associated with the power in the coverage maps.