A Hybrid Path Loss Prediction Model based on Artificial Neural Networks using Empirical Models for LTE And LTE-A at 800 MHz and 2600 MHz

— This article presents the analysis of a hybrid, error correction-based, neural network model to predict the path loss for suburban areas at 800 MHz and 2600 MHz, obtained by combining empirical propagation models, ECC-33, Ericsson 9999, Okumura Hata, and 3GPP’s TR 36.942, with a feedforward Artificial Neural Network (ANN). The performance of the hybrid model was compared against regular versions of the empirical models and a simple neural network fed with input parameters commonly used in related works. Results were compared with data obtained by measurements performed in the vicinity of the Federal University of Rio Grande do Norte (UFRN), in the city of Natal, Brazil. In the end, the hybrid neural network obtained the lowest RMSE indexes, besides almost equalizing the distribution of simulated and experimental data, indicating greater similarity with measurements.


I. INTRODUCTION
4G networks comes to fulfil the demands created by a new communications landscape, where smartphones make use of a large amount of online applications, requiring improvements in the quality and coverage of cellular networks, besides the use of higher data rates, which requires more bandwidth.In this context, Long Term Evolution (LTE) and LTE Advanced (LTE-A) represent the last step of 3G networks towards the fourth generation.Both technologies work at the same frequency To reach the conditions for fulfilling the LTE and LTE-A requisites [2], an efficient and accurate network planning during the preliminary system deployment is necessary, where accurate propagation characteristics of the environment should be known.
Path loss models are important for predicting coverage area, interference analysis, frequency assignments, and cell parameters -basic components for the network-planning process in the project of a mobile communications system [4].Understanding the radio channel for the network deployment is utmost, being the modelling of the radio channel using the most appropriate path loss model, an essential factor.
Propagation models can be classified [5,6] as: deterministic, empirical, and physical/statistical.The first ones can be considered the most accurate method.They are based on the behavior of radio waves propagated in space, calculating propagation losses mathematically, based on theoretical formulation.
For such, accurate information is necessary, not only about buildings and terrains, but also about reflection and diffraction coefficients of the surfaces which are in the propagation path.
Meanwhile, empirical models do not accurately predict the radio waves comportment, depending more on field strength from that specific environment to give an approximation based on measurements.Lastly, physical/statistical models combines empirical and statistic information about the environment, aiming to decrease computational cost.
In order to make the communications systems more accurate -to have a more efficient planning, many efforts have been made towards the development of coverage prediction simulation methods and tools able to accurately estimate on measured data.In this sense, some techniques can help to provide more efficient simulation methods, reducing errors and providing more trustworthy results.
Artificial Neural Networks, also known as ANN, are computational techniques that present a mathematical model inspired by the neural structure of intelligent organisms and their ability to acquire knowledge through experience.ANNs are experiencing a great development for the last years, where a huge number of applications can be numbered: signal processing, forecasting, data mining, data clustering, pattern classification, pattern recognition, image generation and process control, among other features [7][8][9].
The neural network performs a nonlinear mapping of a given set of input values to a set of output values, performed by means of layers of neurons, where the input values are added to the respective synaptic weights of each layer to produce an appropriate output according with the entries [10].
The problem in path loss prediction between two points can be interpreted as a solution to obtain a function of several inputs and a single output, where the inputs contain information like locations of the transmitter and receiver, frequency and surrounding buildings.
Thus, the prediction of path loss can be described as the transformation of an input vector containing topographical and morphological information about the environment to the desired output value [11].Since neural networks can be effectively employed in the solving of nonlinear function approximation problems, they are fit for path loss prediction.
Plenty of works involving ANN approaches to predict path loss can be found in literature.Most of them differ in the type and architecture of the ANN, but mainly in the parameters used as inputs of the neural network.This information can vary from a single input involving the distance from the transmitter to the receiver [12], to robust data about the environment and propagation features, such as construction heights, land cover, clearance angle, and street widths [13][14][15].
In [13], measurements performed in rural Australia were used to train an artificial neural network model used for the prediction of macro cell radio wave propagation.The inputs of the network were the distance to base station, transmitting/base antenna height, terrain clearance angle, and portion through terrain.The network performance was compared against ITU-R P.1546 model.In the end, the ANN presented, in general, better predictions than P.1546.Authors discovered that larger feedforward networks are more sensitive to training data and obtained less accurate predictions when fed with inputs outside the training parameter space.They also noted that, when they are fed with data similar to the training set, the predictions are more accurate.
Years later, the same authors continued the study [14], using the same experiment to evaluate networks now with different numbers of hidden layers and neurons, and other training algorithms (gradient descent and Levenberg-Marquardt).The objective was to obtain statistics regarding their training time, prediction accuracy, and generalization properties.Input parameters remained the same from previous work.
In [15], researchers evaluated the viability of a neural network-based path loss prediction model as an alternative to physical and empirical models.The network has the particularity of, instead of use actual path loss measurements in different receiver locations, employing simulation data based on the Longley-Rice model for the ANN training.Three inputs are required: the distance to the transmitter, the direction bearing (azimuth) from the transmitter to the receiver and the elevation above sea level at the receiver location.The performance was compared against physical propagation model, Free Space Loss (FSL), and empirical Egli model.Authors concluded that the ANN-based path loss prediction model performed very well in comparison to commonly used propagation models.
In [16], the results of the application of a General Regression Neural Network (GRNN) in the modeling of path loss in urban and suburban areas are presented.Different numbers of neural network models were tested for both environments, differing only in the input parameters.The main inputs considered were the distance between transmitter and receiver, width of the streets, buildings separation, and buildings height.Measured data collected in the city of Kavala and in Santorini Island, in Greece, was used for training.GRNN-based model was compared against Walfisch-Bertoni (WB) and a modified version of COST231-Walfisch-Ikegami (CWI).The proposed neural network based model obtained significant improvement in the prediction due to its generalization property.Results, in terms of Root Mean Squared Error (RMSE), varied from 5.35 dB to 8.66 dB and from 3.68 dB to 5.23 dB in urban and suburban scenarios, respectively.
The paper also presented a hybrid error-correction model, based on the combination of deterministic model COST-Walfisch-Ikegami (CWI) and a neural network.This approach was later expanded in [11].GRNN-based model is, just as if a Multilayer Perceptron Neural Network (MLP-NN), built over two types of networks: a simple NN model, with five inputs: distance between transmitter and receiver, width of the streets, height of the buildings, buildings separation, and street orientation, together with the hybrid error correction NN model, using COST-Walfisch-Ikegami.
CWI is considered a physical/statistical (or semi-empirical) model, requiring information about the terrain profile, such as the distance between transmitter and receiver, rooftop heights, and space between buildings.
In the end, there was no significant difference between the prediction done by simple and hybrid models.For urban environments, simple RBF and MLP obtained a RMSE of 5.35 dB and 6.55 dB, respectively, while an RMSE of 5.30 dB and 6.07 dB was computed for hybrid RBF and MLP.
Regarding suburban areas, hybrid RBF and MLP computed a RMSE of 3.71 dB and 3.77 dB, while simple RBF and MLP obtained a RMSE of 3.68 dB and 3.74 dB, respectively.This paper simplifies the approach from [11], also developing a hybrid error-based model, but using empirical propagation models instead.This will require that only basic elements used by the models, such as frequency assigned and the distance between transmitter and receiver, are necessary to feed the network.In order to test the hybrid model performance against other neural network approaches presented in related works, comparisons were made for the same data using a simple neural network model, with inputs being terrain and propagation features.The main terrain/propagation characteristics, present in [13][14][15][16] were chosen as inputs: distance from transmitter to receiver, transmitting/base antenna height, terrain clearance angle, direction bearing to from the transmitter to the receiver, and streets width.The output node is the measured path loss.
From now on, these two ANN-based approaches present in this paper will be defined as Hybrid Neural-Network (HNN) model and Simple Neural-Network (SNN) model.A comparison is also made with the regular versions of the empirical propagation models.This paper aims to obtain the method whose results present more similarity to experimental data.
The methodology is based on the comparison of the versions, looking for which modelsregular or ANN-based; achieve simulation values more close to measurements.For such, a campaign was conducted, comprising two different routes in the district of Lagoa Nova, in the city of Natal, Brazil.MATLAB (R2011a, version 7.12, The Mathworks) software was used to perform the implementation of the computational methods.For benchmarking the performance of each technique, two metrics will be applied: the root mean squared error, which will estimate the difference error, in dB, between the datasets; meanwhile, the Wilcoxon rank-sum will provide a similarity test among the datasets distribution.
The remaining of this paper is organized as follows.In Section 2, we provide some principles of the path loss models applied in this article, while in Section 3 the measurement campaign is detailed.
Meanwhile, in Section 4, more information about the hybrid ANN model, are provided.A comparative test among simulations and experimental data collected is reported in section 5. Finally, in Section 6, we bring the conclusions of the study and give guidelines for further works.

II. PREDICTION PROPAGATION MODELS
For this study, path loss is calculated using four different propagation models.ECC-33 (for small and medium cities) will be analyzed for the frequency of 2600 MHz, Free Space will be applied for 800 MHz, while Ericsson and TR 36.942 will cover both frequency bands.Table I present the main equations from these path loss models.where  is the distance between base station-UE (User Equipment),  is the frequency (MHz), ℎ  is the transmission antenna height (m) and ℎ  is the reception antenna height (m).Regarding ECC equations   is the receiver antenna gain in (9) for small/medium cities and in (10) for big cities.

III. MEASUREMENT CAMPAIGN SCENARIO
The campaign took place at the district of Lagoa Nova, in the city of Natal, Brazil.Measurements were performed between Federal University of Rio Grande do Norte (Universidade Federal do Rio Grande do Norte -UFRN) and streets near the campus.The site presents a regular density of vegetation and medium-sized buildings: this characterizes the environment as suburban (Fig. 1). The

IV. HYBRID ERROR CORRECTION-BASED MODEL
In this research, an error correction-based ANN model, based in [11,16], using empirical models, is applied in the prediction of path loss.The ANN is trained to learn the error between measured values and the ones calculated by the propagation models.The difference error is obtained by: The two input vectors comprise the distance between the transmitting antenna and the receiving station, and the difference error, E, for each point.The output vector, also known as the target of the neural network, comprises the corrected path loss, given by ( 12): The training phase of the neural network structure is represented in Fig. 4, while in Figure 5, the network architecture is depicted.
The implemented ANN is a feedforward Multilayer Perceptron type.Its architecture consists of 2 inputs and 1 output, with 1 hidden layer.The input set consist of two vectors with 455 elements each (in the 2600 MHz scenario, while in 800 MHz case, it consists in 450 elements).
The transfer functions used for hidden and output layers were the tangent-sigmoid and linear, respectively, while the algorithm chosen to train the network was the Levenberg-Marquardt backpropagation [22,23].Seeking to avoid overfitting and to make the network to acquire the generalization property, data was split into three sets (Table II): the first one was used in the training of the network for weights adjustment.The second set, used as a validation set, checks the efficiency concerning to the network generalization capability, also serving as a stopping criteria (using a cross-validation strategy [24]).
Meanwhile, the third set, defined as testing set, gives a realistic estimate of the performance of the learned network on new data.Mean Square Error (MSE) was used as performance function by the algorithm to evaluate the convergence rate.A performance progress plot regarding each set's curves for ECC model at 2600 MHz is illustrated in Fig. 6.There is no indication of any major problems with the process, since the training, validation, and test curves are very similar.If the test curve had increased before the validation curve, it would indicate an overfitting problem [14].
The network was designed with a single hidden layer.To find the optimum configuration, a convergence test was performed, based in a trial-and-error procedure, to select the appropriate number of hidden nodes for each scenario.The MSE for different numbers of hidden nodes with 5, 10, 20, 30, and 40 neurons were compared.Each case was executed 30 times, aiming to obtain the average values.For the best configuration obtained for each model, we computed the RMSE achieved in the procedure, also running the neural network another 20 times (50 in total) and computed the value of RMSE.The results are presented in section V.
A trial-and-error procedure was also conducted to find an efficient configuration for the SNN model.A configuration with 30 neurons for 800 MHz and 20 neurons for 2600 MHz was then selected.Data was split using the same proportions as in the HNN model, presented in Table II.

V. ANALYSIS OF RESULTS
The performance of the HNN model, along with the SNN was obtained by comparing RMSE and applying the Wilcoxon rank-sum test.A box-and-whisker plot compare the datasets obtained by each approach against data gathered from the measurements.The simulation scenarios were performed for the operating frequency of 2600 MHz and 800 MHz, transmitted in the campaign.Table V emphasizes the difference in performance between the evaluated methods, considering all models, concerning the RMSE and Wilcoxon rank-sum test results.The p value will determine the Prediction data is calculated by models Ericsson 9999, Free Space, ECC-33, and TR 36.942.The experiment was set in suburban areas, at the frequencies of 800 MHz and 2600 MHz.ECC-33 model was applied in 2600 MHz, while Free Space model was employed in the frequency of 800 MHz; Ericsson and TR 36.942covered both bands.The frequency of 800 MHz is present in bands 20 (791 MHz -821 MHz), 28 (758MHz -823MHz), and 44 (703MHz -803MHz) of LTE, being deployed in countries like France, Germany, Italy, Morocco, and Tunisia.In concern to 2600 MHz, this frequency is present in bands 7 (2620 MHz -2690 MHz), 38 (2570 MHz -2620 MHz), and 69 (2570 MHz -2620 MHz) adopted by, among other countries, Ghana, Canada, Colombia Chile, and Brazil.
set of equipment used for the transmission and reception of signals comprised a Rhode & Schwarz broadband amplifier, R&SBBA150 (9 kHz -6 GHz), and an Anritsu radio transmitter, model MG3700A (50 Hz -6 GHz).15 Watts of power were used to transmit the signal; two pairs of directive antennas from Pasternack: a panel antenna (2.5 GHz -2.7 GHz) with a nominal gain of 14 dBi was employed in the 2600 MHz frequency.A panel dual band antenna (806-960MHz and 1710-2500 MHz) with 7 dBi of nominal gain was used in the 800 MHz transmission.The same antennas were used in the reception of signals.The transmitted signal was a Continuous Wave (CW).The radio transmitter and the broadband amplifier are showed in Figure 2a.

Fig. 1 .Fig. 2 .Fig. 3 .
Fig. 1.One of the streets from Lagoa Nova district, highlighting the panel antenna used in reception.Regarding the measurement of the signal, an Anritsu spectrum master, model MS2721B (illustrated in Fig.2b.), featuring an integrated GPS -responsible for giving the precise location of measured points, was employed.The transmitter antenna was installed on the rooftop of the Engineering Technological Complex (Complexo Tecnológico de Engenharia -ECT) building, in UFRN campus (Figure3a.),at a height of 20 meters.A high-grade coaxial cable was used to connect the antenna to the broadband amplifier, which in turn was connected to the digital transmitter.With the purpose to cover the different points along the site, a mobile laboratory was set -a car, granted by UFRN, duly equipped with a receiving antenna installed at the top of the vehicle, at a total height of 3.6 meters, measured from the floor, and connected to the spectrum master.The vehicle travelled along two different routes, near the university (Figure3b.),with a constant speed of 20 km/h

Fig. 4 .
Fig.4.Fluxogram of the training process.The goal of the path loss prediction is not only produce small errors for the set of the training examples, but also to be able to present better results dealing with examples not employed in the training process[11].This property is called generalization and it is very important to predict path loss and determine estimated coverage areas properly in different projects within a similar environment.A relevant problem that can occur during the ANN training is the over adaptation, or overfitting, where the network memorizes the training examples and does not learn how to deal with new situations[15].

Fig. 6 .
Fig.6.One of the performances of training, validation, and test sets.The average MSE for validation set versus the number of hidden nodes for each propagation model applied are depicted inTable III and Table IV for 800 MHz and 2600 MHz, respectively.Best values

Figure 7 Fig. 7 .
Figure 7 depicts the results for a) ECC model at the frequency of 2600 MHz (route 1), b) Ericsson model at 800 MHz in route 2 and c) TR 36.942model at 800 MHz in route 1.In all scenarios, both ANN-based models showed a well-defined pattern in terms of performance.The regular variants, although performing satisfactorily, proved to be the less accurate method.From the beginning of the course of measurements in route 1, up to 500 meters, ECC model predictions, which obtained a RMSE of 10.33 dB, were positioned far from most measured points.A less efficient performance can be observed in the regular Ericsson model in route 2, which achieved 15.58 dB of RMSE, presenting a displacement in relation to measured points from 800 meters until the end of the route.TR 36.942model presented the best performance among the three models analyzed, once its path loss curve was located near experimental data along almost the entire course, obtaining a RMSE of 8.68 dB.Regarding the SNN model, it obtained a trustworthy performance, following experimental data closely along the entire course, in all scenarios.This is reflected in the obtained RMSE: the method obtained 4.61 dB, 5.48 dB, and 4.51 dB for the cases presented in Fig.7.a,Fig.7.b, and Fig.7.c, respectively.However, the technique that achieved the highest level of excellence in performance was the hybrid neural network, once the marks were virtually equal to measured data.The RMSE obtained by the HNN model was close to 0 dB for all scenarios.

Fig. 8 .
Fig.8.depicts a box-and-whisker plot with all five datasets from the scenario presented in Fig.7.b.It can be noted that data distribution from Ericsson model occupied most of the range above the interquartile range of experimental data, which indicates higher values of propagation loss predicted; these values deviates from the measurements average, represented by the red line, and only partially matches with its adjacent quartiles.Regarding SNN model in this scenario, although slightly flatter, it almost matched the interquartile range, presenting a high similarity with measured means.However, it obtained decreased data distribution fidelity, once the span occupied only part of the measured data sampling space.The hybrid model corrected these issues, obtaining a data distribution as close as it can get to measured values, reproducing also the outliers, which are represented by the red crosses.

TABLE I .
PROPAGATION MODELS EQUATIONS

TABLE II .
HYBRID ANN PARAMETERS FROM BASIC CONFIGURATION Table III and Table IV for 800 MHz and 2600 MHz, respectively.Best values are marked in bold, while the worst outcomes are presented in italic.

TABLE III .
MSE OF VALIDATION SET DATA FOR ANNS VERSUS THE NUMBER OF HIDDEN LAYER NEURONS FOR 800 MHZ.

TABLE IV .
MSE VERSUS THE NUMBER OF HIDDEN LAYER NEURONS FOR 2600 MHZ.