An evaluation of machine learning methods for speed-bump detection on a GoPro dataset

: Every day, new applications arise relying on the use of high-resolution road maps in both academic and industrial environments. Autonomous vehicles rely on digital maps to navigate when optical sensors cannot be trusted, such as heavy rainfalls, snowy conditions, fog, and other situations. These situations increase the risks of accidents and disable the potentials of real-time mapping sensors. To tackle those problems, we present a methodology to automatically map anomalies on the road, namely speed bumps in this study, using an off-the-shelf camera (GoPro) and Machine Learning (ML) algorithms. We acquired data over a series of differently shaped speed bumps and applied three classifi cation techniques: Naive Bayes, Multi-Layer Perceptron, and Random Forest (RF). With over 96% of classifi cation accuracy, then RF was able to identify speed bumps on a GoPro dataset automatically. The results show a potential of the proposed methodology to be developed in surveying vehicles to produce highly-detailed maps of vertical road anomalies with a fast and accurate update rate.


INTRODUCTION
According to the 2018 Revision of World Urbanization Prospects delivered by the United Nations, the majority of the world population (approximately 55%) lives in urban areas. However, this percentage is even higher in some parts of the globe, such as Northern America (82%), Latin America (81%), Europe (74%) and Oceania (68%). Because road systems are an essential part of urban mobility projects, governments spend a considerable amount of resources to create and especially maintain the road system as cities grow.
The road network must have its structure well surveyed, due to its importance for traffi c management. These documents are crucial for city planning, road inventory, safe driving, and al. 2016). These systems are composed by a set of sensors such as cameras, Global Navigation Satellite Systems (GNSS) receivers/antennas, inertial measurement units, and Light Detection And Ranging (LiDAR) systems. Recently, MMS are assembled in platforms with low-cost sensors using GoPro cameras. The GoPro includes a GNSS receiver, gyroscopes, accelerometers, image sensor, and a thermometer.
The dataset acquired during an MMS mission is usually large and requires a long time of post-processing, with a considerable amount of human intervention. Several studies have been done to improve the final product quality and decrease production time. Most of the research in the field uses computational applications for improving the aspects of map production.
The contribution of this research is the evaluation of speed bump detection applying Machine Learning (ML) techniques using a dataset from a GoPro camera by a low-cost mobile mapping system. The remainder of this paper is organized as follows: Related Work presents and discusses the related work to speed bump and traffic features detection. Also, the proposed methodology is presented in Materials and Methods. The experimental setup, results and analysis are shown in Results. Lastly, Conclusions concludes and suggests directions for future work.

RELATED WORK
There are several types of data and sensors used to solved the mentioned problems in the introduction, such as accelerometer measurements, satellite positioning, gyroscopes, LiDAR, and images. The methods used range from fuzzy inference, unsupervised learning, and genetic algorithms.
In Bello-Salau et al. (2018), the authors present an algorithm for detecting potholes and bumps from acceleration signals using an accelerometer. A wavelet filter was applied to decompose the signals in scales. Then, road anomalies were identified on a threshold system using features extracted from the wavelet coefficients. The system detects road anomalies with high levels of accuracy, precision, and low false alarm rates.
The work of Soilán et al. (2018) describes a methodology for the extraction of semantic information from public data aiming at urban parameterization. First, a building is segmented with voxel data. After, heuristic rules and unsupervised learning were applied to the ground surface data to distinguish the sidewalk and pavement points. Finally, the system was able to generate an F-score close to 95% for detecting pavement and path. Zhao et al. (2011) proposes the extraction of urban roads from airborne LiDAR data. The methodology consists of the terrain separation from Digital Surface Model (DSM), road centerline extraction, and verification of road networks. The elevated objects are removed from the depth image. After, missing roads, the inference is executed on-road centerline vector map according to gestalt laws. A direction-based voting technique is developed to evaluate the reliability of each path. Results show that the road features are successfully projected onto the depth image.
In Wang et al. (2016), the authors analyze different road extraction methods. Firstly, the road features extraction were investigated. Secondly, the authors present the advantages and disadvantages of various ways. Then, to improve the classification results, the road extraction combines multiple methods according to the real applications. Results show that the methodology could successfully recognize roads using different road features, but when objects like water, buildings, trees, grass, and cars occlude the road. Aljaafreh et al. (2017) propose a speed bump detection method based on a fuzzy inference system (FIS). The FIS detects the speed bumps from vertical acceleration and the speed of the vehicle. The system uses the accelerometer sensor in a smartphone. The methodology is evaluated at different speed levels. Results show that the system is promising for bumps detection. Varma et al. (2018) proposes an imagebased speed hump and bump detection method. The technique uses deep learning and stereo cameras to detect speed humps and bump, which make distance calculations for detected objects. Results show that the technique obtained has an accuracy of 97.44% for tagged objects and 93.83% for unlabeled objects. Devapriya et al. (2015) proposes a methodology that uses images. The captured video is transformed into an image and then pre-processed on that image. After that, morphological operations are performed and the projection of the object in the image. The methodology was based on five different types of speed bumps, for type 1 the accuracy was 90%, type 2 in 85%, type 3 in 83%, type 4 in 80% and type 5 at 4%.
Fernández et al. (2012) it has a detection system of free space and spines, these spines being defined by the Spanish government. Free space detection is performed using a low-cost handle, and spine detection are performed with images captured by a camera, taking into account the zebra-like painting that is done on these spines. The image is processed to work only with the regions of interest that go through an edge and contour extraction phase, and then the lines in the image are analyzed. Accuracy was 100% for spines and 94% for detection of zebra bands.
Considering the related work presented, it is possible to affirm that the proposed methodology is original, due to the fact that none of the ML algorithms used in this work were already applied for identifying this type of features -speed bumps and also only one of these related work uses data of gyroscope, accelerometer, and GNSS, even though with a significantly different methodology.

MATERIALS AND METHODS
The road centerline mapping is an essential task for a land traffic control. This task makes informed decisions since it facilitates finding their anomalies, then allowing agility in the process of repairing and managing areas that require speed reduction.
A centerline map has the potential to be integrated with multi-sensor navigation route applications, so drivers and autonomous systems can be aware of upcoming road anomalies. This warning can assist drivers in situations where a lack of visibility due to rain or snow, and lack of illumination are presented as well as the presence of wrong or outdated traffic signals.
In this context, this paper proposes a methodology that is organized in 4 main steps, as presented in Fig. 1 for an automatic speed bump mapping using data from a GoPro device in a vehicle, with a post-processed MLbased algorithm. The purpose of this article is to present a method to detect coordinates of road locations where anomalies are present and classifying them into two main categories: 1) vertically low and horizontally long platforms 2) vertically high and horizontally short bumps For each categories, the driver's action may vary.
The main idea of the proposed methodology is to evaluate different ML algorithms to extract geographic coordinates that represent speed bumps. The steps presented in Fig. 1 can be described as follows: a low-cost set of cameras is assembled in a vehicle which is driven along an specific trajectory. The camera that contains a set of sensors is recording data during this trajectory. As soon as the vehicle approaches a speed bump, the driver tends to slow down the vehicle (varying the acceleration in the x-axis). During the movement passing by the speed bump the vehicle is going up and down (varying the acceleration in the z-axis), and to complete the movements observed by the sensors, the vehicle front part will point up and down (varying the gyro in the y-axis). Therefore, these observations, followed by a training dataset, and a ML algorithm, may allow the mapping of speed bump in an automatically way.

Video capture
The information regarding the road centerline vertical profile is captured by a vehicle which is adapted with a device for assembling the GoPro© cameras. Fig. 2 presents the vehicle, the platform and the mentioned device. In this work, only one of the cameras -the top one -is used. However, the same approach can be applied for all of them and a more precise result obtained due to the measurements redundancy.

Sensors data extraction
After capturing the video, a pre-processing step is performed to extract information such as acceleration, rotation, and coordinates that the camera generates as metadata in the videos. For this task, we used a Python script and the FFmpeg (2018) tool to extract the metadata from the video creating a ".bin" file. With this file, we call the "gopro-utils" library (2018), which from the ".bin" file as an input, generates four "csv" files containing the video three-axial acceleration and rotation, as well as threedimensional coordinates and the respective timestamps.

Time synchronization
Since the sensors (gyroscope, GPS, camera, and accelerometer) collect data at different rates, then it is necessary to perform a synchronization between the acquired data to produce a single  file with coordinates, acceleration, and gyro from a specific moments.
The first part of the script converts the GPS time to local time. After this conversion, we make the interpolation between the file with rotation information (gyro), acceleration, and coordinates since they are collected at different rates. The interpolation used is a linear approach that provides the expected accuracy of the position, and it is based on the following equation: (1) y i being the parameter value at the ith time with i between timestamps 1 and 2; y n , the parameter value of observation n; and t n the timestamp of observation n.
With the synchronized data, it is possible to extract the signal features, such as the peak of a particular period, the time in which the vehicle raised when crossing a speed bump, the average acceleration, and average speed. Fig. 3 shows the samples types, which we are feeding into the training portion of the ML algorithms: 1) Type 1: long platforms. 2) Type 2: short bumps. 3) Type 3: small irregularities on the road or flat road that are neither Type 1 nor Type 2. Although the purpose of this application is to detect long platforms and short bumps along the road, an ML classifier needs to have classes that could represent false positives. In this case, Type 3 represents readings that should not be classified as Type 1 or 2. Then load the interpolated file as an array and iterate over that array. Each record that exceeds the threshold, as shown in Fig. 3, is placed in an auxiliary matrix. The parameters of interest are: • t ∆ is the amount of time that the vehicle remained above the indicated threshold.

•
Ax is the average acceleration of x during the period in which the vehicle remained above the indicated threshold.
• Ay is the average acceleration of y during the period in which the vehicle remained above the indicated threshold.
• Az is the average acceleration of z during the period in which the vehicle remained above the indicated threshold.
• V is the average speed during the period in which the vehicle remained above the indicated threshold.
• P is the highest value of rotation in the y-axis (transversely to the drive direction -which describe the camera's up and down movement) in the period in which the vehicle remained above the indicated threshold.
After synchronization between the original files, the threshold applied to the information. Thus, a dataset with the attributes presented above is generated, which will be used in training and later in the classification step. As discussed in the next session.

ML
The machine learning algorithms operate by constructing a model from sample entries to make predictions or decisions guided by the data rather than by merely following inflexible and static information. The application of ML in this work aims to build models capable of correctly detecting and classifying speed bumps on the road. Three different algorithms available on Weka (2018) tool were used: Naive Bayes

Naive Bayes
It is a learning technique based on the Bayes theorem of conditional probability. Even though a simple model, this theorem is surprisingly effective when used as a classifi cation algorithm (Julia 2018). Equation 2 shows the Bayes Theorem, used as a basis for this ML technique: In simple terms, and applying the explanation to this study, this classifi er estimates what the probability of the collected data from the sensors to belong to data collected over a speed bump is. The posterior function is then maximized to classify a specific feature as defi ned in the training phase.

Multilayer Perceptron
MLP networks use at least one intermediate layer between input data and the features to be classifi ed. Potential areas of application of this technique are (Silva et al. 2010): a universal approach of functions, pattern recognition, process identifi cation and control, forecasting time series, and system optimization. In our work, MLP is evaluated under its pattern recognition capabilities on a multidimensional dataset.

Random Forest
It is a supervised learning algorithm that builds multiple decision trees and merges these trees to achieve an accurate prediction. Each decision tree passes through a voting mechanism, which chooses the most voted classifi cation (Dantas 2015). In this study, we explore the classifi cation capabilities of the RF method to, similarly to the MLP network, classify a multidimensional dataset into features of interest. In this study, the "branches" of each tree will be defi ned by the information provided by each parameter used as an input (Fig. 3).

RESULTS
To assess the proposed methodology, a set of experiments were performed on the premises of the University of Campinas main campus. Three videos were used, two of them for training and another to test the classifi cation models. After pre-processing the training videos, 7500 instances were generated. Most of these instances belong to Type 3, which is more common on a regular road. Therefore, to avoid overfitting on our classifi ers, we fi rst applied a technique to balance the dataset.
A Class Balancer fi lter was used to reweights the instances in the dataset so that each class has the same total weight. The total sum of weights across all instances is maintained. For the last video, used to test our models, the pre-processing step acquired 32 instances. In this case, they were classifi ed without balancing. A video, that included all the gyroscope, accelerometer and positional parameters, as well as the required meta-data, was collected along a 5km route represented in Fig. 4.
For the ML process, we applied k-fold crossvalidation with k=10, i.e., the original sample is randomly partitioned into ten subsamples. Of the ten subsamples, a single subsample is retained as the validation data for testing the model, and the remaining nine subsamples are used as training data. The cross-validation process is then repeated ten times (the folds), with each of the ten subsamples used exactly once as the validation data. The ten results from the folds can then be averaged (or otherwise combined) to produce an estimation. Fig. 5 shows the accuracy of each classifi er after the training phase. Naive Bayes reached 64.83% of accuracy, then followed by RF 63.23% and MLP 33.33%.
Once the models were trained, we used them to classify the data from the last video. This time, RF performed in a better way, as depicted by Fig. 6. It achieved 96.84% of accuracy with only 1 false positive. The second best was Naive Bayes with 13correct classifi cations and 19 false positives, which corresponds to 40.62%. As in the training phase, the worst classifi er was MLP summing 26 false positives against 6 corrected classifi cation (18.70%). Table I, Table II, and Table III present the confusion matrix of each classifi er on test data. Rows and columns report the number of false positives, false negatives, true positives, and true negatives. This allows more detailed analysis than mere proportion of correct classifi cations (accuracy). Fig. 7, Fig. 8 and Fig. 9 illustrate the classifi ed instances along the map using each ML approach. Green markers show correct classifi cations while the red markers the wrong ones. The labels next to the markers represent the resulting classifi cation, where the left number is the actual type of the   instance, and the right number is the predicted type.
We observed based on the presented results, that the imperfections/anomalies along the streets affect the metrics used, as well as cases of lack of maintenance both on the pavement and on the speed bumps.
The speed bumps in Brazil are regulated by the National Traffi c Code Resolution nº 600/2016, but there are cases where the these features do not fi t to the technical requirements, which can even cause accidents, and in the problem tried to be solved in this work could lead to false positives.
In this work, as pointed out previously, only the data generated by one of the cameras were used, and the results can be refi ned using the data generated by the other four cameras, as well as applying deep learning to the images captured by the front cameras.

CONCLUSIONS
The digital maps used in autonomous vehicles need a collaborative high update rate. This method proves that using low-cost mapping tools, such as a base-mounted GoPro camera along with an ML technique, it is possible to achieve an accurate and effi cient way for road map features generation. It is crucial to highlight that the speed bump identifi cation is useful for several applications, such as routing problem when dealing with cost function parameterization, alert systems for car navigation task, traffi c sign inventory, and to check if the speed bumps are in consonance to the local legislation.
These results are an essential step towards the improvement of ground transportation systems as a reliable way of alerting the drivers and the autonomous vehicles on the road via collaborative approach (V2V -vehicle-to-vehicle information network).    Future work considers the potential of other ML algorithms, as well as the improvement at the pre-processing stage and also the reading GoPro data as streaming and use a pre-trained static model generated by an ML algorithm to classify speed bumps in real-time.
Although our data acquisition base, mounted on top of a vehicle, is composed of multiples GoPro cameras, we are not using their images in this work. However, they will be used in the future to detect other features such as potholes and traffic signs (horizontal and vertical), allowing the map generation of all relevant components of a road, by using photogrammetric approach.