Using Multiple Deep Neural Networks Platform to Detect Different Types of Potential Faults in Unmanned Aerial Vehicles

Many researchers developed new algorithms to predict the faults of unmanned aerial vehicles (UAV). These algorithms detect anomalies in the streamed data of the UAV and label them as potential faults. Most of these algorithms consider neither the complex relationships among the UAV variables nor the temporal patterns of the previous instances, which leaves a potential opportunity for new ideas. A new method for analyzing the relationships and the temporal patterns of every two variables to detect the potentially defected sensors. The proposed method depends on a new platform, which is composed of multiple deep neural networks. The method starts by building and training this platform. The training step requires reshaping the dataset into a set of subdatasets. Each new subdataset is used to train one deep neural network. In the testing phase, the method reads new instances of the UAV testing dataset. The output of the algorithm is the predicted potential faults. The proposed approach is evaluated and compared it with other well-known algorithms. The proposed approach showed promising results in predicting different kinds of faults.


INTRODUCTION
In complex systems such as the unmanned aerial vehicles (UAV), the chances of failure are hazardously high. The streamed data of the flight missions contain vast knowledge that can be used in defining and predicting potential faults. The data are stored at a fixed rate in data rows. Each data row contains the values of the UAV variables. The variables are either command (elevator, rudder, and the aileron command) or sensor readings (altitude, longitude, latitude, and airspeed). To foresee system failure, anomaly detection algorithms are used. Anomaly detection algorithms find patterns in data that do not follow an expected behavior. These algorithms are either supervised or unsupervised. The supervised algorithms are trained using datasets with labels for each instance (normal/abnormal) labels. However, the unsupervised algorithms do not involve any labeling for the datasets, and they assume that the abnormal patterns are less frequent than normal ones (Chandola et al. 2009).

LITERATURE REVIEW
Over recent years, detecting system faults became the interest of many researchers, where they designed many algorithms to extract data anomalies to predict faults. Casas et al. (2016) used decision trees to detect anomalies in cellular network data. Their approach correctly recognized more than 80% of the abnormal instances with no false positives. He et al. (2019) presented an anomaly detection and mitigation algorithm based on online subspace tracking ("ADMOST"). Their method detected the outlier instance in the multivariate heterogeneous data stream with high accuracy and mitigated them with low error.
The progress in neural network architectures and machine learning led to a leap forward in performance and efficient processing. Most of the existing approaches used neural networks to learn the behavior of a training dataset. The learned neural network predicts the data online. It uses the error of prediction to extract the outliers during flight missions Hundman et al. (2018), Saurav et al. (2018) and Vinayakumar et al. (2018) presented the efficiency of the deep learning approach and long shortterm memory (LSTM) networks for detecting anomalies for Android malware detection. Their approach obtained 0.987 detection rate and 0.939 accuracy using an LSTM network with six layers. Wang et al. (2019) built a time series prediction model based on the LSTM network. They estimated the uncertainty interval to conduct point anomaly detection. Their method scored a recall rate close to one on two test datasets. Munir et al. (2019) used a deep learning-based approach for anomaly detection in timeseries data. Their technique was accurate even for small deviations in time series cycles. It was capable of detecting point and contextual anomalies in time series with periodic and seasonality characteristics. Althubiti et al. (2019) applied an optimized model of LSTM to implement an anomaly detection system. They claimed that the optimized model of LSTM obtained an accuracy of 3 0.8483. Also, they found that LSTM performed better than support vector machine, multiple perceptron networks, and Naïve Bayes techniques. Despite the good results, these algorithms do not provide the desired objective, as they do not specifically catch contextual faults, neither the context of the fault, and this leads to the benefit of the proposed approach.
Multiple neural networks have been utilized previously for different applications. The multiple neural networks give more reliable results for the occurrence of faults than a single neural network. Zhang (2006) proposed a fully connected architecture of a multiple neural networks platform. The outputs of the multiple DNNs were combined to give a single overall result. Karjol et al. (2018) used MDNN for speech enhancement and estimated the speech spectrum as a weighted average of outputs from multiple DNNs. The used platform is not fully connected as the previous platforms, because fully connected architecture may lead to mixed relationships, and this could affect the results of the presented model.

METHODOLOGY
The presented method uses a platform of MDNN. The DNN is an artificial neural network (ANN) with multiple hidden layers that are stacked together (Singh 2017). The ANN is an algorithm used in solving a wide range of problems, including classification, clustering, and pattern recognition (Ullah et al. 2019). It is a computational system that learns to perform tasks by training. The ANN is based on a collection of connected nodes called neurons. The neurons of the neural network are aggregated into layers. Each neuron in a given layer is connected to every neuron in the next layer. The input layer receives data from outside the network. The output layer generates the output results, and single or multiple hidden layers transform and transfer the data from the input layers to the output layers. The job of a neuron can be represented mathematically by Eq. 1, which is adapted from (Singh 2017). Each neuron receives some input signals; then, it multiplies the received inputs (element-wise multiplication) with corresponding values called weights. Next, the neuron sums the result with a bias value; then, it applies an activation function; for example, the sigmoid σ function (see Eq. 2).
The presented method uses a platform of MDNN. The DNN is an a network (ANN) with multiple hidden layers that are stacked together (Sin ANN is an algorithm used in solving a wide range of problems, including clustering, and pattern recognition (Ullah et al. 2019). It is a computation learns to perform tasks by training. The ANN is based on a collection of co called neurons. The neurons of the neural network are aggregated into layer in a given layer is connected to every neuron in the next layer. The input data from outside the network. The output layer generates the output resu or multiple hidden layers transform and transfer the data from the inpu output layers. The job of a neuron can be represented mathematically by adapted from (Singh 2017). Each neuron receives some input signals; the the received inputs (element-wise multiplication) with corresponding weights. Next, the neuron sums the result with a bias value; then, it applie function; for example, the sigmoid function (see Eq. 2).
where is the input vector, is the weight vector, is the neuron b element-wise multiplication, is the activation function, is the neuron is the input of the activation functions. The neuron sends the results to th (1) The presented method uses a platform of MDNN. The DNN is network (ANN) with multiple hidden layers that are stacked together ANN is an algorithm used in solving a wide range of problems, inclu clustering, and pattern recognition (Ullah et al. 2019). It is a comput learns to perform tasks by training. The ANN is based on a collection called neurons. The neurons of the neural network are aggregated into in a given layer is connected to every neuron in the next layer. The i data from outside the network. The output layer generates the outpu or multiple hidden layers transform and transfer the data from the output layers. The job of a neuron can be represented mathematically adapted from (Singh 2017). Each neuron receives some input signals the received inputs (element-wise multiplication) with correspon weights. Next, the neuron sums the result with a bias value; then, it a function; for example, the sigmoid function (see Eq. 2).
where is the input vector, is the weight vector, is the neu element-wise multiplication, is the activation function, is the ne is the input of the activation functions. The neuron sends the results where x is the input vector, w is the weight vector, v is the neuron bias, • is the element-wise multiplication, f is the activation function, y is the neuron output, and z is the input of the activation functions. The neuron sends the results to the next neurons that are connected to it in the subsequent layers. The weights values are adjusted through the training phase. Training the network is accomplished in an organized and efficient technique, such as the error back-propagation method, which is widely used in most ANN prototypes (Wang et al. 2013), and it is explained briefly in (Yu and Wilamowski 2016). Figure 1 shows a block diagram of the MDNN method, where it consists of two phases: (1) the building and training phase and (2) the testing phase. In the first phase, the algorithm collects the variables of the UAV, and form an array of elements. Each element of this array is a possible (dependent, tested) couple. Then, it builds the MDNN platform by defining one DNN for each (dependent, tested) couple. Accordingly, the algorithm reshapes the training dataset into a set of subdatasets. The rows of each subdataset are the values of a sliding window. Each sliding window consists of the previous instances of the dependent variable, and the differential values of the tested one. The algorithm trains the MDNN platform using the new subdatasets. In the testing phase, the algorithm reads the instances at each time step, constructs the sliding window for the (dependent, tested) couple, and uses the trained MDNN platform to detect the abnormal instances; consequently, it predicts the potential faults. The set of m tested variables is represented by P = {p j : j < m}, and the set of n dependent variables by Q = {q k : k < n}. The values of each couple (q k , p j ) at time step t are (x t qk ,x t pj ). The goal is to predict potential faults by extracting the abnormal temporal patterns of the two variables (q k , p j ). To build the new platform of the MDNNs, Algorithm 1 is used. This algorithm starts by declaring NN as an empty matrix, whose elements will be the new (m × n) DNNs (see Fig. 2 The algorithm creates a neural network NN (qk,pj) for each (q k , p j ) couple. These couples can be nominated with the help of an expert in the field. Then, the algorithm creates a subdataset D qk,pj using the training dataset. The subdataset D qk,pj is used for training NN (qk,pj) . Each row of D qk,pj consists of the input and the output of the neural network NN (qk,pj) . The input W t,h qk,pj at time step t is a sequential list of the current and the previous instances of the two variables (q k , p j ), and it consists of 2h elements (Eq. 3), where h is the size of the sliding window.  ,

Multiple deep neural network (MDNN)
in Eq. 3: the differential values Δx t pj = x t pj -x pj t-1 are used to increase the sensitivity of the algorithm for the occurrences of faults in the Tested variable p j , as suggested by Khalastchi and Kalech (2018).
The training output of the neural network NN (qk,pj) is the class of the point (x t qk , Δx t pj ), where class (x t qk , Δx t pj ) ∈{Zero, One} . Zero is the abnormal class, and one is the normal class. By combining the input and the output, a row of D qk,pj is constructed. Iteratively and in the next time steps, the algorithm constructs the next rows of D qk,pj . Equation 4 shows the shape of D qk,pj .
In Eq. 3: the differential values ∆ = − −1 are used to increase the sensitivity of the algorithm for the occurrences of faults in the Tested variable , as suggested by Khalastchi and Kalech (2018).
The training output of the neural network ( , ) is the class of the Zero is the abnormal class, and one is the normal class. By combining the input and the output, a row of   (Wang et al. 2013). Training the neural network using the method of error back-propagation allows the system to learn any given mapping of input where N is the size of the new subdataset. After creating D qk,pj , the algorithm trains NN (qk,pj) and assigns it to the element NN [k,j] to be used in the testing phase. The mathematical model for a neural network NN[k,j] is realized using a nonlinear functional mapping from the values of the input W t,h qk,pj to the output y t k,j (Eq. 5).
of the algorithm for the occurrences of faults in the Tested variable , as sug Khalastchi and Kalech (2018).
The training output of the neural network  (Wang et al. 2013). Training the neural network using the method of error back-propagation allows the system to learn any given mapping of input to output. The back-propagation method is used by the gradient descent algorithm. The gradient descent algorithm is an iterative optimization algorithm that tries to find the local minimum of the loss function. The loss function E is calculated using Eq. 6. The descent algorithm calculates the gradient ∇E as the first-order derivative of the total error function (Eq. 7), which adapted from (Yu and Wilamowski 2016). At each iteration of the gradient descent method, the values of θ i , b i are adjusted using Eq. 8 and 9 (the reader is referred to Yu and Wilamowski [2016] for the mathematical extraction steps for Eqs. 8 and 9).
to output. The back-propagation method is used by the gradient descent algori gradient descent algorithm is an iterative optimization algorithm that tries to find minimum of the loss function. The loss function is calculated using Eq. 6. Th algorithm calculates the gradient ∇ as the first-order derivative of the t function (Eq. 7), which adapted from (Yu and Wilamowski 2016). At each ite the gradient descent method, the values of , are adjusted using Eq. 8 a reader is referred to Yu and Wilamowski [2016] for the mathematical extraction Eqs. 8 and 9).
to output. The back-propagation method is used by the gradient descent algori gradient descent algorithm is an iterative optimization algorithm that tries to find minimum of the loss function. The loss function is calculated using Eq. 6. Th algorithm calculates the gradient ∇ as the first-order derivative of the t function (Eq. 7), which adapted from (Yu and Wilamowski 2016). At each ite the gradient descent method, the values of , are adjusted using Eq. 8 a reader is referred to Yu and Wilamowski [2016] for the mathematical extraction Eqs. 8 and 9).
to output. The back-propagation method is used by the gradient desc gradient descent algorithm is an iterative optimization algorithm that t minimum of the loss function. The loss function is calculated using algorithm calculates the gradient ∇ as the first-order derivative function (Eq. 7), which adapted from (Yu and Wilamowski 2016). A the gradient descent method, the values of , are adjusted usin reader is referred to Yu and Wilamowski [2016] for the mathematical Eqs. 8 and 9).
where > 0 is the learning rate, which can be adjusted during th Increasing the learning rate value makes the learning process faster, the sensitivity of the neural network (Yu and Wilamowski 2016). In the input vector , ,ℎ outputs an Anomaly, then the algori instance( , ∆ ) as a potential fault (see Fig. 3). (9) where η > 0 is the learning rate, which can be adjusted during the training process. Increasing the learning rate value makes the learning process faster, but it might affect the sensitivity of the neural network (Yu and Wilamowski 2016). In the testing phase, if the input vector W t,h qk,pj outputs an Anomaly, then the algorithm considers the instance (x t qk , Δx t pj ) as a potential fault (see Fig. 3).

RESULTS AND DISCUSSION
The well-known "FLTz" synthetic dataset is used. This dataset is shared by Oza (2011) for public research purposes, and it contains 20 flights of a fixed-wing aircraft with periods up to 40 min. "FLTz" is a flight simulator used to develop flight control, planning, and in-flight fault detection (Chu et al. 2010). Each flight includes all stages as takeoff, climb, cruise, and descent. Each flight consists of 36 variables, and it is recorded at a rate of 1 Hz. The experiments were conducted by employing 14 sensor readings and four commands. The first 14 sensor readings variables were considered as the tested variables, and all the 18 variables were considered as the dependent ones. Table 1 shows the variables of the FLTz dataset. Using the "FLTz" dataset, a new dataset was generated for the training phase with 40,000 rows by concatenating randomly multiple flights. Four different flights for testing purposes were used. Each flight was injected by one fault type (impulse, stuck, cut, and drift) into various variables such as (pitch, pitch rate, and airspeed). Table 2 shows the injected faults in the selected 7 four flights. The impulse fault means that the sensor shows an offset value added to its actual value in one instance (see Fig. 4 and note the small range of the impulse, which is about [0, 0.07]). The stuck fault occurs when the value of the sensor is stuck at a specific reading (Fig. 5). The drift fault means that the readings of the sensor increase through time where they should not (Fig. 6), and the cut fault occurs when the sensor shows unexpected continuous zero reading for a limited period (Fig. 7).    Multiple experiments were conducted to evaluate the results of the presented approach. The results were compared with the results of other known algorithms such as KNN, One-Class SVM, and Kernel SVM. All algorithms were tested using the same flights.

8
The MDNN platform (Algorithm 1) consisted of 238 DNNs. Each neural network consisted of one input layer, one output layer, and three stacked hidden layers. In the experiments, the input layer contained 2h = 8 neurons (2h is the length of the input W t,h qk,pj ), the hidden layers structure had [8,16,8] neurons in three layers, respectively. Using three hidden layers helped to get acceptable results. The output layer had one neuron for detecting anomalies.

EVALUATION INDICATORS
To evaluate the anomaly detection algorithms, the following indicators were used: recall or detection rate (Eq. 10), false alarm rate (FAR) (Eq. 11), and precision (Eq. 12). Precision is an indicator of whether the detected anomalies are trustworthy. Also, the Score (Eq. 13) was used to evaluate both the precision and the recall (Lee and Kim 2019). (These indicators are widely used in the area of anomaly detection).
neuron for detecting anomalies.

EVALUATION INDICATORS
To evaluate the anomaly detection algorithms, the following indi recall or detection rate (Eq. 10), false alarm rate (FAR) (Eq. 11), and p Precision is an indicator of whether the detected anomalies are trust Score (Eq. 13) was used to evaluate both the precision and the recall (Le (These indicators are widely used in the area of anomaly detection).
neuron for detecting anomalies.

EVALUATION INDICATORS
To evaluate the anomaly detection algorithms, the following ind recall or detection rate (Eq. 10), false alarm rate (FAR) (Eq. 11), and Precision is an indicator of whether the detected anomalies are trus Score (Eq. 13) was used to evaluate both the precision and the recall (L (These indicators are widely used in the area of anomaly detection).
neuron for detecting anomalies.

EVALUATION INDICATORS
To evaluate the anomaly detection algorithms, the following indica recall or detection rate (Eq. 10), false alarm rate (FAR) (Eq. 11), and pre Precision is an indicator of whether the detected anomalies are trustw Score (Eq. 13) was used to evaluate both the precision and the recall (Lee (These indicators are widely used in the area of anomaly detection).
Using three hidden layers helped to get acceptable results. The output l neuron for detecting anomalies.

EVALUATION INDICATORS
To evaluate the anomaly detection algorithms, the following indicator recall or detection rate (Eq. 10), false alarm rate (FAR) (Eq. 11), and precis Precision is an indicator of whether the detected anomalies are trustwort Score (Eq. 13) was used to evaluate both the precision and the recall (Lee an

FAULT DETECTION EXPERIMENTS
The proposed approach was compared with other well-known algorithms such as the KNN, the One-Class SVM (support vector machine), and the Kernel SVM. The K nearest neighbor algorithm KNN is used for classification problems by estimating the local density of the data points (Ullah et al. 2019). It depends on the distance between the tested value and its nearest neighbors. Usually, the Euclidean distance is used. The One-Class SVM (support vector machine) finds a boundary that surrounds the normal instances of the training dataset. For that, it decides that the test instance falls within the region of the learned boundary, through a linear decision function (For more details of the linear decision functions, the reader is referred to Chandola et al. [2009] and Bounsiar and Madden [2014]). The SVM algorithm increases its efficiency to perform nonlinear classification by applying kernel functions (Kernel SVM) (Guo et al. 2015). The kernel functions map pairs of objects to their similarity. The values of the similarity range between one and zero, where value one is given for maximum similarity, and value zero is given for no similarity (Das et al. 2010). Commonly, the Gaussian kernel with Euclidean distance measure is used (Juvonen et al. 2015). Table 3 shows that the required training time for the Kernel SVM algorithm was the longest; this was because of the computational complexity of the kernel function (Janakiraman and Nielsen 2016). The training time for the MDNN algorithm was also long because of the large number of neurons that need to be trained. 9 The MDNN algorithm scored good results considering all the evaluation indicators (Table 4), where its recall and precision had the highest values, and its false alarm rate approached zero. Multiple deep neural networks scores were better than the scores of Kernel SVM and KNN algorithms; for example, the precision of the MDNN algorithm was generally better. Additionally, in the testing phase, the KNN algorithm was the slowest, while the One-Class SVM was the fastest as Fig. 8 shows.   The One-Class SVM was better in detecting impulse faults in Flight1; however, its false alarm rate was high for stuck faults in Flight2, and its recall approached zero in the case of the drift in Flight3 and cut faults in Flight4, and this means that One-class SVM failed to detect faults that have a continuous nature.
In Flight1 (impulse-faults, Fig. 4), the precision of the MDNN algorithm was acceptable (0.96), but it was a bit lower from that of KNN, SVM, and Kernel SVM, and the reason is the small range of the impulses. In Flight2 (stuck faults), Flight3 (cut faults), and Flight4 (drift faults), the precision indicator of the KNN, SVM, and Kernel SVM was high. However, the false alarm rate was high, and this was because it was difficult for these algorithms to separate the significant number of outliers from the normal instances due to the continuous nature of the faults. For the same reasons, the One-Class SVM showed low detection rate when processing Flight3 (sensor-drift) and Flight4 (sensor-cut), therefore the F.Score indicator was low too. However, the precision indicator was high (refer to Eq. 12).
K-nearest neighbor efficiency was similar to the MDNN algorithm in processing Flight3 and Flight4. The reason is the increased distance of these outliers from the normal instances, which increased the KNN reliability. Note that the MDNN algorithm was not affected by the continuous nature of the outliers, and this caused the MDNN algorithm (which is a supervised algorithm) to be better from the other algorithms.

CONCLUSION
In this paper, a novel method is proposed for predicting several types of potential faults by analyzing the relationships between the variables of the UAV flights and considering the temporal patterns of the previous values. The new method depends on a platform of MDNN. Each DNN is responsible for detecting anomalies in the instances of a couple (dependent, tested) variables. The MDNN algorithm worked better than other algorithms while processing the stuck, drift, and cut faults, which have continuous nature. On the other hand, it was sensitive to the small values of the abnormal impulses, but its precision was a bit lower than the other algorithms.
Future work includes improvements to make the algorithm faster in the training phase by optimizing the structure of the neural networks and increase their precision or by enhancing the performance by testing different platforms, such as using one tested variable versus many dependent variables. Also, new methods could be explored for choosing the appropriate couples of variables in order to minimize the size of the processed values. Moreover, the new platform can be used to test and compare other machine learning classifiers, such as long short-memory networks and logistic regression.

DATA AVAILABILITY STATEMENT
The data will be available upon request.

FUNDING
Not applicable.