Cloud computing based unsupervised fault diagnosis system in the context of Industry 4.0

Abstract: New online fault monitoring and alarm systems, with the aid of Cyber-Physical Systems (CPS) and Cloud Technology (CT), are examined in this article within the context of Industry 4.0. The data collected from machines is used to implement maintenance strategies based on the diagnosis and prognosis of the machines' performance. As such, the purpose of this paper is to propose a Cloud Computing Platform containing three layers of technologies forming a Cyber-Physical System which receives unlabelled data to generate an interpreted online decision for the local team, as well as collecting historical data to improve the analyzer. The proposed troubleshooter is tested using unlabelled experimental data sets of rolling element bearing. Finally, the current and future Fault Diagnosis Systems and Cloud Technologies applications in the maintenance field are discussed.


Introduction
Without a doubt, maintenance is an important chain of the product lifecycle, since it costs as much as 60% to 70% of the total cost of the product lifecycle (Dhillon, 2006;Venkataraman, 2010). Although the maintenance cost is considerably high, the existing maintenance solutions are used in isolation without taking into consideration the real status of the machine components (Efthymiou et al., 2012). Due to the interlinked production systems, failures of the machine components lead to bottlenecks in the subsequent value-added processes of the manufactures and their customers (Denkena et al., 2012;Wang, 2013). Advanced manufacturers have introduced the use of remote diagnosis techniques to gain awareness of equipment status and identify failures beforehand (Mori et al., 2008). Remote Fault Diagnosis System (RFDS) is defined as a troubleshooter used to diagnose a subject (system, subsystem or component) at distance (Bidgoli, 2010). RFDS has three major layers: I) sensors installed on the monitored subject used to collect the data from the diagnosed system; II) a communication network that conveys the collected data from the diagnosed system to; III) the remote diagnosis center where the data of the information system process occurs and a decision is made based on the generated information of the subject status. The diagnosis center includes real decision making management software that uses the historical data of the system performance to diagnose, isolate, identify the root cause and monitor the performance of the system. RFDS is commonly used in medical equipment, aircraft, and military equipment. Figure 1 illustrates the three layers of RFDS and explains the communication between them (Sanchez et al., 2011). The default setup of the remote diagnosis system exchanges the data with a service hotline, where human experts give their technical advice via communication methods, such as emails and video calls (Wang et al., 2004;Sekar et al., 2011). The advanced setup of the remote diagnosis system, on the other hand, is the automated failure diagnosis system, which uses an offline analysis technique classifying the behavioural change based on the historical data of the subject; then it sends the necessary action to the local maintenance team and the spare parts support (Mourtzis et al., 2017;Zhao et al., 2010).

Literature review
Many of the most popular diagnosis and decision making support have already been applied for the automated remote diagnosis system in rotating machinery. For example, Mourtzis et al. (2017). created an advanced monitoring system by using a cloud-based platform supported by argument reality to monitor the machine tools, calculate remaining useful time and perform accurate and quick maintenance action. Zhao et al. (2010) described an expert enterprise monitoring and remote fault diagnosis system based on web services, visual studio and extensible mark-up language technologies. This proposed architecture enhances elasticity and scalability while decreasing the time needed to develop the maintenance system. Jantunen et al. (2018) described the concept of advanced condition-based maintenance with the aid of cloud computing. This study aims to reinforce the idea that a maintenance system supported by CT and CPS can increase the overall equipment efficiency and reduce maintenance costs. Campos et al. (2016) highlighted data management in advanced maintenance systems, especially when emergent technologies such as CT and big data are used. The use of big data and CT has been discussed and the element of cloud-based big data system features has been highlighted. Monostori et al. (2016) studied the development of computer science and information technologies used to build a cyber-physical system and its implementation in the manufacturing process. The root of the cyber-physical production system from manufacturing technology and the main research challenges are highlighted.
Additionally, Kirubashankar & Krishnamurthy (2013) implemented a novel web-based expert maintenance support system for the Distributed Control System (DCS) on chemical plants, which are one of the vital components of any manufacture. The collected information and critical alarms are stored and compared with the reference functional database instantly; then they are sent to the maintenance engineer for online diagnosis. Sanz-Bobi et al. (2012) developed a remote diagnosis system capable of performing continuous and automatic monitoring of the basic parameters of photovoltaic solar power plants to detect anomalies. A prototype of the proposed system and architecture is introduced for the photovoltaic solar power plant. On the other hand, the concept of cyber-physical systems was proposed in Lee et al. (2015). This concept is based on five levels: I) connection level; II) data to information conversion level; III) cyber level; IV) cognition level; and V) configuration level. This new trend of the cyber-physical system is transforming the manufacturing industry to the next generation industry of 4.0. The main advantage of CPS is the interface between machines, which is similar to social networks. Holgado et al. (2016) studied the developing trends in the remote maintenance, such as diagnosis and prognosis tools, cloud-based tools, simulation tools, location, and tracking tools, aiming to explain the value of using the new trend in service provision. While Wang et al. (2017) presented a new paradigm of cloud predictive maintenance based on mobile agent technology. The mobile-agent based paradigm reduces data transmission and enhances system flexibility. Also, a proactive collaborative maintenance CPS project is proposed to building a platform for proactive maintenance of industrial machines Jantunen et al. (2016). Mourtzis et al. (2016) presented a preventive maintenance approach integrated into a monitoring framework. The developed approach was built by using CT. The main advantage of the proposed system is the distribution of real-time information related to the machine tools to the operators.
Based on the former literature survey, this paper's originality arises from its aim to build a cloud computing platform for diagnosis of the machine's performance; and the exploration of unlabelled data to generate interpretable patterns to be used in maintenance decision making. The combining of machine learning and communication technology brings us to a new age of industrial engineering.
In Section 3 below, the proposed methodology is introduced, while in Section 4 the remote diagnosis unlabelled system architecture is proposed; whereas in Section 5, a case study is presented. Finally, Section 6 exhibits the discussion and conclusion of this paper.

Methodology
Several types of data collection methods can be used to detect the status of the monitored subject. The signals can be vibration signals, thermal imaging, oil particle signals… etc. The most common technique of fault detection for rotating machines is the vibration signal. In the following section, the vibration signal analysis is illustrated. This section presents an unsupervised machine learning technique component of three layers to discover and understand the hidden correlation between the vibration signals. The first layer is vibration signal features analysis, the second layer is clustering and the third layer is a pattern-based machine learning technique called Logical Analysis of Data (LAD).

Vibration signal features analysis
The first analytical step of the vibration signal is feature extraction. Those forms can be used to diagnose the failure status in the rotating system. However, in the three forms, there are a lot of hidden data that can be extracted to improve the accuracy of the diagnosis technique. The advantages of the hidden data extraction techniques and the disadvantages are studied in Lakis (2007), whereas the following table illustrates the statistical features which are commonly used in many research as stated in Mortada et al. (2011).The time domains feature calculated from the vibration signal are descriptive statistics are shown in Table 1.
3 n 1 y y Ku n 1 n 2 n 3 STDED n 2 n 3 The other challenge faced is that there is no scientific node between healthy and unhealthy classes. The new improvement on LAD is using a clustering technique to convert the unlabeled data into a labeled one to help LAD discover the hidden patterns inside the data. In the following section, we introduce the clustering algorithm.

The clustering analysis
After analyzing the vibration signals we use the extracted feature of this signal to find the node between the classes to illustrate the information inside the vibration signal. The Clustering Analysis aims to characterize homogenous objects into a set of classes dealing with very large-scale and complex data sets. We use K-mean clustering algorism to find the best clusters. K-mean is the most common algorithm used in signal processing. The clustering process is done by minimizing the sum of squared distances between data and the corresponding cluster centroid. This technique works through four steps. The first step in how the algorism work is random creating of K centroids based on the initial defined of K. the second step is to allocate each point of the dataset to the nearest centroid. The third one has recomputed the mean of every data point to the centroid's cluster to reduce the intracluster.
The final step is to reiterate the previous steps (2, 3) to find the best criteria. The following Equation 1 explains the K-mean algorithm formula (Zhang, 2008;Kanungo et al., 2002).
is the Euclidean distance between i x and i v . i c is the number of data points in the first cluster, whereas c is the number of cluster centers.
The main advantage of the K-mean is that it is fast, robust, easier to understand and gives the best result when data sets are distinct or well separated from each other. In the following section, we use the proposed methodology to build the cyber-physical maintenance system.

Logical Analysis of Data (LAD)
LAD is a data-driven technique that allows for the discovery of the hidden patterns of phenomena based on pattern recognition. It is applied through two stages: training and testing stages. In the first stage, part of the data is used as training data to extract the strongest hidden patterns explaining the phenomena. To build the most accurate diagnosis model, the training data shall be well balanced to cover the healthy and failure classes, while the rest of the data shall be used to test the generated mode accuracy from the first stage. LAD was introduced at the Rutgers University Center for Operations Research (RUTCOR) in Hammer (1986). The comparison between LAD and the other techniques of machine learning are proposed in Yacout (2010). The main steps of the proposed technique are the binarization of data, pattern generation, and theory formation. The first step is the process of transforming data into a Boolean database. In our work, we use the binarization technique proposed in Boros et al. (2000). The proposed binarization technique is briefly explained in Shaban et al. (2017). The pattern generation is the second step in the proposed technique; in this step, the key building block in LAD extraction is generated. In our work, MILP is used to generate the hidden patterns of the failure state of the rolling element bearing. The generated patterns cover at least one point of the observations and the strong pattern covers most observations. The last step in the proposed technique is theory formation. In these steps, the LAD decision model is generated. It can be used in real-time remote maintenance systems and decision-making support. The equation explained in Mortada et al. (2011) generate a score ranging between -1 and 1. When the output of the presented function is a negative value, this means that the tested observation belongs to the negative class. On the other hand, if the result equals zero, this means that LAD cannot decide the class of the tested observation. ; In the follow we minutely discuss each of those steps.

Data binarization
In this step, the raw data transformed into binary data using a binarization technique that translates each numerical feature to a set of binary attributes. The binarization technique use ranks the distinct values that a numerical feature (u) takes in a data set in ascending order. The second step is deciding the cut point is inserted between every two values which belong to different classes. The average of the two values is calculated to find the cut-point, value. For each cut-point, a binary attribute b is formed as follow in Equation 2.
The number of cut points generated for each numerical feature is the same generated number of binary attributes that make up the binarization training set at the end of the binarization process. In the following section, the pattern generation step is explained.

Pattern generation
This step aims to discover the hidden patterns that divide between classed. In our study, the classes are positive class (defect) and negative class (healthy).In this paper, the pattern generation is proposed based on the formulation and solution of a mixed-integer and linear programming (MILP) problem (Linderoth & Lodi, 2010). The advantages of this approach than the described pattern generation methodology proposed in (Ryoo & Jang, 2009;Moreira, 2000) are the ability to control the discriminating power between the different classed through a user-defined parameter called the discriminating which explain the minimum number of patterns that must separate each observation to class i c from those belonging to class j c (Mortada et al., 2014). The second advantage of the proposed approach is no limit on the number of patterns generated by the pattern generation algorithm, which creates more patterns finding possibilities.
In this step the class patterns that create the decision model in the last step of the proposed approach. It started by creating an empty set of patterns , The generated pattern P is tested for all class pairs ( ) to decide on whether P can be placed in their corresponding pattern set ab P .the pattern rules to be admitted to a set ab P are controlled by the two user-defined values: the minimum positive coverage rate min d + and the maximum negative coverage rate max d − . The generated pattern is added to each set of patterns that satisfies the following coverage rate thresholds as follow in Equation 4 and 5: The three major constrains sets of MILP problem to generate the optimized pattern of the objective function. The first constraint that the resulting pattern must be able to cover a positive observation i S + ∈ and not required to cover all the observations in c. This constrains illustrate as follow in Equation 8: The second constraint in the optimization problem is that the positive pattern should be not covering any negative observations. This constraint is shown as follow in Equation 9: The last constraint in the MILP problem is it must not generate a pattern that has been generated in previous runs. The mathematical formulation which achieves this constrain as follow in Equation 10:

Theory formation
Theory formation is the final step of the LAD decision-making model where the generated patterns in the previous step are used to form a decision function called the discriminant, such as the one given in Equation 13. This equation used to compute a score ranging between -1 and 1. In case the output of the function has a positive value, this indicates that this signal is for a defected element. If the equation output is negative this means that the element is in healthy stat. Zero output means that the signal not enough to detect to which class this signal belongs to each class (Mortada et al., 2014).  where N is the number of generated patterns, P (O) is equal to 1if pattern covers observation O, otherwise is equal to zero and α is the weight of pattern.

Architecture of unlabelled remote diagnosis system
For the industrial process and maintenance of machines, CT and data analysis are important technologies and are gradually becoming the trend in maintenance innovation. The major advantages of CT are cost-effectiveness, unlimited storage space, data recovery, accessibility, scalability, and security. To achieve near-zero break down, machines need to be continuously monitored during their life cycle. This needs scalable storage space and analysis. A detailed architecture of maintenance for the cyber-physical system that consists of three layers, namely, sensor layer, connection layer, and analysis and decision making layer is introduced. 1) Sensor Layer: this layer consists of sensors providing multisource unlabelled data from machines. The sensor has the capability, such as wireless communication, enabling the data transfer through the internet to the analysis and decision making layer. Sensors already support dynamic, real-time, optimized and self-organized industry processes. Intelligent sensors report the real statuses of the machine and machining process, turn these into digital data, and share them automatically with the process controller; 2) Connection Layer: this layer consists of distributed file storage and parallel computing. By using CT, the connection between the physical side and the cyber side occurs; 3) Analysis and Decision-Making Layer: this layer includes the analyzer software that transfers the unlabelled data into useful information and decisions, which are sent back to the machine and the local maintenance team. This software was built by using historical data analysis, which generates governor patterns.
The objective of this paper is to build Unlabelled Cyber-Physical Maintenance System Software (maintenance-CPS) by using an interpretable methodology to generate automatically and without any human interference the status of the machine components (whether healthy or unhealthy); identify the corrective action needed; and, alarm the local maintenance team to interact. Moreover, the proposed platform is built to be the connection point between all maintenance management elements consisting of: monitored machines, local maintenance teams, and spare parts supplier, to overcome the problem of work at a distance; and CT is used to connect its layer. To this end, it is very important to identify Cloud services. Cloud services are internet-based services which can offer all the things that a computer system provides. Due to the continuous increase of data collation, there is an overwhelming flow of data (Villars et al., 2011). Cloud technologies have many advantages of cloud computing including virtualized resources, parallel processing, security and data service integration with scalable data storage. It does not only minimize the cost and restrictions but also provides reduced infrastructure maintenance cost; efficient management and user access control (Hashem et al., 2015). Additionally, Cloud services provided by vendors, such as Dropbox, IBM, Microsoft Azure, Google Cloud and Amazon AWS, support 24/7 operations (Avram, 2014). CT solves the problems faced when using traditional computing. Simply a required capacity of computer service needs can be rented from one of the service providers mentioned above. The virtualized machine can provide the best performance with low cost without the need for a high capital cost for servers, power, cooling, software and hardware upgrades. In this present study, the researchers use Dropbox Cloud storage service as a distributed file storage. The storage capacity of the prototype used for this research is (2 Gigabyte). The benefits of the proposed platform are: adjustable storage capacity; easy and secure access; as well as, fast and efficient synchronize technology.
The proposed system starts with an offline unlabelled data analysis to generate governor patterns. The generated patterns are transmitted to an online decision making. The architecture of the unlabelled cyber-physical maintenance system is divided into three layers: the first layer is the sensor layer, which uses an intelligent sensor that connects to the internet to send the collected data to the analysis and decision-making layer passing through the connection layer. The second layer is the connection layer using CT to save and transfer the data between the different system layers via the internet. The third layer is the analysis and decision layer that is divided into two steps: the first step is unlabelled to labelled data transformation by using the K-mean algorithm. This step enables LAD to discover the hidden patterns in the vibration signal. The historical data is used to help offline LAD to generate constraints, which are used to build the cyber-physical system to continuous monitor, diagnosis, and control. The second step is building the cyber-physical center, which decides to be sent back by the Cloud to the machine control unit, then being transmitted to both the maintenance team to take the necessary action accordingly and the spare parts supplier to prepare the spare parts needed to help the maintenance team to finish the maintenance task in the shortest time possible, reducing the maintenance cost and increasing the production rate. Furthermore, the generated cyber-physical system sends back a corrective action to the machine to overcome the fault or to shut down the machine to prevent catastrophic failure. Figure 2 illustrates the proposed architecture of the unlabelled cyber-physical system. The starting point of the proposed system is the sensors, which take a signal of the monitored machine every certain time; adjust it according to the application including detection or measurement parameters enable them to be dynamically adapted to tasks as necessary; and finally, send it to the Cloud. The collected unlabelled data go to the online remote diagnosis center to convert it to useful information and determine the status of the machine. A decision is taken, either the machine is in a healthy state, then the system records their status in a report; or the machine is in an unhealthy state, upon which the system sends an alarm and corrective action to the local team and spare parts supplier. The corrective action to replace the defected part is sent to the local team. Also, the specification of the needed part is sent to the spare part supplier to prepare the required element. The historical data stored in the Cloud is used to improve the accuracy of the system. Figure 3 illustrates the decision making procedure schematic of unlabelled cyber-physical maintenance system (Maintenance-CPS).
In Figure 3 the solid line is an indication for wired connection while the dashed line is connected via the internet. Besides, corrective actions are sent to the machine actuator to control the operation factor.

Case study
To test the proposed system, a simulation model of the unlabelled cyber-physical maintenance system of rolling element bearing failure is developed. The tested data is extracted from a run to failure data sets of signals obtained from the National Science Foundation's Industry/University Cooperative Research Centre for Intelligent Maintenance Systems (IMS) through the NASA prognostic data repository shown in Figure 4 below. The test rig setup consists of a constant speed motor (2000RPM) coupled to a shaft supported with four rolling element bearing. The four types of bearings are Rexnord ZA-2115 double-raw bearings with 16 rollers in each raw, 71.5 mm pitch diameter, 8.407 mm roller bearings and a tapered contact angle of 15.170 (Lee et al., 2015). A high sensitivity Quartz ICP accelerometers PCB 353B33 was installed on the bearing housing. The card was used to collect a snapshot of the vibration signal every ten minutes with 20 Khz sampled signals until a failure occurred. This test was carried out for seven days. Table 2 illustrates a sample of feature extraction using MATLAB Software.  (Qiu et al., 2006). Where STDE is standard divination; SK is Skewness; KU is Kurtosis; RMS is Root mean square; and CF is crest factor. The proposed clustering technique was used to find the nodal point that identifies the status of the bearing. Then the K-mean clustering step was implemented by using the Weka data mining software (Hall et al., 2009). In the following Figure 5 and Table 3, the clustering results are illustrated by using K-mean algorism. Figure 5. K-mean node presentation using Signal statistical analysis. As exhibited in Table 3 above, the new data after clustering was used to detect the characteristic patterns for machine operation status that lead to a healthy or unhealthy condition. LAD generates positive patterns that discover the unhealthy condition and negative patterns in the condition of health status. The software cbmLAD (Yacout et al., 2012) is trained by using the data obtained from the previous step. Table 4 shows a sample of the positive characterization patterns and the negative characterization patterns found by cbmLAD software. The generated patterns illustrate the importance of the proposed methodology because of the clear pattern used to build the cyber-physical maintenance system code. The interpretable methodology illustrates the importance of the Kurtosis and Range factors in the diagnosis of rolling element bearing. Now, the governor parameter needed to build the online remote diagnosis center is present. As clearly exhibited, CT is the most appropriate means for public communication due to its virtualized resources, parallel processing, security and data service integration with scalable data storage achieving communication between both the cyber world and the physical world. The overall architecture of the proposed system is illustrated in Figure 6. MATLAB was used to build the cyber-physical system code.
The vibration signal is sent from the data set file to the cloud file via the internet in the Dropbox application. It consists of two centers: the physical center, which includes the monitored machines having sensors, actuators, communication Gateway and the local maintenance team.
The cyber center, on the other hand, receives the data coming from the physical system converting it to useful information to monitor and control the physical world without any human interface. The system should also be able to allow the local maintenance team to see a live view of the running machine to interact if required. In the displayed case study, the run to failure data set illustrated in Figure 4 is used. Additionally, MATLAB was used to build the cyber-physical system code.
In our study, part (1) is the aerospace bearing vibration signal data sets that was sent to the Cloud file in Dropbox application via part (2) as infrastructure as a service (IaaS). that is the gateway used to send and receive the signals and corrective action generated from the cyber-physical center via the internet. The function of the gateway device is to bridge the communication gap between the cyber-physical maintenance system layers, the sensors, and the Cloud. It supports communication protocols like Wi-Fi or Bluetooth. For this case study, we used a Wi-Fi gateway. The benefits of using this gateway are: high scalability, lower costs, reduced telecommunication cost and mitigate risks. The built governing code runs every certain time to analyze the received data and generate a suitable decision. For this case study, the software works every 10 minutes, and the period can be adjusted according to the application. In the case of identifying a positive pattern, the system automatically sends an alarm report by secured E-mail to the local maintenance team including the decision and the corrective action. The generated decision is sent back to the machine controller via the gateway carrier. The historical data is then stored in the DropBox file to improve the offline governor patterns generation accuracy. Also, the specification of the spare parts needed for the maintenance process is sent to the spare parts supplier to procure it for the maintenance team.
The generated report, which is sent to the support team, is provided with better understandable patterns as a result of the interpretable methodology. The limitations of the proposed system, on the other hand, are the dependence on the internet service, the real-time online monitoring involving data transfer and the exchange of a large amount of data such as graphs, database information that can cause a time delay and data loss. On a different note, the security of data and anti-hacking technologies is important in the communication field to avoid any unwanted interface. Also, the automatic update of the software by using the stored historical data without an unwanted offline update is an important limitation.

Discussion and conclusion
In this paper, a Maintenance-CPS platform built for the diagnosis of machine performance and the exploration of unlabelled data is presented. The advantages of the proposed system are: avoid an expensive onsite visit by expert engineers; reduce training cost and continuous monitoring achieve the near-zero breakdown. The system is based on Cloud technologies to connect the proposed cyber-physical system layers. These layers are a sensor, connection and analysis, and decision making. The analysis and decision making is constructed based on an interpretable methodology dealing with unlabelled data. The proposed methodology consists of three stages; the signal feature extraction, clustering stage using K-mean algorithm and pattern recognition machine learning technique (LAD). The methodology enables the system to deal with unlabelled data which includes considerable information about monitored machines. Also, the system is tested by using a run to failure data set of aerospace bearing from the NASA prognostic data repository. Samples of patterns generated by using cbmLAD software were shown in Table 4. In summary, the implemented system helps to carry out the maintenance at distance and connects the maintenance chain with near-zero breakdowns and enhances the design of machine element trough using Cloud technologies.
In future work, the performance of the cyber-physical maintenance system will be enhanced by improving the proposed technique. A multi-component cyber-physical maintenance system using the same diagnosis center will be the future step. Furthermore, enhancing communication between the system components to increase the reliability of the maintenance task will be considered. This step will be used in real-time monitoring and diagnosis cyber-physical system.