A PRISMA-driven systematic review of data mining methods used for defects detection and classification in the manufacturing industry

Paper aims: This research aims to analyze the primary studies published in recent years focusing on defect detection or classification in manufacturing and extract information about frequently used data mining (DM) methods, their accuracy, strengths, and limitations. Originality: Industrial production is now undergoing a dynamic transformation in the context of Industry 4.0, where implementation of data mining is a frequently discussed topic, and such an overall summary is missing. Research method: In this study, the PRISMA-driven systematic literature review is combined with the approach defined by Kitchenham (2004). Main findings: The most frequently used data mining methods for defect detection are Bayesian network (BN) and Support vector machine (SVM). Besides previously mentioned methods, the Decision trees (DT) and Clustering are often used for defect classification. Neural Networks (NN) use is common for both defect detection and classification. DT, together with the Genetic algorithm (GA) and SVM, achieved the highest average accuracy. Recently, authors often combine different DM methods, and also methods for data dimensionality reduction are often used. Implications for theory and practice: This study contributes to the quality management literature by extending a summary of recently used DM methods for defect detection and classification. This summary can help researchers choose a suitable method and build models for achieving its research purpose.


Introduction
Due to the rapid technological advancements and globalization the market and industrial environment is getting more complex, challenging and turbulent. To stay competitive firms need to fight for customers by research and inventions mainly in technological field. This gave rise to, in recent years frequently discussed phenomenon, Industry 4.0. which integrates the cyber-world with the physical systems by using semantic machine-to-machine communication, IoT, CPS and embedded systems. Some authors, on the other hand, perceive that much of the shift towards manufacturing paradigm Industry 4.0 has been driven by the emergence of "Big-Data" and the issues associated with their collection, managing and interpretation which are still partially persist (Chen et al., 2015). To deal with the complexities of the modern production system within cyber-physical environment traditional manufacturing companies becomes so-called "smart factories"

Background
"Data Mining," often also referred to as "Knowledge Discovery in Databases" (KDD), is a relatively young sub-discipline of computer science aiming at the automatic interpretation of large datasets. The classic definition of knowledge discovery by Fayyad et al. from 1996 describes KDD as "[...] the non-trivial process of identifying valid, novel, potentially useful, and ultimately understandable patterns in data" (Fayyad et al. 1996, p. 41). The machine learning algorithms construct possible associations among various parameters (called features in data mining and attributes in computer database literature) that are important in modeling processes (Kusiak, 2001). The field of DM makes use of automated tools and techniques, employing sophisticated algorithms to discover hidden patterns, associations, anomalies, or structures from large amounts of data stored in a data warehouse or other information repositories, effective and valuable. In the context of manufacturing, two high-level primary goals of data mining can be identified -they are prediction and description. Descriptive data mining is focused on discovering interesting patterns to characterize the current data. Predictive data mining is meant to assess the behavior of a model and determine future values of key variables based on existing information provided by available databases (Choudhary et al., 2009). One of the most challenging tasks in the whole KDD process is to choose the right data mining technique.
Several approaches to the categorization of Data mining algorithms are available. The approach used in this study identifies strongly with Patel & Patel (2014), although, for our purpose, the methodology was slightly modified. The scheme in Figure 1 represents a description of our approach. Data mining techniques can be roughly categorized into four major groups, namely Classification, Clustering, Association, and Regression. Classification is a process of determining groups ("classes") of given objects based on their characteristics, where the semantics of these classes is known beforehand. Classification and prediction are two forms of data analysis that can be used to extract models describing important data classes. Classification methods predict the target With respect to RQ1, we will search the relevant quality management literature employing data mining methods. class from analyzing the training dataset. To maximize predictive accuracy, the testing phase is then performed. In our work, this set of methods contains Neural networks (NN), Support vector machines (SVM), Decision trees (DT), Bayes networks (BN), Genetic algorithms (GA), and Rough set theory (RST). Clustering is the process of identifying natural groupings or clusters within multidimensional data based on some similarity measure. A brief characterization of the most popular DM tools, suitable also for manufacturing process analyses, can be found in Perzyk et al. (2007) or Perzyk (2006).

Research method
We have undertaken this survey in a systematic manner guided by the work of Kitchenham (2004), Kitchenham et al. (2010), and the PRISMA standards (Moher et al., 2009). Systematic literature review (SLR) was used to extract DM methods for production defect detection and classification and related questions. Based on the SLR and our research problem, the following research workflow (see Figure 2) can be raised. Through these research steps, we can obtain the desired results.

ALL (a AND b AND (c OR d) AND e) in (Title or Abstract or Keyword)
We used the ACM Digital library and ProQuest central as representative search engines to perform automatic searches and refine the search string until all the search items met the requirements and the number of remaining papers was minimal. Then, we used the defined search string to carry out automatic searches. We chose Scopus, ISI Web of Science, and ProQuest central (Table 3) because those three databases are the largest and most complete scientific databases, including high-quality papers regarding data mining problematic. After the automatic search, a total of 227 067 papers were collected (Table 4) The first step is to define search keywords and validate the search string. We combined different keywords that are related to research questions. The list of used keywords is presented in Table 2.
The search string is defined as follows:

Stage 2: grey literature
To cover gray literature, additional records were identified through other sources, f.e. Google Scholar, JSTOR, ACM, etc. Articles have been searched by keywords mentioned in Table 3. Some papers have been found during previous research (Bartova & Bina, 2020). Detected articles were added to the papers pool from the database search.
The gathered articles are then filtered by using the selection strategy described in the next section.

Selection strategy
This methodological review was conducted in accordance with the Preferred reporting items for systematic reviews and meta-analyses (PRISMA) statement for searching, screening, appraising quality criteria, and data extraction and handling. This approach is used to describe employed research designs, methods, and procedures in the defect detection and classification process in manufacturing. It is used to highlight the strengths and weaknesses of methodological tools and explores the evaluation criteria for the performance and efficiency of used approaches. The review focuses on new methodological approaches, modifications of existing methods, and discussions of quantitative and data analytic approaches.
The original sample for this study included all journal articles published after 2010 found in the defined database by searching with defined keywords and key phrases. Using this searching strategy, 227 067 articles were selected since they represent the most of researched knowledge. The set was then adjusted by applying the inclusion criteria. Because of the selected searching strategy, the set of articles tends to contain a large number of duplicities. The duplicities will be removed in the next step.

Inclusion criteria • Journal articles and conference papers only
• Papers from January 2010 to March 2021 • Papers investigating defect detection or classification in the manufacturing process using some of Data mining methods • Papers reported in the English language from any country A total of 28255 articles met the inclusion criteria. After the removal of duplicities, we got the total number of 10877 papers which will be subject to the screening process itself. In the next part of the screening process, the 4-round exclusion process is used to reduce the source set of papers. The reduction process of the original set of papers is demonstrated in the PRISMA flow diagram (see Figure 3). After the screening phase, 172 remaining articles are subjected to further analysis within the Eligibility phase. The abstracts of all articles are analyzed, and 17 articles are found to be irrelevant.

Exclusion
In addition to general inclusion/exclusion criteria, we consider it critical to assess the "quality" of primary studies. Since the full text of all articles needs to be provided, a total of 79 articles are removed from the research because of the impossibility of obtaining them. Based on the evaluated quality, we have decided to remove from our further research articles achieving less than 60% quality. By this additional criterion, we reduced a total of 15 studies. These papers mainly had insufficiently described methodological approaches or results, which is unacceptable for fulfilling our research objectives.
A total of 61 articles were left after the quality assessment step. In general, only good-quality articles have been chosen. However, some of them do not describe the used dataset and accuracy measurements, and there is a lack of discussion of the limitations of the proposed method. However, the total average score of 8.47 out of 12 indicates that the quality of the research is promising, supporting the validity of the extracted data and the conclusions drawn therefrom.

Data extraction
The goal of this step is to design forms to identify and collect useful and relevant information from the selected primary studies so that they can answer our research questions proposed in Section 3.1. Table 6 shows the data extraction form, which will be applied to all selected primary studies for the purpose of detailed analysis.

Data synthesis
The search for our systematic review was performed in March 2021. Data synthesis is used to collect and summarize the data extracted from primary studies. Moreover, the main goal is to understand, analyze and extract frequently used DM methods for defect detection and classification applications in manufacturing processes. We analyzed the extracted data to determine the trends and collect information about our research questions. In addition, we classified and analyzed articles according to the research questions proposed in Section 3.1. We also classified the literature according to different research questions. The most important task was to classify the articles according to the DM methods through the relevant analysis.

Quality assessment
In the quality assessment phase, the articles are evaluated according to the different points of view. The crucial aspects are clarity in research aims, methodology, and also results in description and discussion. The criteria for the quality of the study were determined according to the guidelines proposed by Kitchenham & Charters (2007). The corresponding quality checklist is shown in Table 5. The table includes 12 questions that consider four research quality dimensions, including research design, conduct, analysis, and conclusions. For each quality item, we set a value of 1 if the authors put forward an explicit description, then a value of 0.5 if there was a vague description, and value 0 if there was no description at all. Each possible answer for each question has been scored in the main article and converted into a percentage. The number of papers from our set of articles distributed over the years is shown in Figure 5. We can see that from 2017 the number of published articles focused on DM methods application in defect detection and classification has significantly increased. So far, the highest number of studies was published in 2020, but the number of studies for 2021 is taken only for the first quarter. From this, we can assume that our research topic is currently often discussed among researchers from around the world. Figure 6 then shows the particular DM

Results
In this section, we will deeply analyze the primary studies to provide an answer to the five research questions presented in Section 3. While Section 4.1 provides an overview of the main concepts discussed in this section, Sections 4.2-4.6 report the answer to the research questions.

Overview of the DM methods
First, let us consider the question of frequency in using data mining methods for defect detection and their classification in the manufacturing industry. In order to achieve it, the particular methods were basically mined from the final primary studies base. As you can see on the pie chart below (Figure 4) classification technique is conclusively the most frequently used (88.5% from all included articles). That is why we compared the classification algorithms separately. The Neural network (70.37%) and Support vector machine (16.67%) were identified as the most frequently used. Decision trees rank within the list of frequently used methods as well, with a share bigger than 5.56%. The particular DM method is often chosen with respect to the nature of the data and according to the information type intended to be mined. As Han & Kamber (2001) mentioned -the kind of knowledge to be mined determines the data mining functions to be performed. This statement is also supported by Gibert et al. (2010), who believe that the choice of the DM method used depends on both goals of the problem to be solved and the structure of the available data set. methods usability distributed over the years. We can see that the Neural network method is the most frequently used, especially in recent years (total of 17 studies in 2020). There is also growth for the SVM method from the year 2017 (with 2 applications in 2017 and 3 in 2019). From the chart, we can also assume that BN method usability will increase in the near future. On the other hand, Regression and Clustering methods are probably already behind the zenith and are no more perspectives for us. Now, let us look more closely at the considered DM methods in connection with analyzed papers.

• Association rules
Generally, association rules are not frequently used for defect detection and classification in the manufacturing industry. They are usually employed for the search of relations among individual items (for example, in the shopping basket). Despite that, one paper related to hybrid OLAP-association rule mining based quality management system for extracting defect patterns in the garment industry (Lee et al., 2013) was found in our article base.  Similarly, the use of regression analysis for defect detection and their classification is quite rare. The main application area for regression analysis can be identified in the field of defect prediction. Our research detected the only study of defects detection in gearbox where Spectral regression (SR) is combined with the SVM method (Jiang et al., 2014).

• Genetic algorithm
Also, Genetic algorithms were not often used. We have only 2 cases in our final set of articles. Apostolos Chondronasios et al. (2016) used this method for feature selection purposes within the surface defect classification task. Later, Ji-Deok et al. (2018) proposed a combination of genetic algorithms and support vector machines for defect classification in the PCB manufacturing process, which achieved 96% accuracy.

• Neural Networks
Regarding our selected set of papers, the most frequently used and probably the most successful data mining method for defect product detection or classification is the Neural network. Most recently, for example, the article "Unsupervised Pre-Training of Imbalanced Data for Identification of Wafer Map Defect Patterns" written by Shon et al. (2021), where the proposed model is based on convolution variational autoencoder (CVAE), achieved a 95.1% level of F-measure. In our analysis, we found out that more than 42% of composed primary studies implementing NN are focused on CNN, for example, Lin et al. (2019) (2021) or Zhao et al. (2021). Moreover, there are articles focusing on the implementation of Contextual Hopfield Neural Network (Chang et al., 2011), Sparse Convolutional Neural Networks (Bella et al., 2019), FCN , ResNet50 (Konovalenko et al., 2020), VGG16 (Ihar et al., 2019), YOLOv3 (Yu et al., 2019). The rest of the studies focused on NN can be seen in Table 7.

• Decision trees
In our set of papers, decision trees are successfully used mainly for defect classification in manufacturing processes. For example, Radhika et al. (2013) proposed a model for defect pattern recognition based on random forest. We can also mention a study from Mao et al. (2018), "An Improved Skewness Decision Tree SVM Algorithm for the Classification of Steel Cord Conveyor Belt Defects," where authors combine decision tree with SVM with total accuracy of 97%.

• Clustering
One of the most often used data mining methods for defect detection and classification in the manufacturing process is clustering. For example, Chia-Yu Hsu (2015) proposed a clustering algorithm for defect classification in semiconductor manufacturing. Also, Taha et al. (2018) focused on the semiconductor industry, where successfully performed a clustering algorithm for defect pattern recognition. It is also worth mentioning the article "Bayesian spatial defect pattern recognition in semiconductor fabrication using support vector clustering" despite Yuan et al. wrote it in 2010. This study combines different algorithms with total accuracy of up to 96.55%, and for this reason, it is interesting and beneficial for our research.
• Support vector machine After NN, the Support vector machine algorithm is the second most frequently used method in our primary set of articles. Prachya Bumrungkun (2019) used the SVM method for defect detection in the textile manufacturing industry. In the same year,  also used SVM but for surface defects detection in Polycrystalline Diamond Compact manufacturing. Takada et al. (2017) successfully combined SVM with CNN for Electronic Circuit Boards defect detection and classification with an accuracy of 82.64%. Liu et al. (2019) achieved an accuracy of 99.95% by its SVM model for the real-time defect detection task. Also, Baly & Hajj (2012) and Jeong (2017) have achieved good results with the SVM method implementation in semiconductor manufacturing. Lastly, we would like to mention the study from Hoang & Nguyen (2020), where the authors proposed a sequence of SVM and state-of-the-art history-based adaptive differential evolution with linear population size reduction for concrete quality management, which achieved an accuracy of 93%.   In the following table (Table 7), we summarize individual DM methods, which were found out in our analysis as the most frequently used for defect detection and classification in the manufacturing industry and their average accuracy. The main message of this table is the strengths and limitations of these methods, their frequent field of application, and information on whether the particular method is used in combination with other methods or not. This table will help us in choosing methods for our future research.

Discussion of research questions
• RQ1: Which are the most frequently used DM methods for defect detection and classification in the manufacturing process?
The goal of this section is to answer the RQ1 (which mentioned DM methods are the most frequently used for defect detection and classification tasks?). Overall, the most commonly used method and also the most popular method in recent years is the neural network. There are several different algorithms used, such as CNN, R-CNN, FCN, ResNet50, etc. The NN method is mainly used for defect detection as well as SVM. For defect classification problems, the Clustering method and Decision trees are the most frequently used. It is necessary to mention that NN, SVM, and Clustering are often used for either defect detection or classification.
• RQ2: Which DM methods are the most efficient for defect classification/detection according to the chosen criteria?
The DM method efficiency can be measured, for example, by F-measure, AOC, or simple accuracy. We have compared the particular DM methods by average accuracy achieved by the proposed method. As you can see from Figure 7, the highest average accuracy was achieved by Decision trees, Genetic algorithms, and SVM methods. During the analysis of individual primary studies, we observed that NN, as well as SVM, have achieved in some cases even 100% accuracy, but due to the high volume of studies implementing in the case of NN, the final result can be distorted. For this purpose, we decided to add a median that is less sensitive to extreme values. So from the median point of view, we can see that accuracy of NN, SVM, and Clustering methods is significantly higher.
• RQ3: Do authors combine different DM methods for quality management tasks in manufacturing?
Recently it has become more common for authors to try to combine different data mining methods and use them parallelly or sequentially. In our primary set of articles, we deduced that approximately 30% of gathered articles use more than one DM method. We noticed that SVM is often combined with other methods, such as Genetic algorithms, Decision trees, or Neural networks. On the other hand, we can say that generally, the NN method is not used in combination with other different methods, but sometimes various NN types are combined together.
• RQ4: Do authors use some dimensionality reduction methods for efficient work with high volume data?
Generally yes, several studies in our research use some dimensionality reduction methods, such as Principal Component Analysis, Random forest, Factor analysis, or ReliefF algorithm. The dimensionality reduction method was detected in more than 15% of studies, but it is possible that it is not always mentioned in the study methodology or it is not clear.
• RQ5: What are the main strength and limitations of individual DM methods?
The complete list of strengths and limitations of particular methods is contained in Table 7, but we can at least discuss here a few of them. Surprisingly, NN, which is the most frequently used, has quite a long list of weaknesses such as low interpretability, time inefficiency, or over-fitting. In contrast, SVM appears to be more effective. Among its strengths, we can mention, for example, high robustness, generalization ability, and high accuracy.

Conclusion
This paper introduced a PRISMA-based systematic review focused on defect detection and classification in the manufacturing process using data mining methods. 28255 high-quality articles in the SCOPUS, WoS, and ProQuest Central databases were searched out from who's the final set of only 61 empirical researches were extracted by sequential selection. It was found out that the most frequently used DM methods for the topic under consideration are NN, SVM, Clustering, and DT. Most importantly, we found out that DT, GA, and SVM achieve the highest average accuracy. The advantages and disadvantages of particular data mining methods based on analyzed articles were identified as well. Additionally, from our research results,, we can assume that researchers use different DM methods in combinations and use some data dimensionality reduction methods such as PCA or Random forest. We also found out that the SVM and BN approaches have significant advantages and seem suitable for defect detection tasks. On the other hand, the application of Neural networks brings a lot of disadvantages, which makes them more complicated to be used, but despite this, they are often used for both defect detection and classification. The clustering method is also quite often used mainly for defect classification and achieves good results. It is necessary to state that the results presented in this paper come with certain limitations, especially with respect to the use of three selected source databases to obtain the original set of papers for analysis. We are also aware of some limits in accuracy assessment by average and median values.
This research helps us to design our own model for defect detection and classification, which will be our main interest in future work.