HYBRID FEATURE SELECTION AND SUPPORT VECTOR MACHINE FRAMEWORK FOR PREDICTING MAINTENANCE FAILURES

The main aim of predictive maintenance is to minimize downtime, failure risks and maintenance costs in manufacturing systems. Over the past few years, machine learning methods gained ground with diverse and successful applications in the area of predictive maintenance. This study shows that performing preprocessing techniques such as over-sampling and feature selection for failure prediction is promising. For instance, to handle imbalanced data, the SMOTE-Tomek method is used. For feature selection, three different methods can be applied: Recursive Feature Elimination, Random Forest and Variance Threshold. The data considered in this paper for simulation are used in literature. They are used to measure aircraft engine sensors to predict engine failures, while the prediction algorithm used is a Support Vector Machine. The results show that classification accuracy can be significantly boosted by using the preprocessing techniques.


INTRODUCTION
Maintenance costs are a major part of the total operating costs of all manufacturing or production plants (Mobley, 2002). U.S. industries spend more than 200$ billion each year on the maintenance of plant equipment which impacts their productivity and profit (Mobley, 2002). In fact, predictive maintenance (PdM) is a leading strategy aiming to improve the productivity, quality and the performance of overall equipment. It allows to schedule maintenance at the most convenient and most cost-efficient moment before that the failure occurs.
Predictive maintenance technologies measure and gather operations and equipment realtime data via sensor networks. It includes non-destructive testing methods such as acoustic, infrared, oil analysis, sound level measurements, vibration analysis, and thermal imaging.
Predictive maintenance uses data science and predictive analytics to estimate when a piece of equipment might fail. Many machine learning (ML) techniques are designed to analyze a large amount of data and can achieve outstanding performance (Wuest, 2016). Machine learning is a powerful tool for predictive analyses in different applications whose performance depends on the appropriate choice of ML techniques (Carvalho et al., 2019).
In recent years, many machine learning techniques have been introduced to deal with failure prediction. In their work, Milena Nacchia et al. (Nacchia et al., 2021) presented analyses of the maturity level and the contribution of ML methods for predictive maintenance in smart manufacturing. Chia-Hung Yeh et al. (Yeh et al., 2019) proposed a method based on machine learning to predict the long cycle maintenance time of wind turbines in a power company. A hybrid network was used and reached good predicti on results.
Furthermore, the main aim of Emiliano Traini's et al. (Traini et al., 2019) work is to give a general framework that is applicable to cases of predictive maintenance of generic manufacturing tools in order to improve the man-machine collaboration in production. The study is applied to a real milling data set as validation of the framework. The aim of this paper is to introduce a framework that includes some preprocessing techniques combined with Support Vector Machine (SVM) algorithm.
Data preprocessing in machine learning is a crucial step that helps to enhance the quality of data to promote the extraction of meaningful feature subsets. It refers to the technique of preparing (cleaning and organizing) the raw data to make it suitable for building and training machine learning models. Once redundant or irrelevant features are eliminated, it will lead to significant impacts in terms of the performance of the ML methods used. Also, preprocessing techniques involves handling the class imbalance problem by oversampling or undersampling the data, real-world applications such as failures prediction frequently encounter this problem.
In their work, Bekar et al. (Bekar et al., 2020) present an intelligent approach using unsupervised Machine Learning techniques for data preprocessing and analysis in predictive maintenance area. They demonstrate that, through this approach, it is possible to get useful information about component/machine behavior which serve as a foundation for decision support and the development of prognostic models. Fernandes et al. (Fernandes et al., 2019) performed data analysis and feature selection to build models in predictive maintenance within a metallurgical company. The results demonstrated that insights derived from the data will aid in developing adaptive learning models capable of handling complex information which can be effectively deployed across an entire product line of industrial equipment. In a paper by Lai & Leu (Lai & Leu, 2017), they explained that ensuring data preprocessing has become a significant concern issue of big data applications. Also, they proposed the Preprocessing Tasks Quality Measurement (PTQM) model to identify the quality defects of data preprocessing tasks in order to increase the big data applications efficiency and practicality. Abidi & Alkhalefah (Abidi & Alkhalefah, 2022) proposed a PdM planning model using five main phases: (a) data cleaning, (b) data normalization, (c) optimal feature selection, (d) prediction network decision-making, and (e) prediction. They demonstrated that the proposed model can efficiently predict the future condition of components for maintenance.
The highlighting points of this paper are listed below: -Introduction of a predictive maintenance framework to prevent unexpected failures.
-Application of SMOTETomek method to overcome the problem of unbalanced data.
-Application of feature selection methods to select the most relevant features and optimize the performance of the model. Evaluate the performance of the model using the SVM algorithm and demonstrate that the use of SVM combined with the above pre-processing techniques leads to better results.
The rest of this article is organized as follows: Section 2 introduces the proposed method and related works. In section 3 the experimental results are presented. Finally, our paper ends with a conclusion and future work in section 4.

THE PROPOSED METHOD AND RELATED WORKS
Preprocessing techniques represent a very important part of a data science project, it helps to reduce the dimensions of a dataset and remove the useless variables. In this work, the methodology followed is divided into three phases: (i) The oversampling technique is applied to the unbalanced aircraft engine dataset using SMOTE-Tomek method, (ii) then features selection techniques are used to select the most important features and drop the rest. (iii) The SVM is applied to the balanced dataset and measure the classification using the accuracy (Fig. 1).

Imbalanced data and re-sampling
First, a dataset is imbalanced if the classification categories are not approximately equally represented (Nacchia et al., 2021). Preprocessing of data by resampling methods are commonly used to deal with the class-imbalance problem (Estabrooks & Japkowicz, 2004), it is used to upsample or downsample the minority or majority class. For an imbalanced dataset, we can oversample the minority class using replacement. This technique is called oversampling. Similarly, we can randomly delete rows from the majority class to match them with the minority class, and this technique is called undersampling.
One of these approaches is SMOTE-Tomek which combines SMOTE (Synthetic Minority Oversampling Technique), the famous oversampling methods and Tomek Links function for undersampling.
SMOTE-Tomek was introduced first by Batista et al. (Batista, Bazzan & Monard, 2003), the standard algorithm flow is as follows: Step 1: For dataset D with unbalanced data distribution, we use the SMOTE method to obtain an extended dataset D' by generating many new minority samples (Wang et al., 2019).
Step 2: Tomek Link pairs in dataset D' are removed using the Tomek Link method (Wang et al., 2019).
The pseudocode of SMOTE-Tomek Links is as follows: 1. (Start of SMOTE) Choose random data from the minority class. 2. Calculate the distance between the random data and its k nearest neighbors. 3. Multiply the difference with a random number between 0 and 1, then add the result to the minority class as a synthetic sample. 4. Repeat step number 2-3 until the desired proportion of minority class is met (end of SMOTE). 5. (Start of Tomek Links) Choose random data from the majority class. 6. If the random data nearest neighbor is the data from the minority class (i.e. create the Tomek Link), then remove the Tomek Link.

Feature selection methods
Feature selection (FS) consists on eliminating redudant or irrelevant features that might decrease the model performance (Huang, Li & Xie, 2015).
It is important in fault diagnosis in industrial applications, where numerous redundant sensors monitor the performance of a machine (Jović, Brkić & Bogunović, 2015). Feature selection methods can be classified based on two criteria: The search strategy and the evaluation criterion (Liu & Motoda, 1998). Both criteria typically belong to one of the three classes, determined by the evaluation metric of choice: filter, wrapper, embedded and hybrid methods (Chandrashekar & Sahin, 2014).
Most filter methods calculate a score for all features and then select the features with highest scores (Bommert et al., 2020).Tthey are independent of any learning algorithm. Wrapper methods look for features that are suitable for the machine learning algorithm used, they are evaluated based on the performance of the model (Huljanah et al., 2019). Embedded methods are methods that maintain each iteration of the model training process and extract features that contribute most to training for certain iterations (Huljanah et al., 2019). Hybrid methods are presented as a combination of the above methods.
In the literature, many studies use feature selection to improve the model's performance in the predictive maintenance field. Aremu et al. (Aremu et al., 2020) in their work, presented a feature selection framework, beneficial for predictive maintenance analytics. They proposed a correlation and relative entropy feature engineering framework specific to asset data. A novel and flexible parameterized PdM solution for event/log based equipment was proposed by Wang et al. (Wang et al., 2017). They explained how to optimize the model parameters by selecting the most effective features and tuning classifiers to build a highperformance prediction model.
Various FS methods are used in literature, and some of these methods are applied to our case study in section 3: -Random Forest (RF): is an embedded method. It is presented as an ensemble of unpruned classification or regression trees (Breiman, 2001). Each individual tree in the random forest spits out a class prediction, and the forest chooses the class with the most votes. RF performs feature selection while a classification rule is built. The two commonly used variable importance measures in RF are Gini importance index and permutation importance index (Hasan, 2016). -Recursive Feature Elimination (RFE): is a wrapper method, it selects features by iteratively training a set of data with the current set of features and eliminating the least significant feature indicated (Themistocleous, Papadaki & Kamal, 2020). These features are repeatedly eliminated until a certain threshold is met. The RFE ranks features according to some measure of their importance (Granitto, 2006). At each iteration features importance are measured and the less relevant one is removed (Granitto, 2006). -Variance Threshold: is a filter method. It removes all features whose variance doesn't meet a specific threshold (Themistocleous, Papadaki & Kamal, 2020). The variance value can be calculated using equation (1): in which p represents the percentage of instances taking the feature value 1. Features with the variance score below the threshold can be deleted immediately. The purpose of this filter is to remove features that have a very little variation or that consist only of noise (Ambarwati & Uyun, 2020).

Support Vector Machine
A Support Vector Machine (SVM) is a supervised machine learning algorithm introduced by Vapnick (Vapnik, 1999) and can be used for both classification and regression tasks. The objective of SVM is to find an optimal separating hyperplane in an N-dimensional space that correctly classifies the data points (Fig. 3). SVM has been used in a wide variety of applications such as financial fraud detection (Ravisankar et al., 2011), credit rating analysis (Huang et al., 2004), predictive maintenance (Gohel et al., 2020), among others.
In a SVM, the aim is maximizing the margin between the data points and the hyperplane. The data points that lie closest to the separating line between two classes are called: 'Support Vectors'. An SVM can be linear or nonlinear, the linear formulation is the simplest one.

Fig. 3. Definition of a separating hyperplane illustrated in a two-dimensional input space: Linear classification
The SVM problem for the training data set of N points { , }, i = 1, . . ., M is given by: Minimize: (2) Subject to: ( + ) ≥ 1 − , = 1, . . . . . , Where w is the weight vector, C is the regularization parameter and b is the bias term corresponding to the hyperplane (Singla & Shukla, 2020). The different kernels are mentioned in Table 1. Where d is the degree of the kernel function. θ, σ and η are kernel parameters.

Data
It is crucial that Aircraft Engines should undergo proper maintenance, but it is very expensive as a routine. Hence, airlines are interested to predict engine failures of in-service equipment in order to reduce flight delays and ensure cost savings. The data set (D) is provided by NASA .The train set consists of run-to-failure data from 100 aircraft engines. A brief description of the data set is presented in The 'Cycle' column in the train set has increasing values for every Id. The last value of 'Cycle' for a particular Engine Id represents the failure of that engine.
In the training set, the engine with id=69 took a maximum number of cycles to fail (i.e., 362 cycles) and the engine with id=39 took minimum number of cycles to fail (i.e., 128 cycles).
The engine is assumed to operate normally at the start of each time series and it starts to degrade at some point during the series of the operating cycles. When a predefined threshold is reached, the engine is considered unsafe for further operation (Tarik & Jebari, 2020).
If the remaining cycles are less than the specified number of cycles (e.g.. period=30), the engine will fail in this period otherwise the engine is fine.
The training data file contains 20630 cycle records with 3100 positive targets and 17530 negative targets as presented in Fig. 4.

Fig. 4. Target Vs Records
The dataset is imbalanced, that means, the percentage of the normal working records of the engine (healty) is higher than the faulty ones. Thus, the oversampling technique was applied using the SMOTETomek technique. Fig. 5. presents the new values of the two classes.

Fig. 5. The records after oversampling technique
All the values in the dataset are numeric, there are no missing values and the data is balanced and not noisy. The dataset is divided into training and testing set (respectively 80% and 20%).
The code was performed using jupyter notebook running on python 3.8 language environment and executed on a core I5-73000 CPU processor.

Experimental results
For feature selection methods, three algorithms are used: Random Forest, RFE and CFS. Table 3 shows the relevant features selected after performing the three methods.

Accuracy=
( 5) In addition, the athors also use the SVM under default parameters (C and sigma of the Radial basis function) to classify and predict the data. The SVM applied to the data oversampled (D') gave better accuracy results compared with the initial data set (D). or the next experiment, FS methods were applied to the balanced data set according to the flowchart in Figure 6.:

Fig. 6. Research workflow
The performance accuracy was calculated for SVM with three kernel functions: the Radial Basis Function (RBF), the Sigmoid and the Polynomial function. The SVM classification has the best accuracy using the FS methods. The RF-SVM model reaches the highest accuracy 97,77% using the polynomial function. The results are shown in the table 5.

Fig. 7. Accuracy plots for the three models using different kernel functions
The SVM performance for balanced dataset using feature selection methods proves to be better than that with SVM applied for the initial data set. Also, the results of SVM employing polynomial kernel are better than other kernel functions. Fig. 7. shows that the Random Forest combined with SVM using polynomial function reaches the highest accuracy and the lowest accuracy value is 79% for SVM with Sigmoid kernel.

CONCLUSION
Anticipating equipment failure or maintenance needs presents several challenges. Some of these challenges are the following: the selection of the most relevant features from the available data and identification of which sensor data, variables, or features are most indicative of impending failures or maintenance needs.
Moreover, the issue of imbalanced classes can result in biased models, leading to a poor performance in identifying failures.
Overcoming these challenges can lead to more efficient and cost-effective maintenance practices, reduced downtime, and improved equipment reliability. In this context, this study is more relevant to develop solutions for real-world industrial problems, consequently promoting the adoption of machine learning in various manufacturing industries.
Through this work, the main purpose of this article is to perform a systematic assessment of the effectiveness of data preprocessing techniques on ML methods in the area of predictive maintenance. It demonstrates that the data preprocessing techniques may significantly influence the final prediction results.
The paper presents the study of various feature selection algorithms combined with SVM and analyzes their performance for aircraft engine dataset. Experiments demonstrate that there is a significant performance difference in accuracy using the aircraft engine data and its balanced version using SMOTE-Tomek method. Indeed, the SVM classification accuracy achieved better results using the data oversampled with features selection methods.
In the future, the study can be enhanced by applying hybrid feature selection algorithms combined with metaheuristics to improve the performance of ML algorithms.

Conflicts of Interest
The authors have no conflicts of interest to declare.