Local-feature and global-dependency based tool wear prediction using deep learning | Scientific Reports – Nature.com

In this section, an experiment was designed to test the performances of our proposed LFGD-TWP method.

The machining experiment was carried out in milling operation and the experimental equipment and materials used in this experiment are shown in Table 1. The cutting force acquisition system mainly consists of sensor, transmitter, receiver and PC. The sensor and signal transmitter are integrated into a toolholder, which can directly collect the force data during machining and send it out wirelessly. The signals are collected at a frequency of 2500Hz. The collected data from sensor is transmitted wirelessly to receiver, which in turn transmits the data to PC via USB cable. The signal collection process is shown in Fig.6.

The Anyty microscope was fixed inside the machine tool as shown in Fig.7. The coordinate where image of tool wear can be clearly taken is recorded into the CNC so that the spindle can move to this fixed position for wear measurement after each milling. This measurement method avoids the errors caused by repeated removal and installation of cutters, which improves the efficiency and accuracy of tool wear measurement. A sample photo of the microscope is shown in Fig.8.

A sample photo of tool wear.

Orthogonal experimental method was adopted in this paper in order to test the performances of our method under multiple working conditions. Tool wear experiments are conducted using nine cutters under nine different cutting parameters. The 9 cutters are marked as C1, C2,, C9. The milling parameters were set as shown in Table 2. The cutting width was fixed at 7mm. Each row in the table corresponds to a new cutter. Every 1000mm cutting was a cut and the tool wear was measured after every cut. Replace the cutter and cutting parameters when the previous tool wear exceeds the threshold or the cutter is broken.

The data acquisition files have three columns, corresponding to: bending moment in two directions (x, y) and torsion. Each cutter has a corresponding wear file. The wear file records the wear values of the four flutes corresponding to each cut. The cutting quality will become poor if the wear value of any edge exceeds a certain value. Therefore, this paper takes the maximal flank wear of all flutes as target.

Considering the multisensory input contain three channels, the bending moment in X direction is used as an example to illustrate the data preparation process in this paper. Firstly, the original signal of each cut is truncated to obtain the valid data segment containing 10,240 recorded values in the middle part of each signal. Finally, the data is equally divided into 10 segments based on practice, denoted as (X_{fx} = left[ {X_{1} ,X_{2} ,...,X_{10} } right]).

The maximum level of decomposition in DWT is related to the length of signals and the chosen wavelet. In this paper, db5 is used for decomposition and we select the optimal level of decomposition by comparing the performance under different levels of decomposition. Decomposition level 3, 4, 5 and 6 were chosen for comparison in this paper. The results showed that level 5 had the best performance. Therefore,(X_{1} ,X_{2} ,...,X_{10}) are converted to multi-scale spectrogram images respectively by 5-level wavelet decomposition using db5 based on the practice, denoted as (WS = [ws_{1} ,ws_{2} ,...,ws_{10} ]) where (ws = [c_{1} ,c_{2} ,...,c_{6} ]) with the length of [512, 256, 128, 64, 32, 32] is multi-scale vectors corresponding to each segment.

For each segment, 1D-CNNs are used to extract single-scale features from (c_{1} ,c_{2} ,...,c_{6}) respectively. The structure and parameters of the model are shown in Table 3.

The activation function of the convolution layer is ReLU. Every convolution layer of (c_{1} ,c_{2} ,c_{3} ,c_{4}) is followed by a max-pooling layer with region 12 to compress generated feature maps. The input channel of the model is set to 3 because of the three-channel sensory data.

After the single-scale Feature Extraction by 1D-CNNs and the concatenation of single-scale Features, a feature image of size ({32} times {6} times 32) is obtained, which is used as the input of our multi-scale correlation feature extraction model. Finally, the local feature size of each segment after automatic extraction is 150.

In this case, the dimension of automatic feature vector is 50, and the dimension of manual feature vector is 30. The adopted manual features are shown in Table 4. Therefore, the dimension of the hybrid features of each segment is 80.

The number of segments is T=10 so that the shape of the input sequence of Global Time Series Dependency Mining Model is 8010. The Mean Squared Error (MSE) was selected as the model loss during model training. An Adam optimizer32 is used for optimization in this paper and the learning rate is set to be 0.001. MSE was calculated on test data set for the models having one, two, and three layers and 100, 200, 300, 400, 500 hidden units. The results show that the most accurate model contained 2 layers and 300 hidden units in LSTM models and 400 hidden units in FC-Layer. In order to improve the training speed and alleviate the overfitting issues, we apply batch normalization (BN)33 to all convolution layers of Single-Scale Feature Extraction Model, and apply the dropout method34 to the fully connected layer. To get a relatively optimal dropout value, we set different values to train the model, i.e., p=0, p=0.25, p=0.5, p=0.75. Where p is the probability of an element to be zeroed. The results show that the dropout setting of 0.5 gives a relatively optimal result. After updating the parameters of the model with the training data, the trained model is applied on the testing data to predict tool wear.

In order to quantify the performance of our method, mean absolute error (MAE) and root mean squared error (RMSE) are adopted as measurement indicators to evaluate regression loss. The equations of MAE and RMSE over n testing records are given as follows:

$$ MAE = frac{1}{n}sumlimits_{i = 1}^{n} {left| {y_{i} - hat{y}_{i} } right|} , $$

(5)

$$ RMSE = sqrt {frac{1}{n}sumlimits_{i = 1}^{n} {(y_{i} - hat{y}_{i} )^{2} } } , $$

(6)

where (y_{i}) is predicted value and (hat{y}_{i}) is true value.

To analyze the performance of all our methods, cross validation is used to test the accuracy of the model in this paper. Eight cutter records are used as training sets and the rest one is used as testing set, until all cutters are used as testing set. Forexample, records of cutters C2, C3, , C9 are used as the training sets and records of cutter C1 are used as the testing set, the testing case is denoted as T1. Then the records of cutter C2 are used as the testing set, and the records of the rest cutter are used as the training sets, the testing case is denoted as T2. The rest can be done in the same manner. Nine different testing cases are shown in Table 5.

To mitigate the effects of random factors, each testing case is repeated 10 times and the average value is used as the result of the model. Moreover, in order to demonstrate the effectiveness of the hybrid features in this paper, two models are trained, namely the network with hybrid features and the network with automatic features only. The results of each testing cases are shown in Table 6.

It can be seen from Table 6 that our proposed LFGD-TWP achieves low regression error. In most cases, the model with hybrid features performs better than the model with automatic features only. By calculating the average performance improvement, we can reach a 3.69% improvement in MAE and a 2.37% improvement in RMSE. To qualitatively demonstrate the effectiveness of our model, the predicted tool wears of testing case T2 and T7 are illustrated in Fig.9. It can be seen from Fig.9 that the closer to the tool failure zone, the greater the error. The reason for this may be that the tool wears quicker at this stage, resulting in a relatively small number of samples. Or it could be that the signal changes more drastically and the noise is more severe due to the increasing tool wear, leading to greater error.

Tool wear predicted by LFGD-TWP.

Two statistics are adopted to illustrate the overall prediction performance and generalization ability of the model under different testing cases: mean and variance. Mean is the average value of the results under different testing cases. Obviously, it indicates the prediction accuracy of the method. Variance measures how far each result is from the mean and thus measures variability from the average or mean. It indicates the stability of generalization under different testing cases. The equations of mean and variance of two measurement indicators over n testing cases are given as follows:

$$ Mean = overline{r} = frac{1}{n}sumlimits_{i = 1}^{n} {r_{i} } , $$

(7)

$$ Variance = frac{1}{n}sumlimits_{i = 1}^{n} {left( {r_{i} - overline{r}} right)^{2} } , $$

(8)

where (r_{i}) is the mean value of the results for each testing case.

The definition of mean and variance shows that the smaller their values are, the better performance of the model will be. In our proposed method, the means of MAEs and RMSEs are 7.36 and 9.65, and the variances of MAEs and RMSEs are 0.95 and 1.65.

Other deep learning models are used to compare model performance with the proposed LFGD-TWP. They are CNN24, and LSTM30 and CNN-BiLSTM19, and the structure of these models are shown as follows.

Structure of CNN model in brief: The input of CNN model is the original signal after normalization, and the signal length is 1024. The input channel of the model is set to 3 because of the three-channel sensory data. CNN model has 5 convolution layers. Each convolutional layer has 32 feature maps and 14 filters which is followed by a max-pooling with region 12. Then flatten the feature maps. Finally, it is followed by a fully connected layer, which has 250 hidden layer units. The dropout operation with probability 0.5 is applied to the fully connected layer. The loss function is MSE, the optimizer function is Adam, the learning rate is set to be 0.001, which are kept the same as the proposed model. The means of MAEs and RMSEs are 12.64 and 16.74, and the variances of MAEs and RMSEs are 10.74 and 18.90.

Structure of LSTM model in brief: The model is of type many to one. The input of LSTM is the manual features in Table 4. Therefore, an LSTM cell has an input dimension of 30. The MAE and RMSE values were calculated for models with one, two, and three layers and 100, 200, 300, 400 hidden units. Therefore, 12 structures of an LSTM model were constructed for the most accurate model. Also, the timesteps are 10, the loss function is MSE, the optimizer function is Adam, the learning rate is set to be 0.001, which are kept the same as the proposed model. The results show that the most accurate model contained 2 layers and 200 hidden units. The means of MAEs and RMSEs are 10.48 and 13.76, and the variances of MAEs and RMSEs are 5.12 and 9.28.

Structure of CNN-BiLSTM model is shown in Ref.19, and the input of this model is the original signal after normalization. The means of MAEs and RMSEs of this model are 7.85 and 10.24, and the variances of MAEs and RMSEs are 2.71 and 5.06. Comparison results of our method (LFGD-TWP) and popular models are shown in Table 7. Compared to the most competitive result achieved by CNN-BiLSTM, the proposed model achieves a better accuracy owing to the multi-frequency-band analysis structure. Further, it can be seen that the proposed model achieves lower variances in MAE and RMSE. It means that the proposed model has better overall prediction performance and better stability of generalization under different testing cases by comparing the variance of the results.

To further test the performance of our proposed method, we additionally use the PHM2010 data set35, which is a widely used benchmark. The machining experiment was carried out in milling operation and the experimental equipment and materials used in this experiment are shown in Ref.19. The running speed of the spindle is 10,400 r/min; the feed rate in x-direction is 1555mm/min; the depth of cut (radial) in y-direction is 0.125mm; the depth of cut (axial) in z-direction is 0.2mm. There are 6 individual cutter records named C1, C2,, C6. Each record contains 315 samples (corresponding to 315 cuts), and the working conditions remain unchanged. C1, C4, C6 each has a corresponding wear file. Therefore, C1, C4, C6 are selected as our training/testing dataset. Also, cross validation is used to test the accuracy of the model and the results are shown in Fig.10.

Tool wear (PHM2010) predicted by LFGD-TWP.

In our proposed method, the mean of MAEs is 6.65, the mean of RMSEs is 8.42. Compared with the mean value of MAEs (6.57) and RMSEs (8.1) in Ref.19. The reason for the slightly poor performance may be that in order to enhance the adaptability to multiple working conditions, the architecture of the model is more complex, which leads to overfitting. Although the proposed architecture might overfit the PHM2010 case, the complexity of the architecture ensures that more complex scenarios like the test cases in the paper can be handled.

Read more:

Local-feature and global-dependency based tool wear prediction using deep learning | Scientific Reports - Nature.com

Related Posts

Comments are closed.