Deep learning algorithm-enabled sediment characterization techniques to determination of water saturation for tight … – Nature.com

The aim of this research is to develop precise and dependable machine learning models for the prediction of SW (Water Saturation) using three DL and three SL techniques: LSTM, GRU, RNN, SVM, KNN and DT. These models were trained on an extensive dataset comprising various types of log data. The findings of our investigation illustrate the efficacy of data-driven machine learning models in SW prediction, underscoring their potential for a wide range of practical applications.

When evaluating and comparing algorithms, researchers must take into account several crucial factors. Accuracy and disparities in prediction are among the most significant considerations. To evaluate these factors, researchers can utilize various criteria, including Eqs.16. The Mean Percentage Error (MPE) calculates the average difference between predicted and actual values as a percentage, while the Absolute Mean Percentage Error (AMPE) measures the absolute difference between them. Additionally, the Standard Deviation (SD) determines the variability of data points around the mean. Moreover, the Mean Squared Error (MSE) and Root Mean Squared Error (RMSE) quantify the mean and root mean squared differences between predicted and actual values, respectively. Lastly, the R2 metric assesses the fraction of diversity in the reliant variable that can be accounted for by the autonomous variable.

$$text{MPE}=frac{{{sum }_{text{i}=1}^{text{n}}(frac{{SWE}_{(text{Meas}.)}-{SWE}_{(text{Pred}.)}}{{SWE}_{(text{Meas}.)}}text{x }100)}_{text{i}}}{text{n}}$$

(1)

$$text{AMPE}=frac{{sum }_{text{i}=1}^{text{n}}left|{(frac{{SWE}_{(text{Meas}.)}-{SWE}_{(text{Pred}.)}}{{SWE}_{(text{Meas}.)}}text{x }100)}_{text{i}}right|}{text{n}}$$

(2)

$$text{SD}=sqrt{frac{{sum }_{text{i}=1}^{text{n}}{({left(frac{1}{text{n}}sum_{text{i}=1}^{text{n}}left({{SWE}_{text{Meas}.}}_{text{i}}-{{SWE}_{text{Pred}.}}_{text{i}}right)right)}_{text{i}}-(frac{1}{text{n}}sum_{text{i}=1}^{text{n}}left({{SWE}_{text{Meas}.}}_{text{i}}-{{SWE}_{text{Pred}.}}_{text{i}}right))text{imean})}^{2}}{text{n}-1}}$$

(3)

$$text{MSE}=frac{1}{text{n}}sum_{text{i}=1}^{text{n}}{left({{SWE}_{text{Meas}.}}_{text{i}}-{{SWE}_{text{Pred}.}}_{text{i}}right)}^{2}$$

(4)

$$text{RMSE}=sqrt{frac{1}{text{n}}sum_{text{i}=1}^{text{n}}{left({{SWE}_{text{Meas}.}}_{text{i}}-{{SWE}_{text{Pred}.}}_{text{i}}right)}^{2}}$$

(5)

$${text{R}}^{2}=1-frac{sum_{text{i}=1}^{text{N}}{({{SWE}_{text{Pred}.}}_{text{i}}-{{SWE}_{text{Meas}.}}_{text{i}})}^{2}}{sum_{text{i}=1}^{text{N}}{({{SWE}_{text{Pred}.}}_{text{i}}-frac{{sum }_{text{I}=1}^{text{n}}{{SWE}_{text{Meas}.}}_{text{i}}}{text{n}})}^{2}}$$

(6)

In order to forecast SW, three DL and three SL techniques: LSTM, GRU, RNN, SVM, KNN and DT, were used in this study. Each algorithm underwent individual training and testing processes, followed by independent experiments. To ensure the accuracy of the predictions, the dataset was carefully divided into three subsets. The training subset accounted for 70% of the data records, while 30% was allocated for independent testing.

Choosing the most suitable algorithm for a specific task is a crucial undertaking within the realm of data analysis and machine learning. Therefore, this research aimed to assess and compare the performance of multiple LSTM, GRU, and RNN algorithms in predicting SW. The outcomes of these algorithms, utilizing the train data values, as well as the test, have been meticulously documented and presented in Table 2. By analyzing the results, researchers can gain insights into the effectiveness of each algorithm and make informed decisions about their implementation in practical applications.

The results from the test data are presented in Table 2, highlighting the excellent performance of the RMSE, MPE and AMPE metrics for the GRU algorithm, with values of 0.0198,0.1492 and 2.0320, respectively. Similarly, for the LSTM algorithm, the corresponding values are 0.0284,0.1388 and 3.1136, while for the RNN algorithm, they are 0.0399,0.0201 and 4.0613, respectively. For SVM, KNN and DT these metrics are includes: 0.0599,0.1664 and 6.1642; 0.7873, 0.0997 and 7.4575; 0.7289,0.1758 and 8.1936. The results show the GRU model has high accuracy than other algorithms.

The R2 parameter is a crucial statistical measure for evaluating and comparing different models. It assesses the adequacy of a model by quantifying the amount of variation in the outcome variable that can be clarified by the explanatory variables. In this study, Fig.5 illustrates cross plots for predicting SW values based on the train and test data, demonstrating significantly higher prediction accuracy compared to the other evaluated models. Additionally, Fig.5 confirms that the RGU model exhibits superior prediction accuracy compared to the LSTM and RNN models.

Cross plot for predicting SW using three DL algorithms such as RGU, LSTM and RNN for test data.

To assess the precision of the GRU model, the results presented in Table 2 and Fig.5 were carefully analyzed for the train and test data. The analysis revealed that the GRU algorithm achieved low errors for SW, with RMSE values of 0.0198 and R-square values of 0.9973. The R2 values provided serve as quantitative metrics assessing the predictive prowess of ML models. The R2, denoting the coefficient of determination, gauges the extent to which the variance in the dependent variable can be foreseen from the independent variable(s), essentially showcasing how well the model aligns with observed data points. The R2 values for the GRU, LSTM, RNN, SVM, KNN and DT models stand at 0.9973, 0.9725, 0.9701, 0.8050, 0.7873 and 0.7289 respectively, reflecting their respective accuracy and reliability in predicting SW levels. Figure5 shows the cross plot for predicting SW using three DL algorithms such as RGU, LSTM, and RNN for test data. The GRU model's notably high R2 of 0.9973 underscores its exceptional correlation between predicted and observed SW values, implying that nearly 99.73% of SW data variance can be elucidated by its predictions, showcasing its precision and reliability in SW prediction tasks. Comparatively, the LSTM and RNN models, with R2 values of 0.9725 and 0.9701 respectively, also exhibit strong predictive capabilities, albeit slightly lower than the GRU model. These findings underscore the GRU model's superiority in SW prediction, attributed to its adeptness in capturing intricate temporal dependencies within SW data, thereby yielding more accurate predictions.

Figure6 provides a visual representation of the calculation error for the test data, illustrating the error distribution for predicting SW using three DL algorithms (GRU, LSTM, and RNN). The plotted coordinates in the figure depict the error range for each algorithm. For the GRU algorithm, the error range is observed to be between0.0103 and 0.0727. This indicates that the predictions made by the GRU model for the test data exhibit a relatively small deviation from the actual SW values within this range. In contrast, the LSTM algorithm demonstrates a slightly wider error range, ranging from0.146 to 0.215. This suggests that the predictions generated by the LSTM model for the test data exhibit a somewhat higher variability and may deviate from the actual SW values within this broader range.

Error points for predicting SW using three DL and three SL algorithms such as GRU, LSTM, RNN, SVM, KNN and DT for test data.

Similarly, the RNN algorithm exhibits an error range between0.222 and 0.283. This indicates that the predictions made by the RNN model for the test data show a larger spread and have the potential to deviate more significantly from the actual SW values within this range. By visually comparing the error ranges for the three DL algorithms, it becomes apparent that the GRU algorithm achieves a narrower range and thus demonstrates better precision and accuracy in predicting SW for the test data. Conversely, the LSTM and RNN algorithms exhibit broader error ranges, indicating a higher degree of variability in their predictions for the same dataset. These findings further support the conclusion that the GRU algorithm outperforms the LSTM and RNN algorithms in terms of SW prediction accuracy, as it consistently produces predictions with smaller errors and tighter error bounds.

Figure7 presents an error histogram plot, depicting the prediction errors for SW using three DL and three SL algorithms such as GRU, LSTM, RNN, SVM, KNN and DT. Each histogram represents the distribution of prediction errors for each algorithm, displaying a normal distribution centered around zero with a relatively narrow spread and no noticeable positive or negative bias. This plot enables a comprehensive analysis of the algorithms' performance and aids in determining the best algorithm with a normal error distribution. Upon careful investigation, it becomes evident that the GRU algorithm exhibits a superior normal distribution of data compared to the other algorithms. The GRU algorithm's performance is characterized by a more accurate standard deviation and a narrower spread of prediction errors. This indicates that the GRU algorithm consistently produces more precise and reliable predictions for SW. By comparing the results presented in Table 2 and analyzing the error histogram plot in Fig.7, we can conclude that the performance accuracy of the algorithms can be ranked as follows: GRU>LSTM>RNN>SVM>KNN>DT.

Histogram plot for SW prediction using three DL and three SL algorithms such as GRU, LSTM, RNN, SVM, KNN and DT.

Figure8 illustrates the error rate of the three DL and three SL algorithms such as GRU, LSTM, RNN, SVM, KNN and DT as a function of iteration for SW prediction. The findings of this study indicate that the GRU and LSTM algorithms initially exhibit higher error values that progressively decrease over time. However, this pattern is not observed in the RNN algorithm. Upon analyzing the figure, it becomes evident that the LSTM algorithm achieves higher accuracy than the other algorithms at the beginning of the iteration. At the threshold of 10 iterations, the LSTM algorithm surpasses the GRU algorithm with a lower error value. However, in the subsequent iterations, specifically at iteration 31, the GRU algorithm outperforms the LSTM algorithm with superior performance accuracy. In contrast, the RNN algorithm shows a consistent decrease in performance accuracy from the start to the end of the iterations, without displaying significant fluctuations. When focusing on the zoomed-in portion of the figure, specifically repetitions 85100, the ongoing performance trends of these algorithms become more apparent. It is evident from the analysis that the GRU algorithm consistently outperforms the other algorithms in terms of performance accuracy. The LSTM algorithm follows, with a decrease in accuracy over the iterations. On the other hand, the RNN algorithm exhibits a declining performance accuracy without any notable changes or fluctuations. These findings emphasize the superiority of the GRU algorithm in terms of performance accuracy when compared to the LSTM and RNN algorithms. The GRU algorithm consistently maintains a higher level of accuracy throughout the iterations, while the LSTM and RNN algorithms experience fluctuations and decreasing accuracy over time.

Iteration plot for SW prediction using three DL and three SL algorithms such as GRU, LSTM, RNN, SVM, KNN and DT.

Pearson's coefficient (R) is a widely used method for assessing the relative importance of input-independent variables compared to output-dependent variables, such as SWL. The coefficient ranges between1 and+1 and represents the strength and direction of the correlation. A value of+1 indicates a strong positive correlation, -1 indicates a strong negative correlation, and a value close to 0 indicates no correlation. Equation7 illustrates the calculation of Pearson's correlation coefficient, which is a statistical measure of the linear relationship between two variables. It allows researchers to quantify the extent to which changes in one variable are associated with changes in another variable. By applying Pearson's coefficient, researchers can determine the level of influence that input-independent variables have on the output-dependent variable, SWL.

$$R=frac{sum_{i=1}^{n}({Z}_{i}-overline{Z })({Q}_{i}-overline{Q })}{sqrt{{sum }_{i=1}^{n}{({Z}_{i}-overline{Z })}^{2}}sqrt{{sum }_{i=1}^{n}{({Q}_{i}-overline{Q })}^{2}}}$$

(7)

A coefficient of+1 indicates a perfect positive correlation, suggesting that the input-independent variables have the greatest positive impact on the output-dependent variable. Conversely, a coefficient of1 represents a perfect negative correlation, indicating that the input-independent variables have the greatest absolute impact on the output-dependent variable. When the coefficient is close to 0, it suggests that there is no significant correlation between the variables, indicating that changes in the input-independent variables do not have a substantial effect on the output-dependent variable. Pearson's correlation coefficient is a valuable tool for assessing the relationship between variables and understanding their impact. It provides researchers with a quantitative measure to determine the relative importance of input-independent variables compared to the output-dependent variable, SWL.

By heat map shows Fig.9, a comparison of Pearson correlation coefficients can be made to gain insights into the relationship between input variables and SW. The results reveal several significant correlations between the variables. Negative correlations are observed with URAN and DEPTH, indicating an inverse relationship with SW. This suggests that higher values of URAN and DEPTH are associated with lower SW values. On the other hand, positive correlations are observed with CGR, DT, NPHI, POTA, THOR, and PEF. These variables show a direct relationship with SW, meaning that higher values of CGR, DT, NPHI, POTA, THOR, and PEF are associated with higher SW values. The comparison of Pearson correlation coefficients provides valuable insights into the relationship between input variables and SW.

Heat map plot for SW prediction using three DL and three SL algorithms such as GRU, LSTM, RNN, SVM, KNN and DT.

These findings can be utilized to develop predictive models of SW based on the input variables. By incorporating the correlations into the models, researchers can enhance their accuracy and reliability in predicting SW values. The expression of the relationships between the input variables and SW in the form of Eq.8 allows for quantitative analysis of the data. This equation provides a mathematical representation of the correlations, enabling researchers to quantitatively evaluate the impact of the input variables on SW.

$$SWE=propto left(text{CGR},text{ DT},text{ NPHI},text{ POTA},text{ THOR},text{ PEF}right) and SWE=propto frac{1}{left(URAN, DEPTHright)}$$

(8)

Read more here:
Deep learning algorithm-enabled sediment characterization techniques to determination of water saturation for tight ... - Nature.com

Related Posts

Comments are closed.