Machine learning-based survival prediction nomogram for postoperative parotid mucoepidermoid carcinoma | Scientific … – Nature.com

Screening and characteristics of the patients

This study examined 882 patients with stage IIVA P-MEC, who met the inclusionexclusion criteria, from the SEER database between 2004 and 2015. Figure1 illustrates the patient selection process, while Table 1 summarizes patients demographic and clinicopathological characteristics. The lymph node ratio (LNR) cut-off was determined using X-tile analysis, with a resultant cut-off of 1.15%. The median (95% CI) follow-up time was 99 (92105) months, and the median (IQR) age at diagnosis was 52 (3766) years. A majority of the patients were white (661, 74.9%), with most tumors being grade II (396, 44.9%), stage I (353, 40%), T1-stage (381, 43.2%), N0-stage (685, 77.7%), and LNR0 (686, 77.8%) according to the AJCC 6th stage. All variables, except for chemotherapy (94.2% vs 5.8%), had proportions exceeding 10%. The study encompassed 12 variables, including age, gender, grade, stage, tumor (T) stage, node (N) stage, radiation, chemotherapy, laterality, marriage, and LNR. Nine factorsage, gender, grade, stage, T stage, N stage, radiation, chemotherapy, and LNRwere selected based on univariate Cox regression. Multivariate Cox regression revealed that four factors (age, grade, T stage, and chemotherapy) were independent risk factors, each with P-values less than 0.05. In the multivariate analysis, individuals aged 6070years (HR=5.936, 95% CI=3.01611.681, P<0.001), those over 70years old (HR=11.962, 95% CI=6.30322.703, P<0.001), Grade III (HR=2.324, 95% CI=1.2354.375, P=0.009), Grade IV (HR=3.148, 95% CI=1.7105.795, P<0.001), T2 (HR=3.162, 95% CI=1.0599.440, P=0.039), T3 (HR=4.300, 95% CI=1.50112.316, P=0.007), T4 (HR=4.414, 95% CI=1.43913.535, P=0.009), and chemotherapy (HR=1.721, 95% CI=1.0962.703, P=0.018) emerged as independent risk factors for overall survival (OS). Nevertheless, radiation(HR=0.750, 95% CI=0.5251.072, P=0.114), LNR (HR=0.868, 95% CI=0.1146.602, P=0.891), and other variables demonstrated no prognostic value (Table 2).

Figure2A displays the relationship between the LASSO coefficients and the regularization parameter, lambda (), and demonstrates the variable selection process and the effect of on the coefficients. The lambda.min value, which represents the lambda value corresponding to the minimum likelihood deviation or the highest C-index, was utilized for selecting tuning parameters in LASSO regression. Another vertical line was lambda.1se, which corresponds to the most regularized model within one standard error of the minimum (Fig.2B). The .min (=0.0050724) was chosen for the best predictive performance. A ten-fold cross-validation was employed. Ten variables were chosen through the LASSO regression algorithm, including age, gender, grade, T stage, N stage, radiation, chemotherapy, laterality, marriage, and LNR. Employing the adjusted R-squared maximum of the BSR, we selected eight variables: age, grade, stage, T stage, N stage, radiation, chemotherapy, and marriage(Fig.3). In the RF model and XGBoost, we independently extracted the top 10 variables, excluding laterality, radiation (RF), and LNR (XGBoost) (Fig.4). We assessed the key performance of machine learning and traditional statistics using AUC and AIC. Multivariate Cox stepwise backward regression reconfirmation identified LASSO, BSR, and XGBoost as the best of the five screening methods based on both AUC (AUC=88.4) and AIC (AIC=2118.9) criteria (Table 3).

Predictor Screening: the least absolute shrinkage and selection operator (LASSO) regression and fivefold cross-validation.

Predictor Screening: A SHAP plot and a feature importance plot are visualizations used to interpret XGBoost model results.

Predictor Screening: (A) Random Forest importance plot; (B) Best Subset Regression (BSR), it selected the best subset of predictor variables to accurately model a response variable.

Consequently, we constructed a nomogram with seven variables from the three algorithms (LASSO, BSR, and XGBoost), including age, grade, tumor stage, node stage, chemotherapy, radiation, and marriage. We developed an OS-nomogram capable of predicting a patients 3-, 5-, and 10-year OS rates using these variables (Fig.5). By converting clinical, pathological, and therapeutic factors into points, the nomogram accurately predicted OS. The total risk point score, calculated by summing all points, significantly correlated with 3-, 5-, and 10-year OS. We utilized a 5-year ROC curve to determine the optimum risk score cut-off point. KaplanMeier curves revealed that low-risk group patients (risk score<80.29) had better survival prognosis compared to high-risk group patients (risk score80.29, log-rank test, P<0.001) (Fig. S1).

A survival nomogram for predicting overall survival (OS) for patients with P-MEC. (1) When using the nomogram, seven predictors were quantified as point based on patient-specific factors and then the sum of the point corresponded to the total point below, which corresponded to the 3, 5, 10year OS ; (2) The optimal cut-off total point was 80.29 (the median of patients point), which divided the patients into high-risk group and low-risk group.

We evaluated the predictive ability of our nomogram by constructing time-dependent receiver operating characteristic (ROC) curves at 3, 5, and 10years. The ROC curves demonstrated excellent discriminative capacity of our model, with areas under the curves (AUCs) of 86.9 (95% CI=83.390.6), 88.4 (95% CI=83.591.4), and 87.7 (95% CI=84.191.3) (Fig.6). This indicates that our model has high accuracy in predicting overall survival in parotid MEC patients.

(AC) The calibration curves. The calibration curves of the nomogram predicting (A) 3-years, (B) 5-years, and (C) 10-years OS. (DF) Time dependent ROC curve. (D) ROC curves for 3-year, (E) 5-year, and (F) 10-year overall survival rates. (GI) Decision curve analysis (DCA) plot. (G) DCA plot for 3-year, (H) 5-year, and (I) 10-year overall survival rates.

We also performed 1000 bootstrap resampling analyses on the dataset and generate calibration plots for the prediction model. The calibration plots showed that the curves closely aligned with the 45-degree line, indicating a well-calibrated model in practical use (Fig.6). Furthermore, the 1000 bootstrap resamplings indicated good concordance between actual and predicted values in both the training and validation datasets, as evidenced by C-index (3-year, 0.8499, 0.7750.914; 5-year 0.8557, 0.7930.911; 10-year, 0.8375, 0.7720.897) and AUC (3-year, 0.8670, 95 CI%=0.7870.935; 5-year, 0.8879, 95 CI%=0.820.945; 10-year, 0.8767, 95 CI%=0.7920.947). (Fig.7). These results further support the reliability and accuracy of our prediction model.

This figure presents a bootstrap analysis of a dataset, displaying the 3-year and 5-year AUC and C-index values. The analysis was performed using 1000 bootstrap replicates. The figure demonstrates the accuracy and predictive power of the model for the specified time intervals.

To determine the clinical utility of our prediction model, we utilized the decision curve analysis (DCA) plot. The DCA plot illustrates the net benefit of the prediction model across a spectrum of threshold probabilities. Our model demonstrates clinical utility, as evidenced by its net benefit curve lies above both two lines across the range of threshold probabilities (Fig.6). This suggests that our prediction model is more effective than TNM stage or grade and can aid in making clinical decisions for P-MEC patients.

In summary, our nomogram exhibited excellent predictive ability and calibration, as well as clinical utility, indicating its potential usefulness in clinical practice.

Read the original post:
Machine learning-based survival prediction nomogram for postoperative parotid mucoepidermoid carcinoma | Scientific ... - Nature.com

Related Posts

Comments are closed.