Clustering of trauma patients based on longitudinal data and the application of machine learning to predict recovery | Scientific Reports – Nature.com

Principal component analysis

The development of a supervised machine learning model that can predict the recovery profile of trauma patients requires labelled data. Since the BIOS data set does not contain patient classifications based on recovery, the first step is to cluster patients based on similarity across the different outcome variables that represent the health condition (unsupervised learning). The topic of similarity-based clustering has been investigated intensively, both from a statistical modeling point of view as well as using Machine/Deep Learning approaches48. Preliminary analyses showed that although there is some correlation between the variables measuring recovery, based on clinical expertise it might make sense to separate the recovery variables dealing with physical status from those dealing with psychological function. In the frame of this study, we thus focus on four different cases for the extraction of the clusters: Physical health (with and without pre-trauma scores), psychological health, and general health.

For the case of Physical Health, we implement two cases (i) longitudinal profiles post-injury with four variables namely EQ-5D, EQ-VAS, HUI2 and HUI3 and (ii) longitudinal profiles including pre-injury data with two variables EQ-5D and EQ-VAS (for which pre-injury values Pre-injury EQ-5D and Pre-injury EQ-VAS were available). For Psychological Health the variables HDSA, HDSD and IES are used. Finally, for the case of General Health, a combination of Physical (no pre-injury values) and Psychological variables is applied (EQ-5D, EQ-VAS, HUI2 and HUI3, HDSA, HDSD and IES). For General Health, in total forty-two components are present since we have seven variables for six time frames. To investigate how Physical Health variables correlate with Psychological variables PCA was carried out to visualize the correlations between variables and inspect their loading on the principal components.

The first three components explain more than 60% of the total variance of the forty-two components (see Fig. S4 in the Supplementary information). Additionally, for the first three components, the biggest increase in the cumulative explained variance is observed. For this reason, we extracted the first three components for further analysis. Figure1 displays the PCA biplot for the first and third component. Different color codes are used to represent the ten patient clusters derived from the set of General Health variables by the kml3d method. The plot shows that the first dimension represents the general health condition of the patients. Positive values on dimension one represent good health of patients two years after trauma while negative values point to poor health. The centers of the clusters move consistently from positive to negative values as we move from the first clusters A, B and C (high initial health and high recovery) to the last clusters J, I and H (low initial health and low recovery) (Fig.1).

PCA biplot for the set of General Health variables with indication of the ten clusters obtained with kml3d.

Moreover, we observe that Psychological and Physical Health vectors correlate with each other since they generally point to the same direction for the first dimension. The third dimension represents time as positive values point to variables for the first and the second time frame. As we move to negative values of component three we observe the last three time frames. The correlation of the general health variables is stronger (vectors overlap with each other) as we move from the first to the last time frames. (The second dimension splits the physical from the psychological variables, see Fig. S4 in the Supplementary information).

For the clusters obtained with kml3d, the optimum number is calculated based on the gap statistic using the "Nbclust" library in R45. The gap statistic provides us with the optimal number of clusters per set of variables (Fig.2). For Physical Health, the optimal number of clusters is eight, for Psychological Health it is nine, for General Health it is ten and finally for Physical Health with pre-injury values it is eight. For the clusters obtained with HDclassif and Deepgmm, a grid search was executed per set of variables in combination with the BIC to determine the optimal number of clusters and setting of the parameters; the results of this are presented in Table 1.

Optimum number of clusters with kml3d for the four different cases of variables and k-means.

In general, the number of optimum clusters reduced when we apply HDclassif and Deepgmm compared with kml3d. Additionally, clusters obtained with kml3d are generally more balanced (majority baseline is at maximum of 26.15%). On the other hand, unbalanced clusters (except for the case of Physical Health) are obtained when we apply the Deepgmm clustering method (high majority baselines).

For predicting the outcome class of the patients, we use the labels generated in the clustering step as the target for prediction in a number of supervised machine learning models. In this following example, we focus the model comparison step for the prediction of the six class labels derived from clustering the set of Physical Health variables including pre-injury values with the HDclassif method. We used Logistic Regression, Random Forest and XGBoost as models with different settings for under- or oversampling and hyperparameters. All models were compared under 5-fold cross validation, and we report the mean f1 macro and the 95% CI for accuracy for this example model comparison step in Table 2. We report next to accuracy the f1 macro score since we deal with imbalanced data sets where all the classes are equally important. It is clear that over-sampling has a positive impact on the classification task resulting in higher accuracy and that the Random Forest and XGBoost algorithms outperform logistic regression in this case.

For this reason, Random Forest with over-sampling is the algorithm that we used for the prediction of the classes derived from all clustering attempts using the three clustering methods (Table 3). The best classification results are observed for the clusters obtained with the Deepgmm method. However, the majority baselines for these cluster solutions are high (from 61.02 to 84.70%) meaning that clusters are highly unbalanced. A more detailed methodology for the evaluation of the clusters (clinical sensibleness) based on medical expertise is described in the next section.

In order to get a thorough understanding about the prediction, a technique called Boruta is applied to the prediction models47. Boruta is a feature selection algorithm, implemented as a wrapper algorithm around Random Forest. In Table 4, the prediction accuracy is presented both with all (26) predictors and only with the important predictors extracted with Boruta for the case of General Health. For kml3d and HDclassif, the same seven predictors are highlighted as important. For the case of Deepgmm, the same predictors are noted as important predictors excluding BMI and including predictors such as Category accident, Education level, Traumatic brain injury, Gender and Pre-injury cognition. As can be seen, applying Boruta feature selection did not impair accuracies, leading to simpler models that did not compromise on classification accuracy.

In the previous section, models with high accuracy were developed for the classification of patients. Specifically, clusters derived from Deepgmm are predicted with high accuracy applying Random Forest and over-sampling. Since the obtained clusters cannot be directly evaluated in terms of representing observable ground-truth classes, the strategy to arrive at sensible and functional models is to combine several quality indicators based on statistical criteria, machine learning metrics, and clusters quality assessment based on medical expertise (clinical sensibleness) in relation to known risk factors for recovery. An example of the applied clusters quality assessment is presented in this section for the clusters obtained with three different methods. For illustration purposes we selected three cases which represent highly, medium and poor sensible clustering (Table 5).

For the case of General Health using the HDclassif method the optimal number of clusters is six. In Table 5 the descriptive statistics per cluster are presented. The order of the clusters is defined from the younger to the older patients. As can be seen, there is a trend for the age of the patients to increase across clusters in this highly sensible model (+++). Specifically, for patients who belong to the first cluster (cluster 1) the mean age is around fifty-eight while for patients who belong to the last cluster (cluster 6) the mean age is around seventy-five. Looking at frailty and comorbidities we observe that older patients are characterized by more comorbidities and higher frailty. Moreover, young patients with less frailty are admitted in the hospital for fewer days and their severity score is also lower compared with patients who are older with more frailty. Additionally, exploring the gender distribution of the clusters we observe that the percentage of females increases as we move from the first to the last clusters. Looking at hip fracture injuries, the clusters quality assessment reveals that the last clusters contain a higher percentage of patients who suffer from this known risk factor for poor recovery. The medium and low clinically sensible models do not recapitulate these demographic risk-factor differences as clearly across clusters.

Recovery of the patients is measured based on various parameters. For two parameters, namely EQ-5D and EQ-VAS, we also have pre-injury estimated baseline values. These variables describe the self-reported physical condition of the patients before their injury. These values can thus be used as a baseline for the analysis of patient recovery (Top two graphs in Fig.3). As can be seen from the two graphs, EQ-5D and EQ-VAS show a dip from baseline (set at 100%) and show recovery over time. Patients who belong to the first clusters (1, 2) recover almost completely while for patients of the last clusters (5, 6) recovery is about 6084% depending on the variable.

The two graphs at the top present recovery based on EQ-VAS and EQ-5D for the case of General Health with HDclassif. The two graphs at the bottom depict psychological condition (high values indicate high stress and anxiety) of various clusters after the injury.

Psychological condition is also relevant for the recovery of the patients and is plotted in the bottom graphs of Fig.3. Psychological condition is measured with three parameters namely HDSA (Anxiety), HDSD (Depression) and IES. The same trend over time after the accident can be observed for these parameters. More particular, patients who belong to the first clusters (high recovery) appear to have low levels of depression and anxiety. For the first three clusters (1, 2 and 3) the level of stress decreases over time. On the other hand, for clusters 4, 5, and 6 the level of stress and anxiety remains high for a month and then start decreasing.

According to medical experience, the clusters obtained for General Health Case using the HDclassif method meet the expectations and agree with the prototypical cases observed at the hospital. Especially, the group of old females with high frailty and with a hip fracture is a characteristic group observed at the hospital and typically has low recovery. On the other hand, younger male patients with less comorbidities, low severity score and less days admitted to the hospital recover completely and appear to have low levels of stress and anxiety.

For the selection of a rational and functional model that makes clinical sense, we thus implemented a cluster quality assessment as described in the previous paragraphs for each cluster model case. As a reference we use the case of General Health with HDclassif method (highly sensible). The results are presented in Table 3. Based on the clusters quality assessment each case is categorized on clinical sensibleness either as Poorly sensible (+) or as Medium sensible (++) or as Highly sensible (+++). Highly sensible clusters are those cases where the clusters quality assessment reveals discrete clusters with the same trends and characteristics as the reference case (General Health with HDclassif method) matching clinical experience. On the contrary, when we have clusters that are not discrete or without the characteristics of the reference group then the model is categorized as less adequate. This is the case for example for the clusters obtained for Psychological Health with Deepgmm (Table 5). Descriptive statistics of the clusters obtained for this case reveal that clusters are not discrete and do not follow the characteristic trends for frailty, comorbidities, severity score or days admitted in the hospital. Gender and hip fracture do not follow the trend of the reference case.

Performing clusters quality assessment together with medical experts, we discovered that there are cases where clusters partly match with the clusters of the reference case. In this case not all the clusters are discrete. There are clusters which appear similar properties. However, some of the trends of the clusters match with the trends of the reference group. In Table 5 an example of medium sensible case is presented for the case of Physical Health (pre-injury) and the clustering technique of kml3d. For this case although we observe trends between the clusters for the different variables, there are clusters such as (B, C) and (A, E) who are not discrete and do not follow the general trend of the reference clusters More particular, even though cluster C contains patients with slightly lower Age than cluster B, the mean values of Frailty and Comorbidities are higher.

A supplementary method to quantify the separability of the obtained clusters is to execute a MANOVA. More particular, non-parametric MANOVA (using the function adonis from library "vegan"49) is executed for the clusters of all cases on the variables of Age, Frailty, Comorbidities, Injury severity score, Pre-injury EQ-5D, and T6-EQ-5D. We decided to execute a non-parametric MANOVA since the assumptions for running MANOVA (homogeneity of the variances and normality within the groups) were not met for our data. Assumptions are examined using the function assumptions manova. The non-parametric MANOVA revealed that there was a strong relation between the value of the F statistic and the sensibility of the clusters. More precisely, F values between 79.71 and 101.45 (separable clusters, very low p-values) are obtained for the highly sensible models. For medium sensible clusters F is between 5.99 and 8.32 while for inadequate clusters F is between 1.21 and 2.98. For the non-sensible clusters, the difference between clusters is not significant, showing p-values larger than the chosen threshold of 5%. It is remarkable that for the case of Physical Health with Deepgmm method, non-parametric MANOVA reveals that there is a statistically significant difference between the obtained clusters, F(35, 3880)=101.45, p<103. On the contrary, for the case of Psychological Health with Deepgmm, non-parametric MANOVA indicates that the separability of the clusters is not statistically significant F(35, 3880)=1.21, p=0.30.

Further evaluation of the models is performed by using a graphical method: plotting the t-distributed stochastic neighbor embedding (t-SNE) graphs. In the Supplementary information the t-SNE graphs of two extreme cases, namely General Health with kml3d and 10 clusters with high clinical sensibleness and Psychological Health with Deepgmm with 6 clusters with low clinical sensibleness, are presented (see Fig. S5 and Fig. S6 in the Supplementary information). In the case of General Health with kml3d, t-SNE visualisation shows discrete clusters in the two-dimensional space. On the contrary, for Psychological Health with Deepgmm, high interference between the groups is observed.

From Table 3, we observe that for General Health, the best model is achieved with the HDclassif method. The accuracy of this model is almost 74% while the clusters quality assessment indicates that the obtained clusters are sensible. For the case of Physical Health, the best model with high accuracy (91.30%) and sensible clusters is derived using the Deepgmm method. Cluster quality assessment of clusters obtained with the HDclassif method for Physical Health with pre-injury measurements reveals that clusters are highly sensible, however, accuracy is much lower (at 69.12%) compared with Deepgmm. Another observation has to do with the case of Psychological Health. Applying variables which are related only to the psychological condition of the patients do not lead to sensible (+++) clusters for any method, suggesting that these outcome measures are not related to traditional risk factors for physical recovery, but capture a different dimension.

Original post:
Clustering of trauma patients based on longitudinal data and the application of machine learning to predict recovery | Scientific Reports - Nature.com

Related Posts

Comments are closed.