A total of 13,170 participants were recruited (n=5780 people infected to SARS-COV-2 (case) and n=7390 individuals without SARS-COV-2 (control)). Based on Table 1, participants with SARS-COV-2 were significantly older than the control group (59.298.54 versus 56.979.03 years, respectively). In addition, BMI, diastolic blood pressure (DBP), systolic blood pressure (SBP), blood urea nitrogen (BUN), sex, smoking status, serum zinc, copper, creatinine (Cr), cholesterol, triglyceride, high sensitivity C-Reactive Protein (hs-CRP), fasting blood glucose (FBG), serum phosphorus, low-density lipoprotein cholesterol (LDL-C), high-density lipoprotein cholesterol (HDL-C), serum gamma glutamyl transferase (Gamma-GT), creatine phosphokinase (CPK), serum calcium, serum total bilirubin, serum direct bilirubin, aspartate aminotransferase (AST), alanine transaminase (ALT), alkaline phosphatase (ALP), serum uric acid and magnesium showed significant differences between groups. Several hematological factors, white blood cells (WBC), red blood cells (RBC), hemoglobin, hematocrit, mean corpuscular volume (MCV), mean corpuscular hemoglobin (MCH), mean corpuscular hemoglobin concentration (MCHC), red cell distribution width (RDW), platelet distribution width (PDW), and mean platelet volume (MPV) were higher compared to the control group (P-value<0.05).
We have attempted to use the LR, DT, and BF models to diagnostic COVID-19 tested participants and their biochemical and hematologic features. In this regard, the data were divided into two parts as training and test data (80%-20%), randomly. The models are validated using test data (20%) and built on the training dataset. Results of the LR algorithm illustrated that biochemical factors (Model I), such as age, smoking status, sex, DBP, SBP, BUN, BMI, hs-CRP, FBG, HDL-C, AST, ALT, CPK, total bilirubin, iron, magnesium, and Gamma-GT were correlated with COVID-19 status (P-value<0.05). In Model I, the BMI, BUN, age variables have been defined as the most crucial variable with high OR by the LR algorithm. With a unit increase in BMI, the chance of being Cov+was 1.092 times. With a year increase in age, the chance of being Cov+was 1.048 times, and with a unit increase in BUN, the chance of being Cov+was 1.041 (see Table 2). In Model II, BMI, age, hemoglobin, hematocrit, sex, MPV, smoking status, and MCHC were significant (P-value<0.05). The hemoglobin had an OR equal to 4.292, so, the chance of being Cov+was 4.292 times. The MPV had an OR equal to 1.550, so, the chance of being Cov+was 1.550 times. Table 3 showed the other variables and values of effect. In Model III, CPK, BMI, MPV, FBG, sex, BUN, Cr, iron, magnesium, total bilirubin, hemoglobin, hematocrit, MCHC, smoking status, age, WBC, HDL-C, and ALT were correlated with COVID-19 status (P-value<0.05). The total bilirubin and MPV had an OR 1.647 and 1.447, so, the chance of being Cov+was 1.647 and 1.447 times, respectively (see Table 4). Based on Table 5, for LR algorithm the accuracy of three models (Model I, II, and III) were 75.13%, 68.28%, and 69.63%, respectively. The other performance indices were given in Table 5 (a), (d), and (g).
In the training phase of DT, the important variables were selected and the final tree is given after pruning. Models I, II, and III runs with 17, 8, and 18 variables as input, respectively. In Model I, CPK, age, BUN, BMI, ALP, sex, total bilirubin, hs-CRP, FBG, and Gamma-GT, in Model II, age, MPV, sex, BMI, hemoglobin, and MCHC, and in Model III, CPK, Cr, BUN, BMI, FBG, age, MPV, MCHC, sex, and total bilirubin variables remained in models. Based on Table 5, the tree is made based on biochemical, hematologic, and both of the variables (Model I, Model II, and Model III, respectively) that had 73.24%, 70.53%, and 68.80% accuracy on the training data, respectively. The other performance indices were given in Table 5 (b), (e), and (h).
The rules from DTs for Model I, II, and III is shown in Table 6. Rule 1 in Model I was illustrated that in a subgroup with CPK>=114.09 & BUN>=30.00 & BMI>=26.77 & Age>=54.00 & Gamma-GT>=16.91, the chance or probability of having Cov+was 84.69%. In another subgroup, CPK<114.09 & CPK<88.06 & Sex(female) & ALT<9.00 led to a 6.57% chance of having Cov+. The rules from Model II, were illustrated that there was an 86.46% chance that participants with features such as Age>=54.00 & BMI>=26.77 & MPV>=9.60 & Sex(male) & Hemoglobin<15.8 be infected with COVID-19. Another rule was suggested that the probability of Cov+in individuals with Age<54.00 & MPV<9.10 was 12.26%. The rules from Model III, were illustrated that there was an 88.15% chance that participants with features such as CPK>=114.09 & BUN>=30.00 & BMI>=26.77 & Age>=54.00 & MPV>=9.60 & MCHC<35.6 be infected with COVID-19. Another rule was suggested that the probability of Cov+in individuals with CPK<114.09 & Cr<1.40 & Cr<1.00 & FBG<118.34 & Sex(female) was 9.90%. Other rules were stated in Table 6.
Hence, the CPK and BUN for Model I, age, BMI, and MPV for Model II, and CPK and BUN for Model III were defined as most crucial variables. The final DT is shown in Figs.2, 3, and 4.
Graphical representation of the classification tree introduced for SARS-COV-2 diagnosis for Model I
Graphical representation of the classification tree introduced for SARS-COV-2 diagnosis for Model II
Graphical representation of the classification tree introduced for SARS-COV-2 diagnosis for Model III
In the final step, for another analysis we applied BF for analyzing the data based on COVID-19. The factors included in the BF algorithm were 17, 8, and 18 variables for Model I, II, and III, respectively. Moreover, we set the following specifications for Model I: Number of Trees in the Forest: 29 for Model I, 13 for Model II, and 53 for Model III, Number of Terms Sampled per Split: 4 for Model I, 2 for Model II, and 4 for Model III, Training Rows: 10,536, Test Rows: 2634, Minimum Splits per Tree: 10, Minimum Size Split: 13 for all three models. Confusion matrix and evaluation indices for comparison of the models I, II, III were stated in Table 5 (c), (f), and (i). Additionally, the crucial variables related to COVID-19 based on BF algorithm were: CPK, BUN, FBG, BMI, total bilirubin, and age in Model I, BMI, sex, MPV, and age in Model II, and CPK, Cr, FBG, BMI, BUN, total bilirubin, sex, MPV, and age for Model III. As one can check the obtained features from BF algorithm were equal to the obtained factors from LR and DT algorithms.
Read more here:
Association between biochemical and hematologic factors with COVID-19 using data mining methods - BMC Infectious ... - BMC Infectious Diseases
Read More..