Uncovering hidden and complex relations of pandemic dynamics using an AI driven system | Scientific Reports – Nature.com

This section presents the experimental results and comprehensive evaluations of BayesCovid. We will explicitly discuss the results of the algorithms applied to the clinical datasets to uncover the hidden patterns of COVID-19 symptoms.

We set up a Spark on Hadoop Yarn cluster consisting of 4 EC2 machines, 1 master and 3 workers in AWS to deploy BayesCovid. We chose Ubuntu Server 20.04 LTS as the operating system for all the machines and installed Hadoop version 3.3.2 and Spark 3.3.1. All the nodes have 4 cores and 16 GB of memory.

The dataset, prepared by Carbon Health and Braid Health14, was obtained through RT-PCR tests from 11,169 individuals, approximately 3% of patients living in the United States who had COVID-positive, and 97% had COVID-negative tests. This dataset, which began to be collected by Carbon Health in early April 2020, was collected under the anonymity standard of the Health Insurance Portability and Accountability Act (HIPAA) privacy rule. This dataset covers multiple physiognomies, including Epidemiological (Epi) Factors, comorbidity, vital signs, healthcare worker-identified, patient-reported, and COVID-19 symptoms. In addition, information about patients, such as heart rate, temperature, diabetes, cancer, asthma, smoking and age, is also available. The Carbon Health team gathered the Braid Health team datasets, which collected radiological information, including CXR information. This dataset includes data from patients with one or more symptoms and no symptoms, and we only used the COVID-19 symptom information indicated in Fig. 1. Radiological information was not included in the analysis. Table 2 shows the statistical information of the COVID-19 dataset. We have 18,538 test results of 11 different COVID-19 symptoms and COVID severity values, belonging to 11,169 individuals. Moreover, Table 3 demonstrates the number of false (negative) and true (positive) values for each symptom.

Cross-validation is an important step in assessing the predictive power of models while mitigating the risk of overfitting15. To rigorously evaluate our models, we implemented ten-fold cross-validation by dividing the dataset into ten equal parts. During each iteration, one part was the validation/test set, while the remaining nine were used for model training. This process was repeated ten times, and the resulting accuracies were averaged across all folds to assess each models performance comprehensively. Importantly, using ten-fold cross-validation ensures that every instance in the dataset is precisely used once as a testing and training sample, which minimises the risk of overfitting16.

This subsection explains three distinct Bayesian networks: Nave Bayesian, Tree-Augmented Nave Bayesian, and Complex Bayesian models. These models have unveiled intricate and concealed patterns within COVID-19, offering valuable insights into the complex dynamics and relationships underlying the disease.

Figure 3a depicts the dependencies for the Nave Bayesian algorithm where the class variable, COVID severity, is the only parent associated with each symptom, and there is no link between symptoms. Figure 4 and 5 show the probability percentages of the symptoms for their positive and negative values. For example, in Fig. 4, while the probability of diarrhea is around 3% for COVID severity level 1, the probability of this symptom for level 3 is about 95%. Moreover, the probabilities of shortness of breath for levels 1, 2, 3, and 4 are very low, about 5%, and the likelihood of having this symptom is very high for levels 5 and 6. In short, the distribution of symptoms differs according to the severity levels of COVID-19, and the probability of some increases as the COVID-19 severity level rises. When we compare Figs. 4 and 5, it is seen that there is an inverse relationship between the incidence and absence of symptoms.

Conditional probability of symptoms with COVID-19 severity if symptoms are positive for Nave Bayes.

Conditional probability of symptoms with COVID-19 severity if symptoms are negative for Nave Bayes.

The dependency network built using the Tree-augmented Nave Bayesian network is depicted in Fig. 3b. COVID severity is the class variable similar to Nave Bayesian network, but the connections between the symptoms (features) are also available. As seen from the figure, for example, cough has an effect on both headache and fever while muscle sore is affected by headache and affects fatigue. For the probabilities, Tables 4 and 5 show some results of the Conditional Probability Table (CPT). In Table 4, when shortness of breath and fever are negative (F), the probability of COVID severity level 1 is 92.95%. In contrast, when shortness of breath is positive (T), and fever is negative, then the probability of COVID severity level 1 is 2.04%. When headache is positive, but cough is negative, then the probability of COVID severity level 4 is 65.92% (see Table 5).

Figure 3c shows the dependencies between all the symptoms (features) and COVID severity (class variable). Cough is most affected by different symptoms and does not affect any features. While the class variable, COVID severity, has impacts on shortness of breath, fever, fatigue, and sore throat, interestingly, it is affected by diarrhea. Another interesting pattern different from the Tree-augmented Nave Bayesian network is that fever affects muscle sore. While Table 6 shows the CPT for three variables, namely COVID severity, shortness of breath, and fever, Table 7 displays the probabilities for four variables, namely diarrhea, fatigue, muscle sore, and headache. When shortness of breath is negative, and fever is positive, the probability of COVID severity level 4 is 98.92%. In contrast, when shortness of breath is positive, but fever is negative, the probability of COVID severity level 6 is 48.18% (see Table 6). For the probabilities based on the situation of four symptoms (see Table 6), for instance, when all three symptoms, diarrhea, fatigue, and muscle sore, are positive, the probability of having headache symptom is 73.18%. Another remarkable finding in Table 7 is that if an individual has fatigue, muscle sore, and headache, the probability of not having diarrhea is 58.43%.

In this study, we have also investigated and implemented three distinct Bayesian models, each representing a unique intersection of deep learning and Bayesian inference. The first model, Deep Learning-based Nave Bayes (DL-NB), is a deep learning-based Nave Bayes structure that capitalises on the capacity of deep neural networks to refine the traditional Nave Bayes model for enhanced feature learning and dependency representation. Additionally, we extended our exploration to traditional Bayesian network structures by implementing Deep Learning-based Tree-Augmented Nave Bayes (DL-TAN), where deep learning principles are integrated to augment the classic Tree-Augmented Nave Bayes algorithm, providing richer feature representations. Furthermore, our investigation includes Deep Learning-based Complex Bayesian (DL-CB), a model designed to overcome the limitations of traditional Complex Bayesian structures in modelling intricate relationships within high-dimensional data. This comprehensive analysis and implementation of DL-NB, DL-TAN, and DL-CB contribute to the broader understanding of the synergies between deep learning and Bayesian techniques in various Bayesian network architectures. Figure 6 demonstrates the network dependencies of deep learning-based Bayesian network algorithms which uncover the complex and hidden relationships between COVID symptoms. As illustrated in Fig. 6ac, our Bayesian deep learning models, namely DL-NB, DL-TAN, and DL-CB, reveal a richer web of relationships among features compared to their traditional counterparts. The Bayesian Deep Learning models exhibit a higher density of connections, which indicates a more nuanced understanding of inter-feature dependencies. This heightened connectivity means the enhanced capacity of Bayesian Deep Learning to capture complex relationships within the data that provide a comprehensive and informative modelling of the underlying dynamics.

Bayesian deep learning dependency networks.

Figure 7 demonstrates the accuracys for the three different algorithms proposed in our system, namely Nave Bayesian Network, Tree-Augmented Nave Bayesian Network, and Complex Bayesian Network. Although the general accuracy of the algorithms is close to each other, there are apparent differences in the accuracy of the symptoms. The algorithms perform between 60% and 68% poorly for cough symptoms, while they show high accuracys for COVID severity ranging from 94% to 97%. The overall accuracys of these three algorithms are 83.52%, 87.35%, and 85.15%, respectively.

Total accuracys of the algorithms.

In the evaluation of the accuracy of deep learning-based Bayesian network algorithms, the results, as depicted in Fig. 8, showcase the performance of three distinct models: DL-NB, DL-TAN, and DL-CB. The overall accuracies reveal nuanced differences among the algorithms. DL-TAN emerges with the highest cumulative accuracy of 95.21%, which indicates its superior predictive capabilities across a spectrum of symptoms. DL-NB and DL-CB follow closely, exhibiting overall accuracies of 91.04% and 92.81%, respectively. These results underscore the efficacy of deep learning-based Bayesian approaches in capturing complex relationships within the dataset.

The comparative analysis of Bayesian deep learning algorithms against traditional Bayesian network algorithms elucidates a discernible advantage favouring the former. Notably, the Bayesian deep learning models, such as DL-NB, DL-TAN, and DL-CB, exhibit superior predictive performance across various symptoms.

Total accuracys of the deep learning-based Bayesian algorithms.

We have developed a web interface for BayesCovid decision support system that can be used by any clinical practitioner or other users. It utilises Python libraries, concerning probabilistic graphical models, data manipulation, network analysis, and data visualization. Additionally, tkinter is adopted for the graphical user interface, and PyMuPDF (fitz) is leveraged for PDF file handling. All the source code and accompanying documentation for BayesCovid decision support system are available as open-source on GitHub (https://github.com/umitdemirbaga/BayesCovid). A demonstration is also available online on YouTube (https://youtu.be/7j36HuC9Zto). The designed user interface provides dual functionality highlighted below.

Dependency analysis: This component of application ensures efficient and accurate relationship analysis between the symptoms and severity assessment, enhancing the decision-making process in clinical settings. Figure 9a depicts the user-friendly interface where a data file can be uploaded using Select CSV button. After the data file is uploaded, six radio buttons are provided for users to select one of the following Bayesian models: (a) Nave Bayesian Network, (b) Tree-Augmented Nave Bayesian Network, (c) Complex Bayesian Network, (d) Nave Bayes Deep Learning, (e) Tree-Augmented Bayes Deep Learning, and (e) Complex Bayes Deep Learning. An Analyse button that starts the processing of the selected model with the selected CSV file. A progress bar populates to show the processing status. After the model is processed, the dependency network plot is generated (see Fig. 9b) and the CPT output is saved as a file.

Severity analysis: This component of the application assists clinical staff in calculating the severity of COVID-19. This feature assists in selecting the detected symptoms that the patient exhibits and subsequently determines the severity of COVID-19. As depicted in Fig. 9c a clinician or user can select the visible symptoms and calculate severity. This will output the COVID-19 severity level based on the input symptoms as shown Fig. 9c.

See the rest here:
Uncovering hidden and complex relations of pandemic dynamics using an AI driven system | Scientific Reports - Nature.com

Related Posts

Comments are closed.