This section portrays the various ML techniques that have been employed by various academicians for effective heart disease diagnosis. The major reason to utilize the ML algorithm is that it is capable of detecting hidden patterns and can operate with large datasets to make predictions.
In13, Syed et al. were involved in developing SVM-based Heart Disease Diagnosis using the datasets Cleveland, Hungarian, Switzerland, and a combination of all of them (709 instances). Then, utilize the advantages of Mean Fisher Based Future Selection and Accuracy-based Feature selection algorithms for optimal feature selection. Further, the selected feature subset is refined through Principal Component Analysis. Finally, Radial Basis Function Kernel Based Support Vector Machine is applied over the reduced feature subset to categorize the heart disease patient from normal people. Their experimental result demonstrated that the proposed framework outperforms with an average accuracy rate of 85.3%. Youn-Jung et al.14 have chosen the SVM algorithm as it can handle high dimensionality problems to detect heart patients where data about patients are collected from University Hospital through a self-reported questionnaire and the experiment is carried out based on leave-one-out cross-validation (LOOCV) and it was proven that SVM based classification is a promising approach with the highest detection accuracy of 77.63%. Ebenezer et al.15 have developed a mechanism based on Boosting SVM to enhance prediction accuracy by combining the results of all weak learners. For reducing misclassification, normalization, redundancy removal, and heat-map are applied over the given datasets. Upon applying heat-map, it identifies important factors such as age and maximum heart rates in predicting heart disease which further facilitates prediction. In this study, Cleveland datasets were used which contain 303 instances with 13 attributes. Through the experimental results, the performance of the Boosting SVM is compared with Logistic regression, Nave Bayes, decision trees, Multilayer Perceptron, and random forest. Out of which, the proposed Boosting SVM achieves greater accuracy of 99.75%.
Medhekar et al.16 were involved in developing a Nave Bayes Classifier (NBC) based Heart Disease Prediction System using the Cleveland dataset downloaded from the UCI repository to classify the patients into five categories viz. no, low, average, high, and very high to identify the severity level of disease. Then, System accuracy was calculated and the results are tabulated to evaluate the system performance through which it can be observed that the proposed NBC-based system could attain 88.96% of accuracy. Vembandasamy et al.18, proposed a framework to detect heart disease using NBC. The experiment is carried out by the WEKA tool over the datasets collected from a diabetic research institute in Chennai and the accuracy rate yielded by the system is about 86.4198%. The authors of17 presented an article for heart disease detection using NBC and could able to attain an accuracy rate of 80% which is comparably poor by performing prediction over the dataset collected from Mayapada Hospital which contains 60,589 records. Heart disease prediction using NBC is quite challenging since all the properties in NBC are required to be mutually independent15.
The authors of19,20,21 have employed the concept of neural networks in heart disease diagnosis to improve the accuracy further. In20, Firstly, Cleveland datasets were subjected to a Feature selection algorithm that uses information gain to remove the features which do not contribute to the disease prediction. Further, the ANN algorithm was applied over the reduced feature set for classification. This study dictated that the accuracy (89.56%) of the system with a reduced feature set (8 features) is slightly improved than the accuracy (88.46%) of the system with a full feature set (13 features). Miray et al.19 have presented an intelligent heart disease diagnosis method using a hybrid Artificial Neural Network (ANN) and Genetic Algorithm (GA) where GA is used to optimize the parameters of ANN. Experimental results are obtained by using Cleveland data through which it is visible that the hybrid approach outperforms Naive Bayes, K- Nearest Neighbor, and C4.5 algorithms in terms of accuracy rate (95.82%), precision (98.11%), recall (94.55%) and F-measure (96.30%). Even NN model is good at generalizing data and capable of analyzing complex data to discover hidden patterns, many medical experts are dissatisfied with NN because of its black-box characteristics. That is, NN models get trained without knowing the relationship between input features and outputs. So, if many irrelevant features are used to train the NN model, it results in inaccurate prediction in testing. To address this challenge, Kim and Kang21 have employed two preprocessing steps before applying ANN. The first step is the feature selection step to select the features based on ranking. Then, feature correlation analysis is performed to make the system learns about the correlation between feature relations and NN output thereby eliminating the black-box nature. The overall experiment is performed on the Korean dataset containing 4146 records and resulted in a larger ROC curve with more accurate predictions. However, ANN could be suffered from data overfitting and temporal complexity and it may fail to converge when dimensionality is low.
As K-Nearest Neighbor (KNN) is a simple and straightforward approach where samples are classified based on the class of nearest neighbors, the authors of22 have employed the KNN algorithm for classifying heart disease. Since medical datasets are larger, the Genetic algorithm was utilized to prune redundant and irrelevant features from 6 different medical datasets taken from the UCI repository to improve the prediction accuracy which is 6% greater than the accuracy rate achieved by the KNN algorithm without GA. Ketut et al.23 have proved that simple and fewer features are good enough to reduce misclassification, especially in heart disease prediction. In the experimental study, chi-square evaluation is done over the given Hungarian data set which contains 293 records with 76 parameters. But, in this paper, only 13 parameters were taken into consideration and after performing a chi-square evaluation, it results in 8 parameters as the most important parameters. Subsequently, KNN is executed with reduced feature set results with 81.85% of accuracy which is considerably greater than NBC, CART, and KNN with full feature set.
A heart disease prediction model using Decision Tree Algorithm (DT) has been implemented on UCI datasets24. The main aim of this paper is to reveal the importance of the pruning approach in DT which provides compact decision rules and accurate classification. The J48 DT algorithm is implemented with three types such as DT with pruning, un-pruning, and pruning with a reduced error rate. In this experiment, it shows fast blooding sugar is the most important attribute which yields greater accuracy (75.73%) than other attributes but, it is comparably very poor. The DT algorithm is simple, but it is capable of handling only categorical data and it is inappropriate for smaller datasets and datasets with missing values25.
In Research26, Logistic Regression (LR) is applied to UCI datasets to classify cardiac disease. Initially, data preprocessing is done to filter the missing values, and a feature selection process based on correlation is carried out to select the highly co-related features. Then given data is split into training and testing splits to perform classification by LR. From the tabulated results, it shows that LR increases the accuracy by 87.10% when increases the training size from 50 to 90%. Paria and Arezoo27 were involved in developing a regression-based heart attack prediction system. For this purpose, three regression models were made based on the variable selection algorithm and applied to the dataset with 28 features collected from Iran hospitals. The model that uses the following features such as severe chest pain, back pain, cold sweats, shortness of breath, nausea, and vomiting yielded a greater accuracy of 94.9% than that of the model using physical examination data and ECG data.
Yeshvendra et al.28 have employed Random Forest (RF) algorithm for heart disease prediction. In this paper, the Cleveland heart disease dataset was exploited which has non-linear dependency attributes. So, RF is considered the optimum solution for the non-linear dependent datasets and it produced good accuracy of 85.81% by making a bit of adjustment over the non-linear dataset. To reduce the overfitting problem, Javeed et al.29 have developed an Intelligent Heart Disease Diagnostic system that uses Random Search Algorithm (RSA) for feature selection from the Cleveland dataset and Random Forest (RF) model for heart disease prediction. Based on the experimental results, it is observed that RSA-based RF produced 93.33% accuracy using only 7 features which are 3.3% higher than the conventional RF. As the ensemble nature of RF is capable of producing high accuracy, handling missing and huge data, eliminating the need for tree pruning, and solving the problem of overfitting, authors of30 have employed RF to predict heart disease. In addition, chi-square and genetic algorithms were applied to select the prominent features from the heart disease data set collected from various corporate hospitals in Hyderabad. The performance of the proposed system is about 83.70% of accuracy which is considerably greater than the NBC, DT, and Neural Nets.
Jafar and Babak31 proposed an efficient and accurate system to diagnose heart disease. This research developed an ensemble classification model based on a feature selection approach. The heart disease dataset used by this research is downloaded from the UCI repository which contains 270 records with 13 useful variables. After selecting the prominent features, seven classifiers namely SVM, NBC, DT MLP, KNN, RF, and LR were used in ensemble learning to predict the disease. The final prediction over the given sample is done by combining the prediction result of all seven classifiers using the Stacked Ensemble method. An ensemble learning based on a genetic algorithm had shown the best performance and could lead to 97.57% accuracy, 96% sensitivity, and 97% specificity. Ensemble learning is a combination of multiple classifiers which improves the predictive performance by combining the output of individual classifiers. To identify the best ensemble method in heart disease detection from the heart Stalog dataset, Indu et al.32 developed an automatic disease diagnosis system based on three ensemble learners such as Random Forest, Boosting, and Bagging along with PSO-based feature subset selection method. The overall experiment is carried out using RStudio and the proposed system with the bagging approach yielded greater accuracy than the other approaches. The Table 1 summaries the major finding.
Automatic heart diagnostic systems that were developed using ML techniques were surveyed as heart disease is the major cause of human deaths in todays world, so an effective and accurate diagnosis system is to be developed to save human lives.
From the above study, it is observed that many researchers were interested in machine learning for heart disease diagnosis since it helps to reduce diagnosis time and increases accuracy.
From the study, every new approach competes with one another to win a greater accuracy rate.
Boosting-based SVM and an Ensemble of classifiers are being seen as the most promising methods that yielded the greater accuracy ever seen.
One algorithm may work well for one dataset while cannot work well for another dataset.
The accuracy of the system may rely on the quality of the datasets used.
Some datasets can have missing values, redundancies, and noises which makes data to be unsuitable. Such uncertainty can be resolved by applying data preprocessing techniques such as normalization, missing value imputation, etc.
Some datasets may have too many attributes which may threaten the performance of ML in accuracy and computational time. It can be improved by applying suitable feature selection strategies to perform prediction with the most informative features.
As machine learning algorithms predict the output by learning the relationship between input features and class labels based on classical theories of probability and logic, the accuracy rate is still on the lower side33,34. So, it requires a lot of improvements to have general acceptability for disease prediction. Another major issue in traditional machine learning algorithms is computation time. Since the computation time is increased with an increase in the size of the feature set. Therefore, the main aim of the paper is to enrich the performance of classical ML algorithms and make them outperform all the baselines in terms of Precision, Recall, F-Measures, and Computation Time35. It redirects the research toward quantum computing to create a pave to integrate quantum computing with ML approaches.
After having detailed observation from the recent articles36,37,38,39,40, quantum mechanics have shown excellent performance in various fields such as classification, disease prediction, object detection, and tracking and achieved remarkable performance over classical probability theory-based models. The basics of quantum computing, its essential features, and working are available in public domains, as a result, it cannot be explored further.
When compared with traditional machine learning algorithms, quantum Enhanced machine learning algorithms are capable of reducing training time, automatically adjusting the network hyper-parameters, performing complex matrix and tensor manipulation at high speeds, and use of quantum tunneling to achieve objective function goals. Integrating quantum computing and machine learning enables healthcare sectors to evaluate and treat complicated health disorders. Quantum computing uses the principle of quantum physics in which a single bit can be represented in both 0 and 1 which is known as qubits (Quantum Bits). Another salient feature of quantum computing is Superposition allows the particle to exist in multiple states at a time which provides tremendous power to handle massive amount of data, Entanglement occurs when pair of particles are generated which allows them to share spatial proximity or interact, Quantum tunneling enables computer to complete the task faster and Quantum gates work on collections of quantum sates to produce desired output. The first quantum computing device came into act in the year of 2000, so many researchers recently utilized quantum computing principle to analyze billions of diagnostic data with the help of artificial intelligence techniques. Quantum-enhanced machine learning assists physicians with earlier and more accurate health disease predictions. According to the report of41, the time spent on research and analyzing diagnostic data will decrease when quantum computing is integrated with healthcare systems.
With this motivation, the paper aims to implement Quantum Enhanced Machine Learning approaches for diagnosing heart diseases and it yields a remarkable accuracy rate and computation time shown in "Results and discussions" by simply replacing classical probability theory with quantum probability theory and it makes use of superposition state that provides a higher degree of freedom in decision making.
See more here:
Revolutionizing heart disease prediction with quantum-enhanced machine learning | Scientific Reports - Nature.com
Read More..