Health Care

The Potential Of Machine Learning In Advancing The Prediction Of Coronary Cardiovascular Disease

Abstract

Coronary cardiovascular disease (CVD) remains a leading cause of death worldwide, necessitating accurate and early prediction methods to mitigate risks and improve patient outcomes. This study utilizes the Rates_and_Trends_in_Coronary_Heart_Disease dataset, which consists of two classes: Normal and Predicting CVD, to develop predictive models for coronary heart disease. Several machine learning models were employed, including the Random Forest Classifier, Decision Tree, XGBoost Classifier, and K-Nearest Neighbors (KNN), along with a proposed Artificial Neural Network (ANN) model. The models were evaluated using performance metrics such as precision, recall, F1 score, and accuracy to ensure a thorough assessment of their effectiveness in predicting CVD. The Random Forest and XGBoost classifiers both achieved an accuracy of 91%, while the Decision Tree and KNN models each achieved 90% accuracy. Notably, the proposed ANN model significantly outperformed the others, achieving an impressive 99% accuracy. These findings underscore the potential of machine learning, particularly deep learning, in advancing the prediction of coronary cardiovascular disease, paving the way for improved diagnostic and preventive strategies.

Introduction

Introduction

The heart is essential to life because it efficiently pumps oxygen-rich blood and controls key hormones to keep blood pressure at ideal levels. Any disruption in its operation can result in the development of heart diseases, which are all referred to as Cardiovascular Diseases (CVD) (Robinson, 2021). A variety of conditions that impact the heart and blood vessels are included in CVD, including cerebrovascular accidents, congenital defects, pulmonary blood clots, cardiac arrhythmia peripheral arterial problems, Coronary Artery Disease (CAD), rheumatic heart conditions, CHD, and heart muscle-affecting cardiomyopathies (Saheera & Krishnamurthy, 2020). Cardiovascular disease (CHD) is the subtype that comprises a substantial 64% of all cases. Although it mostly impacts men, women are not immune to its effects. Among CVD, CAD is especially worrisome because of its correlation with worldwide death rates (Al-Khlaiwi et al., 2023). The World Health Organization (WHO) states that there are severe repercussions from CVDs, with startling data showing that these illnesses are thought to be the cause of 17.9 million deaths globally each year (Prabhakaran et al., 2022). These figures demonstrate the importance of scientific investigations and medical breakthroughs aimed at preventing and decreasing the effects of cardiovascular illnesses globally (Vaduganathan et al., 2022).

Millions of lives are lost to CVDs every year, which is a major cause for concern in the global healthcare community (Flores-Alonso et al., 2022). It is critical to give the early identification and management of CVDs top priority in order to lower mortality rates (Sarrafzadegan & Mohammmadifard, 2019). Although auscultation is a straightforward and accurate technique for identifying CVDs, even highly skilled doctors may find it difficult to quickly identify CVDs (Yan et al., 2019). Physicians can make better decisions with the aid of artificial intelligence-driven automated cardiac screening systems based on phonocardiography (PCG) classification (Sethi et al., 2022).

The World Health Organization (WHO) (Organization, 2020) states that heart diseases are the primary worldwide root of death. CVDs are an important reason for concern in the global healthcare community, claiming millions of lives each year (Shokouhmand et al., 2021). The Internet of Medical Things (IoMT) is a technology that links medical devices and collects and processes data in real time, enhancing healthcare workflow. It combines IoT power with patient details, ensuring data security in the IoMT-based framework (Alshehri & Muhammad, 2020). In today’s healthcare sector, the IoMT is a quickly developing field where a variety of contemporary medical equipment, software programmers, and healthcare professionals come together on a single platform to provide high-quality services (Jadhav, 2018). Globally, cardiovascular disease is the primary reason for rising death rates (Shaffer & Ginsberg, 2017).

The heart’s arteries are impacted by the common and potentially dangerous condition known as coronary artery disease (CAD) (Shao et al., 2020). The heart muscle receives oxygen-rich blood from the coronary arteries, which narrow or obstruct in this condition. Atherosclerosis, a disorder where fatty deposits, cholesterol, calcium, and other materials build up inside the artery walls and form plaques, is the main cause of CAD. CAD is a serious condition affecting the heart’s blood vessels, primarily caused by atherosclerosis (Shao et al., 2020). Common risk factors include smoking, high blood pressure, cholesterol, diabetes, obesity, a sedentary lifestyle, a family history of heart disease, and ageing (Ciumărnean et al., 2021). CAD can cause Angina is the term for chest pain and, if a plaque ruptures or a blood clot blocks an artery, it can lead to a heart attack, as shown in figure 1.1. Diagnosis involves medical history, physical examination, ECG, stress tests, imaging, and blood tests (Habuza et al., 2021). Reducing symptoms, averting issues, and enhancing general cardiovascular health are the goals of CAD management (Cacciatore et al., 2023). It’s imperative to make lifestyle changes, which should include regular exercise, a diet low in cholesterol and fatty foods, managing weight, quitting smoking, and reducing stress. Prescription drugs are frequently used to treat symptoms like angina, inhibit blood clots, reduce cholesterol levels, and control blood pressure (Flora & Nayak, 2019). For the restoration of blood flow to the heart, invasive treatments like coronary artery bypass grafting (CABG) or angioplasty with placement of stents may be required in specific circumstances.

Figure 1.1: Coronary Artery Disease

Machine learning has the potential to completely transform the healthcare sector. Its remarkable progress can be attributed to its superior data processing capabilities over human capabilities (Quazi, 2022). As a result, the healthcare industry has seen the creation of a number of AI applications that take advantage of the speed and accuracy of machine learning, opening the door for ground-breaking answers to a variety of healthcare problems (Holmes et al., 2004). Numerous machine-learning techniques have been used to identify cardiovascular illnesses. Predictive models still need to be improved, and research gaps in the current detection methods need to be filled (Quazi, 2022). One such area is the problem of imbalanced datasets, which can result in biased predictions. Researchers have looked into a variety of approaches, such as neural networks and different machine learning methods, to improve prediction accuracy by examining the efficacy of hybrid models that combine different techniques. The intricacy of the predictive task is highlighted by the variations in datasets, models, and results, even though these studies offer insightful information. Even with these developments, more research is still desperately needed to improve the accuracy of cardiovascular disease prediction using the current models. The wide range of machine learning applications in this field highlights how crucial it is to carry out more research to improve the predictive models’ generalizability, accuracy, and dependability in order to improve patient care and medical treatments.

Research Motivation

Machine learning techniques are being used to predict CVD in response to the pressing need to enhance early diagnosis and intervention strategies to lower the high rates of morbidity and death that are associated with this condition. Conventional diagnostic techniques can be expensive, time-consuming, and occasionally unreliable. They also frequently require invasive procedures. Through the analysis of massive volumes of medical data, machine learning provides a non-invasive, effective, and possibly more accurate alternative by identifying patterns and risk factors related to CVD. Healthcare professionals can use machine learning algorithms to forecast a patient’s risk of CVD based on various clinical parameters, including demographic data, medical history, and results of diagnostic tests. Ultimately, this method improves patient outcomes and maximizes healthcare resources by improving diagnosis precision and facilitating individualized treatment plans and preventive measures.

Research problem

The research problem in machine learning-based CVD prediction centres on the difficulty of creating precise, dependable, and broadly applicable models that can successfully identify individuals at risk. The complexity of feature selection and engineering to ensure relevant variables are used without introducing bias, the need for high-quality, comprehensive datasets that include diverse demographic and clinical features, and the requirement for sophisticated algorithms capable of handling the complex relationships between risk factors are all involved in this. Developing clinical trust and ensuring that healthcare providers can comprehend and act upon the predictions depends heavily on the interpretability and transparency of the model. Another difficulty is incorporating these machine learning models into the current healthcare systems in a way that makes sense for users and facilitates clinical workflows. Developing strong, useful tools that improve early diagnosis, direct preventive measures, and eventually lower the burden of CVD requires addressing these issues.

Research scope

The field of study on the application of machine learning techniques to the prediction of CVD includes an array of important domains to create a comprehensive and efficient predictive framework. To make sure the models are inclusive and widely applicable, this involves gathering and integrating diverse datasets with clinical, demographic, lifestyle, and genetic data from different populations. To determine the most precise and effective methods, the scope also entails investigating and contrasting various machine learning algorithms, including logistic regression, decision trees, and support vector machines. Crucial elements include feature engineering and selection, which concentrate on locating the most significant predictors and reducing noise. To guarantee robustness and reliability, the scope also includes model validation and evaluation using metrics like accuracy, precision, recall, and AUC-ROC. The model’s interpretability, which guarantees that healthcare practitioners can readily comprehend and apply the predictions, is another essential component.

Research objectives

The primary objectives of this thesis are as follows:

To propose a comprehensive framework that integrates diverse data sources, including clinical records, and demographic information to create a robust dataset for model training and validation.

To implement various machine learning algorithms in order to identify and compare the most effective techniques for accurately predicting CVD risk.

To develop advanced feature selection and engineering methods that pinpoint the most relevant predictors, reduce dimensionality, and minimize data noise, thereby enhancing the accuracy and efficiency of the predictive models.

Research questions

Try to find the answers to the following questions:

How can diverse data sources, including clinical records and demographic information, be effectively integrated to create a robust and comprehensive dataset for model training and validation in predicting CVD risk?

What are the computational and practical considerations when implementing various machine learning models for real-world clinical applications?

What strategies can be employed to balance the complexity and interpretability of predictive models while maintaining high accuracy in CVD risk prediction?

The proposed contribution of the dissertation

This work aims to build a robust and comprehensive dataset necessary for precise model training and validation by creating a novel framework that integrates various data sources, such as demographic data and clinical records. The goal of the dissertation is to determine the best methods for predicting CVD risk by applying and thoroughly analyzing a variety of machine learning algorithms, including logistic regression, decision trees, and support vector machines. It will also include advanced feature engineering and selection techniques to reduce data dimensionality, minimize noise, and highlight the most important predictors, all of which will improve model efficiency and accuracy. To facilitate the practical adoption of these risk factors in clinical settings, the research will focus on developing interpretable models that offer precise, practical insights into CVD risk factors. The dissertation will evaluate the influence of these predictive models on patient outcomes, preventive measures, and early diagnosis by incorporating them into clinical workflows. This will show how machine learning can revolutionize CVD risk prediction and management.

Dissertation organization

Chapter 1 introduces the research problem, objectives, and significance of predicting cardiovascular disease using machine learning. Chapter 2 describes the existing research on CVD prediction and the application of machine learning in healthcare. This chapter identifies gaps in the current literature, justifying the need for this study and highlighting opportunities for innovation. Chapter 3 describes in detail the comprehensive framework proposed for integrating diverse data sources, including clinical records and demographic information. It describes the selection of machine learning algorithms, the feature selection and engineering methods developed, and the experimental design and evaluation metrics used to assess model performance. Chapter 4 describes the performance evaluation using machine leaning by conducting experiments and showing results and also describes the limitations and discussion. Chapter 5 includes the conclusion and future work.

Chapter Summary

This chapter summarizes the impact of coronary and cardiovascular disease (CVD) on the world’s health and emphasizes the urgent need for precise and early prediction techniques. It shows the possibility that machine learning methods could transform CVD risk prediction by providing accurate, quick, and non-invasive diagnostic instruments. The chapter describes the goals of the research, which include improving feature selection techniques, implementing different machine learning algorithms, and creating a comprehensive framework that integrates a variety of data sources.

Background And Literature Reviewed

Background

Cardiovascular disease (CVD) is a major global health concern that is associated with multiple risk factors, such as obesity, smoking, high cholesterol, a lack of exercise, and hypertension (Flora & Nayak, 2019). Heart arrhythmias, congestive heart failure, and congenital heart disease are just a few of the conditions that fall under the general heading of CVD (Lockhart & Sun, 2021). The complicated and frequently problematic nature of traditional methods for predicting and diagnosing CVD had an adverse effect on people’s general well-being (Levine et al., 2021). Since this illness continues to be the primary cause of death in both developed and developing nations, appropriate preventive and diagnostic measures are required. Due to a lack of resources, physicians in developing nations have difficulty correctly diagnosing and treating CVD. Early detection and risk assessment for CVD have been made possible by the introduction of computer technology and machine learning as clinical decision-making aids. Because medical data is so complex, it is essential that medical data mining technologies be able to extract meaningful information from the vast amounts of data in the healthcare industry. Our CVD prediction technology has the potential to save millions of lives by facilitating faster treatment for more people.

A significant change occurred with the introduction of electronic health records (EHRs) in the late 20th and early 21st centuries (Arvisais-Anhalt et al., 2022). These records offered a plethora of patient data that could be used for increasingly complex analysis. The handling of these massive datasets was made easier by concurrent improvements in processing power and data storage, which opened the door for the use of machine learning in healthcare (Awotunde et al., 2021). Simple algorithms like logistic regression and decision trees were the focus of early machine-learning applications in CVD prediction. These methods were more accurate than traditional statistical approaches, but they were still constrained by the complexity of the disease.

The development of machine learning in recent years has given the medical industry a revolutionary opportunity (Ahmed et al., 2020). Large, complex datasets can be analyzed using machine learning techniques to find patterns and insights that may be missed by traditional statistical methods (Meshref, 2019). Researchers can create predictive models that evaluate the risk of CVD based on a variety of variables, such as genetic information, lifestyle choices, clinical records, and demographic data, by utilizing machine learning (Allan et al., 2022). These models can provide more accurate and customized risk assessments, which can help with early diagnosis and focused preventive actions.

The creation of models that can accurately predict the risk of CVD from medical imaging data, such as echocardiograms and coronary angiography, represents significant advancements in this evolution (Gahungu et al., 2020). Furthermore, research has shown the potential of incorporating genomic data into prediction models to provide insights into the genetic predispositions that influence the risk of CVD. The availability of large-scale public health datasets and the development of sophisticated algorithms that can extract meaningful patterns from noisy, high-dimensional data have been key drivers of these advancements.

Traditional diagnostic methods for heart disease, such as ECGs and echocardiograms, while effective, often require specialized equipment and clinical settings, posing challenges for continuous monitoring and early detection (Ulloa-Cerna et al., 2022). Deep learning, a branch of machine learning, has shown substantial potential in medical diagnostics due to its ability to automatically extract features and patterns from raw data. CNNs excel at identifying spatial features in data, making them particularly suitable for analyzing complex patterns in heart sound spectrograms (Shuvo et al., 2021). LSTM networks, a type of recurrent neural network RNN, are proficient at learning temporal dependencies, ideal for time-series data such as heart sounds. By leveraging these deep learning models, an IoMT-based approach can enable continuous, real-time analysis of heart sounds, providing timely and accurate diagnoses. Early detection of heart diseases becomes more feasible, leading to better treatment outcomes. Additionally, this approach extends diagnostic capabilities to remote and underserved areas, improving healthcare accessibility. By minimizing reliance on subjective human interpretation, the method standardizes diagnostic procedures and improves accuracy. The combination of IoMT and deep learning thus holds significant potential in transforming heart disease diagnosis and management, ultimately contributing to better healthcare outcomes and patient quality of life (Adewole et al., 2021).

Literature review

This study (Pachiyannan et al., 2024) presents a healthcare technique, the Machine Learning-based Congenital Heart Disease Prediction Method (ML-CHDPM), designed to recognize and categorize CHD in pregnant women. The algorithm, trained on a large dataset, recognizes intricate patterns and correlations, leading to accurate forecasts and classifications. ML-CHDPM’s evaluation encompasses Receiver Operating Characteristic Curve (ROC) area, sensitivity, specificity, and accuracy, showcasing its superior performance across critical metrics: recall 96.25%, accuracy 94.28%, specificity 91.74%, with low False Positive Rate (FPR) 8.26% and False Negative Rate (FNR) 3.75%.

This article (Mohanty et al., 2024) focuses on the design, construction, and structural analysis of a passive optical Fiber Bragg Grating (FBG) sensor, in order to obtain real-time Heart Rate Variability (HRV) parameters, such as heart rate, the median variation of normal-to-normal intervals, root mean square of each subsequent differences, and Percentage of Successive Normal-to Normal (PSNN) intervals differing by more than 50 ms. Furthermore, an Internet of Things (IoT) based architectural design and sophisticated signal processing methods are described. Five people, three male and two female, participated in an experimental investigation that was carried out in a laboratory. The study showed good performance with an error rate of less than 10% when compared to a standard Heart Rate (HR) monitor. By detecting arrhythmia, coronary heart disease, aortic illnesses, and strokes, this intelligent system significantly improves healthcare. Advanced technology, IoT architecture, and FBG sensors together have enormous potential to improve cardiac surveillance and patient outcomes.

This study (Aljohani et al., 2023) introduces deep convolutional neural networks for categorizing common valve diseases and typical valve sounds into binary and multiclass categories. Three feature extraction methods, including Mel-Frequency Cepstral Coefficients (MFCC) and Discrete Wavelet Transform (DVT), were explored. Both models achieved precision with F1 scores exceeding 98.2% and specificities surpassing 98.5%, indicating minimal misclassification of regular instances. These findings affirm the proposed model as a highly accurate tool for assisted diagnosis.

This research (Khan Mamun & Elfouly, 2023) presents a hybrid 1D-CNN, which selects features using feature selection techniques and makes use of a sizable dataset amassed from online survey data. When contrasted with modern machine learning methods and Artificial Neural Networks (ANN), the 1D-CNN demonstrated superior accuracy. The accuracy for both the CHD and non-coronary heart disease (no-CHD) validation data was 76.9% and 80.1%, respectively. The model was contrasted with Support Vector machines (SVM), Random Forests (RF), AdaBoost, and ANN. In terms of accuracy, FNR, and FPR, 1D-CNN performed better overall. Analyses of four other heart diseases using similar methodologies demonstrated that the hybrid 1D-CNN achieved higher accuracy.

CardioXNet is a portable end-to-end Convolutional Recurrent Neural Network (CRNN) architecture that uses raw PCG signals to automatically detect five classes of cardiac auscultation (Chen et al., 2022). Results show that the proposed architecture outperforms previous state-of-the-art methods, achieving up to 99.60% accuracy, 99.56% precision, 99.52% recall, and 99.68% F1 score. It works particularly well for point-of-care CVD screening using memory-constrained mobile devices in low-resource settings.

The strategy put forward in this research (Liu et al., 2021) uses cross- and entropy-entropy features together with a fusion of multiple interfaces of cardiac sound with multi-domain feature recordings. The data collection involved 36 participants, comprising 21 individuals with CAD and 15 without CAD. Each participant underwent simultaneous recording of five-channel heart sound signals for 5 minutes. Following segmentation and quality assessment, 553 samples remained in the CAD group, while 438 samples were retained in the non-CAD group. An SVM was fed the ideal feature set for classification. According to the findings, the method improved classification accuracy from 78.75% to 86.70%, and after entropy and cross-entropy characteristics, it continued to improve to 90.92%. Features of entropy and features of cross entropy are essential and important for multi-domain fusion for recordings of the heart, which play a vital role in or identification of CAD.

The goal of the research was to determine how a machine learning platform could be created to help physician assessment and simplify strain echocardiogram research (O’driscoll et al., 2022). In order to obtain new geometrical and kinematics information from strain echocardiograms acquired in the course of a sizable forward-looking, multicenter, multivendor investigation carried out in the United Kingdom, a computerized computational imaging workflow was created. The collected characteristics were used to build a combined neural network decoder to recognize patients who had significant heart disease during noninvasive cardiac imaging. In a separate American study, the model was examined. A controlled split reading research looked at how the accessibility of an AI categorization would affect the medical assessment of stressful echocardiograms. Cross-fold verification utilizing 31 distinct geometrical and kinematic factors produced a rate of classification that was satisfactory for identifying individuals who had significant heart disease in the initial data collection, with a specificity of 92.7% and a sensitivity of 84.4%. Throughout the separate verification information set, this precision was preserved. By using the AI categorization tool, doctors were able to acquire an area under the receiver-operating characteristic curve of 0.93 while also improving inter-reader assurance, acceptance, and specificity for recognizing diseases by 10%.

The objective of this study (Schuuring et al., 2021) of the joints-creating grouping established by the American Academy of Echocardiography and the European Association of the Use of Imaging was to make revised suggestions to the earlier released standards for the heart container measurement in light of the recent decade’s quick advances in technology and the modifications in echocardiographic perform these advancements have caused regarding. On the back of significantly higher numbers of normal people, gathered from various forms of databases, this paper gives revised normal values for all four heartbeat spaces, incorporating multifaceted echocardiogram and cardiac stretching, if applicable. Additionally, this paper aims to fix a few small inconsistencies with earlier stated regulations. Information on ventricular arterial pressure, diastolic heart rate, elevated blood pressure evaluation, cardiovascular disease medication, diagnosis of diabetes, fasting glucose, creatinine concentrations overall cholesterol, low-density lipoprotein cholesterol, and triglycerides were gathered whenever it was practical. The Mosteller method was utilized to determine BSA. By dividing the weight in kilos by the square of the length in kilometres, the human body weight index was computed.

This paper (Xiao et al., 2020) introduces an innovative heart sound classification technique leveraging deep learning technologies for predicting cardiovascular diseases. The method consists of three main components: pre-processing, classification of 1-D waveform heart sound segments using a deep convolutional neural network (CNN) with an attention mechanism, and majority voting for the final prediction of heart sound recordings. To enhance information flow within the CNN, a block-stacked architecture with clique blocks is employed, featuring a bidirectional connection structure within each clique block. By integrating stacked cliques and transition blocks, the proposed CNN achieves both spatial and channel attention, resulting in notable classification performance. A novel separable convolution with an inverted bottleneck is utilized to efficiently decouple spatial and channel-wise feature relevancy. Experiments conducted on the PhysioNet/CinC 2016 dataset demonstrate that the proposed method achieves superior classification results and excels in parameter efficiency compared to state-of-the-art methods.

We have effectively tackled the huge task that is stress cardiology interpretation in the current research (Pellikka, 2022). Featured were dobutamine and activity investigations, carried out in conjunction with or without ultrasonography image-enhancing drugs utilizing a variety of ultrasonic technologies. For evaluation, endocardial visibility of at least 14 of 16 sections and an average of 4 images spanning end-diastole and end-systole were employed in basal 4-chamber, 2-chamber, and parasternal short-axis midventricular perspectives at repose and strain. None of the individuals in question had undergone previous cardiac procedures and all reached a desired rhythm, double item, or other outcome. The simulation was then separately assessed with 154 stress cardiac echocardiograms from an earlier investigation. The AUROC was 0.927 utilizing the same categorization limit, with an accuracy of 84 percent and an accuracy of 92.7%. When 38 individuals with established coronary artery disease (CAD) or aberrant stationary wall movement were excluded from the subgroup study, the degree of sensitivity and accuracy remained at 90.5% and 88.4%, respectively.

In this study (Yang et al., 2022), we developed a deep learning (DL) system that recognizes valvular heart disorders (VHDs) in echocardiographic films. While improvements in DL have been utilized for interpreting echocardiograms, it has not been documented that these techniques have been used to analyze coloured Doppler recordings to diagnose VHDs. The researchers created a three-stage DL structure that categorizes echocardiographic opinions, recognizes the existence of VHDs, and, when measuring important metrics associated with VHD levels in order to automatically screen echocardiographic videos for mitral stenosis (MS), mitral regurgitation (MR), aortic stenosis (AS), and aortic regurgitation (AR). Retrospective analyses from five medical centres were used to instruct (n = 1,335), validate (n = 311), and test (n = 434) the method. The practical test information set consisted of 1,374 sequential cardiac echocardiograms that were retrospectively acquired. Using regions around the line for MS, MR, AS, and AR in the future test information set of 0.99 (95% CI: 0.97-0.99), 0.88 (95% CI: 0.86-0.90), 0.97 (95% CI: 0.95-0.99), 0.90 (95% CI: 0.88-0.92), and 0.90 (95% CI: 0.88-0.92), respectively, disease diagnosis precision was good. The degree of agreement (LOA) between the DL method and doctor predicts of statistics of valve injury levels ranged from 0.60 to 0.77 cm2 vs. 0.44 to 0.44 cm2 for MV area; from 0.27 to 0.25 vs. 0.23 to 0.08 for MR jet area/left atrial area; from 0.86 to 0.52 m/s vs. 0.48 to 0.54 m/s.

The present paper explores the application of advanced machine learning approaches to echocardiograms (echo), an exciting and actively investigated diagnostic approach. In this study (Wahlang et al., 2021), the echo is classified according to two distinct categories. Initially utilizing 2D echo pictures, 3D Doppler images, and video graphics images, categorization into ordinary (absence of anomalies) or improper (presence of anomalies) has been done. Additionally, utilizing video graphic echo pictures, the distinct types of valve—namely, mitral regurgitation (MR), aortic regurgitation (AR), tricuspid regurgitation (TR), and a mixture of the three types of valve—are classified. Long Short Term Memory (LSTM), which relies on recurrent neural networks (RNN), and Variational AutoEncoder (VAE), which is based on AutoEncoder, are two deep-learning approaches utilized for these goals. The use of video graphic pictures set this study apart from earlier SVM (Support Vector Machine)-based research, and the first of many deep-learning applications in this field. In the categorization of typical or deviant behaviour, it was discovered that deep-learning methods outperform SVM approaches. Overall, VAE outperforms LSTM for static 2D and 3D Doppler images, whereas LSTM outperforms VAE for video graphic data.

Aortic stenosis (AS) is frequently brought on by problems with the original arterial button, decrepit valves with calcification, and arthritis (Wahlang et al., 2020). For serious, the aorta repair technique is critical. Heart dimension measurements and regions in 2D ultrasound images have been individually identified prior to replacing the valve operation to assess the degree of heart constriction and offer sufficient data for estimating the volume of artificial gates. However given the location of the valves in the aorta in life varies dynamically, a 2D static visual evaluation is not only personal but also only provides measurements gathered from one or two images throughout the entire cardiac cycle. For quick and self-monitoring of the valves of the aorta and upstream end of the right cardiac outflow canal utilizing 3D-TEE the internet, a few investigations have developed a computerized monitoring technique using structural and optical flow. It offers up-to-date, precise evidence of support for preclinical replacement of the aortic valve organizing, assisting in improving the precision of valve assessment and boosting practitioner confidence.

This article (Fatima et al., 2020), describes both authors’ initial experiences using Auto valve Analysis, a revolutionary (AI)-based semi-automated tricuspid valvular analysis program from Siemens Healthcare in Mountain View, California. By reducing the amount of effort needed to examine heart frameworks, customized AI-based programs with live visualization and automation verification speed up medical decision-making and ensure strong consistency with little involvement from users. This approach will close deficiencies in variables predicting TV performance in the medical and scientific fields with the implementation of TV research. Additionally, these characteristics can enhance analytical and predictive classification for surgery and medical interventions when combined with interventionist design.

A growing amount of individuals in a range of medical settings employ myocardial point-of-care echocardiography to quickly diagnose significant cardiac disease on the patient’s side (Kirkpatrick et al., 2020). It may be necessary for echocardiographers and ultrasound technicians to assist in instructing professionals in cardiovascular ultrasonography who are trained in fields unrelated to heart disease. Echocardiography may face difficulties or have chances depending on the learners’ backgrounds, requirements, goals, and free time. Materials are needed, in addition to properly-directed and organized utilization of assets in order to participate in cardiac echocardiography training. In particular, educational initiatives benefit most from unrestricted institutional/departmental support, extensive academic expertise, committed academic time and money, computer technology assistance, and clinic- or hospital-wide collaboration.

The present research tested the hypothesis that when compared to cardiologists, sonographers, and resident readers, a deep convolutional neural network (DCNN) could better detect regional wall motion abnormalities (RWMAs) and distinguish between groups of coronary injury areas from conventional 2-dimensional echocardiographic images (Kusunose et al., 2020). There were included a total of 300 people with a diagnosis of cardiac attack. Three separate sets of 100 individuals each from this cohort experienced heart attacks of the right coronary artery (RCA), left circumflex (LCX) branch, and left anterior descending (LAD) artery. From a record set, 100 control individuals with adequate wall movement who were identical in age were chosen. Cardiovascular ultrasound images from short-axis views at the end-diastolic, mid-systolic, and end-systolic phases were included in each case. Diagnostic accuracies were calculated from the test set after the DCNN underwent 100 steps of retraining. The identical model received instruction separately into ten different iterations, and composite estimates have been generated using those iterations. The region under the receiver-operating characteristic curve (AUC) generated by the deep learning algorithm for detecting the existence of WMAs was comparable to that generated by the cardiologists and sonographer readers (0.99 vs. 0.98, respectively; p = 0.15) and significantly higher than the AUC result of the resident readers (0.99 vs. 0.90, respectively; p = 0.002). The deep learning algorithm’s AUC for detecting WMA areas was greater than that of resident readers (0.97 vs. 0.83, respectively; p = 0.003) but equivalent to that of cardiologist and sonographer readers (0.97 vs. 0.95, respectively; p = 0.61). The deep learning algorithm’s AUC from the verification group at a separate site (n = 40) was 0.90.

In this study (Davis et al., 2020), Echocardiography is only one of the areas of medical treatment where AI has found a home. Various fields of cardiac ultrasound, imaging, tests, and diagnostics are currently impacted by AI. The chance that AI will enhance sonographers’ employment and lessen the variation that is that exists in echocardiograms persists despite reservations amongst ultrasound technicians and echocardiographers. That is crucial to continue using analytical techniques and to recognize the proposed union between computers and mankind will succeed only when properly built AI and knowledgeable individuals are combined. As multidimensional echocardiograms become more common, specially designed devices using algorithms that are supervised may be able to discern when objects are visible prior to capturing them directly for a more informative sample.

The purpose of this research (Genovese et al., 2019) was to evaluate the precision and repeatability of novel, completely automated, machine learning (ML)-based technology for three-dimensional assessment of RV size and function. On exactly the same day, a transthoracic 3DE test was performed on 56 not chosen individuals who had been referred for clinically indicated cardiac magnetic resonance (CMR) imaging and had a wide range of RV dimensions, functions, and quality of the picture. The ML-based method was used to assess the end-systolic and end-diastolic RV volumes (ESV, EDV) and the ejection fraction (EF), which was then compared to CMR reference values using Bland-Altman and linear regression analyses. It was possible to measure RV activity by ultrasonography in all cases. The computerized method had an assessment time of 15 1 seconds, was 100% reversible, and was correct in 32% of cases. After automatic post-processing, endocardia contour editing was required in the remaining 68% of patients, increasing study time to 114 ± 71 seconds. A little these small corrections, the measures of RV amounts and EF were precise when compared to the CMR regard (biases: EDV, −25.6 ± 21.1 mL; ESV, −7.4 ± 16 mL; EF, −3.3% ± 5.2%) and demonstrated outstanding consistency, as evidenced by values of variation of 7% and intra class connections of 0.95 for all tests.

In this study (Kusunose et al., 2019), the identification and treatment of heart illness, and echocardiography are crucial. For medical judgment, a precise and trustworthy echocardiographic examination is necessary. Even if novel methods (3-dimensional echocardiography, speckle-tracking, semi-automated analysis, etc.) are being developed, operators’ expertise still plays a significant role in the end result of the choice. Unsolved diagnostic errors are a significant issue. Furthermore, when readings are taken repeatedly, an identical observer could reach an alternate conclusion. This is because cardiac specialists can disagree with one another regarding how to interpret images. All cardiac specialists need to have an accurate perception in this area due to the daily high work in clinical practice that may cause this inaccuracy. Though the necessary enormous database and “black box” approach raise a number of questions, AI can deliver acceptable outcomes in this area. Cardiologists will eventually need to modify their standard operating procedures to incorporate AI in the current phase of cardiology.

A neural network with deep learning is a subset of ANN that is also a subset of artificial intelligence (Madani et al., 2018). The field of artificial intelligence has applications across many fields of research, technology, and even everyday life. The present article will go over the function and present-day uses of neural network-based studies for cardiology assessment as well as its drawbacks and difficulties. The numerous cardiac illnesses, their roles, and even the way they look are all determined via echo. Decision-making is time-consuming, costly, and requires specialist expertise because requires is analyzed and understood. The use of computerized systems for cardiovascular imaging has significantly changed medical techniques by finding anomalies in cardiovascular muscle movements and device functions that aid in determining heart disease. The method known as deep learning is employed to analyze pictures, and it is currently being used to solve diagnostic issues. It is also highly helpful for doctors to improve their care for patients. In contrast to methods based on statistics, an extensive set of photos is needed to develop an algorithm for the particular issue. Machine learning is used for finding and establishing complicated designs and their relationships in images whenever performed in big databases. Using the extensive dataset, the machine understands and detects the required structure of the photo. Despite the lack of widespread acceptance of computerized devices in medical research, that method benefits academicians as well as physicians.

The purpose of this work (Nath et al., 2016) was to investigate the viability and dependability of enormous, focused NLP retrieval of numerous data items from cardiac reports. The machine-learning extraction of information about cardiovascular anatomy and functioning from differently structured echocardiographic records was made possible by the development of the NLP tool EchoInfer. Three independent current medical study projects’ accessible echocardiogram results from 2004 to 2013 were subjected to EchoInfer analysis. 15116 echocardiogram records from 1684 individuals were evaluated by EchoInfer, and 59 statistical and 21 qualitative data items were collected from each report. With regard to all 80 data pieces in 50 reports, EchoInfer attained an accuracy of 94.06%, a recall of 92.21%, and an F1-score of 93.12%. The 15,116 reports for this investigation included 10,590 dot echocardiographic reports, 861 stress echocardiographic reports, 3,456 transesophageal echocardiographic reports, and 1,050 transthoracic echocardiographic reports. EchoInfer assessed 9,444 reports from patients for various signs with no history of valvular surgery, 3,725 notifications from individuals with a past of replacing an aortic valve (AVR), 828 reports from individuals with a history of mitral valve replacement (MV) substitution, 441 reports from individuals with a past of mitral valve repair, 677 reports from individuals with a history of combined AVR and MV replacement or fixation. EchoInfer attained an overall F1-score of 93.12%, an overall precision (positive predictive value) of 94.06%, and an overall retention (sensitivity).

The objective of this study (Krittanawong et al., 2017) of the joints-creating grouping established by the American Academy of Echocardiography and the European Association of the Use of Imaging was to make revised suggestions to the earlier released standards for the heart container measurement in light of the recent decade’s quick advances in technology and the modifications in echocardiographic perform these advancements have caused regarding. On the back of significantly higher numbers of normal people, gathered from various forms of databases, this paper gives revised normal values for all four heartbeat spaces, incorporating multifaceted echocardiogram and cardiac stretching, if applicable. Additionally, this paper aims to fix a few small inconsistencies with earlier stated regulations. Information on ventricular arterial pressure, diastolic heart rate, elevated blood pressure evaluation, cardiovascular disease medication, diagnosis of diabetes, fasting glucose, creatinine concentrations overall cholesterol, low-density lipoprotein cholesterol, and triglycerides were gathered whenever it was practical. The Mosteller method was utilized to determine BSA. By dividing the weight in kilos by the square of the length in kilometres, the human body weight index was computed.

The development of multidimensional (3-D) current-time echocardiograms in the past few decades has made it possible and crucial for clinicians to automatically create particular-to-patient mathematical modelling (Bersvendsen et al., 2017). Although differentiation of the right ventricle (RV) is increasingly recognized as having a role in heart failure, a large number of echocardiographic segmentation methods described in the scientific literature concentrate on the left ventricle’s (LV) endocardial border. We outline a technique for linked separation of the LV and RV endo- and epicardial boundaries in 3-D ultrasound photographs. They suggest extending effective state-estimation classification architecture with an illustration of connected areas in order to address the separation issue. They also suggest adding cardiac incompressibility to the system in order to regularize the segmentation. In photos of 16 clients, the technique was examined against personal measurements and divisions. For the interfaces of the LV endocardium, RV endocardium, and LV epicardium, total actual differences across the suggested and comparison categorizations were found to be 2.8 ± 0.4 mm, 3.2 ± 0.7 mm, and 3.1 ± 0.5 mm, correspondingly. The approach was effective in terms of calculation, taking only 2.1 ± 0.4 s.

In this work (Balaji et al., 2015), a fully machine-generated categorization of the echocardiogram’s heart picture is suggested. The methodology depends on a technique called machine learning that distinguishes between two distinct characteristics. The parasternal short axis (PSAX), parasternal long axis (PLAX), apical two-chamber (A2C), and apex four-chamber (A4C) views are the four traditional perspectives described in this framework. Due to the sound, analyzing echocardiography pictures is challenging. The echocardiography picture contains a mixture of pepper and salt sounds, which will complicate the picture categorization procedure. Initially, the median filtering procedure is used to eliminate distortion from the source echocardiography picture. Labelling and triangles that are visible in the limits of the echocardiography images are the aberrations. Studies involving two hundred echocardiography pictures demonstrate that the suggested approach, which has a precision of 87.5%, may be utilized to classify heart views efficiently.

This work (Balaji et al., 2014) suggested work on effective ventricular image classification of echocardiography. Systolic blood and ventricular stages make up a heart circuit. The period is a state of relaxation and filling, whereas the systolic is the contraction of the blood vessels. Only the rhythmic images from the supplied video series were taken out and used to establish the echocardiogram’s image. First, distortion was removed from the echocardiography image while brightness was improved. In order to highlight the cardiac cavity prior to division, computational morphology is applied. It was done to break down the data using connected components labelling (CCL). Three common heart views were categorized: the parasternal short axis (PSAX), the apical two chambers (A2C), and the four-chamber (A4C) views. Over 200 echo pictures were subjected to tests, with an accuracy rate of 94.56%.

The goal of the research was to determine how a machine learning platform could be created to help physician assessment and simplify strain echocardiogram research (Johnson et al., 2018). In order to obtain new geometrical and kinematics information from strain echocardiograms acquired in the course of a sizable forward-looking, multicenter, multivendor investigation carried out in the United Kingdom, a computerized computational imaging workflow was created. The collected characteristics were used to build a combined neural network decoder to recognize patients who had significant heart disease during noninvasive cardiac imaging. In a separate American study, the model was examined. A controlled split reading research looked at how the accessibility of an AI categorization would affect the medical assessment of stressful echocardiograms. Cross-fold verification utilizing 31 distinct geometrical and kinematic factors produced a rate of classification that was satisfactory for identifying individuals who had significant heart disease in the initial data collection, with a specificity of 92.7% and a sensitivity of 84.4%. Throughout the separate verification information set, this precision was preserved. By using the AI categorization tool, doctors were able to acquire an area under the receiver-operating characteristic curve of 0.93 while also improving inter-reader assurance, acceptance, and specificity for recognizing diseases by 10%.

The purpose of this research (Genovese et al., 2019) was to evaluate the precision and repeatability of novel, completely automated, machine learning (ML)-based technology for three-dimensional assessment of RV size and function. On exactly the same day, a transthoracic 3DE test was performed on 56 not chosen individuals who had been referred for clinically indicated cardiac magnetic resonance (CMR) imaging and had a wide range of RV dimensions, functions, and quality of the picture. The ML-based method was used to assess the end-systolic and end-diastolic RV volumes (ESV, EDV) and the ejection fraction (EF), which was then compared to CMR reference values using Bland-Altman and linear regression analyses. It was possible to measure RV activity by ultrasonography in all cases. The computerized method had an assessment time of 15 1 seconds, was 100% reversible, and was correct in 32% of cases. After automatic post-processing, endocardia contour editing was required in the remaining 68% of patients, increasing study time to 114 ± 71 seconds. A little these small corrections, the measures of RV amounts and EF were precise when compared to the CMR regard (biases: EDV, −25.6 ± 21.1 mL; ESV, −7.4 ± 16 mL; EF, −3.3% ± 5.2%) and demonstrated outstanding consistency, as evidenced by values of variation of 7% and intra class connections of 0.95 for all tests.

In this study (Kusunose et al., 2019), the identification and treatment of heart illness, and echocardiography are crucial. For medical judgment, a precise and trustworthy echocardiographic examination is necessary. Even if novel methods (3-dimensional echocardiography, speckle-tracking, semi-automated analysis, etc.) are being developed, operators’ expertise still plays a significant role in the end result of the choice. Unsolved diagnostic errors are a significant issue. Furthermore, when readings are taken repeatedly, an identical observer could reach an alternate conclusion. This is because cardiac specialists can disagree with one another regarding how to interpret images. All cardiac specialists need to have an accurate perception in this area due to the daily high work in clinical practice that may cause this inaccuracy. Though the necessary enormous database and “black box” approach raise a number of questions, AI can deliver acceptable outcomes in this area. Cardiologists will eventually need to modify their standard operating procedures to incorporate AI in the current phase of cardiology.

This study presents (Volpato et al., 2019), contrasted with well-known standard methods, automated 3DE analysis of left ventricular (LV) mass utilizing the innovative ML algorithm yields repeatable and precise readings. 23 individuals who had 3DE (Philips EPIQ) and CMR scanning on the exact same day were prospectively evaluated. Wide-angle 3D single-beat datasets of the left ventricle were collected. The recently released computerized program (Philips HeartModel), along with traditional volumetric measurements (TomTec), was used to measure the amount of LV mass. By manually identifying the LV endo- and epicardial borders, CMR analysis was carried out. Repeated measurements were used to determine the repeatability of the ML technique and to quantify it using intra-class correlation (ICC) and coefficients of variation (CoV). In 20 patients (87%), computerized LV mass evaluations proved practical. The findings were comparable to those obtained from CMR (Bland-Altman bias 5 g, limits of agreement 37 g), as well as to findings obtained from a traditional 3DE study (bias 7 g, 27 g). While manual modifications were made in the majority of clients, computation time was significantly shorter (1.02 0.24 mins vs. 2.20 0.13 minutes for CMR and 2.36 0.09 minutes for TomTec). Experiments taken repeatedly revealed great reproducibility: ICC is 0.99 and CoV is 4 +/- 5%.

The purpose of this work was to better accurately forecast survival following cardiology using artificial intelligence (Samad et al., 2019). An extensive provincial medical system’s 171,510 randomly chosen individuals who received 331,317 cardiac echocardiograms were evaluated for morbidity. Using three distinct components, the researchers evaluated the prediction abilities of asymmetric artificial intelligence models to those of conventional logistics regression models. Sex, age, height, weight, heart rate, arterial pressure, low-density lipoprotein, high-density lipoprotein, cigarette smoking, and 90 cardiovascular-relevant global classifications of Conditions, Tenth Update, and codes are among the physiological factors. Other clinical factors include physician-reported EF and 57 more echocardiographic units of measurement. A multimodal restoration was performed on data that was absent using the linked equations approach (MICE). The researchers used a mean area under the curve (AUC) over 10 cross-validation folds for comparing the predictions to one another and basic medical grading systems. Just ten factors, 6 of which were generated from cardiac echocardiography, were required to reach 96% of the highest precision for prediction. Compared to LVEF, regurgitation of the tricuspid speed was a better predictor of mortality. Multimodal restoration using chain equations produced significantly lower prediction accuracy levels (the variation in AUC of 0.003) than the initial information in a selection of trials that included complete data for the top 10 factors.

The purpose of this work (Nath et al., 2016) was to investigate the viability and dependability of enormous, focused NLP retrieval of numerous data items from cardiac reports. The machine-learning extraction of information about cardiovascular anatomy and functioning from differently structured echocardiographic records was made possible by the development of the NLP tool EchoInfer. Three independent current medical study projects’ accessible echocardiogram results from 2004 to 2013 were subjected to EchoInfer analysis. 15116 echocardiogram records from 1684 individuals were evaluated by EchoInfer, and 59 statistical and 21 qualitative data items were collected from each report. With regard to all 80 data pieces in 50 reports, EchoInfer attained an accuracy of 94.06%, a recall of 92.21%, and an F1-score of 93.12%. The 15,116 reports for this investigation included 10,590 dot echocardiographic reports, 861 stress echocardiographic reports, 3,456 transesophageal echocardiographic reports, and 1,050 transthoracic echocardiographic reports. EchoInfer assessed 9,444 reports from patients for various signs with no history of valvular surgery, 3,725 notifications from individuals with a past of replacing an aortic valve (AVR), 828 reports from individuals with a history of mitral valve replacement (MV) substitution, 441 reports from individuals with a past of mitral valve repair, 677 reports from individuals with a history of combined AVR and MV replacement or fixation. EchoInfer attained an overall F1-score of 93.12%, an overall precision (positive predictive value) of 94.06%, and an overall retention (sensitivity).

The objective of this study (Krittanawong et al., 2017) of the joints-creating grouping established by the American Academy of Echocardiography and the European Association of the Use of Imaging was to make revised suggestions to the earlier released standards for the heart container measurement in light of the recent decade’s quick advances in technology and the modifications in echocardiographic perform these advancements have caused regarding. On the back of significantly higher numbers of normal people, gathered from various forms of databases, this paper gives revised normal values for all four heartbeat spaces, incorporating multifaceted echocardiogram and cardiac stretching, if applicable. Additionally, this paper aims to fix a few small inconsistencies with earlier stated regulations. Information on ventricular arterial pressure, diastolic heart rate, elevated blood pressure evaluation, cardiovascular disease medication, diagnosis of diabetes, fasting glucose, creatinine concentrations overall cholesterol, low-density lipoprotein cholesterol, and triglycerides were gathered whenever it was practical. The Mosteller method was utilized to determine BSA. By dividing the weight in kilos by the square of the length in kilometres, the human body weight index was computed.

The paper (Baumgartner et al., 2017) concentrates especially on improving the evaluation of the left ventricle’s discharge system, which includes low slope atrial stenosis with kept ejection ratio, a novel category of heart steno by slope, circulation, and discharge percentage, and an evaluation technique for a combined and gradually method of assessing the coronary a narrowing in health care settings. It is crucial to employ identical techniques for both AVA and shifts in velocity/gradient in order to prevent unexpected shifts. As an illustration, contrasting the results of the range acquired from the correct parasternal method with earlier measures that were taken from a coronal method can result in a spike in peak speeds of 0.3 m/s that may prompt surgery to be performed. When the flows have dropped concurrently, speed and slope may stay stable or even decline throughout AS advancement.

Table 2.1: State-of-the-art table for predicting coronary cardiovascular disease

References Methodology Dataset Evaluation Measures Limitations
(Pachiyannan et al., 2024) Machine Learning-based Congenital Heart Disease Prediction Method (ML-CHDPM) A large dataset of pregnant women ROC curve area, sensitivity, specificity, accuracy; recall: 96.25%, accuracy: 94.28% Potential bias in the dataset, limited to pregnant women, high computation power required
(Mohanty et al., 2024) Design and analysis of passive optical FBG sensor for HRV parameters; IoT-based architectural design Experimental investigation with 5 people Error rate < 10% compared to standard HR monitor Small sample sizes and experimental settings might not reflect real-world variability.
(Aljohani et al., 2023) Deep convolutional neural networks for valve diseases classification; MFCC, DWT feature extraction Dataset for valve sounds (not specified) Precision with F1 scores > 98.2%, specificities > 98.5% Dataset details not provided, performance might vary with different datasets
(Khan Mamun & Elfouly, 2023) Hybrid 1D-CNN for CHD detection using feature selection techniques Large dataset from online surveys Accuracy: 76.9% for CHD, 80.1% for no-CHD; compared with SVM, RF, AdaBoost, ANN Limited to survey data, performance might vary with clinical data, with relatively moderate accuracy.
(O’driscoll et al., 2022) Neural network decoder for strain echocardiograms; cross-fold validation with geometrical and kinematic factors Multicenter, multivendor strain echocardiograms Sensitivity (84.4%), Specificity (92.7%), AUROC (0.93) Limited to specific strain echocardiogram data; applicability to other datasets not explored
(Pellikka, 2022) Dobutamine stress echocardiograms with ultrasound enhancements; AUROC, sensitivity, and specificity calculations Stress echocardiograms AUROC (0.927), Sensitivity (90.5%), Specificity (88.4%) Limited to stress echocardiogram context; impact on routine clinical settings not assessed
(Yang et al., 2022) DL system for VHD recognition in echocardiograms; three-stage DL structure for disease detection and metric measurement Retrospective data from five medical centers Disease diagnosis precision for MS, MR, AS, AR Performance metrics specific to the DL approach; dataset bias and variability among centers could affect results
(Abbas et al., 2022) Attention-based Convolutional Vision Transformer (CVT-Trans) using CWTS Dataset for PCG signals (not specified) Accuracy: 100%, sensitivity: 99.00%, specificity: 99.5%, F1-score: 98% Dataset details not provided, high computational requirements
(Li et al., 2021) Lightweight neural network for heart sound categorization using time-frequency properties Heart sound data (not specified) Accuracy: 95.00%, memory size: 1.36 MB Dataset details not provided, limited to time-frequency properties, need for optimization for different devices.
(Schuuring et al., 2021) Revised guidelines for cardiac chamber measurement; inclusion of diverse normal populations; data on cardiovascular parameters Various databases Updated normal values, Methodological consistency Small inconsistencies not fully resolved; variability in data sources might affect generalizability
(Kusunose et al., 2020) DCNN for detecting regional wall motion abnormalities in cardiac ultrasound; comparison with expert readers Various coronary injury groups AUC comparison with cardiologists and sonographers Generalizability to other imaging modalities and clinical contexts not discussed
(Davis et al., 2020) AI impact on various cardiac ultrasound fields; potential for reducing variability in echocardiograms Multidimensional echocardiogram datasets Future AI applications in ultrasound, Reduction of variation Speculative impact; actual integration and acceptance in clinical practice not evaluated

Research gap

Although there are still certain research gaps, machine learning has made great strides in the prediction of coronary cardiovascular disease. The interpretability of sophisticated models, the integration and standardization of various data sources, and the models’ applicability to various clinical contexts or demographic groups are a few of these. Studies currently in existence frequently concentrate on particular populations, which raises questions about bias and fairness. Another difficulty is integrating these models into standard clinical workflows. Additional research is required to address the ethical issues and data privacy concerns related to using large amounts of patient data for machine learning.

Methodology

Introduction

This chapter uses machine learning models to analyze the role of predicting cardiovascular disease. It describes a dataset that contains different classes. The chapter covers both traditional and advanced machine learning models, including Random Forest Classifier, Decision Tree, XGBoost Classifier, and K-Nearest Neighbors (KNN). It also proposes an Artificial Neural Network (ANN) model to capture the nonlinear relationships between detecting cardiovascular disease.

Dataset

A comprehensive collection of data essential for using machine learning techniques to predict coronary cardiovascular disease (CVD) is the “Rates_and_Trends_in_Coronary_Heart_Disease” dataset. This dataset includes several dimensions that are necessary for a thorough analysis, including location, year, geography, and classes. It includes several classes, such as cardiovascular diseases, stroke, and coronary heart disease that offer a comprehensive understanding of cardiovascular health. Researchers can examine patterns and trends across a range of diseases thanks to the insights provided by each class, which covers various facets and kinds of cardiovascular conditions. The geographic and location characteristics in the dataset aid in capturing the regional variations in CVD prevalence and trends, offering important context regarding the ways in which local and environmental factors influence the risk of heart disease. Through temporal analysis made possible by the year attribute, trends and shifts in CVD rates over time can be identified and linked to modifications in population behaviours, healthcare, and policy, as shown in Figure 3.1. A comprehensive approach to cardiovascular health is made possible by this dataset’s inclusion of a variety of topics, including stroke and coronary disease. This makes it easier to develop machine learning models that can distinguish between related conditions and specifically predict coronary heart disease. The dataset facilitates the application of diverse machine learning techniques, including supervised and unsupervised learning models, by integrating a broad range of features and labels. This allows for the comprehensive assessment and prediction of risks.

Figure 3.1: Dataset sample

Preprocessing

When utilizing machine learning techniques to predict CVD, preprocessing is an essential step because it directly affects the predictive models’ performance and quality. Comprehensive preprocessing is necessary to guarantee that the data is clear, consistent, and prepared for analysis given the complexity and heterogeneity of the data involved, which range from clinical records and demographic data to imaging data and lifestyle factors. Preprocessing usually entails several important procedures, such as feature selection, dimensionality reduction, data transformation, data cleaning, data normalization, and handling of missing values.

Data cleansing

Eliminating duplicate entries, fixing inconsistencies, and handling outliers that might interfere with the model’s learning process are all part of the data-cleaning process. In medical datasets, where gaps in data may arise from incomplete patient records or data entry errors, handling missing values is especially crucial. To address missing values without significantly increasing bias, strategies like mean/mode imputation, forward filling, or sophisticated techniques like KNN imputation are employed.

Handling missing values: Identify and manage missing data. Common techniques include imputation (e.g., filling missing values with mean, median, or mode) or removing rows/columns with excessive missing values.

Removing Duplicates: Ensure there are no duplicate entries in the dataset to maintain data integrity.

Encoding categorical variables

Convert categorical variables into a numerical format using techniques such as one-hot encoding or label encoding. Depending on the models being used and the characteristics of the categorical variables, different encoding techniques are applied, such as Label Encoding, One-Hot Encoding, and Target Encoding. This stage makes sure that all the data is in a numerical format that can be used by machine learning algorithms.

Normalization/Standardization

By using this method, the numerical characteristics are rescaled to fall between [0, 1] and [-1, 1]. Normalizing variables like age, income, loan amount, and mortgage amount, for instance, guarantees that they are on a comparable scale and keeps any one feature from unduly impacting the model. When there is a non-Gaussian distribution of the data, normalization is especially helpful.

Feature engineering

When utilizing machine learning techniques to predict coronary CVD, feature engineering plays a crucial role in the preprocessing stage. To increase the predictive capacity of machine learning models, new features are added or current ones are changed. Because CVD is a complex disease with many risk factors, including lifestyle choices, genetic markers, clinical measurements, demographic information, and imaging data, effective feature engineering is essential for identifying the underlying patterns and relationships that influence disease risk. Finding important predictors that have a strong correlation with CVD outcomes is the first step in the feature engineering process when it comes to CVD prediction. This could involve simple transformations like combining characteristics like blood pressure, cholesterol, and glucose levels to create composite risk scores, or it could involve calculating the body mass index (BMI) from weight and height. To capture more complex risk profiles, more sophisticated approaches might include combining genetic markers with family history or developing interaction terms between characteristics, like age and smoking status.

Data splitting

Using an 80-20 data splitting strategy, machine learning techniques are applied to the prediction of CVD. The remaining 20% of the training set is used for testing, and the remaining 80% is used to learn patterns and correlations from different risk factors. Predictive accuracy is improved by the large variety of data points in the 80% training set, as shown in Figure 3.2. After the model has been fully trained and optimized, its performance on unobserved data is evaluated using the 20% testing set. Evaluation is done on important performance metrics, such as recall, accuracy, precision, F1-score, and ROC curve.

C:\Users\Mega Computers\AppData\Local\Packages\Microsoft.Windows.Photos_8wekyb3d8bbwe\TempState\ShareServiceTempFolder\CVD disease detection.drawio.jpeg

Figure 3.2: Data splitting

Machine learning models

The process for evaluating asthma using a variety of machine-learning models is explained in this section. The machine learning models include the Random Forest Classifier, Decision Tree, XGBoost Classifier, and KNN and proposed ANN model.

Random forest classifier

Since the Random Forest classifier can handle complex, non-linear relationships and high-dimensional datasets well, it is a powerful ensemble learning technique that is frequently used in the prediction of coronary CVD. In order to arrive at a final, more reliable result, it first builds a “forest” of several decision trees during the training phase. A random subset of features and a random subset of data points are used to build each decision tree in the random forest (with replacement, known as bootstrapping). This randomness lowers the likelihood of overfitting, a common issue in machine learning, by ensuring that the model is not unduly sensitive to particular features or data points. The Random Forest classifier can handle a wide range of predictor variables, including age, gender, blood pressure, cholesterol, smoking status, diabetes, family history, and other clinical measurements, in the context of CVD prediction. Different patterns and relationships within the data will be discovered by each decision tree in the forest. One tree may concentrate on the effects of age and smoking status, while another may discover how certain combinations of high blood pressure and cholesterol raise the risk of coronary heart disease. Random Forest accurately predicts CVD risk by capturing a wide range of patterns and interactions among the features, which are often critical for multi-tree training.

From the original training dataset D with n instances, generate B bootstrap sample where (b = 1, 2, 3…, B) by sampling n instances with replacement. The best split among these m features is chosen based on a splitting criterion, such as Gini impurity or entropy.

Feature importance scores, which measure each feature’s contribution to the model’s predictions, are provided by Random Forest.

Decision tree

Using subsets of a dataset based on input features, the Decision Tree machine learning algorithm predicts coronary CVD. This forms a structure resembling a tree, where each node denotes a choice made in response to a particular feature, and each branch denotes a result. The final classification or prediction result is represented by the end nodes, also referred to as leaves.

The decision tree determines which critical characteristics such as age, blood pressure, cholesterol, or smoking status best differentiate the target classes in CVD prediction. The metrics used to determine this choice are information gain and Gini impurity, which gauge the split’s purity. For instance, the tree may divide the dataset according to other characteristics like blood pressure, diabetes status, BMI, and family history of CVD if a person’s cholesterol level exceeds a predetermined threshold.

To choose the best feature for data splitting, the decision tree applies a splitting criterion. Information Gain (IG), which is predicated on the information theory concept of entropy, is one popular criterion. The degree of uncertainty or impurity in a dataset is measured by entropy.

XG Boost classifier

A scalable and highly effective machine-learning method for predicting coronary CVD is the XGBoost classifier. It is a more complex variation of gradient boosting that gradually assembles a collection of weak learners, usually decision trees. The main idea of XGBoost is to minimize errors through an iterative process of prediction optimization, where each new tree corrects the mistakes made by the previous trees. Because of this, XGBoost is especially good at identifying minute patterns and interactions between different CVD risk factors. When predicting CVD, XGBoost first initializes a basic model that forecasts the mean result for the training set. After that, it continuously adds new decision trees to the ensemble with the goal of estimating residual errors of the total predictions made by all of the earlier trees. This aids in lowering errors and improving the model’s forecasts. The L1 (Lasso) and L2 (Ridge) regularization methods in XGBoost assist in preventing overfitting in intricate medical datasets with high-dimensional features. This guarantees the model’s strong generalization to new data and its high predictive accuracy in practical scenarios. To effectively compute optimal splits and handle missing values, XGBoost also employs a weighted quantile sketch algorithm. This feature makes XGBoost a useful tool for early CVD risk assessment in clinical settings.

Determine the loss function’s first and second derivatives (gradients and Hessians) in relation to the predictions. For instance, equation (3.4) describes the gradient and Hessian is calculated as follows in logistic regression for binary classification:

Gradient ():

The loss function’s first derivative with respect to the predicted value is represented by the gradient. It indicates the direction in which the model’s prediction needs to be adjusted in order to minimize the loss and measures the rate at which the loss function changes in relation to the prediction.

Hessian (​):

The first derivative of the loss function with respect to the predicted value is represented by the gradient. It shows the direction in which the model’s prediction must be changed in order to minimize the loss by measuring the rate of change of the loss function with respect to the prediction.

K-Nearest Neighbor classifier

To predict coronary CVD, machine learning techniques such as the K-Nearest Neighbors (KNN) algorithm are straightforward and efficient. It is a lazy learning algorithm that uses the training dataset directly to inform predictions rather than going through a separate training phase. KNN uses a user-defined parameter k to determine how similar a new data point is to the k closest data points in the training dataset. Using distance metrics like Euclidean, Manhattan, or Minkowski distances, KNN determines the distance between each new patient’s data point and every other data point in the training set for CVD prediction. Age, cholesterol, blood pressure, heart rate, smoking status, diabetes, family history, and other clinical measurements are among the characteristics that go into determining this distance. After determining the k neighbours that are closest to the new data point, KNN uses a majority voting system to predict the likelihood that the new patient will have CVD. Because it is based on the results of comparable cases, KNN’s majority voting system lends itself to ease of interpretation.

The process involves identifying the k data points (neighbours) that are closest to a given data point, usually by utilizing distance metrics like Euclidean distance. In n-dimensional space, the Euclidean distance between two points is defined as follows in equation (3.5):

The k-NN algorithm is then fed with the feature vector f that was extracted from the deep learning model. Based on the distances between the feature vectors, the k-NN classifier finds the k-nearest neighbours, as shown in equation (3.6). Target data point 𝑓 𝑡f t​is classified according to the majority vote of its k nearest neighbours’ risk labels :

Proposed Artificial Neural Network (ANN)

An innovative technique for forecasting coronary CVD is the Artificial Neural Network (ANN). ANNs, which are modelled after the human brain, are made up of linked layers of neurons that process input data and produce predictions. An input layer, several hidden layers, and an output layer make up an ANN. A collection of cardiovascular risk-related features, including age, blood pressure, heart rate, cholesterol, BMI, diabetes, smoking status, family history of heart disease, and other clinical or demographic factors, are sent to the input layer. The hidden layers are made up of several neurons that apply an activation function to introduce non-linearity and perform weighted computations on the inputs. During training, an optimization algorithm is used to iteratively adjust the weights of these connections, which are initially set randomly. Backpropagation is a technique used by the ANN during training to adjust the parameters and update the weights across several epochs, as shown in Figure 3.3. For binary classification tasks, the output layer typically employs a sigmoid activation function to generate the final prediction. A probability score indicating the patient’s chance of developing CVD is the output.

The model uses Keras to predict coronary CVD and is a Sequential ANN. Based on the input features, it is intended for binary classification, where the output is either the presence or absence of CVD. The model is composed of several layers: activation functions, dropout layers for regularization, dense layers, and an output layer set up for binary classification. The binary cross-entropy loss function, accuracy as the performance metric, and a learning rate of 0.001 are used to compile the model using the RMSprop optimizer. The number of features in the training data is matched by the input layer, which is followed by a dense layer with 1,056 neurons and ReLU activation, as described in Table 3.1. While the third layer polishes the learned features, the second layer analyzes the transformed features to discover intricate patterns. One neuron with a sigmoid activation function makes up the output layer. This neuron is best suited for binary classification tasks because it produces a probability value between 0 and 1.

Table 3.1: Parameters detail of proposed model ANN

Layer Type Output Shape Number of Parameters Activation Function
Dense (None, 1056) 48,608 ReLU
Dense (None, 512) 540,224 ReLU
Dense (None, 256) 131,328 ReLU
Dropout (None, 256) 0
Dense (Output) (None, 1) 257 Sigmoid

C:\Users\Mega Computers\AppData\Local\Packages\Microsoft.Windows.Photos_8wekyb3d8bbwe\TempState\ShareServiceTempFolder\Proposed diagram CVD.drawio.jpeg

Figure 3.3: Proposed model

Evaluation measures

Evaluation measures in the prediction of coronary CVD are essential for figuring out how well predictive models work overall and in terms of accuracy and dependability. Accuracy is one of the key metrics; it gives an overall idea of how frequently the model predicts outcomes correctly, but it may not be enough when there is a class imbalance, as shown in equation (3.7). More specific insights are provided by precision and recall. Precision represents the accuracy of asthma predictions as the percentage of true positive predictions out of all positive predictions, while recall quantifies the model’s capacity to capture all real positive instances and shows how well it detects situations.

True positives (TP) are instances that are positive in the test set and are correctly labelled as positive by the classifier. True negatives (TN) are instances that are negative in the test set and are correctly labelled as negative by the classifier. False positives (FP) are instances that are negative in the test set but are incorrectly labelled as positive by the classifier. False negatives (FN) are instances that are positive in the test set but are incorrectly labelled as negative by the classifier. Equation (3.10) shows the F1 score is the harmonic mean of precision and recall, providing a combined measure of precision and recall.

Chapter Summary

The methodology chapter offers a thorough examination of the machine learning models that are employed to investigate coronary CVD. It describes the preprocessing steps for the data and the dataset which contain different columns. The models that were employed are then covered in the chapter, including the ANN that was suggested as well as the Random Forest Classifier, Decision Tree, XGBoost Classifier, and KNN. The strengths and suitability of each model for predicting cardiovascular disease are discussed. The evaluation metrics such as accuracy, precision, recall, F1-score, and AUC that are used to gauge how well these models perform are also covered in this chapter.

Results And Discussion

Introduction

In the results chapter, the effectiveness of machine learning models in predicting cardiovascular disease is examined. These models include Random Forest Classifier, Decision Tree, XGBoost Classifier, and KNN and a proposed ANN model. AUC, F1-score, accuracy, precision, recall, and other critical evaluation metrics are used to evaluate how well each model predicts coronary CVD, and clinical data. Initial insights into data separability are provided by the Decision Tree models, while complex feature interactions are handled by the ensemble-based Random Forest and XGBoost classifiers. The suggested ANN model may outperform conventional models in capturing complex patterns and non-linear relationships.

Experimental setup

TensorFlow and Python programming are used for the implementation of the model described in the present research. The complete experimental setup was done in Python using Anaconda 2.6.0. Keras 2.6.0 libraries are used to build, compile, and test the model. TensorFlow 2.6.0 was used to develop the models as a backend, as Python 3.9.18 on a computing environment was used in these experiments with a 2.20GHz Intel(R) Core (TM) i9-13950HX CPU and 64GB of RAM, and 8GB dedicated GeForce RTX 4060 Nvidia GPU.

Machine learning models

The machine learning models, which showed respectable accuracy and interpretability, were, Random Forest Classifier, Decision Tree, XGBoost Classifier, KNN, and proposed ANN. These models offered a strong basis for coronary CVD prediction. These models were successful in detecting coronary CVD patterns, but they sometimes had trouble processing complex, high-dimensional data.

Random forest classifier

The Random Forest classifier for predicting coronary CVD using machine learning techniques performed well overall, with an accuracy of 91%. For class ‘0’ (no CVD), the model achieves a precision of 0.92, indicating that 92% of the instances predicted as having no CVD are correct. The recall for class ‘0’ is 0.91, which means that the model correctly identified 91% of the actual no CVD cases. The f1-score for this class is 0.91, indicating a balanced trade-off between precision and recall. Similarly, for class 1 (CVD), the model has a precision of 0.91, indicating that 91% of the predicted CVD cases are true positive. The recall for class ‘1’ is 0.92, indicating that the model correctly detects 92% of actual CVD cases. The f1-score for class 1 is 0.91, indicating that the model is highly reliable and balanced when predicting both positive and negative classes. With support values of 43,528 for class 0 and 43,026 for class 1, the results show that the Random Forest model was trained and tested on a large amount of data, ensuring the model’s robustness and generalizability. Overall, these metrics indicate that the Random Forest classifier is extremely effective at predicting CVD, giving clinicians a reliable tool for assessing patient risk and making informed decisions. Table 4.1 describes the evaluation measure of the random forest classifier.

Table 4.1: Performance evaluation of random forest classifier

Class Precision Recall F1 score Support
0 0.92 0.91 0.91 43528
1 0.91 0.92 0.91 43026
Accuracy 0.91 86554

The confusion matrix of the Random Forest classifier illustrates how well it predicts coronary CVD. With minimal false positives (FP) and false negatives (FN), it predicts true positives (TP) and true negatives (TN) with accuracy. The model’s low values of FP and FN minimize misclassifications, making it a useful tool for early intervention and preventive healthcare strategies, while its high values of TP and TN demonstrate its ability to distinguish between patients with and without CVD.

Figure 4.1: Confusion matrix of random forest classifier

Decision tree

With an accuracy of 90%, the Decision Tree classifier’s overall performance in predicting coronary CVD through machine learning techniques is good. The precision for class ‘0’ (no CVD) is 0.90, meaning that 90% of the cases that were predicted to have no CVD are correct. With a recall for class ‘0’ of 0.91, the model correctly identified 91% of real cases of no CVD. The class’s f1-score, which is 0.90, indicates a balanced trade-off between recall and precision and demonstrates the model’s ability to accurately predict both true positives and true negatives. Comparably, the model achieves a precision of 0.90 for class 1 (CVD), indicating that 90% of predicted CVD cases are accurate. Ninety per cent of the real CVD cases were correctly identified, as indicated by the recall for class 1, which is also 0.90. Class 1 has an f1-score of 0.90, indicating consistent performance in predicting positive cases of CVD. The reliability of the results is increased by the support values, which show that the model’s predictions were based on a sizable number of samples (43,528 for class ‘0’ and 43,026 for class ‘1’), as described in Table 4.2. The Decision Tree classifier is a useful tool for predicting CVD risk and supporting clinicians in their decision-making processes because it performs robustly and in balance across both classes overall.

Table 4.2: Performance evaluation of decision tree

Class Precision Recall F1 score Support
0 0.90 0.91 0.90 43528
1 0.90 0.90 0.90 43026
Accuracy 0.90 86554

The performance of the Decision Tree classifier in predicting coronary  CVD is shown by the confusion matrix. Accurate predictions are indicated by TP and absence is indicated by TN. FP happens when the model predicts CVD incorrectly, and FN happens when it doesn’t detect it. The model’s high TP and TN values show how well it works for early diagnosis and treatment, as shown in Figure 4.2.

Figure 4.2: Confusion matrix of decision tree

XG boost classifier

With an accuracy of 91%, the XGBoost classifier performs well overall in predicting coronary CVD. The classifier achieves a precision of 0.92 for class 0 (no CVD), meaning that 92% of the instances predicted as having no CVD are accurate. Class ‘0’ has a recall of 0.90, which indicates that 90% of real cases of no CVD are correctly identified. Class ‘0’ has a f1-score of 0.91, which indicates a performance that strikes a balance between recall and precision. The model’s precision for class 1 (CVD) is 0.90, meaning that 90% of the predicted cases of CVD are correct. Class 1 has a recall of 0.93, meaning that 93% of real CVD cases are identified correctly. Class 1 has an f1-score of 0.91, which indicates that the model consistently performs well in identifying CVD cases. Support values for classes ‘0’ and ‘1’ are 43,528 and 43,026 respectively, indicating that the classifier is more reliable because it is based on a large number of samples. In general, XGBoost shows a high degree of efficacy in predicting the existence and absence of CVD, which makes it a useful instrument for precise risk assessment in clinical settings. Table 4.3 describes the evaluation measure of the XG boost classifier.

Table 4.3: Performance evaluation of XG boost classifier

Class Precision Recall F1-Score Support
0 0.92 0.90 0.91 43,528
1 0.90 0.93 0.91 43,026
Accuracy 0.91 86,554

The confusion matrix of the XGBoost classifier shows that it can discriminate between patients who have and do not have coronary CVD. It displays TP when the model predicts CVD correctly, TN when it detects the absence, and FP when the model predicts CVD incorrectly for cases that are not CVD, as shown in Figure 4.3. The model’s low FP and FN values show that it can minimize errors, while its high TP and TN values show that it is good at identifying both positive and negative instances.

Figure 4.3: Confusion matrix of XG boost classifier

K-Nearest Neighbor

With an overall accuracy of 90%, the KNN classifier performs well in predicting coronary CVD. The classifier achieves a precision of 0.90 for class 0 (no CVD), meaning that 90% of the instances predicted as having no CVD are accurate. The model correctly detects 90% of real cases of no CVD, according to the recall for class ‘0’, which is 0.90. Class 0 has a f1-score of 0.90, which indicates a performance that strikes a balance between recall and precision. The KNN model’s precision for class 1 (CVD) is 0.89, meaning that 89% of the predicted cases of CVD are correct. Class 1 has a recall of 0.90, meaning that 90% of real CVD cases are correctly identified by the model. Class 1 has an f1-score of 0.90 as well, demonstrating the model’s consistency in identifying CVD cases. Class 0 and class 1 have support values of 43,528 and 43,026 respectively, indicating that the datasets are fairly balanced for both classes, as described in table 4.2. The KNN classifier is a good choice for clinical risk assessment applications because it performs consistently and fairly in predicting the risk of CVD, with good precision and recall for both CVD and non-CVD predictions.

Table 4.4: Performance evaluation of KNN

Class Precision Recall F1-Score Support
0 0.90 0.90 0.90 43,528
1 0.89 0.90 0.90 43,026
Accuracy 0.90 86,554

The confusion matrix of the K-Nearest Neighbors (KNN) classifier illustrates how well it predicts coronary CVD. Along with FP and FN, it displays TP and TN for patients with and without CVD. For both the 0 and 1 classes, the model has a high percentage of accurate predictions (TP and TN) and a low percentage of incorrect predictions (FP and FN), as shown in Figure 4.4. Even so, there are still some misclassifications, indicating that the model still needs to be improved.

Figure 4.4: Confusion matrix of KNN

Proposed Artificial Neural Network (ANN)

With an overall accuracy of 99%, the suggested Artificial Neural Network (ANN) model for coronary CVD prediction shows exceptional performance The ANN model achieves a precision of 0.99 for class ‘0’, which denotes no CVD, meaning that 99% of the predictions for no CVD are accurate. With a recall for class 0 of 0.98, the model correctly detects 98% of real cases of no CVD. Class ‘0’ has an f1-score of 0.99, indicating a great trade-off between recall and precision that leads to high reliability for negative predictions. Additionally, the model achieves a precision of 0.99 for class 1 (representing CVD), meaning that 99% of the predicted CVD cases are accurate. Recall for class ‘1’ is 0.99, meaning that the model correctly detects 99% of the real CVD cases. Class ‘1’ has a f1-score of 0.99, which indicates a very high model effectiveness in minimizing false positives and identifying true positives. The well-balanced nature of the dataset is indicated by the support values for classes 0 and 1, which are 43,528 and 43,026, respectively. Overall, the suggested ANN model’s results validate its remarkable efficacy and dependability in precisely predicting CVD, making it a useful instrument for early intervention and clinical decision-making. Table 4.5 describes the evaluation measure of the proposed model.

Table 4.5: Performance evaluation of the proposed model

Class Precision Recall F1-Score Support
0 0.99 0.98 0.99 43,528
1 0.99 0.99 0.99 43,026
Accuracy 0.99 86,554

With 98% of cases correctly identified as true negatives and 99% correctly identified as true positives in class 0 (no CVD), the ANN model accurately predicts coronary CVD. The model’s ability to reliably distinguish between patients with and without CVD is demonstrated by the balance between false positives and false negatives, as shown in Figure 4.5. The model is appropriate for clinical decision-making and early CVD risk prediction due to its precision, recall, and overall accuracy.

Figure 4.8: Confusion matrix of the proposed model

Discussion

In the discussion of machine learning techniques for coronary CVD prediction, the relative performance of several models Random Forest, Decision Tree, XGBoost, KNN, and the proposed ANN model is highlighted. With 91% accuracy rates, the Random Forest and XGBoost classifiers demonstrated their robustness in managing intricate data structures and producing dependable forecasts. With 90% accuracy, the Decision Tree classifier fared better as well, albeit a little slower because of its propensity for overfitting. Additionally, the KNN model demonstrated its simplicity and efficacy in predicting CVD with an accuracy of 90%. However, the suggested ANN model, which attained an amazing 99% accuracy, was the best performer. This excellent accuracy shows how well the suggested model can identify intricate patterns and relationships in the dataset, which makes it a very useful tool for estimating the risk of CVD. The strong performance of the suggested ANN model indicates that it may considerably enhance coronary CVD early detection and intervention strategies, providing a more dependable and effective method than conventional techniques.

Model Accuracy
Random Forest 91%
Decision tree 90%
XG Boost 91%
KNN 90%
Proposed model 99%

The proposed ANN model and alternative methods have a significant difference in accuracy, which can be visually highlighted by using graphic representations like bar charts or line graphs. This facilitates comprehension of the relative benefits and emphasizes the model’s potential for practical applications in predicting coronary CVD and enhancing clinical decision-making, as shown in Figure 4.9.

Figure 4.9: Comparison models accuracy

Limitations

Machine learning techniques have the potential to predict coronary cardiovascular disease, but they face several limitations. Model performance can be impacted by the diversity and quality of the datasets, as many do not contain complete clinical, lifestyle, and demographic data. Unbalanced data can produce skewed outcomes. Feature engineering can take a lot of time and requires domain knowledge. Clinical adoption of deep learning models is challenging due to their black-box nature, which makes them difficult to interpret. Model robustness can be impacted by noise and variable data sources, which can reduce the models’ dependability in practical applications. Challenges also arise from data privacy and ethical concerns.

Chapter Summary

The results chapter examines several machine learning models to predict coronary CVD. With a 99% accuracy rate, the ANN model that was suggested was the most successful. The artificial neural network model outperformed other models in identifying cases of normal and predicting coronary CVD. The chapter highlights the significance of each model in comprehending the onset of coronary CVD and makes recommendations for further study to increase prediction accuracy.

Conclusion And Future Work

Conclusion

CVD are a significant global health concern, causing millions of deaths annually. CVDs include various conditions that affect the heart and blood vessels, including cerebrovascular accidents, congenital defects, pulmonary blood clots, cardiac arrhythmia, peripheral arterial problems, coronary artery disease (CAD), rheumatic heart conditions, and heart muscle-affecting cardiomyopathies. CAD is particularly concerning due to its correlation with worldwide death rates. The IoMT is a technology that links medical devices and collects and processes data in real-time, enhancing healthcare workflow. It combines IoT power with patient details, ensuring data security in the IoMT-based framework.

The IoMT is a rapidly developing field where various medical equipment, software programmers, and healthcare professionals come together on a single platform to provide high-quality services. CAD is a common and potentially dangerous condition affecting the heart’s arteries.

Common risk factors include smoking, high blood pressure, cholesterol, diabetes, obesity, a sedentary lifestyle, a family history of heart disease, and ageing. Diagnosis involves medical history, physical examination, ECG, stress tests, imaging, and blood tests. The “Rates_and_Trends_in_Coronary_Heart_Disease” dataset is essential for using machine learning techniques to predict CVD. This dataset includes several dimensions, such as location, year, geography, and classes, providing a comprehensive understanding of cardiovascular health. Preprocessing is crucial for ensuring the accuracy and consistency of predictive models. Data cleansing involves eliminating duplicate entries, fixing inconsistencies, and handling outliers. Common techniques include imputation or removing rows/columns with excessive missing values. Categorical variables can be converted into a numerical format using techniques like one-hot encoding or label encoding. Normalization/Standardization ensures that the numerical characteristics are on a comparable scale, especially when there is a non-Gaussian distribution of the data. The Random Forest classifier is a powerful ensemble learning technique used in predicting coronary heart disease (CVD) risk. It builds a “forest” of several decision trees during the training phase, using a random subset of features and data points to build each decision tree.

The Decision Tree machine learning algorithm predicts CVD by forming a structure resembling a tree, with each node denoting a choice made in response to a particular feature and each branch denoting a result. The decision tree determines which critical characteristic best differentiates the target classes in CVD prediction using metrics such as information gain and Gini impurity. XGBoost classifier is a scalable and highly effective machine-learning method for predicting CVD. It minimizes errors through an iterative process of prediction optimization, identifying minute patterns and interactions between different CVD risk factors. XGBoost first initializes a basic model that forecasts the mean result for the training set and continuously adds new decision trees to the ensemble to estimate residual errors.

KNN is a lazy learning algorithm that uses the training dataset directly to inform predictions. It uses distance metrics like Euclidean, Manhattan, or Minkowski distances to determine the distance between each new patient’s data point and every other data point in the training set for CVD prediction. KNN’s majority voting system allows for ease of interpretation. An innovative technique for forecasting coronary CVD is the Artificial Neural Network (ANN), which is modelled after the human brain and consists of linked layers of neurons that process input data and produce predictions. Evaluation measures are crucial in assessing the accuracy and dependability of predictive models for coronary cardiovascular disease (CVD). Accuracy is a key metric, providing insight into how frequently the model predicts outcomes correctly. Precision and recall are more specific metrics, indicating the model’s capacity to capture all real positive instances and its detection ability. The Random Forest classifier for predicting CVD using machine learning techniques performed well overall, with an accuracy of 91%. The model achieved a precision of 0.92 for class ‘0’ (no CVD) and 0.91 for class 1 (CVD), indicating a balanced trade-off between precision and recall. The Decision Tree classifier also performed well, with an accuracy of 90% for class ‘0’ and 90% for class 1 (CVD). The XGBoost classifier also performed well, with an accuracy of 91% for class ‘0’ and a precision of 0.92 for class ‘0’. The KNN classifier also performed well, with an overall accuracy of 90% for both classes. The suggested Artificial Neural Network (ANN) model demonstrated exceptional performance, with an overall accuracy of 99% for class ‘0’, 99% of predictions for no CVD, and a recall of 0.98 for class ‘0’. The model’s effectiveness in minimizing false positives and identifying true positives was high, with support values of 43,528 and 43,026 respectively. In conclusion, the Random Forest, Decision Tree, XGBoost, KNN, and ANN classifiers have shown promising results in predicting CVD risk and providing reliable tools for clinicians. The ANN model’s results validate its remarkable efficacy and dependability in accurately predicting CVD, making it a useful instrument for early intervention and clinical decision-making.

Future work

Future work on utilizing transfer learning and domain adaptation strategies, creating explainable AI models, and integrating multi-modal data sources should be the main goals of machine learning research on cardiovascular disease prediction. Predictive performance can be improved by using sophisticated feature engineering and selection techniques like natural language processing. IoMT devices and real-time monitoring systems can help with early diagnosis and individualized treatment plans. Strong frameworks are required for model training and data sharing while protecting privacy, and ethical considerations are vital. Transitioning these tools from research to routine clinical practice can be facilitated by conducting prospective clinical trials to validate predictive models in various patient populations and healthcare settings.

References

Abbas, Q., Hussain, A., & Baig, A. R. (2022). Automatic detection and classification of cardiovascular disorders using phonocardiogram and convolutional vision transformers. Diagnostics, 12(12), 3109.

Adewole, K. S., Akintola, A. G., Jimoh, R. G., Mabayoje, M. A., Jimoh, M. K., Usman-Hamza, F. E., . . . Ameen, A. O. (2021). Cloud-based IoMT framework for cardiovascular disease prediction and diagnosis in personalized E-health care. In Intelligent IoT systems in personalized health care (pp. 105-145). Elsevier.

Ahmed, Z., Mohamed, K., Zeeshan, S., & Dong, X. (2020). Artificial intelligence with multi-functional machine learning platform development for better healthcare and precision medicine. Database, 2020, baaa010.

Al-Khlaiwi, T., Alshammari, H., Habib, S. S., Alobaid, R., Alrumaih, L., Almojel, A., . . . Alkhodair, M. (2023). High prevalence of lack of knowledge and unhealthy lifestyle practices regarding premature coronary artery disease and its risk factors among the Saudi population. BMC Public Health, 23(1), 908.

Aljohani, R. I., Hosni Mahmoud, H. A., Hafez, A., & Bayoumi, M. (2023). A Novel Deep Learning CNN for Heart Valve Disease Classification Using Valve Sound Detection. Electronics, 12(4), 846.

Allan, S., Olaiya, R., & Burhan, R. (2022). Reviewing the use and quality of machine learning in developing clinical prediction models for cardiovascular disease. Postgraduate Medical Journal, 98(1161), 551-558.

Alshehri, F., & Muhammad, G. (2020). A comprehensive survey of the Internet of Things (IoT) and AI-based smart healthcare. IEEE Access, 9, 3660-3678.

Arvisais-Anhalt, S., Lau, M., Lehmann, C. U., Holmgren, A. J., Medford, R. J., Ramirez, C. M., & Chen, C. N. (2022). The 21st Century Cures Act and multiuser electronic health record access: potential pitfalls of information release. Journal of medical Internet research, 24(2), e34085.

Awotunde, J. B., Folorunso, S. O., Bhoi, A. K., Adebayo, P. O., & Ijaz, M. F. (2021). Disease diagnosis system for IoT-based wearable body sensors with machine learning algorithm. Hybrid artificial intelligence and IoT in healthcare, 201-222.

Balaji, G., Subashini, T., & Chidambaram, N. (2015). Automatic classification of cardiac views in echocardiogram using histogram and statistical features. Procedia Computer Science, 46, 1569-1576.

Balaji, G., Subashini, T., & Suresh, A. (2014). An efficient view classification of echocardiogram using morphological operations. Journal of Theoretical and Applied Information Technology, 67(3), 732-735.

Baumgartner, H., Hung, J., Bermejo, J., Chambers, J. B., Edvardsen, T., Goldstein, S., . . . Otto, C. M. (2017). Recommendations on the echocardiographic assessment of aortic valve stenosis: a focused update from the European Association of Cardiovascular Imaging and the American Society of Echocardiography. European Heart Journal-Cardiovascular Imaging, 18(3), 254-275.

Bersvendsen, J., Orderud, F., Lie, Ø., Massey, R. J., Fosså, K., Estépar, R. S. J., . . . Samset, E. (2017). Semiautomated biventricular segmentation in three-dimensional echocardiography by coupled deformable surfaces. Journal of Medical Imaging, 4(2), 024005-024005.

Cacciatore, S., Spadafora, L., Bernardi, M., Galli, M., Betti, M., Perone, F., . . . Landi, F. (2023). Management of coronary artery disease in older adults: recent advances and gaps in evidence. Journal of Clinical Medicine, 12(16), 5233.

Chen, D., Xuan, W., Gu, Y., Liu, F., Chen, J., Xia, S., . . . Luo, J. (2022). Automatic classification of normal–abnormal heart sounds using convolution neural network and long-short term memory. Electronics, 11(8), 1246.

Ciumărnean, L., Milaciu, M. V., Negrean, V., Orășan, O. H., Vesa, S. C., Sălăgean, O., . . . Vlaicu, S. I. (2021). Cardiovascular risk factors and physical activity for the prevention of cardiovascular diseases in the elderly. International journal of environmental research and public health, 19(1), 207.

Davis, A., Billick, K., Horton, K., Jankowski, M., Knoll, P., Marshall, J. E., . . . Adams, D. B. (2020). Artificial intelligence and echocardiography: a primer for cardiac sonographers. Journal of the American Society of Echocardiography, 33(9), 1061-1066.

Fatima, H., Mahmood, F., Sehgal, S., Belani, K., Sharkey, A., Chaudhary, O., . . . Khabbaz, K. R. (2020). Artificial intelligence for dynamic echocardiographic tricuspid valve analysis: a new tool in echocardiography. Journal of Cardiothoracic and Vascular Anesthesia, 34(10), 2703-2706.

Flora, G. D., & Nayak, M. K. (2019). A brief review of cardiovascular diseases, associated risk factors and current treatment regimes. Current pharmaceutical design, 25(38), 4063-4084.

Flores-Alonso, S. I., Tovar-Corona, B., & Luna-García, R. (2022). Deep learning algorithm for heart valve diseases assisted diagnosis. Applied Sciences, 12(8), 3780.

Gahungu, N., Trueick, R., Bhat, S., Sengupta, P. P., & Dwivedi, G. (2020). Current challenges and recent updates in artificial intelligence and echocardiography. Current Cardiovascular Imaging Reports, 13(2), 5.

Genovese, D., Rashedi, N., Weinert, L., Narang, A., Addetia, K., Patel, A. R., . . . Lang, R. M. (2019). Machine learning–based three-dimensional echocardiographic quantification of right ventricular size and function: validation against cardiac magnetic resonance. Journal of the American Society of Echocardiography, 32(8), 969-977.

Habuza, T., Navaz, A. N., Hashim, F., Alnajjar, F., Zaki, N., Serhani, M. A., & Statsenko, Y. (2021). AI applications in robotics, diagnostic image analysis and precision medicine: Current limitations, future trends, guidelines on CAD systems for medicine. Informatics in Medicine Unlocked, 24, 100596.

Holmes, J., Sacchi, L., & Bellazzi, R. (2004). Artificial intelligence in medicine. Ann R Coll Surg Engl, 86, 334-338.

Jadhav, U. M. (2018). Cardio-metabolic disease in India—the upcoming tsunami. Annals of translational medicine, 6(15).

Johnson, K. W., Torres Soto, J., Glicksberg, B. S., Shameer, K., Miotto, R., Ali, M., . . . Dudley, J. T. (2018). Artificial intelligence in cardiology. Journal of the American College of Cardiology, 71(23), 2668-2679.

Khan Mamun, M. M. R., & Elfouly, T. (2023). Detection of Cardiovascular Disease from Clinical Parameters Using a One-Dimensional Convolutional Neural Network. Bioengineering, 10(7), 796.

Kirkpatrick, J. N., Grimm, R., Johri, A. M., Kimura, B. J., Kort, S., Labovitz, A. J., . . . Thorson, K. (2020). Recommendations for echocardiography laboratories participating in cardiac point of care cardiac ultrasound (POCUS) and critical care echocardiography training: report from the American Society of Echocardiography. Journal of the American Society of Echocardiography, 33(4), 409-422. e404.

Krittanawong, C., Zhang, H., Wang, Z., Aydar, M., & Kitai, T. (2017). Artificial intelligence in precision cardiovascular medicine. Journal of the American College of Cardiology, 69(21), 2657-2664.

Kusunose, K., Abe, T., Haga, A., Fukuda, D., Yamada, H., Harada, M., & Sata, M. (2020). A deep learning approach for assessment of regional wall motion abnormality from echocardiographic images. Cardiovascular Imaging, 13(2_Part_1), 374-381.

Kusunose, K., Haga, A., Abe, T., & Sata, M. (2019). Utilization of artificial intelligence in echocardiography. Circulation Journal, 83(8), 1623-1629.

Levine, G. N., Cohen, B. E., Commodore-Mensah, Y., Fleury, J., Huffman, J. C., Khalid, U., . . . Spatz, E. S. (2021). Psychological health, well-being, and the mind-heart-body connection: a scientific statement from the American Heart Association. Circulation, 143(10), e763-e783.

Li, S., Li, F., Tang, S., & Luo, F. (2021). Heart sounds classification based on feature fusion using lightweight neural networks. IEEE Transactions on instrumentation and measurement, 70, 1-9.

Liu, T., Li, P., Liu, Y., Zhang, H., Li, Y., Jiao, Y., . . . Ren, M. (2021). Detection of coronary artery disease using multi-domain feature fusion of multi-channel heart sound signals. Entropy, 23(6), 642.

Lockhart, P. B., & Sun, Y. P. (2021). Diseases of the cardiovascular system. Burket’s Oral Medicine, 505-552.

Madani, A., Arnaout, R., Mofrad, M., & Arnaout, R. (2018). Fast and accurate view classification of echocardiograms using deep learning. NPJ digital medicine, 1(1), 6.

Meshref, H. (2019). Cardiovascular disease diagnosis: A machine learning interpretation approach. International Journal of Advanced Computer Science and Applications, 10(12).

Mohanty, M., Rath, P. S., & Mohapatra, A. G. (2024). IoMT-based Heart Rate Variability Analysis with Passive FBG Sensors for Improved Health Monitoring. International Journal of Computing and Digital Systems, 15(1), 1135-1147.

Nath, C., Albaghdadi, M. S., & Jonnalagadda, S. R. (2016). A natural language processing tool for large-scale data extraction from echocardiography reports. PLoS One, 11(4), e0153749.

O’driscoll, J. M., Hawkes, W., Beqiri, A., Mumith, A., Parker, A., Upton, R., . . . Sabharwal, N. (2022). Left ventricular assessment with artificial intelligence increases the diagnostic accuracy of stress echocardiography. European Heart Journal Open, 2(5), oeac059.

Organization, W. H. (2020). WHO reveals leading causes of death and disability worldwide: 2000-2019. World Health Organization (WHO), 1.

Pachiyannan, P., Alsulami, M., Alsadie, D., Saudagar, A. K. J., AlKhathami, M., & Poonia, R. C. (2024). A Novel Machine Learning-Based Prediction Method for Early Detection and Diagnosis of Congenital Heart Disease Using ECG Signal Processing. Technologies, 12(1), 4.

Pellikka, P. A. (2022). Artificially intelligent interpretation of stress echocardiography: the future is now. In (Vol. 15, pp. 728-730): American College of Cardiology Foundation Washington DC.

Prabhakaran, D., Anand, S., & Reddy, K. S. (2022). Public Health Approach to Cardiovascular Disease Prevention & Management. CRC Press.

Quazi, S. (2022). Artificial intelligence and machine learning in precision and genomic medicine. Medical Oncology, 39(8), 120.

Robinson, S. (2021). Cardiovascular disease. In Priorities for Health Promotion and Public Health (pp. 355-393). Routledge.

Saheera, S., & Krishnamurthy, P. (2020). Cardiovascular changes associated with hypertensive heart disease and aging. Cell transplantation, 29, 0963689720920830.

Samad, M. D., Ulloa, A., Wehner, G. J., Jing, L., Hartzel, D., Good, C. W., . . . Fornwalt, B. K. (2019). Predicting survival from large echocardiography and electronic health record datasets: optimization with machine learning. JACC: Cardiovascular Imaging, 12(4), 681-689.

Sarrafzadegan, N., & Mohammmadifard, N. (2019). Cardiovascular disease in Iran in the last 40 years: prevalence, mortality, morbidity, challenges and strategies for cardiovascular prevention. Archives of Iranian medicine, 22(4), 204-210.

Schuuring, M. J., Išgum, I., Cosyns, B., Chamuleau, S. A., & Bouma, B. J. (2021). Routine echocardiography and artificial intelligence solutions. Frontiers in Cardiovascular Medicine, 8, 648877.

Sethi, Y., Patel, N., Kaka, N., Desai, A., Kaiwan, O., Sheth, M., . . . Khandaker, M. U. (2022). Artificial intelligence in pediatric cardiology: a scoping review. Journal of Clinical Medicine, 11(23), 7072.

Shaffer, F., & Ginsberg, J. P. (2017). An overview of heart rate variability metrics and norms. Frontiers in public health, 5, 290215.

Shao, C., Wang, J., Tian, J., & Tang, Y.-d. (2020). Coronary artery disease: from mechanism to clinical practice. Coronary Artery Disease: Therapeutics and Drug Discovery, 1-36.

Shokouhmand, A., Yang, C., Aranoff, N. D., Driggin, E., Green, P., & Tavassolian, N. (2021). Mean pressure gradient prediction based on chest angular movements and heart rate variability parameters. 2021 43rd Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC),

Shuvo, S. B., Ali, S. N., Swapnil, S. I., Al-Rakhami, M. S., & Gumaei, A. (2021). CardioXNet: A novel lightweight deep learning framework for cardiovascular disease classification using heart sound recordings. IEEE Access, 9, 36955-36967.

Ulloa-Cerna, A. E., Jing, L., Pfeifer, J. M., Raghunath, S., Ruhl, J. A., Rocha, D. B., . . . Steinhubl, S. R. (2022). rECHOmmend: an ECG-based machine learning approach for identifying patients at increased risk of undiagnosed structural heart disease detectable by echocardiography. Circulation, 146(1), 36-47.

Vaduganathan, M., Mensah, G. A., Turco, J. V., Fuster, V., & Roth, G. A. (2022). The global burden of cardiovascular diseases and risk: a compass for future health. In (Vol. 80, pp. 2361-2371): American College of Cardiology Foundation Washington DC.

Volpato, V., Mor‐Avi, V., Narang, A., Prater, D., Goncalves, A., Tamborini, G., . . . Lang, R. M. (2019). Automated, machine learning‐based, 3D echocardiographic quantification of left ventricular mass. Echocardiography, 36(2), 312-319.

Wahlang, I., Maji, A. K., Saha, G., Chakrabarti, P., Jasinski, M., Leonowicz, Z., & Jasinska, E. (2021). Deep Learning methods for classification of certain abnormalities in Echocardiography. Electronics, 10(4), 495.

Wahlang, I., Saha, G., & Maji, A. K. (2020). A study on abnormalities detection techniques from echocardiogram. Advances in Electrical and Computer Technologies: Select Proceedings of ICAECT 2019,

Xiao, B., Xu, Y., Bi, X., Zhang, J., & Ma, X. (2020). Heart sounds classification using a novel 1-D convolutional neural network with extremely low parameter consumption. Neurocomputing, 392, 153-159.

Yan, Y., Zhang, J.-W., Zang, G.-Y., & Pu, J. (2019). The primary use of artificial intelligence in cardiovascular diseases: what kind of potential role does artificial intelligence play in future medicine? Journal of geriatric cardiology: JGC, 16(8), 585.

Yang, F., Chen, X., Lin, X., Chen, X., Wang, W., Liu, B., . . . Huang, D. (2022). Automated analysis of Doppler echocardiographic videos as a screening tool for valvular heart diseases. Cardiovascular Imaging, 15(4), 551-563.

Cite This Work

To export a reference to this article please select a referencing stye below:

SEARCH

WHY US?

Calculate Your Order




Standard price

$310

SAVE ON YOUR FIRST ORDER!

$263.5

YOU MAY ALSO LIKE

Pop-up Message