MachineX: Heart Diseases detection using Machine Learning

Reading Time: 4 minutes

In this blog, we will be going to see how we can use machine learning and data science to detect or to predict potential Heart Diseases.

Introduction

Heart disease describes a range of conditions that affect your heart. Diseases under the heart disease umbrella include blood vessel diseases, such as coronary artery disease, heart rhythm problems (arrhythmias) and heart defects you’re born with (congenital heart defects), among others.

The term “heart disease” is often used interchangeably with the term “cardiovascular disease”. This refers to conditions that involve narrowed or blocked blood vessels that can lead to a heart attack, chest pain (angina) or stroke.

Other heart conditions, such as those that affect your heart’s muscle, valves or rhythm, also are considered forms of heart disease.

Machine Learning is used across many spheres around the world. The healthcare industry is no exception. Machine Learning can play an essential role in predicting the presence/absence of Locomotor disorders, Heart diseases and more. Such information, if predicted well in advance, can provide important insights to doctors who can then adapt their diagnosis and treatment per-patient basis.

Problem

Detecting heart diseases is one of the major problems facing doctors and healthcare specialists nowadays.

Preventing heart disease is important. Good data-driven systems for predicting heart disease can improve the entire research and prevention process, making sure that more people can live healthy lives.

In the United States, the Centers for Disease Control and Prevention is a good resource for information about heart disease. According to their website:

About 610,000 people die of heart disease in the United States every year–that’s 1 in every 4 deaths.
Heart disease is the leading cause of death for both men and women. More than half of the deaths due to heart disease in 2009 were in men.
Coronary heart disease (CHD) is the most common type of heart disease, killing over 370,000 people annually.
Every year about 735,000 Americans have a heart attack. Of these, 525,000 are a first heart attack and 210,000 happen in people who have already had a heart attack.
Heart disease is the leading cause of death for people of most ethnicities in the United States. For American Indians or Alaska Natives and Asians or Pacific Islanders, heart disease is second only to cancer.

I am neither a doctor nor a healthcare expert. This type of data analysis and machine learning solutions can be beneficial as a piece of second advice to doctors.

Data Description

The dataset has been taken from Kaggle.

there are a total of 13 features and 1 target variable. Also, there are no missing values so we don’t need to take care of any null values.

This is how my dataset is looking like:

Solution

I took 4 algorithms and varied their various parameters and compared the final models. I split the dataset into 67% training data and 33% testing data.

The project involved an analysis of the heart disease patient dataset with proper data processing. Then, I trained four models and tested them with maximum scores as follows:

K Neighbors Classifier: 87%

The classification score varies based on the different values of neighbors that we choose. Thus, I’ll plot a score graph for different values of K (neighbors) and check when do I achieve the best score.

From the plot above, it is clear that the maximum score achieved was 0.87 for the 8 neighbors.

Support Vector Classifier: 83%

There are several kernels for the Support Vector Classifier. I’d test some of them and check which has the best score.

The linear kernel performed the best, being slightly better than rbf kernel.

Decision Tree Classifier: 79%

Here, I’d use the Decision Tree Classifier to model the problem at hand. I’d vary between a set of max_features and see which returns the best accuracy.

The model achieved the best accuracy at three values of maximum features, 2, 4 and 18.

Random Forest Classifier: 84%

Now, I’d use the ensemble method, Random Forest Classifier, to create the model and vary the number of estimators to see their effect.

The maximum score is achieved when the total estimators are 100 or 500.

K Neighbors Classifier scored the best score of 87% with 8 neighbors.

Happy learning

Follow MachineX for more”

Introduction

Problem

Data Description

Solution

K Neighbors Classifier: 87%

Support Vector Classifier: 83%

Decision Tree Classifier: 79%

Random Forest Classifier: 84%

K Neighbors Classifier scored the best score of 87% with 8 neighbors.

Recommend

Scientists Found a New Jurassic Reptile and It Looks Like a Pokémon

苹果 iPhone 13 系列 4 款新机今晚预售，购买前你需要注意这些

You Don’t Have to Be Afraid of Public Transit

Is Becky Chambers the Ultimate Hope for Science Fiction?

全球汽车美容巨头MAFRA投资1000万欧元开启品牌的中国战略

Best Dehydrators in 2021

Knoldus is proud to be named as a Leader on Clutch 1000!

十三香吗？iPhone 13 系列发布，高刷屏、雷达激光、影像升级都来了

I/O 2015：Google Photos良心，Google Jump黑科技，可惜某墙尚在

让人人能体验“头号玩家” ，这家公司要做的是VR设备和系统

About Joyk