Machine learning for death prediction in heart failure cases

This example evaluates the survival outcome of patients who experienced heart failure. We used clinical data from 299 patients, including variables extracted from blood tests.

This data is obtained by Institutional Review Board of Government College University, Faisalabad-Pakistan, available at Plos One Repository.

Cardiovascular diseases (CVDs) are the number one cause of death globally, taking an estimated 17.9 million lives each year, which accounts for 31% of all deaths worlwide. Heart failure is a common event caused by CVDs and this dataset contains 12 features that can be used to predict mortality by heart failure.

Most cardiovascular diseases can be prevented by addressing behavioural risk factors such as tobacco use, unhealthy diet and obesity, physical inactivity and harmful use of alcohol using population-wide strategies.

People with cardiovascular disease or who are at high cardiovascular risk (due to the presence of one or more risk factors such as hypertension, diabetes, hyperlipidaemia or already established disease) need early detection and management wherein a machine learning model can be of great help

1. Application type

The predicted variable can have two values, "1" if the event to be measured has occurred and the patient is dead, or "0" otherwise. Therefore, this is a binary classification project.

The goal is to model the status of the patient (dead or alive). This approach is based on clinical data including variables extracted from blood tests using artificial intelligence and machine learning.

2. Data set

The heart_failure.csv file contains the data for this example. Target variables can only have two values in a classification model: 0 (false, alive) or 1 (true, deceased) depending on the occurrence of the event. The number of patients (rows) in the data set is 299, and the number of variables (columns) is 12.

The number of input variables, or attributes for each sample, is 11. The target variable is 1, death_event (1 or 0) wether or not the patient has died or survived. The following list summarizes the variables information:

age: Age of the patient
anaemia: Low count of red blood cells or hemoglobin.
creatinine_phosphokinase: Level of the CPK enzyme in the blood.
diabetes: If the patient has diabetes.
ejection_fraction: Percentage of blood leaving.
high_blood_pressure: If a patient has hypertension.
platelets: Platelets in the blood.
serum_creatinine: Level of creatinine in the blood.
serum_sodium: Level of sodium in the blood.
sex: Woman or man.
smoking: If the patient smokes.
death_event: If the patient died during the follow-up period.

To start, we use all instances. Each instance contains the input and target variables of a different patient. The data set is divided into training, validation, and testing subsets. Neural Designer automatically assigns 60% of the instances for training, 20% for selection, and 20% for testing. The user can choose to modify these values to the desired ones.

Also, we can calculate the distributions for all variables. The following figure is a pie chart showing which patients are dead (1) or alive (0) in the data set.

The image shows that dead patients represent 32.107% of the samples, while live patients represent 67.893%.

The inputs-targets correlations might indicate to us which factors have the most univariate influence on whether or not a patient will live.

Here, the most correlated variables with survival status are serum_creatinine, age, serum_sodium and ejection_fraction.

3. Neural network

The next step is to set a neural network to represent the classification function. For this class of applications, the neural network is composed of:

The scaling layer contains the statistics on the inputs calculated from the data file and the method for scaling the input variables. Here the minimum-maximum method has been set. As we use 11 input variables, the scaling layer has 11 inputs.

We won't use a perceptron layer to stabilize and simplify our model.

The probabilistic layer only contains the method for interpreting the outputs as probabilities. Moreover, as the output layer's activation function is the logistic, that output can already be interpreted as a probability of class membership. The probabilistic layer has 11 inputs. It has one output, representing the probability of a patient being dead or alive.

The following figure is a graphical representation of this neural network.

4. Training strategy

The fourth step is to set the training strategy, which is composed of two terms:

A loss index.
An optimization algorithm.

The loss index is the weighted squared error with L2 regularization which is the default loss index for binary classification applications.

We can state the learning problem as finding a neural network that minimizes the loss index. That is, a neural network that fits the data set (error term) and does not oscillate (regularization term).

The optimization algorithm that we use is the quasi-Newton method which is also the standard optimization algorithm for this type of problem.

The following chart shows how the error decreases with the iterations during the training process. The final training and selection errors are training error = 0.632681 WSE and selection error = 0.899055 WSE, respectively.

As we can see in the previous image, the curves have converged, although the selection error is greater than the training error, so we could try to continue improving the model to further reduce the errors.

5. Model selection

The objective of model selection is to find the network architecture which minimizes the error, that is, with the best generalization properties for the selected instances of the data set.

Order selection algorithms train several network architectures with different number of neurons and select that with the smallest selection error. We have removed our perceptron layer to stabilize our model, so we cannot use this feature.

However, we are going to use input selection to select features in the data set that provide the best generalization capabilities.

In the following image, we see that we can reduce the training/selection error using this method.

At the end we obtain a training error = 0.6934 WSE and selection error = 0.8234 WSE, respectively. Also, we have reduced the number of inputs to only 5 features. Our network is now like this:

6. Testing analysis

The objective of the testing analysis is to validate the performance of the generalization properties of the trained neural network. To validate a classification technique, we need to compare the values provided by this technique to the observed values. We can use the ROC curve as it is the standard testing method for binary classification projects.

A random classifier has an area under a curve of 0.5, while a perfect classifier has a value of 1. The closer this value is to 1, the better the classifier. In this example, this parameter is AUC = 0.80317, which means a good performance.

The following table contains the elements of the confusion matrix. This matrix contains the true positives, false positives, false negatives, and true negatives for the variable diagnosis.

The binary classification tests are parameters for measuring the performance of a classification problem with two classes:

Classification accuracy (ratio of instances correctly classified): 0.762712
Error rate (ratio of instances misclassified): 0.237288
Specificity (ratio of real positive which are predicted positive): 0.805556
Sensitivity (ratio of real negative which are predicted negative): 0.695652

7. Model deployment

Once we have tested the neural network's generalization performance, we can save it for future use in the so-called model deployment mode.

The mathematical expression represented by the neural network is written below.

				scaled_age = (age-60.83390045)/11.89480019;
				scaled_ejection_fraction = (ejection_fraction-38.08359909)/11.83479977;
				scaled_high_blood_pressure = high_blood_pressure*(1+1)/(1-(0))-0*(1+1)/(1-0)-1;
				scaled_serum_creatinine = (serum_creatinine-1.39388001)/1.034510016;
				scaled_serum_sodium = (serum_sodium-136.625)/4.412479877;

				probabilistic_layer_combinations_0 = -0.0438098 +0.506346*scaled_age -0.862699*scaled_ejection_fraction +0.321015*scaled_high_blood_pressure +1.73723*scaled_serum_creatinine -0.35006*scaled_serum_sodium 
					
				DEATH_EVENT = 1.0/(1.0 + exp(-probabilistic_layer_combinations_0);

References:

The data for this problem has been collected by Institutional Review Board of Government College University, Faisalabad-Pakistan, available at Plos One Repository.

Prediction of death from heart failure

Machine learning for death prediction in heart failure cases

Contents:

1. Application type

2. Data set

3. Neural network

4. Training strategy

5. Model selection

6. Testing analysis

7. Model deployment

References:

Recommend

Using Percona Kubernetes Operators With K3s Part 1: Installation

Postgres: a better message queue than Kafka?

aigo U358闪存U盘：高速读写，便捷备份_原创_新浪众测

What is multi-factor authentication, and how should I be using it?

Twitter’s letting you combine photos, videos, and GIFs in one tweet

GoPro HERO 11 Black just released last month, sees first discount to $450 (Save...

Are We Getting the Real Stuff in Popular Science?

Stocks moving in after hours: Costco, Ford

The Boy Who Didn’t Care That I Was a Girl

Create a design system teams love to use

About Joyk