Remedial Action Recommendation with Machine Learning and Genetic Algorithm

Prescriptive analytic sits at the top of a three tier analytic pyramid. The bottom layers are descriptive and predictive analytic. Prescriptive analytic entails action recommendations based on the results of descriptive and predictive analytic, which if executed will have have positive business impact. As an illustrative example, after a machine learning has predicted that a customer is very likely to churn in the near future, the business might be interested in getting some remedial action recommendations which if implemented will prevent the churn.

In this post we will go through a solution for remedial action based on predictive Machine Learning (ML) and Genetic Algorithm (GA) , using loan approval as an example. Following the rejection of a loan application by the ML model, the bank may be interested in a set of remedial action recommendations for the applicant, so that the negative outcome can turned around to a positive one. The implementation is available in my OSS Github repo avenir.

Remedial Action with Counterfactual Analysis

The ML model makes predictions based on values of a set of features. Counterfactual analysis involves using alternative values of the features and evaluate the outcome i.e how the ML prediction changes for our use case. Essentially it’s a “what if” kind of analysis.

There are many possible candidate solutions that will result in a positive ML prediction. However, we are interested in an optimum set of new feature values, based on some definition of cost of change that will result in positive ML model prediction. Genetic Algorithm helps us finding the optimum feature values among many candidate set of of values

Pulling all these ideas together, here are the steps for remedial action recommendation using a case with negative ML model prediction.

Choose the free features that will be changed (e.g existing debt for a loan application)
For each feature variable that can change, define cost per unit change of the variable value in configuration
There is cost associated with the ML predicted probability also. The cost is highest at predicted probability of 0.5 and the cost decreases as the predicted probability goes up. This also defined in the configuration
Use GA to generate candidate solution.
For each candidate solution, calculate cost for changes made in free feature variables
Use ML model to make prediction. If the prediction is negative , reject the candidate solution, otherwise calculate cost based on the ML predicted probability.
Repeat the last 3 steps while keeping track of the best solution found so far

Essentially it’s an optimization problem as follows. We are generating candidate solution such that the cost of making changes in the free variables with respect to the baseline feature values is minimum, subject to the condition that the model predicted outcome probability for the candidate solution is greater than 0.5 and as high as possible.

Stand another way, we are trying to navigate from the current location in the feature space to a point to the other side of the class boundary with positive outcome. There are many such paths, however we are interested in a path that has the least cost.

Loan Application

The data set which is synthetically created has 14 feature variables. The feature variables used as free variables are marked with *. I have used 9 of them. Free variable selection can be made in the optimizer configuration file

Marital status
No of children
Education level
Whether self employed *
Income *
Years of experience *
No of years in current job *
Debt amount *
Loan amount *
Loan term
Credit score *
Bank account balance *
Retirement account balance *
No of prior mortgage loans

Solution

The solution has 2 main components. The machine learning model for predicting loan approval is on neural network with one hidden layer. The model is trained using a no code framework built on top of PyTorch. The training and validation process in driven by a configuration file.

The other component is heuristic optimizer based on Genetic Algorithm (GA). GA is nature inspired optimization algorithm that uses cross over and mutation as in natural evolution to generate candidate solutions. The implementation is heavily configuration driven. The configuration contain various parameters, including statistical distribution of the variable that are free . These distributions are sampled to create new candidate solution.

To use GA, the user has to implement some call back python code that will return cost given a candidate solution. For our use use al the cost related parameters (e.g cost per unit change of some variable) are defined in a JSON configuration file. The free feature variables are marked in this configuration

The tutorial document can be used to run this use case. Here are some sample output along with explanation..

field values for original and reccommended  with variable fields marked with *
field                         original            new
loan ID                       3K5FG92033          3K5FG92033
marital status                single              single
num of children               1                   1
education                     1                   1
self employed *               1                   0
income *                      43                  76
years of experience *         6.630               10.985
years in current job *        1.200               2.368
outstanding debt *            50                  33
loan amount *                 571                 418
loan term                     7                   7
credit score *                540                 623
saving *                      60                  63
retirement *                  41                  64
num of prior mortgae loans    0                   0

initial  model prediction 0.276

counterfactual model prediction 0.682

It shows the original and recommended changes side by side for the free variables. If the recommended changes are implemented and the loan application is made again the approval probability goes up from 0.276 to 0.682. All the recommended changes make sense intuitively.

Some of the free variable sections are not realistic e.g years of experience and number of years in the current job. it’s not realistic to ask some some to get 4 more tears of work experience and then reapply. I should have made those 2 feature variables fixed. They turn out to be bad choices for free variables.

The optimizer keeps track of the best solution found so far. However, with a configuration option set, it can keep track of all the best solutions, as the algorithm iterates through various solutions. Here are some other good solutions in decreasing order of solution cost. These solutions can be used as alternative recommendations for remedial action to get loan approved.

iter: 0039	cost: 4985.864 	soln: [0, 76, 11.67080531613541, 2.56877650019252, 18, 418, 623, 79, 87]
iter: 0040	cost: 4932.883 	soln: [0, 76, 11.67080531613541, 2.56877650019252, 20, 418, 623, 79, 81]
iter: 0043	cost: 4875.853 	soln: [0, 76, 11.67080531613541, 2.56877650019252, 30, 418, 623, 79, 81]
iter: 0055	cost: 4811.710 	soln: [1, 76, 11.67080531613541, 2.56877650019252, 22, 418, 623, 79, 63]
iter: 0057	cost: 4758.158 	soln: [0, 76, 11.67080531613541, 2.56877650019252, 30, 418, 623, 79, 64]
iter: 0063	cost: 4700.163 	soln: [0, 76, 11.67080531613541, 2.56877650019252, 30, 418, 623, 69, 64]
iter: 0076	cost: 4689.877 	soln: [1, 76, 11.67080531613541, 2.56877650019252, 29, 418, 623, 63, 64]
iter: 0079	cost: 4689.435 	soln: [1, 76, 11.67080531613541, 2.3683443853354804, 29, 418, 623, 73, 64]
iter: 0080	cost: 4671.069 	soln: [0, 76, 11.67080531613541, 2.56877650019252, 29, 418, 623, 63, 64]
iter: 0101	cost: 4670.474 	soln: [0, 76, 11.67080531613541, 2.3683443853354804, 29, 418, 623, 73, 64]
iter: 0107	cost: 4664.272 	soln: [1, 76, 11.67080531613541, 2.3683443853354804, 23, 418, 623, 63, 64]
iter: 0116	cost: 4657.994 	soln: [0, 76, 11.67080531613541, 2.3683443853354804, 21, 418, 623, 63, 64]
iter: 0119	cost: 4465.973 	soln: [0, 76, 10.984514065233952, 2.56877650019252, 29, 418, 623, 63, 64]
iter: 0124	cost: 4441.523 	soln: [0, 76, 10.984514065233952, 2.3683443853354804, 23, 418, 623, 63, 64]
iter: 0135	cost: 4424.480 	soln: [0, 76, 10.984514065233952, 2.3683443853354804, 26, 418, 623, 63, 64]
iter: 0149	cost: 4401.856 	soln: [0, 76, 10.984514065233952, 2.3683443853354804, 30, 418, 623, 63, 64]
iter: 0152	cost: 4385.542 	soln: [0, 76, 10.984514065233952, 2.3683443853354804, 33, 418, 623, 63, 64]

The loan applicant has to implement the changes recommended to increase the the likelihood of loan approval. However, there is no guarantee that the loan applicant would accept the top recommendation. That’s why the alternative recommendations are useful. it allows the applicant to choose from the alternative recommendation.

The cost configuration should be customized. Changing any variable e.g increasing income by 10K takes time and effort. The amount of effort is likely to vary from person to person. For example, someone may find it easier to save more money rather than increase income.

Lastly The GA algorithm is probabilistic. Every run will produce slightly different results. One option is make multiple runs and use the results from all the runs.

Deep Reinforcement Learning Solution

Constrained optimization is not the only solution for remediation problems. Deep reinforcement Learning (DRL) is another potential approach. Action and reward data could be generated to train a DRL model. Action will be set of changes to the free variables and reward will be inverse of the cost. Once a model is trained optimal action be found for any test case.

One disadvantage of the optimization algorithm based approach is it needs to be run for every case. But a trained a DRL model is reusable for many cases. Here is an example for DRL based remedial action for online training. However, this conspicuous advantage of DRL goes away if cost configuration is personalized. When cost configuration is personalized, we are essentially modifying the reward function in DRL under the hood

Wrapping Up

The problem addresses in this post is a post prediction problem of recommending remediation, falling within the realms of prescriptive analytic. Although prescriptive analytic is a nebulous term, subject to various interpretation. It generally implies some kind of recommendation and decision making process based on the prediction from the predictive ML model.

For many problems humans, prescriptive analytic is the natural next step after predictive analytic. We have used loan rejection as the use case. Some other example use cases for remedial action are preventing customer churn in retail and patient disease prediction turn around in healthcare.

Remedial Action Recommendation with Machine Learning and Genetic Algorithm