Remedial Action Recommendation with Machine Learning and Genetic Algorithm
source link: https://pkghosh.wordpress.com/2022/01/26/remedial-action-recommendation-with-machine-learning-and-genetic-algorithm/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
Remedial Action Recommendation with Machine Learning and Genetic Algorithm
Prescriptive analytic sits at the top of a three tier analytic pyramid. The bottom layers are descriptive and predictive analytic. Prescriptive analytic entails action recommendations based on the results of descriptive and predictive analytic, which if executed will have have positive business impact. As an illustrative example, after a machine learning has predicted that a customer is very likely to churn in the near future, the business might be interested in getting some remedial action recommendations which if implemented will prevent the churn.
In this post we will go through a solution for remedial action based on predictive Machine Learning (ML) and Genetic Algorithm (GA) , using loan approval as an example. Following the rejection of a loan application by the ML model, the bank may be interested in a set of remedial action recommendations for the applicant, so that the negative outcome can turned around to a positive one. The implementation is available in my OSS Github repo avenir.
Remedial Action with Counterfactual Analysis
The ML model makes predictions based on values of a set of features. Counterfactual analysis involves using alternative values of the features and evaluate the outcome i.e how the ML prediction changes for our use case. Essentially it’s a “what if” kind of analysis.
There are many possible candidate solutions that will result in a positive ML prediction. However, we are interested in an optimum set of new feature values, based on some definition of cost of change that will result in positive ML model prediction. Genetic Algorithm helps us finding the optimum feature values among many candidate set of of values
Pulling all these ideas together, here are the steps for remedial action recommendation using a case with negative ML model prediction.
- Choose the free features that will be changed (e.g existing debt for a loan application)
- For each feature variable that can change, define cost per unit change of the variable value in configuration
- There is cost associated with the ML predicted probability also. The cost is highest at predicted probability of 0.5 and the cost decreases as the predicted probability goes up. This also defined in the configuration
- Use GA to generate candidate solution.
- For each candidate solution, calculate cost for changes made in free feature variables
- Use ML model to make prediction. If the prediction is negative , reject the candidate solution, otherwise calculate cost based on the ML predicted probability.
- Repeat the last 3 steps while keeping track of the best solution found so far
Essentially it’s an optimization problem as follows. We are generating candidate solution such that the cost of making changes in the free variables with respect to the baseline feature values is minimum, subject to the condition that the model predicted outcome probability for the candidate solution is greater than 0.5 and as high as possible.
Stand another way, we are trying to navigate from the current location in the feature space to a point to the other side of the class boundary with positive outcome. There are many such paths, however we are interested in a path that has the least cost.
Loan Application
The data set which is synthetically created has 14 feature variables. The feature variables used as free variables are marked with *. I have used 9 of them. Free variable selection can be made in the optimizer configuration file
- Marital status
- No of children
- Education level
- Whether self employed *
- Income *
- Years of experience *
- No of years in current job *
- Debt amount *
- Loan amount *
- Loan term
- Credit score *
- Bank account balance *
- Retirement account balance *
- No of prior mortgage loans
Solution
The solution has 2 main components. The machine learning model for predicting loan approval is on neural network with one hidden layer. The model is trained using a no code framework built on top of PyTorch. The training and validation process in driven by a configuration file.
The other component is heuristic optimizer based on Genetic Algorithm (GA). GA is nature inspired optimization algorithm that uses cross over and mutation as in natural evolution to generate candidate solutions. The implementation is heavily configuration driven. The configuration contain various parameters, including statistical distribution of the variable that are free . These distributions are sampled to create new candidate solution.
To use GA, the user has to implement some call back python code that will return cost given a candidate solution. For our use use al the cost related parameters (e.g cost per unit change of some variable) are defined in a JSON configuration file. The free feature variables are marked in this configuration
The tutorial document can be used to run this use case. Here are some sample output along with explanation..
field values for original and reccommended with variable fields marked with * field original new loan ID 3K5FG92033 3K5FG92033 marital status single single num of children 1 1 education 1 1 self employed * 1 0 income * 43 76 years of experience * 6.630 10.985 years in current job * 1.200 2.368 outstanding debt * 50 33 loan amount * 571 418 loan term 7 7 credit score * 540 623 saving * 60 63 retirement * 41 64 num of prior mortgae loans 0 0 initial model prediction 0.276 counterfactual model prediction 0.682
It shows the original and recommended changes side by side for the free variables. If the recommended changes are implemented and the loan application is made again the approval probability goes up from 0.276 to 0.682. All the recommended changes make sense intuitively.
Some of the free variable sections are not realistic e.g years of experience and number of years in the current job. it’s not realistic to ask some some to get 4 more tears of work experience and then reapply. I should have made those 2 feature variables fixed. They turn out to be bad choices for free variables.
The optimizer keeps track of the best solution found so far. However, with a configuration option set, it can keep track of all the best solutions, as the algorithm iterates through various solutions. Here are some other good solutions in decreasing order of solution cost. These solutions can be used as alternative recommendations for remedial action to get loan approved.
iter: 0039 cost: 4985.864 soln: [0, 76, 11.67080531613541, 2.56877650019252, 18, 418, 623, 79, 87] iter: 0040 cost: 4932.883 soln: [0, 76, 11.67080531613541, 2.56877650019252, 20, 418, 623, 79, 81] iter: 0043 cost: 4875.853 soln: [0, 76, 11.67080531613541, 2.56877650019252, 30, 418, 623, 79, 81] iter: 0055 cost: 4811.710 soln: [1, 76, 11.67080531613541, 2.56877650019252, 22, 418, 623, 79, 63] iter: 0057 cost: 4758.158 soln: [0, 76, 11.67080531613541, 2.56877650019252, 30, 418, 623, 79, 64] iter: 0063 cost: 4700.163 soln: [0, 76, 11.67080531613541, 2.56877650019252, 30, 418, 623, 69, 64] iter: 0076 cost: 4689.877 soln: [1, 76, 11.67080531613541, 2.56877650019252, 29, 418, 623, 63, 64] iter: 0079 cost: 4689.435 soln: [1, 76, 11.67080531613541, 2.3683443853354804, 29, 418, 623, 73, 64] iter: 0080 cost: 4671.069 soln: [0, 76, 11.67080531613541, 2.56877650019252, 29, 418, 623, 63, 64] iter: 0101 cost: 4670.474 soln: [0, 76, 11.67080531613541, 2.3683443853354804, 29, 418, 623, 73, 64] iter: 0107 cost: 4664.272 soln: [1, 76, 11.67080531613541, 2.3683443853354804, 23, 418, 623, 63, 64] iter: 0116 cost: 4657.994 soln: [0, 76, 11.67080531613541, 2.3683443853354804, 21, 418, 623, 63, 64] iter: 0119 cost: 4465.973 soln: [0, 76, 10.984514065233952, 2.56877650019252, 29, 418, 623, 63, 64] iter: 0124 cost: 4441.523 soln: [0, 76, 10.984514065233952, 2.3683443853354804, 23, 418, 623, 63, 64] iter: 0135 cost: 4424.480 soln: [0, 76, 10.984514065233952, 2.3683443853354804, 26, 418, 623, 63, 64] iter: 0149 cost: 4401.856 soln: [0, 76, 10.984514065233952, 2.3683443853354804, 30, 418, 623, 63, 64] iter: 0152 cost: 4385.542 soln: [0, 76, 10.984514065233952, 2.3683443853354804, 33, 418, 623, 63, 64]
The loan applicant has to implement the changes recommended to increase the the likelihood of loan approval. However, there is no guarantee that the loan applicant would accept the top recommendation. That’s why the alternative recommendations are useful. it allows the applicant to choose from the alternative recommendation.
The cost configuration should be customized. Changing any variable e.g increasing income by 10K takes time and effort. The amount of effort is likely to vary from person to person. For example, someone may find it easier to save more money rather than increase income.
Lastly The GA algorithm is probabilistic. Every run will produce slightly different results. One option is make multiple runs and use the results from all the runs.
Deep Reinforcement Learning Solution
Constrained optimization is not the only solution for remediation problems. Deep reinforcement Learning (DRL) is another potential approach. Action and reward data could be generated to train a DRL model. Action will be set of changes to the free variables and reward will be inverse of the cost. Once a model is trained optimal action be found for any test case.
One disadvantage of the optimization algorithm based approach is it needs to be run for every case. But a trained a DRL model is reusable for many cases. Here is an example for DRL based remedial action for online training. However, this conspicuous advantage of DRL goes away if cost configuration is personalized. When cost configuration is personalized, we are essentially modifying the reward function in DRL under the hood
Wrapping Up
The problem addresses in this post is a post prediction problem of recommending remediation, falling within the realms of prescriptive analytic. Although prescriptive analytic is a nebulous term, subject to various interpretation. It generally implies some kind of recommendation and decision making process based on the prediction from the predictive ML model.
For many problems humans, prescriptive analytic is the natural next step after predictive analytic. We have used loan rejection as the use case. Some other example use cases for remedial action are preventing customer churn in retail and patient disease prediction turn around in healthcare.
Recommend
About Joyk
Aggregate valuable and interesting links.
Joyk means Joy of geeK