Simulating A/B Test with Counterfactual and Machine Learning Regression Model

Performing A/B test is costly. It takes time and resource.AB testing is a way trying multiple versions of something to find out which works best based on some metric. it’s also called Randomized Controlled Test (RCT). There are many application where A/B testing is prevalent e.g web site design, marketing campaign, drugs trials. Performing A/B testing is time consuming and costly.

In this post, we will go through a simulation based alternative to A/B test using counterfactual and a machine learning regression model. The use case is for a targeted marketing campaign. We will simulate different marketing campaign targets to discover which one generates maximum revenue. The ML regression model is neural, implemented with a no code PyTorch framework, which is available as a Python package.

Causal Inference with Counterfactuals and A/B tests

A/B test involves deciding what to test and the metric to measure performance. There could be proliferation of tests, depending upon the complexity of the problem. Consider a website design. There are numerous attributes of a web site. Testing each one could be time consuming and not feasible. This where the simulation helps.

Counterfactual analysis is a causal inference technique. Sample generation for a counterfactual requires A/B test. That’s the connection between counterfactual and A/B test. It enables evaluation of the effect of intervention of some variables on outcomes. The counterfactual measures what would have happened to beneficiaries in the presence of the intervention, called the treatment group. The impact is estimated by comparing outcomes to those outcomes observed under non intervention, called the treatment group.

Counterfactuals are expressed with do calculus as P(X | do(y)). It expresses the distribution of X when Y is constrained to have a value of y with intervention. This is the distribution that would be observed if there is intervention in the data generating process by artificially forcing the variable Y to take value y

This is different from normal conditional distribution P(X | Y) which is based on the observational data. Normally we are interested in the average effect as E(X | do(y)).

One example for our target marketing campaign use case would be E(TA | do (IG=3)), where TA is transaction amount spent following the marketing campaign and IG is income group. We read this as average transaction amount following campaign where the marketing campaign target is customers in the income group of 3.

Generally observational samples are not available under this kind of interventional constraints, unless you perform A/B test for each such case to generate samples. Sometimes it’s not even feasible to perform A/B test. Suppose you want to find the effect of smoking on some segment of patients. You can not force those people to smoke to perform your A/B test.

Counterfactual with Machine Learning

There is a solution with machine learning regression models for causal inference with counterfactuals. You can use such regression models to simulate for any A/B test you want. The solution is follows. First you have to generate samples. In our case it will be done using marketing campaign where campaign targets are randomly selected among the customers. it’s ideal to have a 50-50 split between targets and non targets

Build a ML regression model. For our use case we used a NN model with one hidden layer
For any counterfactual, constrain the corresponding column to the intervened value e.g income group = 3
Predict target variable using the regression model e.g transaction amount in our case
Taker average of the predictions which is essentially E(X | do(y)).
Repeat for any other counterfactual. Each counterfactual will correspond to some targeted marketing campaign for our use case.

There is one potential issue here. Since we are artificially constraining some feature fields, the resulting data, on which predictions will be made may be out of distribution wrt the training data. However, since are taking average of the predictions, it will have a mitigating effect on any error from the predictions.

With ML regression model based simulation, you can avoid costly A/B tests. The simulation is the only option for problems where A/B test is not feasible.

Marketing Campaign

The marketing campaign planner may have ideas about various targeted campaigns. But without knowledge about returns from the campaigns, it’s difficult to choose the right campaign. As alluded to earlier running A/B test for each such campaign is time consuming and cost prohibitory. Simulation with a ML model circumvents these problems. The data set we have used is synthetically generated and it consists to the following fields.

Income group
Average transaction amount in last one year
No of transactions in last one year
Day of the week for a transaction
Whether marketing campaign email was sent prior to the transaction
Transaction amount in the first transaction following campaign (target variable)

The data is synthetically generated with a technique as described in my earlier post. You might wonder about income group. Generally such data is not available. However some inference can be made from zip code, which is available from the customer data. Some of the features are based on past engagement behavior of customers.

Machine Learning Model

A neural network with a hidden layer is used for the regression model. The model is trained with a no code framework based on PyTorch. All the no code DL framework requires is a configuration file. Here is the the configuration file for this use case, with annotated explanation for each parameter. The framework is available as a Python package torvik.

Note:
When any property value is _, it implies default value. To find the default value please look up the constructor code in tnn.py

common.mode=training
This is not used. Please ignore

common.model.directory=./model/tnn
Model save directory path

common.model.file=countfl.mod
Saved model file name

common.preprocessing=scale
Any pre processing to be performed e.g scale

common.scaling.method=_
Scaling method e.g minmax, zscale

common.scaling.minrows=_
minimum of no of rows for scaling

common.verbose=True
Verbosity level

common.device =_
Device  e.g cpu, gpu

train.data.file=countfl_tr.txt
Training data file path

train.data.fields=1:6
Index of fileds to be used from the training data file

train.data.feature.fields=0:4
Index of columns for features

train.data.out.fields=5
Index of output column

train.layer.data=3:relu:false:false:0.5,1:none:false:false:-1.0
Nerual architecture description for each layer. description of each later is separated by coma.Attributes of a given layer are separated by semi colon. The layer attributes are 1)no of units 2)activation function 3)batch normalization flag 4) whether batch normalization should be done after activation 5)drop out probability

train.input.size=_
Input size

train.output.size=1
Output size

train.output.clabels=_
Output class labels for classification

train.batch.size=32
batch size

train.loss.reduction=_
Loss reduction. Please look up PyTorch documentation for the options

train.opt.learning.rate=.005
Learning rate

train.opt.weight.decay=_
Optimizer weight decay. Please look up PyTorch documentation for details

train.opt.momentum=_
Optimizer momentum. Please look up PyTorch documentation for details

train.opt.eps=_
For adam optimizer term added to the denominator to improve numerical stability. Please look up PyTorch documentation for details

train.opt.dampening=_
Dampening for momentum. Please look up PyTorch documentation for details


train.opt.momentum.nesterov=_
Nesterov momentum. Please look up PyTorch documentation for details

train.opt.betas=_
For adam optimizer, coefficients used for computing running averages of gradient and its square.Please look up PyTorch documentation for details

train.opt.alpha=_
For RMPprop optimizer, smoothing constant. Please look up PyTorch documentation for details

train.num.iterations=100
num of epochs for training

train.optimizer=_
Optimizer type. Options are sgd, adam and rmsprop

train.lossFn=mse
Loss function. Options are ltwo, mse, ce, lone, mae, bce, bcel, sm, mlsm and triplet

train.model.save=True
If True, trained model is saved

train.track.error=True
Tracks error if True

train.epoch.intv=10
Epoch interval for tracking error

train.batch.intv=5
Batch interval for tracking error

train.print.weights=_
Prints weights if True

valid.data.file=countfl_va.txt
Validation data file path

valid.accuracy.metric=mse
Acciracy metric for validation

predict.data.file=countfl_va.txt
Validation data file path

predict.use.saved.model=True
If True uses saved model for training

predict.output=_
Prediction output type for classification. Options are prob, class

predict.feat.pad.size=50
prediction output formatting related

Results

I ran various counterfactual cases to get the average transaction amount in the first transaction following the campaign. There is also base line case, where the campaign is targeted randomly to some customers aka control group. This will help us to estimate the relative lift from the targeted content. The results are as follows

control group
non intervened av xaction amount 119.71

income group
intervened value 1.0	av xaction amount 117.10
intervened value 2.0	av xaction amount 123.71
intervened value 3.0	av xaction amount 126.07

num of past transactions
intervened value 1.0	av xaction amount 122.47
intervened value 2.0	av xaction amount 126.12
intervened value 3.0	av xaction amount 130.24
intervened value 4.0	av xaction amount 134.83

As you can see there are several targeted campaigns where the return is higher than the base line (control group) case. The constraints indicating the customer segments are applied to the variables income group and no of past transactions in some time window.

Strictly speaking, the the average transaction amount should be multiplied by the number of customers in the particular segment to get total transaction amount as a true value of the return.

Currently, the driver code supports only one constraint or intervened variable. I will modify the driver code to support multiple, for example income group = 2 and no of past transactions = 3. Please refer to the tutorial for more details on the execution of this use case.

Wrapping Up

We went through a technique for simulating A/B tests with counterfactuals and an ML regression model. The simulation can replace the actual A/B tests which are time consuming and costly to run. The technique could be used for any kind of A/B tests for optimization.

Simulating A/B Test with Counterfactual and Machine Learning Regression Model