Data Driven Causal Relationship Discovery with Python Example Code

You may find two variables A and B strongly correlated, but how do you know whether A causes B or B causes A. Irrespective of the causal direction, causality will be manifested as correlation. Discovering causal relationship is important for many problems. However, unlike correlation it’s not so easy to discover causality. In this post we will go through a technique called Additive Noise Method. We will use product sale cannibalization as an example i.e whether introduction of new product is causing plummeting sale of another existing competing product. The example python code can be found in my open source project avenir in GitHub.

Causality.

You must have heard the adage “correlation is not causality”. Correlation is a manifestation of causation and not causation itself. Having the knowledge of correlation only does not help discovering possible causal relationship. The correlation between two variables X and Y could be present because of the following reasons

X causes Y
Y causes X
No direct causal relationship between X and Y but a known or unknown confounding variable is causing X and Y, resulting in X and Y being correlated.
Combination of any of those above

When X causes Y, the following axioms are true. P(Y|do(x)) is an interventional distribution.

P(Y) != P(Y | x) for some x
P(Y) != P(Y | do(x)) for some x
P(Y | do(x)) = P(Y | x) for some x
P(Y | do(x1)) != P(Y | do(x2))
P(X) is independent of P(Y|X)

Typically causal relationships are discovered through controlled experiments. However it’s not always feasible to conduct these experiments. They could be expensive or unethical. It’s always desirable, albeit challenging to extract causal relationship from observational data.

There are many techniques for finding causal relationships from observation data. They require data for all other variables involved in causal relationships with the two variables of interests. The algorithms are based on the premise that if X causes Y then the total complexity of P(X) and P(Y|X) is smaller than the total complexity of P(Y) and P(X|Y).

Additive Noise Model

The Additive Noise Model (ANM) is a bivariate model i.e the model is based on the two variables of interest. It discovers only unidirectional causal relationship assuming there is no bi directional causality and there is no confounding variable driven causality.

The ANM is based on the assumption that the effect is a linear function of the cause plus a gaussian noise. It turns out these assumptions are not limiting because our goal is to find out if X causes Y or Y causes X and the decision is based on relative values of a static. The algorithm steps are as follows. For details please refer to the paper cited above.

Regress Y on X
Find the residue R1 i.e difference between actual Y and predicted Y for all X values
Calculate dependency measure C(X,R1)
Reverse the roles of X and Y and calculate C(Y,R2) where R2 is difference between actual X and predicted X for all Y values
if C(X,R1) is less than C(Y,R2) then X causes Y otherwise Y causes X

The core argument of ANM is that if Y is expressed as a linear function of X and the residue of the regression is independent of X then X causes Y. The dilemma is deciding on an appropriate threshold for the measure of independence between X and the residue. The algorithm circumvents this issue reversing the roles of X and Y and then comparing the two independence measures

There are different metrics for dependency measure as follows.Please refer to the paper for details. in the example in the next section have used entropy based metric

Hilbert-Schmidt Independence Criterion (HSIC)
Entropy based score
Gaussian score
Empirical Bayes score

There are various algorithms for calculating entropy from observational data directly. I have used something called one space technique in the example below.

Competitive Product Sale

In the example we have 2 products. With increasing sale of the first product there is decreasing sale for the second product. The hypothesis is that the increasing sale for the first product is causing decreasing sale for the second product. Applying ANM model we will find out if the hypothesis is correct. Here is the result

./causal.py disc
** forward case
total entropy 2.560416
** reverse case
total entropy 2.767325

In the forward case the second product sale is regressed on the first product sale and vice versa for the reverse case. According to ANM, the first product sale is causing the decreasing sale for the second product, since dependency metric value is lower.

The python code is in my Github repo. The script generates sales data for 2 products and then executes the ANM logic. For regression etc I have used a python wrapper class that wraps various python libraries and provides more than 70 data analysis and exploration functions, all in one place.

Causality and Machine Learning Models

There is an interesting link between causality and machine learning models. If all the features of the machine learning model are causes and the outcome is the effect, then the machine learning model is causal. It’s always desirable to have a causal model, because it’s on a better foundation, being closer to the underlying physical process.

One way to test if a machine learning model is causal we can test if the prediction error is independent of the feature values.

Wrapping Up

Although there are various techniques discovering techniques they require computer causal graph and data for all variables involved. Additive noise model is a simple technique for bivariate causal analysis which requires only observational data for the 2 variables of interest.

Data Driven Causal Relationship Discovery with Python Example Code | Mawazo