Introduction¶
Evaluating causal inference methods in a scientifically thorough way is a cumbersome and error-prone task. To foster good scientific practice JustCause provides a framework to easily:
- evaluate your method using common data sets like IHDP, IBM ACIC, and others;
- create synthetic data sets with a generic but standardized approach;
- benchmark your method against several baseline and state-of-the-art methods.
Our cause is to develop a framework that allows you to compare methods for causal inference in a fair and just way. JustCause is a work in progress and new contributors are always welcome.
The reasons for creating a library like JustCause are laid out in the thesis
A Systematic Review of Machine Learning Estimators for Causal Effects
of Maximilian Franz. Therein, it is shown that many publications about causality:
- lack reproducibility,
- use different versions of the seemingly same data set,
- fail to state that some theoretical conditions in the data set are not met,
- miss several state of the art methods in their comparison.
A more standardised approach, as offered by JustCause, is able to improve these points.
Installation¶
Install JustCause with:
pip install justcause
but consider using conda to set up an isolated environment beforehand. This can be done with:
conda env create -f environment.yaml
conda activate justcause
with the following environment.yaml
.
Quickstart¶
For a minimal example we are going to load the IHDP (Infant Health and Development Program) data set, do a train/test split, apply a basic learner on each replication and display some metrics:
>>> from justcause.data.sets import load_ihdp
>>> from justcause.learners import SLearner
>>> from justcause.learners.propensity import estimate_propensities
>>> from justcause.metrics import pehe_score, mean_absolute
>>> from justcause.evaluation import calc_scores
>>> from sklearn.model_selection import train_test_split
>>> from sklearn.linear_model import LinearRegression
>>> import pandas as pd
>>> replications = load_ihdp(select_rep=[0, 1, 2])
>>> slearner = SLearner(LinearRegression())
>>> metrics = [pehe_score, mean_absolute]
>>> scores = []
>>> for rep in replications:
>>> train, test = train_test_split(rep, train_size=0.8)
>>> p = estimate_propensities(train.np.X, train.np.t)
>>> slearner.fit(train.np.X, train.np.t, train.np.y, weights=1/p)
>>> pred_ite = slearner.predict_ite(test.np.X, test.np.t, test.np.y)
>>> scores.append(calc_scores(test.np.ite, pred_ite, metrics))
>>> pd.DataFrame(scores)
pehe_score mean_absolute
0 0.998388 0.149710
1 0.790441 0.119423
2 0.894113 0.151275