Evaluating Causal Inference models. How to check for common support and covariate balance using Box Plots and Love Plots.

A propensity score is the estimated probability that a unit receives the treatment, given its observed covariates. In observational studies where randomization is not possible, propensity scores let us approximate a randomized experiment by balancing the distribution of confounders between the treated and control groups.

Once scores are estimated (usually via logistic regression or gradient-boosted trees), they enable several adjustment strategies: matching pairs similar treated and control units, stratification bins units into score quintiles, weighting uses inverse probability weights, and regression adjustment includes the score as a covariate.

Propensity Score
e(X) = P(T = 1 | X)
Interpretation: A propensity score of 0.7 means the model estimates a 70% chance that this unit would receive treatment, based on its covariates. Two units with similar scores are considered comparable regardless of their actual treatment assignment — which is the foundation for valid causal comparisons.

Common support (overlap) means every unit has a non-trivial chance of appearing in either the treated or control group. When the propensity-score distributions barely overlap, matching is unreliable because many treated units have no comparable controls.

Treated Control 0.0 0.2 0.4 0.6 0.8 1.0 Propensity Score

A Love Plot shows the standardized mean difference (SMD) for each covariate before and after matching. The goal is to shrink every dot inside the ±0.1 threshold, indicating adequate balance.

Unadjusted Adjusted
Goal: After matching, every covariate's SMD should fall within the ±0.1 threshold (the dashed vertical lines). Covariates outside that band suggest residual imbalance that could bias causal estimates. The teal dots should cluster tightly around zero.