The gold standard for addressing non-random selection bias — explicitly modeling the decision to join as a separate process from spending behavior.
Heckman Two-Step Correction
The Heckman correction (or “heckit”) models why customers choose to join, then uses that model to correct the spending analysis. This isolates the true causal effect of program membership.
1 Selection Equation
A probit model that includes the entire population (both members and non-members) to determine who is likely to choose to join the rewards program.
- Predicts probability of joining based on observable characteristics
- Variables include demographics, initial order type, location
- Captures the “selection mechanism” that creates bias
- Must include at least one variable that affects joining but not spending directly
2 Outcome Equation
An Ordinary Least Squares (OLS) model that measures actual spending outcomes, corrected for selection bias.
- Models spending as a function of membership and other factors
- Includes the Inverse Mills Ratio from Step 1
- Isolates the true causal effect of program membership
- Provides unbiased estimates of program lift
3 Inverse Mills Ratio
The Inverse Mills Ratio (λ) is the key innovation — a correction factor that captures the bias from non-random selection.
- Calculated from the selection equation’s predicted probabilities
- Added as a regressor in the outcome equation
- Absorbs the correlation between joining tendency and spending
- If λ is statistically significant, selection bias exists
λ = φ(Zγ) / Φ(Zγ)
Where φ = standard normal PDF, Φ = standard normal CDF, Z = selection variables
Where φ = standard normal PDF, Φ = standard normal CDF, Z = selection variables
Heckman Two-Step Process
STEP 1
Probit Model
EXTRACT
Mills Ratio (λ)
STEP 2
OLS + λ
RESULT
True Lift