Share of Wallet — Huff Model

From algebraic theory to BigQuery execution — estimate the probability a customer travels to each store, and compute share of wallet by geography.

At its Core: Just Algebra

The Huff Model is not a “black box.” It calculates the probability that a customer at an origin (i) (like a home or zip code) will travel to a specific destination (j) (a store).

P_ij = A_j / D_ij^β ÷ Σ (A_k / D_ik^β)

The Variables

A_j (Attractiveness) Store size (sq ft) or sales volume. Bigger is better.

D_ij (Distance) The friction. Further is worse.

β (Beta / Decay) How lazy are the customers?

β = 1.5 They are willing to drive.

β = 2.5 They only go to the closest store.

Why SQL?
If you have the data (origins and destinations), you don’t need expensive proprietary tools. You can run the math directly in BigQuery, Snowflake, or Redshift.

The Workflow

Calibrate (BQML)

Don’t guess the Beta. Use logistic regression to find it. BQML effectively “self-corrects” for the difference between straight-line distance and drive time.

The Matrix (Cross Join)

Connect every zip code to every store using ST_DISTANCE. Optimize using ST_DWITHIN (e.g., limit to 50km).

Calculate Probability

Use window functions to sum the total utility per zip code.

Note on Distance: Native SQL uses straight-line distance. Because we use BQML to calibrate, the model learns the friction of straight lines automatically. No OSRM required.

Step 1: Auto-Calibration with BQML

Find the exact Beta (β) and Alpha (α) from your sales history.

CREATE OR REPLACE MODEL `project.dataset.huff_model`
OPTIONS(model_type='LOGISTIC_REG', input_label_cols=['label']) AS
SELECT
    IF(visited_store_id = store_id, 1, 0) as label,
    LN(distance_km) as log_distance, -- Input 1
    LN(square_footage) as log_size   -- Input 2
FROM `project.dataset.customer_store_matrix`
WHERE distance_km > 0;

Step 2: The Prediction Query

Apply the weights found above to predict future scenarios.

WITH od_matrix AS (
    SELECT
        o.zip_code, s.store_id, o.spend_potential,
        s.square_footage AS size,
        ST_DISTANCE(o.geo, s.geo)/1000 AS dist_km
    FROM zip_codes o
    CROSS JOIN stores s
    WHERE ST_DWITHIN(o.geo, s.geo, 50000) -- 50km Radius Optimization
),
utility_calc AS (
    SELECT *,
        -- Formula: Size^Alpha / Distance^Beta
        (POW(size, 1.15) / POW(GREATEST(dist_km, 0.1), 2.1)) AS utility
    FROM od_matrix
)
SELECT
    zip_code, store_id,
    utility / SUM(utility) OVER(PARTITION BY zip_code) AS market_share_pct
FROM utility_calc;

Interactive Huff Simulator

All values below are computed live using the Huff formula from Tab 1. Adjust stores, distances, and parameters to see market shares and revenue predictions update in real time.

Distance Note: This demo uses straight-line distances entered manually. In production with BigQuery, use ST_DISTANCE() for geodesic straight-line distance, or integrate with the Google Maps Distance Matrix API / OSRM for drive-time distances. When you calibrate β using BQML (see Tab 2), the model automatically compensates for the difference between straight-line and actual driving distance.

Model Parameters

β (Distance Decay) 2.0

α (Size Exponent) 1.0

Stores

Store	Sq Ft
Our Store (104)
Competitor A
Competitor B

Distance Matrix (km) & Spend Potential

Enter straight-line distance from each zip code to each store. All fields are editable.

Zip	Spend ($k)	→ Store 104	→ Comp A	→ Comp B
90210
90211
90212
90213
90214

Computed Results

Store 104 Revenue—

Comp A Revenue—

Comp B Revenue—

Store 104 Market Share

Zip Code Probability Breakdown

Zip	Spend ($k)	P(Store 104)	P(Comp A)	P(Comp B)	Rev → 104

Formula applied:
P_ij = (Size_j^α / Dist_ij^β) ÷ Σ_k(Size_k^α / Dist_ik^β) | Revenue_ij = P_ij × Spend_i