Market Basket Analysis

When customers buy product X, what else do they buy? Market basket analysis turns transaction data into actionable cross-sell and assortment insights using simple ratios.

Four Simple Ratios

Market basket analysis answers one question: given that a customer bought X, what else did they buy? The entire method rests on four ratios computed from transaction-level data. No machine learning, no black boxes—just counting and dividing.

Support

Support = (Baskets containing X & Y) ÷ (Total baskets)

Support tells you how common the pair is. A high-support pair shows up in many transactions. But support alone can be misleading—two popular items will naturally co-occur often, even with no real affinity.

Support Support(X → Y) = P(X ∩ Y) = count(X and Y) / count(all baskets)

Attached Rate (Confidence)

Attached Rate = (Baskets containing X & Y) ÷ (Baskets containing X)

Given someone bought X, what fraction also bought Y? This is the conditional probability P(Y|X). A 40% attached rate means 4 out of every 10 buyers of X also pick up Y. In retail, this is the metric that drives cross-sell placement decisions.

Attached Rate (Confidence) Confidence(X → Y) = P(Y | X) = count(X and Y) / count(X)

Affinity (Lift)

Affinity = Attached Rate ÷ P(Y)

Affinity (lift) controls for Y’s overall popularity. If Y appears in 50% of all baskets, a 50% attached rate is meaningless—you’d expect it by chance. Affinity > 1 means the pair co-occurs more than chance. Affinity < 1 means they co-occur less than chance (substitutes or repellers).

Affinity (Lift) Lift(X → Y) = P(Y | X) / P(Y) = Support(X,Y) / [P(X) × P(Y)]

Cherry Pick Rate

Cherry Pick Rate = (Baskets containing Y but not X) ÷ (Baskets containing Y)

Of all buyers of Y, how many bought Y without X? A high cherry pick rate means Y has strong standalone demand and does not depend on X for traffic. A low cherry pick rate suggests Y is primarily an add-on to X.

Cherry Pick Rate Cherry Pick(Y given X) = count(Y without X) / count(Y) = 1 − P(X | Y)

Worked Example

Imagine 100 baskets. Product X appears in 40 baskets. Product Y appears in 30 baskets. Both X and Y appear together in 20 baskets.

Metric	Calculation	Result
Support	20 / 100	0.20 (20%)
Attached Rate	20 / 40	0.50 (50%)
P(Y)	30 / 100	0.30 (30%)
Affinity (Lift)	0.50 / 0.30	1.67×
Cherry Pick	(30 − 20) / 30	0.33 (33%)

Interpretation

Affinity of 1.67 means buyers of X are 67% more likely to also buy Y than a random customer. The attached rate of 50% means half of X-buyers also grab Y. The cherry pick rate of 33% means a third of Y-buyers get Y without X—Y has moderate standalone demand.

1 Upload Transaction Data

Upload a CSV with at least these columns: Transaction ID, Product, and optionally Date and Customer ID. Each row is one product in one transaction. The tool auto-detects column names.

Expected Format

Minimum columns: one for the transaction/basket ID and one for the product name. Column names are flexible—the tool looks for keywords like “transaction”, “order”, “basket”, “product”, “item”, “sku”. If columns can’t be detected, you’ll see a warning.

2 Configure Analysis

Start Date

End Date

Antecedent (X)

Min Support

3 Results

Top Pairs by Affinity

Same-Customer Second Analysis

The main tool looks at co-occurrence within the same basket. But sometimes you want a different question: among customers who bought X in one time period, what did they buy in a different time period?

This is useful for sequential cross-sell: if someone buys a camera in January, do they come back for a lens in February? For this analysis your CSV needs a Customer ID column and a Date column.

1 Define the Anchor Period

Pick the time window where customers bought the anchor product (X). Only customers with X in this window are included.

Anchor Start

Anchor End

Anchor Product

2 Define the Follow-Up Period

What did those same customers buy during this second window?

Follow-Up Start

Follow-Up End

3 Follow-Up Results

How to Read the Results

The results table gives you four numbers for every product Y paired with your anchor X. Here is what each one means and how to act on it.

Affinity (Lift)

> 1.0

Positive association. Buyers of X are more likely to buy Y than average. Good cross-sell candidate.

= 1.0

No association. X and Y co-occur at chance level. No actionable relationship.

< 1.0

Negative association. X and Y repel each other (substitutes). Bundling them would likely fail.

Attached Rate (Confidence)

The raw probability that a buyer of X also buys Y. Use this to size the opportunity. An affinity of 3.0 sounds exciting, but if the attached rate is only 2%, the absolute volume of cross-sell is tiny. Look at both together:

The Sweet Spot

High affinity + high attached rate = strong, high-volume cross-sell pair. These are your top priorities for placement, bundling, and recommendations.

High affinity + low attached rate = strong association but niche. May be a hidden gem for targeted segments but not a mass play.

Low affinity + high attached rate = both products are popular independently. Co-occurrence is expected, not driven by genuine affinity.

Support

The share of all baskets containing both X and Y. Support acts as a frequency filter. Very low-support pairs (e.g. 0.1%) may have extreme affinity scores simply because they appeared together by accident in a tiny number of baskets. Set a minimum support threshold (we default to 1%) to filter out noise.

Cherry Pick Rate

Of all buyers of Y, what fraction bought Y without X? This tells you about Y’s standalone strength:

High

Y has strong independent demand. Customers seek it out regardless of X. Safe standalone item.

Low

Y is primarily bought alongside X. It depends on X for traffic. Risky to promote alone.

The Bubble Chart

When available, the bubble chart plots each product Y on two axes:

X-axis: Attached Rate (how often Y appears alongside X).
Y-axis: Affinity (Lift) — how much more than chance.
Bubble size: Support (how frequent the pair is overall).

Top-right, large bubbles are your strongest cross-sell opportunities. Bottom-left, small bubbles are noise.

Caveats

Correlation ≠ Causation. Market basket analysis finds co-occurrence patterns, not causal relationships. Two products may co-occur because of a third factor (seasonal demand, promotions, store layout). Always validate with domain knowledge before making merchandising changes.

Simpson’s Paradox. Aggregate patterns can reverse when you segment by store, region, or customer type. If results seem surprising, drill down.