What is the compounding test for AI marketing use cases?

The compounding test asks three questions: Does the AI touch the real system (not just a chat window a human pastes into)? Does it remember context across time? Does it run repeatedly the same way and improve? Three yeses means the use case compounds into durable ROI; any no means it's theatre — impressive once, flat forever.

Where Gen-AI actually moves marketing ROI — and where it's theatre

Q: Does the choice of AI model matter for marketing ROI?

Less than most teams assume. Both MIT's and McKinsey's 2025 data land on the same point: the failures are almost never the model — they're the absence of workflow redesign, integration, and data readiness around it. Pick a capable model, then spend your real effort on the architecture.

The short version: Most AI marketing spend produces a great demo and no measurable return. The compounding test — does it touch the real system, remember across time, and run repeatedly while improving — separates AI leverage from AI theatre before you spend a dollar.

Most of what marketing teams buy under the banner of "AI" produces a great demo and no measurable return. That isn't cynicism — it's the base rate. MIT's Project NANDA studied more than 300 enterprise AI initiatives in 2025 and found 95% of generative-AI pilots delivered no measurable return. McKinsey's 2025 State of AI puts a finer point on it: only about 6% of organisations capture significant value from AI, and the factor that most separates them isn't the model they chose — it's whether they redesigned the workflow around it.

So the useful question isn't "is AI worth it for marketing." It plainly is — McKinsey finds marketing and sales are where AI revenue uplift shows up most often. The useful question is which AI work compounds into a number and which just looks busy. Having run an AI-native marketing operation for a while now, I've found the line between the two is sharp — and you can test for it before you spend a dollar.

The compounding test

Here's the test I run on any proposed AI use case. Three questions:

Does it touch the real system? Does the AI read and write in the actual stack — the analytics, the ad platform, the CRM — or does a human copy data out, paste it into a chat window, and copy the answer back? If a person is the integration layer, you're buying advice, not work.
Does it remember? Does the use case carry context across time — last month's decision, the brand rules, what already failed — or does it cold-start every session? If you re-brief it each time, the human is the memory, and the human is the bottleneck.
Does it run repeatedly, the same way, and improve? Is the workflow written down and versioned so it runs identically and gets better as lessons fold in — or is it a one-off prompt, re-invented slightly worse by whoever's on shift?

Three yeses and the use case compounds: it runs at near-zero marginal cost, improves with use, and the return accrues month over month. Any no and it's theatre: impressive once, flat forever, with a human quietly absorbing the work the demo implied was automated.

This isn't a maturity score for your whole operation — I wrote about that ladder separately. It's a cheaper, faster tool: a filter you run on each line item before it gets budget.

Where AI is leverage

Run the test across a marketing stack and the same zones pass every time. This is where the money is:

1. Reporting and insight. Wire a model to GA4, Google Search Console, Google Ads and Meta and it pulls the live numbers, finds the regression, and drafts the read — the same way each month, sharpening its eye as it goes. The highest-ROI, least-glamorous AI in marketing.

2. Always-on research and monitoring. Competitor moves, SERP shifts, brand mentions, category trends — even which AI engines actually cite you, and which sources they trust — watched on a schedule, written to a staging surface for review. It compounds because it never sleeps and never forgets what it saw last week.

3. Codified operating workflows. The audit, the campaign brief, the QA pass, the monthly report — each written once as a procedure the AI runs identically and improves over time. This is the lever McKinsey's data points straight at: workflow redesign, not model choice, is what most separates the value-capturers.

4. Data-grounded personalisation and segmentation. AI acting on real CRM and CDP data — HubSpot, Salesforce — to segment and tailor. Grounded in your data it moves conversion; ungrounded it's just more copy.

5. Institutional memory. The team's decisions, assets, and rationale made retrievable, so people and agents start warm instead of re-litigating what was settled in March.

Notice what these share: none of them are the thing AI demos well at. They're integration, memory, and process — the boring infrastructure. That's exactly why they compound, and exactly why they're under-bought.

Where it's theatre

The zones that fail the test are, not coincidentally, the ones that demo beautifully:

1. One-shot copy generation in a chat window. Ungrounded in your brand, your data, or last week's decisions; re-briefed every single time. It feels like leverage. It's a faster typist with amnesia — and the human stays the bottleneck.

2. Bolt-on chatbots with no system access. A widget that answers FAQs and touches nothing real. Fine as a feature; not a transformation, and not a number.

3. "AI strategy" with no implementation layer. The deck, the pilot, the proof-of-concept that never wires into anything. This is most of the 95% — it demos, it impresses a steering committee, and it dies without moving EBIT.

The tell is always the same: theatre keeps a human in the loop doing the real work while the AI takes credit for the demo.

Why smart teams keep buying theatre

If the line is this clear, why is most spend on the wrong side of it? Because theatre demos well and leverage doesn't. A copy generator produces a wow in the first ninety seconds of a sales call. A codified reporting workflow wired to your GA4 produces a wow in month three, on a spreadsheet, to nobody's applause. Procurement optimises for the demo; the compounding work loses every bake-off it was never invited to.

The vendors know this. So the market over-produces the theatre — easier to sell — and under-produces the infrastructure, which is harder to sell and harder to build. The gap between what demos well and what compounds is, right now, one of the cleanest arbitrage opportunities in marketing operations.

What this means if you lead a marketing team

Three moves.

Run the test before the budget. For every AI line item, ask the three questions. If it doesn't touch the real system, remember, and run repeatedly, you're buying a demo — redirect the money to something that compounds.

Fund the boring middle. The durable ROI is in reporting, monitoring, codified workflows, and grounded data — the parts that don't photograph well. McKinsey's high performers aren't the ones with the best model; they're the ones who rebuilt the workflow. That's a budgeting decision, not a technology one.

Make vendors prove compounding, not output. Ask anyone selling you AI to show where it touches your stack, what it remembers, and how it runs next month without a human re-driving it. If they can only show you a great one-shot output, you've found the theatre. I publish how I run mine — the integrations, the workflows, the numbers as they're real — because in this category, receipts are the only spec that matters. (Real ROI figures from my own operation to follow as the case studies land — measured, not modelled.)

Questions marketing leaders ask

Is generative AI actually worth it for marketing, or is it hype? It's worth it where the use case compounds and hype where it doesn't. McKinsey finds marketing and sales are where AI revenue uplift shows up most — but only about 6% of organisations capture significant value, because most buy ungrounded, un-integrated, one-shot tools. The technology is real; most of the spend is aimed at the theatre half of it.

How do I tell a real AI use case from a demo before I commit budget? Run the compounding test: does it touch the real system, does it remember context across time, and does it run repeatedly the same way and improve? Three yeses is leverage; any no is theatre. You can answer all three in a vendor call, before a dollar moves.

What's the single highest-ROI place to start with AI in marketing? Reporting and analysis wired to your live data — GA4, Search Console, the ad platforms. It's unglamorous, it runs every month, it compounds, and it teaches the organisation what "AI with hands" actually feels like, which de-risks everything after it.

Why do our AI pilots stall after the demo? Because the demo was theatre — a one-shot output with a human doing the real work underneath. Pilots that compound are built on integration, memory, and a codified workflow; pilots that stall skipped those and bought the impressive output instead. The fix is architectural, not a better model.

Does the choice of AI model matter for ROI? Less than you'd think. Both MIT's and McKinsey's 2025 data land on the same point: the failures are almost never the model — they're the absence of workflow redesign, integration, and data readiness around it. Pick a capable model, then spend your real effort on the architecture.

I run an AI-native marketing operation and write about what actually moves the number — in the open, with receipts. If you're sorting leverage from theatre in your own stack, that's the conversation I'm here for.

Sources

MIT Project NANDA, 2025 study of enterprise generative-AI initiatives (reported by Fortune, 18/08/2025) — 95% of generative-AI pilots delivered no measurable return.
McKinsey, "The State of AI in 2025" (QuantumBlack, 2025) — marketing/sales revenue uplift, ~6% high performers, and workflow redesign as the primary EBIT lever.