MUD\WTR · Internal
press enter
Creative Analytics · Northbeam

If an ad has great engagement,
does it earn a better CAC?

The question every creative review should answer: when an ad scores well on a cheap upper-funnel signal — thumbstop, hold rate, CTR, cost-per-engagement — does that same ad go on to acquire customers cheaply, earn more cash, and get scaled? We grouped every ad with $1k+ spend over 14 days and measured the answer per metric.

Window Jun 13 – Jun 26, 2026 (14d) Comparable ads Meta video, ≥$1k Attribution Northbeam · Clicks-only · per-ad CAC Spend
The Verdict

Yes — better engagement travels with lower CAC. CTR is the strongest signal; cost-per-engagement is the weakest.

Grouping ads into best / middle / worst thirds by each metric (spend-weighted), the best-engagement third consistently posts a 24–43% lower CAC than the worst third — and earns far more cash. CTR is the cleanest predictor (best third $195 CAC vs worst $341, a statistically significant relationship). Thumbstop and hold rate point the right way but are noisier per-ad. Cost-per-engagement is the least reliable — use it last.

Why grouping, not dots: a single ad's 14-day CAC is noisy (few conversions, outliers), so an ad-by-ad scatter shows almost nothing. Bucketing into thirds and weighting by spend cancels that noise and reveals the real signal — which is why the headline lives in the bars, not the dots.

At a Glance
Predictor scorecard

Ranked by how reliably the metric separates a good CAC from a bad one. "CAC gap" = best-third CAC vs worst-third CAC (more negative = stronger predictor).

Engagement metricBetter whenBest⅓ CACWorst⅓ CACCAC gapRank corr (ρ)p-valueVerdict
Metric by Metric
Does each one actually predict CAC?

For each metric: ads split into best / middle / worst thirds, then the spend-weighted CAC, cash, and ROAS of each group. If the metric predicts performance, the green (best) bar should sit well below the red (worst) bar.

The Evidence
Top 20 ads by spend

Every engagement metric and outcome for the 20 highest-spend ads. Note how the high-CTR, high-thumbstop video ads cluster at the lower CACs — and how Google Search ads (no video metrics) are a separate species entirely.

AdTypeSpendThumbHoldCTRCost/EngNCCashCAC
So What
How to use this in creative reviews

01 Treat CTR as your #1 leading indicator.

It separates good CAC from bad CAC more cleanly than any other engagement metric here — and you can read it days before CAC stabilizes. A high-CTR new ad is your best early bet to scale.

02 Use thumbstop & hold rate as supporting signals.

They point the right way (better hook/hold → lower CAC) but are noisy on any single ad. Good for diagnosing why a creative underperforms, not as a solo scale/kill trigger.

03 Don't lean on cost-per-engagement.

It showed the weakest and least consistent link to CAC. Cheap likes ≠ cheap customers. Deprioritize it in scorecards.

04 The algo already concentrates spend on high-engagement ads.

The best-engagement third also absorbed the most budget — Meta is voting the same way. That's confirmation, but also a reason to read these as associations, not proof of pure cause.

Read This Before Quoting It
Caveats & method
  • Per-ad CAC, not blended. Each ad's CAC is Northbeam's clicks-only 1st-time CAC for that specific ad — exactly the per-ad performance you asked for. Blended CAC doesn't decompose to the ad level, so absolute values are directional.
  • Grouped beats individual here. 14-day single-ad CAC is noisy, so we report spend-weighted CAC per third. The ad-level rank correlation (Spearman ρ) backs the bar story; the linear (Pearson) fit is weaker because of outliers.
  • Reverse causation with spend. High-engagement ads get scaled, and scaled ads have more stable (often better) CAC. Engagement, spend, and CAC all partly reflect "this is just a good ad."
  • ROAS is short-window & understated. These are first-order attributed ROAS on a subscription model — LTV isn't captured, so all values read <1. Compare buckets relative to each other, not against a 1.0 bar. CAC is the truer KPI.
  • Video-only comparable set. Thumbstop/hold require video, so search & Amazon ads are excluded from the predictor tests (they're in the top-20 table for context).