Skip to main content

Understanding the Heatseeker Metrics

Spot Winning Variants and Segments with Confidence

Razii Abraham avatar
Written by Razii Abraham
Updated over 2 months ago

When you run experiments, clicks and impressions only tell part of the story. They don’t show true buyer intent or how reliable your results are.

Heatseeker goes deeper. We measure real buyer behavior—and tell you what’s working, how strongly it’s working, and how much you can trust it.

Our four core metrics are:

  • Score — how much buyers are engaging

  • Precision — how consistent the data is

  • Uplift — how much better or worse something performs

  • Confidence — how sure you can be that the difference is real

Let's walk through each one—with examples.

1. Buyer Engagement Score (“Score”)

Score captures how much meaningful engagement your test variants are driving.
It rolls up actions like leads, form opens, clicks, and social interactions into a single number, normalized per 1,000 impressions.

Not all actions are equal—actions that show stronger buying intent are weighted more heavily.

Example:
A Score of 21 would mean 21 clicks per 1,000 impressions if clicks were the only action. But usually it’s a mix: form opens, leads, clicks—all weighted by intent.

💡 A higher Score means stronger buyer interest.

2. Uplift

Uplift shows how much better or worse a variant is compared to the baseline.

  • Positive Uplift = outperforming

  • Negative Uplift = underperforming

Baseline is by default the average Score across all variants. You can set a different baseline for your experiment (dropdown at the top right of your results screen).

Example:
A +63% Uplift means the variant is performing 63% better than the baseline.

💡 Uplift tells you the size of the difference between the score and the baseline.

3. Precision

Precision tells you how consistent your results are.

  • High Precision = The data is stable across impressions.

  • Low Precision = The data is noisy and could change with more volume.

Precision Level

What it means

Very High

Use this result confidently for decision-making.

High

Use this result confidently for decision-making.

Moderate

Usable but could be improved. Consider additional validation for higher certainty.

Low

Too much variability. Results are not actionable; collect more data or refine the test.

Example:
A Score of 21 with Moderate Precision means engagement looks good, but more impressions would make the result more reliable.

💡 Precision tells you how stable your Score and Uplift numbers are.

4. Confidence

Confidence tells you whether the difference between a variant and the baseline is real—or just random chance.

  • High Confidence = Very likely real

  • Low Confidence = Could just be noise

Confidence Level

Meaning

Very High (95%+)

Almost certain the uplift is real.

High (85–95%)

Likely real.

Medium (75–85%)

Directional. Use with caution.

Low (50–75%)

Could easily be random.

Example:
A Confidence of 94% means there’s a very strong chance the uplift you’re seeing is real.

💡 Confidence tells you if the variant really beats the baseline. It doesn’t say by how much—that’s what Uplift (combined with precision) is for.

Precision vs Confidence

Precision

Confidence

What it tells you

How consistent the results are across impressions.

How likely it is that one variant truly outperforms another.

If low...

Results might swing if you collect more data.

The difference might not be real.

In plain English

"Is my data stable?"

"Is my winner really a winner?"

Simple rule of thumb:

  • Precision = Trust the quality of the data.

  • Confidence = Trust the reality of the winner.

Real Example: Reading a Result

Let’s put it all together with this real result:

Message Variant: "Same-day or next-day delivery guaranteed."

Metric

Result

Interpretation

Score

21

Solid engagement.

Precision

Moderate

Some noise; extend test if critical.

Uplift

+63%

Variant strongly outperforms baseline.

Confidence

High (94%)

Very likely a real winner.

Bottom Line:
This message is outperforming and resonating with buyers. It’s a strong result—but because Precision is only Moderate, you might choose to gather a bit more data before scaling fully.

Metrics Applied to Segments, Too

Heatseeker applies Score, Precision, Uplift, and Confidence not just to variants—but to audience segments too:

  • By region

  • By age group

  • By job title

  • By custom combinations

No matter where you drill down, you get the same consistent, comparable insights.

Questions?

Send us a note in Intercom—we’re happy to help interpret your results or plan your next experiment.

Did this answer your question?