When you run experiments, clicks and impressions only tell part of the story. They don’t show true buyer intent or how reliable your results are.
Heatseeker goes deeper. We measure real buyer behavior—and tell you what’s working, how strongly it’s working, and how much you can trust it.
Our four core metrics are:
Score — how much buyers are engaging
Precision — how consistent the data is
Uplift — how much better or worse something performs
Confidence — how sure you can be that the difference is real
Let's walk through each one—with examples.
1. Buyer Engagement Score (“Score”)
Score captures how much meaningful engagement your test variants are driving.
It rolls up actions like leads, form opens, clicks, and social interactions into a single number, normalized per 1,000 impressions.
Not all actions are equal—actions that show stronger buying intent are weighted more heavily.
Example:
A Score of 21 would mean 21 clicks per 1,000 impressions if clicks were the only action. But usually it’s a mix: form opens, leads, clicks—all weighted by intent.
💡 A higher Score means stronger buyer interest.
2. Uplift
Uplift shows how much better or worse a variant is compared to the baseline.
Positive Uplift = outperforming
Negative Uplift = underperforming
Baseline is by default the average Score across all variants. You can set a different baseline for your experiment (dropdown at the top right of your results screen).
Example:
A +63% Uplift means the variant is performing 63% better than the baseline.
💡 Uplift tells you the size of the difference between the score and the baseline.
3. Precision
Precision tells you how consistent your results are.
High Precision = The data is stable across impressions.
Low Precision = The data is noisy and could change with more volume.
Precision Level | What it means |
Very High | Use this result confidently for decision-making. |
High | Use this result confidently for decision-making. |
Moderate | Usable but could be improved. Consider additional validation for higher certainty. |
Low | Too much variability. Results are not actionable; collect more data or refine the test. |
Example:
A Score of 21 with Moderate Precision means engagement looks good, but more impressions would make the result more reliable.
💡 Precision tells you how stable your Score and Uplift numbers are.
4. Confidence
Confidence tells you whether the difference between a variant and the baseline is real—or just random chance.
High Confidence = Very likely real
Low Confidence = Could just be noise
Confidence Level | Meaning |
Very High (95%+) | Almost certain the uplift is real. |
High (85–95%) | Likely real. |
Medium (75–85%) | Directional. Use with caution. |
Low (50–75%) | Could easily be random. |
Example:
A Confidence of 94% means there’s a very strong chance the uplift you’re seeing is real.
💡 Confidence tells you if the variant really beats the baseline. It doesn’t say by how much—that’s what Uplift (combined with precision) is for.
Precision vs Confidence
| Precision | Confidence |
What it tells you | How consistent the results are across impressions. | How likely it is that one variant truly outperforms another. |
If low... | Results might swing if you collect more data. | The difference might not be real. |
In plain English | "Is my data stable?" | "Is my winner really a winner?" |
Simple rule of thumb:
Precision = Trust the quality of the data.
Confidence = Trust the reality of the winner.
Real Example: Reading a Result
Let’s put it all together with this real result:
Message Variant: "Same-day or next-day delivery guaranteed."
Metric | Result | Interpretation |
Score | 21 | Solid engagement. |
Precision | Moderate | Some noise; extend test if critical. |
Uplift | +63% | Variant strongly outperforms baseline. |
Confidence | High (94%) | Very likely a real winner. |
Bottom Line:
This message is outperforming and resonating with buyers. It’s a strong result—but because Precision is only Moderate, you might choose to gather a bit more data before scaling fully.
Metrics Applied to Segments, Too
Heatseeker applies Score, Precision, Uplift, and Confidence not just to variants—but to audience segments too:
By region
By age group
By job title
By custom combinations
No matter where you drill down, you get the same consistent, comparable insights.
Questions?
Send us a note in Intercom—we’re happy to help interpret your results or plan your next experiment.