Creating strong hypotheses

Introduction

Most experiments fail not because the idea was wrong, but because the experiment was poorly designed. Companies change something, conversion improves, they assume the change worked. But maybe it was seasonality, or a successful PR campaign, or a competitor raising prices. Without proper controls, you don't know what caused the change.

This chapter shows you how to write testable hypotheses (predicting outcome and mechanism), design experiments with proper controls (isolate variables), set success criteria before running tests (define "winning"), and choose appropriate test structures (A/B, multivariate, holdout groups).

Top picks

VWO

Rating

From

€

393

per month

VWO provides A/B testing, personalisation, and behaviour analytics to optimise website conversion rates through data-driven experimentation.

Explore tool

My review

Hotjar

Rating

From

€

per month

Hotjar captures user behaviour through heatmaps, session recordings, and feedback polls to understand how visitors use your website.

Explore tool

My review

Microsoft Clarity

Rating

From

€

per month

Microsoft Clarity provides free session recordings, heatmaps, and user behaviour analytics without traffic limits or time restrictions.

Explore tool

My review

Notion

Rating

From

€

per month

Flexible workspace for docs, wikis, and lightweight databases ideal when you need custom systems without heavy project management overhead.

Explore tool

My review

What makes a good hypothesis

A testable hypothesis has three components: a specific change, a predicted outcome, and a reasoning for why you expect that outcome.

The format I use is: "If we [specific change], then [metric] will [direction of change] because [reasoning]."

For example: "If we move the pricing table above the fold on the landing page, then demo requests will increase because session recordings show visitors scrolling past the CTA without seeing our pricing."

Each part matters. The specific change tells you what to build. The predicted outcome tells you what to measure. The reasoning tells you what you'll learn regardless of whether the test wins or loses.

A hypothesis without reasoning is just a guess. If the test wins, you don't know why. If it loses, you don't know what was wrong with your thinking. The reasoning is what turns an experiment into learning.

Being specific enough

Vague hypotheses produce vague learnings. "If we improve the landing page, conversions will increase" tells you nothing. Improve how? Increase by how much? Why would that change behaviour?

The dog training analogy is useful here. When training dogs to detect drugs at airports, handlers once made the mistake of using rubber gloves to handle the training materials. The dogs learned to detect the smell of rubber gloves, not drugs. A small, unconscious detail threw off the entire training because the handlers weren't specific enough about what they were actually training for.

The same thing happens in A/B testing. You test a new headline and a new button colour and a new image all at once. The test wins. What did you learn? You have no idea which change mattered, or whether they all mattered, or whether they cancelled each other out and some fourth factor drove the result.

Be specific about what you're changing and why. One change per test where possible. If you must test multiple changes together, at least document what you're bundling and acknowledge that you won't know which element drove the result.

Connecting hypotheses to metrics

Every hypothesis should connect to a metric you actually care about. This sounds obvious, but it's easy to optimise for intermediate metrics that don't translate to revenue.

You might hypothesise that a new email subject line will increase open rates. That's fine as far as it goes. But if open rates go up and click rates stay flat, did you actually improve anything? The metric that matters is further down the funnel.

When writing hypotheses, think through the chain. If this test wins, what happens next? Does that lead to revenue? If the connection is indirect or uncertain, you might be optimising for the wrong thing.

This doesn't mean you can only test bottom-of-funnel metrics. But it means you should be explicit about the assumptions linking your test metric to revenue. "If we increase email open rates, more people will see our offer, which should increase demo requests" makes the chain visible. You can then check whether the chain actually holds.

Common mistakes

Don't stop tests too early or run them too long. Calculate required duration and sample size before starting.

Sample size calculation: Use a sample size calculator (many free online). Input: baseline conversion rate (current performance), minimum detectable effect (smallest improvement you care about), statistical power (typically 80%), significance level (typically 95%). Output: required sample size per variant. Example: baseline 4% conversion, want to detect 10% lift (4% → 4.4%), need ~15,000 visitors per variant (30,000 total). If your page gets 2,000 visitors/month, the test will take 15 months. Not feasible. Either test a higher-traffic page or test a larger effect size.

Test duration calculation: Minimum 2 weeks to account for day-of-week variation (B2B traffic patterns weekly). Minimum through 1 full business cycle (if you're B2B with monthly sales cycles, run test through full month). Maximum 8 weeks (after 8 weeks, external factors change too much to attribute results cleanly). If you can't reach sample size within 8 weeks, either accept lower confidence level or don't run the test.

Early stopping rules: Generally, don't stop tests early. "We're up 15% after 3 days!" is often regression to the mean. But you can set pre-defined stopping rules: if variant is worse by >20% after reaching 50% of required sample size, stop for safety (you're harming conversion). If variant is better by >30% after reaching 75% of sample, you can stop early (result is clear). These rules must be set before starting, not decided during the test.

Simultaneous tests: Can you run multiple tests at once? Yes, but be careful of interactions. Testing homepage headline and pricing page CTA simultaneously is fine (different pages, different visitors). Testing headline and CTA on the same page simultaneously requires multivariate approach (test all combinations). Testing two headline variants on the same page (A/B/C test) splits traffic three ways, requires 50% more traffic to reach significance.

Conclusion

A strong hypothesis is specific, measurable, and grounded in reasoning about why the change will affect behaviour. Writing it down before you test is non-negotiable.

The goal isn't to be right. It's to learn. A well-written hypothesis teaches you something whether the test wins or loses. A vague hypothesis teaches you nothing either way.

Next chapter

3

Setting up experiments

A winning test means nothing if the setup was flawed. Learn how to configure experiments properly in VWO, ad platforms, and email tools so your results are actually valid.

Read concept

Related tools

VWO

VWO provides A/B testing, personalisation, and behaviour analytics to optimise website conversion rates through data-driven experimentation.

Rating

From

€

393

per month

Explore tool

My review

Hotjar

Hotjar captures user behaviour through heatmaps, session recordings, and feedback polls to understand how visitors use your website.

Rating

From

€

per month

Explore tool

My review

Microsoft Clarity

Microsoft Clarity provides free session recordings, heatmaps, and user behaviour analytics without traffic limits or time restrictions.

Rating

From

€

per month

Explore tool

My review

Notion

Flexible workspace for docs, wikis, and lightweight databases ideal when you need custom systems without heavy project management overhead.

Rating

From

€

per month

Explore tool

My review

Creating strong hypotheses

Introduction

Top picks

VWO

Hotjar

Microsoft Clarity

Notion

What makes a good hypothesis

Being specific enough

Connecting hypotheses to metrics

Common mistakes

Conclusion

Next chapter

3

Setting up experiments

Related tools

VWO

Hotjar

Microsoft Clarity

Notion

Related wiki articles

A/B testing

Hypothesis testing

Control group

Statistical significance

Lead capture rate

Further reading

Experimentation