Designing experiments

Introduction

A poorly designed experiment is worse than no experiment. It gives false confidence or misleading data that sends you down the wrong path. Proper experiment design starts with a testable hypothesis, defines what success looks like, calculates how long to run the test, and anticipates what could invalidate results. This chapter walks through the framework for designing experiments that actually teach you something.

Writing a good hypothesis

Start by creating the growth backlog. I use a simple Kanban board in Notion. Columns run left to right: Ideas, Ready, In progress, Completed and Archived. Each insight from audits or customer interviews becomes a card in the Ideas column.

Every card gets a consistent title that names the page, issue and metric. For example, “Pricing page – unclear savings – form-submit rate.” Consistent naming lets you search and filter at speed.

Add five mandatory fields inside the card: source link, problem statement, supporting data, rough lift estimate and owner. For lift I use a quick guess in per-cent, based on similar past wins. The owner is the person responsible for pushing the test forward, not the developer.

Move cards to Ready only when data backs the problem and design or copy resources are available. This gate prevents clutter and keeps momentum.

The backlog exists; next you need to write a hypothesis that turns each card into a testable claim.

Choosing success metrics

A strong hypothesis follows the format: “If we change X for segment Y, then metric Z will improve because insight A.” Example: “If we add monthly cost comparison above the fold for self-serve prospects, form-submit rate will rise because interviews showed pricing confusion.”

Keep hypotheses short two sentences max. Long explanations hide fuzzy thinking. Include only one change per hypothesis so results stay clear.

Attach the hypothesis to the backlog card under a dedicated field. This practice stops vague tasks such as “Rewrite headline” from entering development without purpose.

With clear hypotheses ready, you can now rank them to decide what to test first.

Calculating sample size

I rank hypotheses using the ICE framework: impact, confidence and ease, each scored from one to five. Impact estimates potential lift on the target metric. Confidence measures evidence strength customer quotes and analytics bumps earn higher scores than gut feel. Ease reflects effort in hours to design, build and review.

Multiply the three scores for a total between one and one hundred and twenty-five. Sort the Ideas column by this total every Monday. The top five become the sprint candidates.

Review scoring openly with the team. If design argues that a “quick copy tweak” actually needs a new component, adjust the ease score and let the card fall accordingly.

Ranking yields a clear top pick. The next step is to define the experiment so developers and analysts know exactly what to build and track.

Plan for confounding variables and controls

Create an experiment brief inside the backlog card. Outline test type (A/B or multivariate), variant description, primary metric, guardrail metric and duration. I set guardrails on bounce rate and qualified-lead quality to avoid accidental harm.

Specify sample size using a calculator. Aim for ninety-five per-cent confidence and a minimum detectable effect that matches your impact score. If traffic is low, bundle similar pages or extend duration rather than lowering statistical rigour.

Add technical notes: URL, targeting rules, tagging requirements and any necessary design assets. Link to Figma or copy docs so nothing blocks development.

Once the brief is complete move the card to In progress. After the test ends, log results, update the ICE scores and shift the card to Completed or Archived. This loop makes learning cumulative instead of circular.

The experiment is now live, closing the backlog cycle and feeding the next phase of test execution.

Conclusion

A disciplined experimentation backlog turns scattered insights into a steady queue of high-impact tests. Build the board, craft sharp hypotheses, rank by ICE and brief experiments with precision. Each step lifts booked-meeting rates while cutting debate and delay.

Implement this system today and future chapters on building and analysing A/B tests will slot in effortlessly. Your optimisation engine will shift from ad-hoc tweaks to a compounding, sprint-by-sprint growth machine.