How to run experiments properly

Execute tests with proper controls. Avoid peeking early. Monitor external factors. Maintain experiment integrity start to finish.

How to run experiments properly

Introduction

Designing a good experiment is one thing. Running it properly is another. Common mistakes stopping tests early, changing variables mid-test, ignoring external factors invalidate results and waste effort. This chapter covers execution best practices: when to start and stop, how to maintain control groups, what to watch for that signals problems, and how to avoid the traps that ruin experiments.

Set up proper control and variant groups

Begin with a coding briefing that leaves no room for guesswork. Share a link to the hypothesis card, the control page URL and a annotated mock-up that highlights every element to change. Include mobile and desktop views, colour values and font sizes.

Add functional notes: which button fires the conversion event, which field must remain untouched for tracking and which scripts load after interaction. Reference the dataLayer push or GA4 event names so developers can test locally.

Finally, agree a branch name in version control and a staging link. A clear brief prevents mid-sprint slack pings that derail both teams.

With the brief locked, you can move to crafting the actual variant, covered next.

Avoid peeking at results before completion

Build the variation in a staging environment first. Clone the production page, apply copy or design tweaks and keep file paths identical so assets cache correctly. If the test swaps a headline, do not sneak in button colour changes; one variable keeps results clean.

Run basic checks. Load time must stay within ten per cent of control. Alt text and aria labels must remain accurate for accessibility. Verify that form validations still fire and that thank-you pages render.

Capture before-after screenshots and attach them to the backlog card. These visuals speed approvals and later help the analyst pinpoint unexpected behavioural shifts.

With a stable variant ready, the next task is wiring the split in an A/B testing platform.

Monitor external factors that skew results

Open your testing tool Optimizely, VWO or Google Optimize 360 and create a new experiment. Paste the control URL, then target fifty-fifty traffic for your first run. If traffic is low, consider a higher share for the variant, yet never drop control below thirty per cent.

Define the primary metric as the booked-meeting thank-you URL load or the GA4 event book_demo. Add a guardrail metric such as bounce rate to catch catastrophic failures early. Set the experiment to run until it reaches ninety-five per cent confidence or fourteen days, whichever comes later.

Apply exclusion settings: filter internal IP addresses, exclude bots and remove current customers if your audience includes both prospects and users. Tag the experiment with the naming convention from earlier chapters so reports align.

Configuration done, you are ready for the final launch checklist.

Know when to stop, iterate, or extend tests

Work through a pre-launch checklist. Clear browser cache and load both variants in incognito. Trigger the primary conversion and check that events fire in real-time analytics. Confirm that heatmap and session-recording tools respect the split by querying variant data.

Test across devices: desktop, tablet and a mid-range smartphone on 4G. Validate legal banners like cookie consent and privacy notices. Ensure that customer support chat still loads and that tracking pixels for remarketing fire on both versions.

Notify sales and support teams of potential messaging changes so incoming calls do not catch them off guard. Schedule the launch at a low-traffic hour to minimise disruption if a rollback is required.

Checklist complete, press go and monitor the first one hundred sessions before stepping away. The experiment is now live and learning.

Conclusion

Your first A/B test now runs on solid ground: a precise brief, a single-variable variant, airtight tool setup and a rigorous checklist. This discipline limits risk and maximises learning speed.

Review interim data only after the minimum sample size to avoid premature bias. When the test finishes, record outcomes in the backlog and decide on rollout or further iteration. Each structured experiment compounds gains and confidence for the next sprint.

Next chapter

Continue reading

Article
5

How to analyse experiment results

Interpret data correctly. Calculate statistical significance. Distinguish signal from noise. Extract insights that inform next experiments.

Playbook

Experimentation

Random experiments waste time and budget. A structured framework ensures every test teaches you something, even when it fails. Decide what to test, design experiments properly, analyse results accurately, and share learnings so the whole team gets smarter.

See playbook
Experimentation
Tools

Relevant tools

VWO
Tool

VWO

VWO provides A/B testing, personalisation, and behaviour analytics to optimise website conversion rates through data-driven experimentation.

Hotjar
Tool

Hotjar

Hotjar captures user behaviour through heatmaps, session recordings, and feedback polls to understand how visitors use your website.

Microsoft Clarity
Tool

Microsoft Clarity

Microsoft Clarity provides free session recordings, heatmaps, and user behaviour analytics without traffic limits or time restrictions.

Notion
Tool

Notion

Flexible workspace for docs, wikis, and lightweight databases ideal when you need custom systems without heavy project management overhead.

Growth wiki

Growth concepts explained in simple language

Wiki

A/B testing

Compare two versions of a page, email, or feature to determine which performs better using statistical methods that isolate the impact of specific changes.

Wiki

Control group

Maintain an unchanged version in experiments to isolate the impact of your changes and prove causation rather than correlation with external factors.

Wiki

Sample size

Calculate how many users you need in experiments to detect meaningful differences and avoid declaring winners prematurely based on insufficient data.

Wiki

Statistical significance

Determine whether experiment results reflect real differences or random chance to avoid making expensive decisions based on noise instead of signal.

Wiki

Conversion rate

Calculate the percentage of visitors who complete desired actions to identify friction points and measure the effectiveness of marketing and product changes.