There are quite a few misconceptions about determining statistical significance in a simple A/B split test. More often than not, naive online marketers say they’ll use a “rule-of-thumb” to determine when to keep or optimize a landing page. Their “rule-of-thumb” is typically as simple as sending 300 clicks to landing page 1 and another 300 clicks to landing page 2 and toss which ever converts less. That’s a mistake.
The problem with using a rule-of-thumb is that you’ll either:
- Spend more money than necessary on a certain advertisement — resulting in testing less advertisements and taking much more time to find that optimum ROI.
- On the other hand, marketers will many times pull an ad too early — never learning that that advertisement may have been profitable given enough time.
So, what we want to do is find the exact amount of time to run both advertisements A and B to maximize profits.
How you should start a campaign
Obviously, everyone will tell you to start out with as many landing pages and advertisements as possible. “As possible” might mean ten or more, but who really does that? I’ve told myself that I’d do that, but that hasn’t happened yet. It’s not that I’m lazy because I’m not. It’s because I’m impatient. Here’s what I do:
I create two or three of everything – not ten. First combination (or two) is what I think will convert the best. The other is completely different. I’m not talking about telling your designer, “surprise me,” because we all know that won’t work. “Completely different” is usually a spontaneous thought I’ve had that I believe may work – but is unlikely, or it’s from an offline example of persuasion or marketing that I’ll try in some way to apply online.
After I calculate statistical significance, I’ll split test another set of ads against that. Typically, the results of the initial split-test give me more ideas to test.
Two ways to calculate statistical significance
Confidence intervals
This is done using confidence intervals from statistics. If you’ve never heard of confidence intervals in college, I’d imagine you didn’t learn anything from college that’s helping you in online advertising currently. Anyway, back to the subject… Several analytical programs calculate winning advertisements using this method. By using this free tool by Split Test Accelerator (opens in new window, type random data in it to try it out), you can determine which landing page or ad copy would perform the best at what percentage of the time. For a quick example, take a look at:
Essentially, if you have an advertisement that has a 95% confidence, that ad would be most effective 19 times out of 20. If the campaign is important, I aim for at least 99% confidence.
A/A (or null) split-test
This is the second way to determine statistical significance. Personally, I use confidence intervals, but I wanted to be sufficiently extensive so I’ll briefly explain what’s referred to as an A/A split-test.
Typically, in an A/A test, you’ll split test the same advertisement. You make a duplicate of an advertisement and track each ad as if they were different:
Finally, you need to decide your sample size and set up the criteria for success. To decide your ultimate sample size, run a “null” test with your A/B test. The null test is really just an A/A test, where you are running the control against itself to determine where the convergence of results matches up (typically within 0.05 percent of each other, but that’s up to you)…. – Mike Sack, executive VP of Inceptor
If you run an A/A test and then a A/B test at a later date, then conditions will be different since the timing is. So, the way to do this is to do an A/A test during an A/B test.
In summary, I know most of you aren’t going to use the A/A test, so just use confidence intervals through a tool or analytics program. Remember, you’re wasting valuable time if you’re not using statistical significance in your testing.
PS: If you know who came up with the Hathaway shirt ad, you’re awesome.
Leave a Reply