Oli Gardner

posted this on October 22, 2010, 1:30 PM

The confidence rating displayed in the A/B Test Center indicates whether or not your challenger variant has achieved statistically significant results. For those of you who are interested, this is a Chi Square test for significance. Most times, a confidence rating of 95% or better is sufficient to make a decision to either promote the variant to champion if it has the highest conversion rate, or to discard it if it's conversion rate is less than your current champion.

Another way to think of the confidence rating is that it indicates how often, if you repeated the same experiment, you could expect to get differing results. If you achieve a 90% confidence rating, there's a 1 in 10 chance that if you ran the same test again, you might get different results. With 95% confidence your chances are 1 in 20, and with 99% confidence they're 1 in 100.

To make sense of all this, think of your testing activities as managing an investment portfolio. Let's say you ran ten tests, and the average conversion rate lift of those ten tests was 50%. So, that might be taking a 10% conversion rate to 15%. If all of those 10 tests achieved about 90% significance, you could reasonably expect that in 1 of those 10 tests, you didn't actually find the best performing page. Now, if you're getting 50% returns on your testing investment, you have a hefty margin to absorb the possibility of being wrong that 1 time out of 10. However, let's say you were only getting a 5% average lift. In that case, being wrong 1 out of 10 times would almost wipe out your overall portfolio returns.

So, be aware of how your overall testing activities are performing, and choose a significance level that's right for you.

## Comments

Maybe itÂ´s my English. But I do not get it.

Ron, another way to think about it is that if you're seeing a big difference in performance between 2 pages (e.g. variant B is performing 50% better than variant A) then you can be less worried about getting a 95% confidence level. If the performance difference is smaller (e.g. 5%) then it's better to wait until you get a 95% confidence rating before choosing a winning variant.

I am not able to shift weight between the two variants

I can't figure out how you are using a chi-squared test to calculate this. Can you draw out what your contingency table looks like?

Conversion % | ? | ? |

Variant A

Variant B

Hey Jason!

Assuming you had a test where you split your weight 50/50 (and you lucked out and got a perfectly even split), and that your champion had a 6% conversion rate and your challenger only 2%, our contingency table would look like this if you had 2000 total visitors:

Does that help?

Carl,

Your example makes sense. I would certainly appreciate an extra 40 prospects especially if my market was only 2,000 visitors. But, are we sure that those 40 converted BECAUSE Variant A is superior? If we run the test again, would Variant A still beat B by 40 clicks?

How big does my market have to be to be sure that those extra 40 conversions aren't random variation?

Thank you,

Tim

Carl,

The confidence interval is a measure of how certain you are that the extra 40 conversions were a result of Variant A being better, as opposed to random variation. If you ran the test again, and the confidence interval was 90%, Variant A would still beat Variant B 9 out of 10 times.

Jason,

So, does Jason's example of 2,000 total visitors have a 90% confidence index? I thought the size of the CI was dependent on the sample size.

Thank you,

Tim

Hey Tim, when you say "size of the CI", I'm not sure I totally follow. We're not producing a confidence interval (as would be done in a Z-test, say), which provides a range for the expected value and an associated probability. The test we use (a Chi-Squared test), produces a single probability that indicates how likely it is the test results were due to chance. Sample size does influence the Chi-Squared test, but so does the difference between the outcomes

Add a comment