Hi, this is Jing, a data scientist with great passion for applying data science and big data technology in the industry. Today you are going to learn about pre-test analysis with me before you turn on your AB test.
Clarification on AB Test and Web experiment
AB Test is also known as split testing, is a method of comparing two versions of a webpage or app against each other to determine which one performs better. It is just one type of web experiment, there are other types, such as split url test, multipage test and etc.. depends on different categorisation. But usually we call web experiments as AB test in general.
The main reason for doing a pre-test analysis is to estimate how long the test should be running. In general, we don’t want a test running for too long for example longer than 90 days. Ideally, we want to run a test quickly, so that we could fail and learn quickly. There are two factors we need to consider to decided on how long we need to run the test. First is the sample size, the second is the whether the sample space is representative.
Sample Size for AB Test
Pre-test analysis what we mean in terms of AB testing is basically running a power analysis to know how big sample size we need to reach a statistical significant difference. The power of the test increases together with the sample size. Ideally, minimum power of a test required is 80%.
You can go to site: https://cxl.com/ab-test-calculator/ to get the sample size you need!
Based on weekly traffic & weekly conversions to see if you will get enough uplift and determine how long the test should run. For a really big change it might be reasonable to get 40% uplift but for a smaller one it might only be 5%. If your minimal detectible effect (MDE) is too high for the time period you have (2-6 weeks) you might need to reconsider:
- Focus on other supportive metrics
- In CRO area, instead of focusing on Conversion Rate, we might need to check Click Through Rate, Bounce Rate and etc…
- Add more markets to get more traffic, thus more conversions.
- Skip the test and go for a version supported by other data, e.g. user research, best practice. There is always a better alternative.
We need to reach at least 1000 visitors and at least 100 conversions in each variant to have enough data to draw conclusions. This is a guideline comes from how probabilities are calculated in Statistics. I am not a statistician, so don’t laugh at me here not go into details. But trust me since I learned this from a Statistics book.
Representative of the sample space
For some websites, they might have huge traffic. If only based on the sample size by the power calculation, some test only needs to be run for 5 days. Should we do it then in this case? I would suggest not. We are running AB test for optimising online conversion rate or other (depends on your business cases). We need to consider the user behaviour in this scenario. Most of the online business has a clear weekly pattern, so at least running the test for a whole week so that the sample will be representative.
If there is some previous analysis in terms how long your customers will convert, such as it might take around 2 weeks for the customers to convert, then I would suggest to run the AB test at least for 2 full weeks.
Statistical Model for AB Test Analysis
There are different statistical models to analyse whether there is a significant difference between the control and test groups. The most popular one right now is Bayesian statistical model. And if you are using Bayesian statistical model, you are not required to do a pre-test analysis since you gonna to monitoring the test all the time and to decide when to stop the test. However, it requires more statistic knowledge and certain coding skills to analyse an AB test using Bayesian statistical model. I used the PyMC3 library to do the analysis in python. Try with it if you are interested.
The easiest way to kick starting running and analysing AB test is using Frequentist Statistical Model. This is why power analysis (pre-analysis) is needed. Since when we are using Frequentist Statistical Model, we cannot monitor the test and decide when to stop the test. We have to pre-decide how long we are going to run the test.
Tools might be useful for AB Test Pre-Analysis and Post Analysis
To calculate the statistics significance, you can use certain website tool to get the result instead of writing your own formula.
That’s all I know at this moment. As I said, I am not a statistician and my major in Information Science and Technology. Statistician is just one subject I studies in the university, and didn’t got the patience to dig into this fascinating area unfortunately. So don’t judge too hard if I said something inaccurate. All my intention is to share and hope this can be helpful to you. Thanks for reading. I am Jing, a data scientist aiming to be better and better.