How to define and measure the success of experiments?

How to define and measure the success of experiments?

How to define and measure the success of experiments?

The best practice is to measure the success of experiments by calculating the statistical significance of the change in the main parameters by the treatment. Basically, the higher the statistical significance level, the more certain we can be that the control and the variant are really different and that the change in the parameters is not just random. Most product teams use a 95% significance level, which means that they can be 95% sure that the observed differences are real.

One of the main costs of experimentation is the time it takes to get a meaningful result. Depending on the impact of the change and the use of a particular product, the time to obtain results can vary from a few days to several months. This not only slows down the time to market for the change, but also reduces the total number of experiments that can be conducted due to overlap. It is recommended to calculate in advance the time needed to gather a sufficient sample size (to reach statistical significance) using the treatment effect (change in metrics for variants), the confidence level (typically 95%) and the number of users that can be treated (traffic or usage).

Teams are sometimes tempted to conclude a test prematurely, either to extend the gains to the whole base as soon as possible, or to prevent an MVT design from persisting too long. However, the best practice is to run a test not only to get the minimum number of users processed, but also to observe some product cycles. These product cycles depend entirely on the nature of the product, e.g. an e-commerce site may experience weekly peaks and troughs in terms of customer type and traffic, while a hospitality product may experience seasonal patterns.

To better understand the context of the impact and further optimise, it is recommended to analyse the response of different customer cohorts and segments. This not only allows the team to target any change to those users most likely to benefit, but also improves their understanding of different segments and the nuances of their behaviour. For example, as the composition of new customers changes with acquisition targeting, the effectiveness of new features for new cohorts may be different, or customers who primarily use a mobile device may prefer a different user interface than customers who use larger screens.

Write first comment

Leave a Reply

Your email address will not be published. Required fields are marked *