Wrong Tests and Ill Fitting KPIs Should not be Held Against A/B Testing

Ill fitting KPIs are no argument against A/B-tests

This time @AndrewChen ’s Newsletter was somewhat weird. Under the heading “Conservation of Intent: The hidden reason why A/B tests aren’t as effective as they look, Andrew argues that measures which result from A/B testing and aim at improving conversion rates do not take into account that normal users among those who churn are statistically dominated by “low intent” minnows, i.e. those who do not pay (a lot): “Doing tactical things like moving buttons above the fold, optimizing headlines, removing form fields – those are great, but the increases won’t directly drop to your bottom line.”

Of course, they won’t. But who would claim that? And which A/B test would suggest that changing buttons, headlines or removing fields will have a directly measurable effect on returns? Measurable direct effects would be things like an improved call to action, a shorter walk to the shopping cart/ till etc. Quantifiably improved numbers of daily/ monthly active users, better retention rates, reduced churn rates, improved conversions, reduced CACs or CLVs, may all be potential mediate effects. But these results should not be linked directly 1:1 to any of the above mentioned individual measures or to improved returns. And frankly, I do not know who would do that. Paying value users, so called whales, should not be subsumed under “normal” users at all. Their particular way-above-the-monetary-norm intent can often already be depicted from the particulars of their visit to the landing page.

No serious person would expect that measuring and then changing something “high up on a funnel” would show results in equal measure way down the funnel. Andrew’s argument leads astray, because he does not distinguish between leading and lagging indicators which measure KPIs along a hierarchy of corresponding tactical and strategic (where “strategic measure” may either be the sum of a set of individually measured “tactical” activities or a separate type of measure such as e.g. choice of target group, choice of key words etc.). In short: The choice of false metrics should not be held against the institute of A/B testing.