New outlook for 2022: The (non) sense of benchmarks

Written by Olivier Tilleuil (Founder of EyeSee), Mirna Djurić (Head of Insights, EyeSee) and Jane Nedinkovski (Global Director, New Business Development, EyeSee)

Taxes, death, and norms – or at least for the market research industry – are inevitable. But do they make sense, or do they just give us a wrong sense of security?

Why do we even rely on norms? As our experts put it – the straightforward answer is to understand whether a score on the survey is good or bad. Picture this: a company is planning to launch an ad, and they want to test that ad with a survey to ensure it’s a success – but only about 40% of respondents recall the ad. So, is that good or bad? By comparing these results with previously tested ads, we can understand (theoretically speaking) whether that is a good or bad score.

So, what’s the issue?

What is good or bad depends on many factors, Olivier explained. If the above ad is for a new brand product, then a similar, or even a slightly lower score than the norm database, would be great. If it were for a well-known brand, scoring at the same level as the average ad would be bad. The same logic can be applied to many other variables. For instance, when it comes to the messaging and the campaign’s objective – some campaigns might be more focused on emotions and others on objective information (e.g., did you know you can use this product for that occasion). Emotions sound great, but rational arguments can unlock a lot of sales as well.

When talking about other variables that impact the score an ad can make in comparison to the benchmark, EyeSee’s Founder stated that category and even subcategory, survey method or question, testing in the same medium (e.g., social media or TV), the target audience, the country and so on, are just as vital.

What does this mean?

This means: similar category/industry, similar life cycle, target audience, different types of questions, etc. The number of combinations increases fast: 20 subcategories x 4 life cycles x 12 target audiences x 3 mediums x 4 different types of messaging – this makes 11,520 different cuts! And you need to have at least 20 data points, so you would need +250,000 stimuli (per type and per country) tested every 2 years – and even if you focus on 5 key subcategories, you would need +62,500 stimuli, explained Mirna. Long story short, no company has these resources, she concluded.

Are those cuts really necessary?

Why do we otherwise use cuts in the target audience? Are the companies negotiating so hard with retailers about the product’s location or pay so much for certain ad locations? Do they need to be so specific? With first-hand experience with clients across the globe, Jane pointed out that anything less specific is a black box, and the numbers can be impacted by a couple of key parameters. Below is an overview of an important question across different stimuli. The median score for the KPI is 46%, while for some subcategories, the median is 37%.

However, as Jane indicated, the subcategories are for CPG – here, we are not taking into account finance vs. CPG. So, your confidence interval of all categories (in CPG) does not even include the median of the better/worse subcategories.

What are the alternatives?

1. A/B test

The advantage: A/B tests are done on the same audience, which enables you to measure an uplift compared to the current design – if there is any.

The disadvantage: There might not be a current design (e.g., NPD testing). Or perhaps, people could simply be used to the current design, or the objective of the current design is different. But still, if you want to create an uplift, the A/B tests are much more precise than relying on the ‘black box’ norm database.

2. Build a tailor-made norm database (and keep it open)

Instead of having a black box norm database, you can carefully select key competitors and collect data for their stimuli.

The advantage: It can be done on any relevant target audience and all on relevant stimuli.

The disadvantage: The cost price. It will cost about 20k per category/stimuli to develop, and you need to renew it every 2-4 years. But then again, you have multiple campaigns per year, so it might make sense to invest in the norm database.

3. Agree on action standard

You can agree in advance what good results would be, given the marketing campaign’s objective – but what if the campaign does not hit the mark? Is it because the action standards were too high? The process of creating action standards can be not only very complex but quite subjective as well!

Should you throw away the norm database?

Not necessarily, our experts agreed, but the point is rather to be careful when using them – use them for context and not as the vital parts of your research. In most cases, choosing a better methodology and visualization instead of a supplier with better norm databases is a smarter move. Keep in mind that nobody got fired because of using Nielsen or Milward Brown databases – but if you want to be the best in your area and category, you might want to rely on better agencies!

Doing your best to prepare for the upcoming year? Make sure to also check out the fresh perspectives for our new seasoned industry experts!

New outlook for 2022: The (non) sense of benchmarks

So, what’s the issue?

What does this mean?

Are those cuts really necessary?

What are the alternatives?

Should you throw away the norm database?

Request access to

TikTok creative study findings!

Thanks for your interest!

So, what’s the issue?

What does this mean?

Are those cuts really necessary?

What are the alternatives?

Should you throw away the norm database?

Get the latest newsletter updates!

Thanks for your interest!

Get the latest
newsletter
updates!