The unlikely link between Expected Goals (xG) and research

Andy Wiseman, strategy director, shares his thoughts on the unlikely link between Expected Goals (xG) and the world of market research.

During the break over the holidays, I was lucky enough to receive the excellent book by Rory Smith titled ‘Expected Goals: The Story of How Data Conquered Football and Changed the Game Forever’. As someone with more than a passing interest in football and data, it could be described as the perfect gift!

For those not that interested in football, or in the wealth of data that now underpins it, Expected Goals (xG) measures the quality of a goalscoring opportunity by calculating the likelihood of it being scored from that position on the pitch in that phase of play. The xG measure ranges from 0-1, where a score of 0 suggests it should be impossible to score from that position, and 1 suggests a player should always score from the position.

Not that any of that really matters for the purpose of this blog! The prologue of the book focuses on a Filipino by the name of Ashley Flores. Flores is what is known in football data circles as a ‘tagger’. Now, in this world of AI, machine learning and so on, it might be reasonable to think that much of the data produced around football is done automatically. Of course, there’s a lot of modelling of this data that is used to produce more useful analysis, but at the heart, the creation of the data remains a very human thing. The prologue tells the story of Flores, sitting in his company’s offices in Manila, watching football matches in his role as ‘data operator’. Essentially, his role involves watching games largely from Europe’s big 5 leagues (English Premier League, La Liga in Spain, Germany’s Bundesliga, Serie A in Italy and France’s Ligue 1) and ‘tagging’ the action. At the simplest form, tagging will look at passes, shots on goal, corners and throw-ins, but the actual role is more complex, with taggers making assessments of a whole multitude of other factors that are taking place in the game.

The labour-intensiveness of tagging is perhaps hard to comprehend. A new tagger may take days to fully tag a single game – with each effort being scrutinized for accuracy and completeness. Even Flores takes several hours to tag a single game. Multiply this by c. 50 games in the big 5 leagues, and increased demand for such services from other parts of the globe, and it’s easy to see how this becomes quite a big operation.

But, what does any of this mean for the work of market research? When thinking about the role of taggers, I started to think that they were no different in some ways to quantitative research participants (Note: it’s 2023, can we stop calling them respondents!). When completing surveys, we’re essentially asking our participants (whether from panels or from customer lists) to ‘tag’ their preferences and beliefs, in response to the questions we pose to them.

Like in the world of football analytics, much has been done to look at automating the quality assurance procedures in research. Panels, in particular, have led the way in this area, starting with the simple ‘digital fingerprinting’ using IP addresses, through to giving participants a ‘quality’ score based on their answers to surveys over a period of time. However, as in the football analytics world, there remains an important role of a human quality assessor to avoid the ‘rubbish in, rubbish out’ mantra that statisticians in particular use.

In football circles, this means that the work of each tagger is checked before the data is released – this is still true today where data in live games is provided in near real-time. The same remains true for the world of research – looking for flat-liners, speeders, potential bots and those who are just not that engaged in the task of survey completion need to be identified, so that the data we use for our analysis is as complete and accurate as possible.

What this isn’t is a rejection of all things AI or tech based – we know that the work going on behind the scenes with some of our partners is doing a lot of good in removing the amount of noise in the data we collect. Rather, it’s a rallying call for the human element in data quality, ensuring we can deliver robust insights to our clients and make the difference for their businesses. And with that, I’m going back to read about ‘Positions of Maximum Opportunity’.