Good practice in market research: Coding

In the first of a new series of “how to” blogs, Anthony Shephard-Williams (director), sheds some light on an important, but sometimes disregarded, part of the researcher’s tool box.

What is coding?

Coding in market research is the process of grouping together open-ended responses based on similar words, sentiments, or ideas. Some may refer to it as “sentiment analysis” – it’s essentially the same thing.

Many telephone surveys and online surveys include at least a few open-ended questions (where respondents can answer openly in any way they wish) and online communities are full of them!

Coding enables researchers to quantify data obtained from open-ended questions and identify the top prevailing themes for analysis purposes, usually via the development of a code-frame.

From an inquisitive researcher’s perspective, it provides a more colourful understanding of opinions within the data, and provides a level of closeness to the research respondents which could be missed when coding is outsourced or delegated as a “task” rather than being treated as an important exploratory stage of the analysis process.

What does a code-frame look like?

To give you a feel for what a code frame looks like, I’ve created a really basic example using only n=20 verbatim comments.

The question here was ‘what brand or brands do you think this TV ad was for?’. Coding the verbatim responses into codes (themes) shows that the top 3 brands respondents perceived the ad was for were The Co-operative (30%), the BBC (20%) and Burtons (15%).

Surely there must be a robot / AI for this?

There are lots of software / tools available for researchers to use to analyse verbatim comments. But are they as good as humans? In a nutshell, NO! Written expression is extremely complex, and as such there will always be a place for humans in the coding process.

  • Robots / AI struggle to detect sentiment – It’s nearly impossible for an automated tool to “hear” sarcasm. So, if a respondent types “I just LOVE Company X” immediately after blasting them for poor customer service, you run a risk of a negative, sarcastic statement being interpreted as positive.
  • People don’t speak in key words – The world is a wide and varied place. Things like educational background, gender, ethnicity, language skills (and more) can mean that two respondents who are saying the same thing can choose very different words to say it. A skilled coder teases out this underlying meaning and makes sure it is captured.
  • A human can quickly raise the “data integrity” alarm when needed – Coders are often the first to spot a systemic problem with a respondent’s data. Timing tools and check boxes can only tell us so much, but a respondent who is typing contradictory nonsense into every ‘other specify’ box within a survey needs to be flagged for investigation. Coders can also easily spot if a respondent is saying something negative about how a question is worded or asked, or if they have completely misunderstood what is being asked of them.

Some tips for more efficient coding

As researchers we can sometime be faced with an overwhelming number of verbatim comments (some of our projects run into thousands of respondents) so we wanted to share some tips to help make the experience of coding more efficient.

  • Sort your verbatims from A to Z before you get started – if you’ve got lots of comments related to a client being helpful, they’ll be more likely to be grouped together from the off. Typos can be fixed whilst you’re at it.
  • When coding, ALWAYS include the “respid” (the unique identifier for each respondent) – just in case you need to refer back to a response, or if you we want to add coded data into cross tabulations. Also, pull across any profiling information you may need in columns close to the verbatim (i.e., gender / age) to more easily label verbatim comments you may wish to use in the report or debrief
  • If it’s a ‘why do you say this’ question following on from a rating question, use the rating question to split into positive / negative / neutral – you don’t always need to code all three sentiments.
  • Always have an ‘other’ code – If you are unsure of how to describe a response or where it fits in your code frame, code it as ‘OTHER’ and then revisit all of your ‘OTHERS’ at the end to see if they fit somewhere, or decide whether there are enough similar responses to create a new code.

The best advice I can give, however, is to enjoy it! By immersing yourself in the data you will also become a much better story-teller.

Lastly, here are a couple of comments from my Mustard colleagues on why they love to roll their sleeves up and code data:

“Coding brings you a lot closer to the data. Sometimes you can feel overwhelmed with numbers, figures and % in quantitative research. Just by coding open-ends you can immediately get a clearer picture of the insights.”

I love coding! It really helps me to put myself in our respondent’s mind and to really get to grips with what they are saying. Also, it is extremely useful in piecing together the insights and helps enormously with storytelling.”

If you have any questions about coding or any other research requirement, we’d love to hear from you, feel free to get in touch –anthony-shephard-williams@mustard-research.com.