How to analyse data from open-ended questions

Analysing data from closed questions - where people pick an answer from a menu of options - is easy. The answers are already sorted into categories. At the most basic level all you do is count the number of people who selected each option, and report summary statistics – totals, averages, ranges etc. There are many excellent books and YouTube videos that provide introductions to basic statistical analysis.

Likewise for observational techniques, most of the data analysis involves counting the number of times different behaviours occurred. For more on this see How to analyse data from observations

This guide describes how to analyse data collected using open-ended questions - where people answer using their own words. Compared to closed questions this data is a little trickier and more time-consuming to analyse.

A five-step guide to analysing data from open-ended question

You cannot just dump a transcript of everyone’s answers into a Word document and call it a report. It would be impossible to read, and even if someone tried it would literally take them days to draw any meaningful conclusions from the raw data.

If you are short of time and cannot analyse every answer you can take a random sample of responses and just analyse those. For example, if you collected 700 answers to an open-ended question, then a random selection of 100 should give you an accurate enough picture of what people are saying.

Step 1 - numbering

To start, assign a number to each questionnaire or interview. That way if there are any queries about a particular answer you can go back and check the original data. This also allows you to compare responses to different open-ended questions e.g. what people liked, compared to what they feel they gained from the experience or to their expectations.

Step 2 - data cleaning

Before you do any analysis, you need to do a couple of bits of housework. Quickly go through the data and:

Move any answers that do not match the question to another, more appropriate, question. This sounds like an odd thing to do but it is surprisingly common for people to give an answer to a question you have not yet asked, or one you asked previously. If you asked what they like about the workshop, it makes absolutely no sense to try and categorise an answer they gave about why they decided to visit the science festival.
Remove any questionnaires or interviews where the person has clearly decided not to take part – for example they have only given silly answers, or have left most of the questions blank. It is fine to bin these.

Sometimes a person gives up part way through an interview or questionnaire. It is up to you to decide whether they have answered enough questions to still include them in the sample.

Step 3 - define categories of answer

Once you have cleaned-up the data, you can start defining the categories of answers for the open-ended questions.

For each open-ended question, take a random sample of10 to 20 answers and sort them into categories. You can start with as many categories as you like but the aim is to gradually merge them together until you have no more than 5 or 6. This should include a ‘miscellaneous’ category for answers that just do not fit anywhere.

NB the miscellaneous category should contain no more than 10% of the responses. If it contains more, either something has gone wrong with the question or there is an answer category in there that you need to fish out.

Once you have identified your answer categories, you need to write a short description of each of them, with a couple of example answers. These descriptions will help you sort the rest of the data, ensuring you are consistent in your choice of category. It will also help when it comes to writing the report.

You will need to repeat this process for each of the open-ended questions. Some will be easier to categories than others, depending on the range of answers people gave.

Step 4 - sorting people's answers into categories

Now you have your list of answer categories for each of the open-ended question, you can sort the entire data-set into them. A few things to note:

People’s answers very often fall into two or more categories. That is fine, you can assign the relevant parts of the answer to different categories
As you work your way through the data you may find some categories contain very few responses. Consider merging these almost empty categories into other closely related ones; or put these answers into the miscellaneous category
If you merge some categories, you will need to adjust the definition of the new, larger category
Sometimes a category turns out to be too broad, and ends-up containing too wide a range of answers. In this case, you need to split it into two or more separate categories. You will need to write definitions for each of the new categories

There is no need to be super-precise about assigning answers to categories. There will always some that could fall into two different categories. Do not spend too long agonising over these, just put them into the category that feels like the right one.

As you are sorting the data into categories it is worth taking a note of any responses that would make good illustrative quotes. These will add some personality and authenticity to your report. Avoid choosing very bland comments e.g. ‘It was good’, overly lengthy responses, or ones that aren’t representative of the rest of the answers.

Step 5 - looking for patterns

Once you have sorted the answers into categories, you then need to ‘step-back’ and look at the patterns that are emerging.

It might help to consider the following:

What range of opinions did you get for each open-ended question?
How frequently did the different categories of response occur? For qualitative data: was it nearly everyone, about a third of the people, only a couple of individuals … ? If it is quantitative data (i.e. a reasonably large sample of 60 or more) you can quote percentage figures for each category of answer
What responses were you expecting to get but which were absent or rare?
Were there different patterns of response from different types of visitors e.g. adults vs children, first time visitors vs. repeat visitors?
What seem to be the underlying reasons for people’s thoughts and feelings? These can be deduced from the words used in response to the question, or from how they answered other questions

Using quotes

When you come to writing the report, you can use the example quotes you have identified to illustrate your findings. One or two quotes for each point is usually enough. More than that and the report becomes difficult to read. It is fine to correct spelling mistakes so that quotes are clear and easy to read.

You should also delete “umm”s and “err”s. And if there is a lot of irrelevant detail in the quote (e.g. they deviated into a different topic or repeated what they previously said) it is fine to delete that text and replace with ‘...’. The most important thing is to ensure you have preserved the original meaning of the quote.

Analysing focus group data

Analysing data from focus groups involves much the same process. You start by identifying 5 to 6 answer categories for each of the main questions in your topic guide. You then go through the transcript or the digital recording, allocating participants’ answers to the appropriate category.

Focus group discussions are much more free flowing than surveys and interviews. The discussion often returns to questions posed earlier, and covers questions the moderator has yet to ask. So it is especially important to allocate people’s answers to the most appropriate question in the topic guide.

You may also have asked people to fill in bubble cartoons or mind maps during the focus group. If so, you can process this data in the same 5-step process outlined above.