As a researcher, extracting insights from qualitative, guided interactions is a very intensive and time consuming process. Even when using notes instead of full-blown transcripts, the sheer volume of text to go through is enormous. Extracting the obvious themes and the underlying subtext, subtleties and nuances of the data is necessary. And there is no real alternative to reading, and reading again, is there? This how most of us started out as qualitative researchers.
Manual Coding
Let’s discuss how we all probably used to code earlier in our careers.
- We engaged with the data: Listened to the initial interactions, made our notes
- We created a structure of analysis: Created a code-list from the DG, notes and transcripts
- We identified themes: Grouped codes into themes, to reveal underlying patterns
- We iterated: Marked transcripts and refined codes/themes, dug deeper into the data
- We sorted the data: Copied analysis into Excel, created tables and maybe did some math
- We summarized our findings: Various cuts by cohort, some numbers and illustrative verbatims to support findings
There are definite benefits of this manual effort:
- Hands-on and in-depth: You develop a connection with the data, and are able to explore the depth of the conversations, rather than just the volume.
- Contextual Nuances: By directly engaging with the data, you're better equipped to capture context, emotions and subtle nuances.
- Adaptability: You can adapt your analysis to unexpected discoveries and findings quickly.
But there are also challenges.
- Time-Consuming: It demands significant time and effort.
- Subjective: Researcher bias can inadvertently influence the coding process.
- Scalability: For large datasets, manual coding becomes laborious and less efficient.
Over time, all of us have developed our own methods and hacks to tackle these challenges. However, I'm usually anxious about text overload or missing some key information. Which leads me to resort to more hacks eg. asking an intern to copy data to Excel and sort it, mark the transcripts for completeness, visualize cohorts in tabular form. Over time, third-party firms cropped up, to help me do just that, at scale.
As technology was more and more democratized, I began to witness (and explore) the development of automated tools closely, thus turning into an early adopter. Let's take a look at how this has changed my world.
Automated coding
It is more than obvious that Technology should help minimize some, if not all the challenges that manual effort brings to the researcher's job. My view of how Coding evolved thanks to technology, is an attempt to summarize what my colleagues and I witnessed over the past two decades:
- MS Office: Word and Excel were our go-to’s. Transcripts were digitized, and tools within these applications allowed us to annotate, tabulate, comment, track changes. Even now, these are still commonly used for exchanging information and analysis at least.
- Open-ended coding tools like Ascribe, Clarabridge: Though most industry standard tools have now been acquired by large enterprises, these went a bit beyond MS Office. You could build and iterate codeframes more easily and also be able to collaborate among many individuals working on the same dataset. Over time, tools also gained the ability to automatically tag codes and text.
- CAQDAS' like NVivo, Delve, Atlas.ti: Computer-Assisted Qualitative Data Analysis Software were able to do more complex activities and help reduce manual effort significantly. They did evolve over time and became more complex.
- Machine Learning and AI: This is where NLP, complex algorithms (LLM’s for one), self-learning systems and hardware came together at scale; to take on more complex analysis and reporting tasks. Today, ChatGPT, DALL-e, Midjourney have been adopted quickly, at unbelievable scale. Many CAQDAS' now leverage these is various ways, and it promises to revolutionize the way Qualitative Research is analyzed.
What have we learnt so far?
Curious qualitative researchers have experimented with all the above in some shape or form; sometimes successfully, sometimes resulting in frustration. Here's a distilled-down summary of the community's experiences, that helps understand where Automated Coding stands today:
What works...
- Speed and Efficiency: Automated coding significantly accelerates the process, making it ideal for time-sensitive research studies.
- Consistency: Automated methods reduce the potential for human bias, thus ensuring consistent results, even if studies change hands often.
- Scalability: Handling large datasets has indeed become more manageable with automation.
And what doesn't (yet)...
- Feature-overload: Some of these tools offer a truckload of features, many that make you change your established processes. This surfeit of features also means you often end up paying for actions that you do not really need.
- Tough to adapt to: There is almost always, a steep learning curve involved.
- The tech trap: Once you start using a tool, you are stuck with its algorithms and shortcomings too.
- Context-starved: While tools are getting better at this, capturing nuances and subtleties remains a significant challenge.
- Way too tech: Machines cannot think like a researcher; at least, not yet.
Also, a few ethical conundrums to bear in mind...
- Data privacy
- Developer bias
- Consent and transparency
- Environmental impact
In conclusion
Our fundamental desire to generate better, faster and more actionable insights will require us to be aware of new technologies, and know the boundaries we must draw between human and artificial intelligence. There have been books and movies that have excited us about the potential of AI, as also its downsides. Debates will rage on at least for some years to come.
As for me, I am excited about the evolving role of the qualitative researcher in today’s world. Watch us shape how GenerativeAI helps qualitative researchers stay relevant and on top of their game.