Automating Thematic Analysis: How LLMs Analyse Controversial Topics

2405.06919

Published 5/14/2024 by Awais Hameed Khan, Hiruni Kegalle, Rhea D'Silva, Ned Watt, Daniel Whelan-Shamy, Lida Ghahremanlou, Liam Magee

cs.CY cs.CL

📉

Abstract

Large Language Models (LLMs) are promising analytical tools. They can augment human epistemic, cognitive and reasoning abilities, and support 'sensemaking', making sense of a complex environment or subject by analysing large volumes of data with a sensitivity to context and nuance absent in earlier text processing systems. This paper presents a pilot experiment that explores how LLMs can support thematic analysis of controversial topics. We compare how human researchers and two LLMs GPT-4 and Llama 2 categorise excerpts from media coverage of the controversial Australian Robodebt scandal. Our findings highlight intriguing overlaps and variances in thematic categorisation between human and machine agents, and suggest where LLMs can be effective in supporting forms of discourse and thematic analysis. We argue LLMs should be used to augment, and not replace human interpretation, and we add further methodological insights and reflections to existing research on the application of automation to qualitative research methods. We also introduce a novel card-based design toolkit, for both researchers and practitioners to further interrogate LLMs as analytical tools.

Create account to get full access

Overview

This paper explores how large language models (LLMs) can support thematic analysis of controversial topics.
The researchers compare how human researchers and two LLMs (GPT-4 and Llama 2) categorize excerpts from media coverage of the controversial Australian Robodebt scandal.
The findings highlight similarities and differences in thematic categorization between human and machine agents, and suggest where LLMs can be effective in supporting forms of discourse and thematic analysis.

Plain English Explanation

Large language models (LLMs) are powerful AI systems that can analyze and understand large volumes of text data. The paper introduces how LLMs can be used as "research assistants" to help humans make sense of complex topics and environments.

In this study, the researchers wanted to see how well LLMs could categorize and analyze media coverage of a controversial political issue - the Australian Robodebt scandal. They compared how human researchers and two LLMs (GPT-4 and Llama 2) would categorize excerpts from news articles about this topic.

The results showed some interesting overlaps and differences between how humans and machines analyzed the content. This suggests that LLMs can be a useful tool to support thematic analysis of complex topics, but they should be used to complement and enhance human interpretation, not completely replace it.

The researchers also introduced a new "card-based design toolkit" that could help researchers and practitioners further explore how to effectively use LLMs as analytical tools.

Technical Explanation

The researchers conducted a pilot experiment to explore how LLMs can support thematic analysis of controversial topics. They compared the thematic categorization of excerpts from media coverage of the Australian Robodebt scandal by human researchers and two LLMs - GPT-4 and Llama 2.

The researchers first identified key themes and categories related to the Robodebt scandal through a review of the literature. They then selected 30 excerpts from news articles and asked the human researchers and LLMs to categorize each excerpt according to the identified themes.

The results showed both similarities and differences in how the human and machine agents categorized the excerpts. In some cases, the LLMs were able to accurately identify the key themes, while in others, there were discrepancies between human and machine interpretations.

The researchers argue that LLMs can be effective in supporting forms of discourse and thematic analysis, but they should be used to augment and not replace human interpretation. They also provide methodological insights and reflections on the application of automation to qualitative research methods.

Finally, the researchers introduce a novel "card-based design toolkit" that they believe can help both researchers and practitioners further interrogate LLMs as analytical tools.

Critical Analysis

The researchers acknowledge several caveats and limitations in their study. They note that the sample size was relatively small, and the study was a pilot experiment. Additionally, the researchers used only two LLM models (GPT-4 and Llama 2) in their analysis, and it would be valuable to expand the study to include other LLM architectures.

The researchers also highlight the need for further research to better understand the strengths and weaknesses of LLMs in supporting thematic analysis and qualitative research methods. While the study suggests that LLMs can be a useful tool, there are still many open questions about how to effectively integrate them into research workflows and ensure that they complement and enhance, rather than replace, human interpretation and decision-making.

It would also be valuable to explore how the use of LLMs in thematic analysis might be influenced by factors such as the specific domain or topic being analyzed, the quality and diversity of the training data, and the transparency and interpretability of the LLM's decision-making processes.

Conclusion

This pilot study suggests that large language models (LLMs) can be a promising tool to support thematic analysis of complex and controversial topics. The researchers found both similarities and differences between how human researchers and LLMs categorized excerpts from media coverage of the Australian Robodebt scandal.

The findings indicate that LLMs can be effective in supporting forms of discourse and thematic analysis, but they should be used to complement and enhance human interpretation, rather than replace it entirely. The researchers also introduce a novel "card-based design toolkit" that could help researchers and practitioners further explore the use of LLMs as analytical tools.

Overall, this study contributes to the growing body of research on the use of LLMs as "research assistants" and the effectiveness of LLMs as annotators, providing valuable insights into how these powerful AI systems can be leveraged to support and enhance human research and analysis.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Related Papers

💬

Apprentices to Research Assistants: Advancing Research with Large Language Models

M. Namvarpour, A. Razi

Large Language Models (LLMs) have emerged as powerful tools in various research domains. This article examines their potential through a literature review and firsthand experimentation. While LLMs offer benefits like cost-effectiveness and efficiency, challenges such as prompt tuning, biases, and subjectivity must be addressed. The study presents insights from experiments utilizing LLMs for qualitative analysis, highlighting successes and limitations. Additionally, it discusses strategies for mitigating challenges, such as prompt optimization techniques and leveraging human expertise. This study aligns with the 'LLMs as Research Tools' workshop's focus on integrating LLMs into HCI data work critically and ethically. By addressing both opportunities and challenges, our work contributes to the ongoing dialogue on their responsible application in research.

4/10/2024

cs.HC cs.AI cs.LG

💬

Thematic Analysis with Large Language Models: does it work with languages other than English? A targeted test in Italian

Stefano De Paoli

This paper proposes a test to perform Thematic Analysis (TA) with Large Language Model (LLM) on data which is in a different language than English. While there has been initial promising work on using pre-trained LLMs for TA on data in English, we lack any tests on whether these models can reasonably perform the same analysis with good quality in other language. In this paper a test will be proposed using an open access dataset of semi-structured interviews in Italian. The test shows that a pre-trained model can perform such a TA on the data, also using prompts in Italian. A comparative test shows the model capacity to produce themes which have a good resemblance with those produced independently by human researchers. The main implication of this study is that pre-trained LLMs may thus be suitable to support analysis in multilingual situations, so long as the language is supported by the model used.

4/15/2024

cs.CL

🛸

Analyzing LLM Usage in an Advanced Computing Class in India

Chaitanya Arora, Utkarsh Venaik, Pavit Singh, Sahil Goyal, Jatin Tyagi, Shyama Goel, Ujjwal Singhal, Dhruv Kumar

This paper investigates the usage patterns of undergraduate and graduate students when engaging with large language models (LLMs) to tackle programming assignments in the context of advanced computing courses. Existing work predominantly focuses on the influence of LLMs in introductory programming contexts. Additionally, there is a scarcity of studies analyzing actual conversations between students and LLMs. Our study provides a comprehensive quantitative and qualitative analysis of raw interactions between students and LLMs within an advanced computing course (Distributed Systems) at an Indian University. We further complement this by conducting student interviews to gain deeper insights into their usage patterns. Our study shows that students make use of large language models (LLMs) in various ways: generating code or debugging code by identifying and fixing errors. They also copy and paste assignment descriptions into LLM interfaces for specific solutions, ask conceptual questions about complex programming ideas or theoretical concepts, and generate test cases to check code functionality and robustness. Our analysis includes over 4,000 prompts from 411 students and conducting interviews with 10 students. Our analysis shows that LLMs excel at generating boilerplate code and assisting in debugging, while students handle the integration of components and system troubleshooting. This aligns with the learning objectives of advanced computing courses, which are oriented towards teaching students how to build systems and troubleshoot, with less emphasis on generating code from scratch. Therefore, LLM tools can be leveraged to increase student productivity, as shown by the data we collected. This study contributes to the ongoing discussion on LLM use in education, advocating for their usefulness in advanced computing courses to complement higher-level learning and productivity.

4/9/2024

cs.HC cs.CY

Exploring the Latest LLMs for Leaderboard Extraction

Salomon Kabongo, Jennifer D'Souza, Soren Auer

The rapid advancements in Large Language Models (LLMs) have opened new avenues for automating complex tasks in AI research. This paper investigates the efficacy of different LLMs-Mistral 7B, Llama-2, GPT-4-Turbo and GPT-4.o in extracting leaderboard information from empirical AI research articles. We explore three types of contextual inputs to the models: DocTAET (Document Title, Abstract, Experimental Setup, and Tabular Information), DocREC (Results, Experiments, and Conclusions), and DocFULL (entire document). Our comprehensive study evaluates the performance of these models in generating (Task, Dataset, Metric, Score) quadruples from research papers. The findings reveal significant insights into the strengths and limitations of each model and context type, providing valuable guidance for future AI research automation efforts.

6/10/2024

cs.CL cs.AI