Experiences from Integrating Large Language Model Chatbots into the Classroom

2406.04817

Published 6/10/2024 by Arto Hellas, Juho Leinonen, Leo Leppanen

💬

Abstract

In the present study, we provided students an unfiltered access to a state-of-the-art large language model (LLM) chatbot. The chatbot was intentionally designed to mimic proprietary commercial chatbots such as ChatGPT where the chatbot has not been tailored for the educational context; the underlying engine was OpenAI GPT-4. The chatbot was integrated into online learning materials of three courses. One of the courses focused on software engineering with LLMs, while the two other courses were not directly related to LLMs. Our results suggest that only a minority of students engage with the chatbot in the courses that do not relate to LLMs. At the same time, unsurprisingly, nearly all students in the LLM-focused course leveraged the chatbot. In all courses, the majority of the LLM usage came from a few superusers, whereas the majority of the students did not heavily use the chatbot even though it was readily available and effectively provided a free access to the OpenAI GPT-4 model. We also observe that in addition to students using the chatbot for course-specific purposes, many use the chatbot for their own purposes. These results suggest that the worst fears of educators -- all students overrelying on LLMs -- did not materialize even when the chatbot access was unfiltered. We finally discuss potential reasons for the low usage, suggesting the need for more tailored and scaffolded LLM experiences targeted for specific types of student use cases.

Create account to get full access

Overview

Researchers provided students with unrestricted access to a state-of-the-art large language model (LLM) chatbot
The chatbot was designed to mimic commercial chatbots like ChatGPT
The chatbot was integrated into online learning materials across three courses
One course focused on software engineering with LLMs, while the other two were unrelated

Plain English Explanation

The researchers in this study gave students free and open access to a powerful AI chatbot, similar to ChatGPT. They integrated this chatbot into the online course materials for three different classes. One of the classes specifically covered the topic of using large language models (LLMs) like this chatbot for software engineering tasks. The other two classes were not directly related to LLMs.

The researchers found that in the classes not focused on LLMs, only a small number of students actively used the chatbot. However, in the class about using LLMs for software engineering, almost all the students took advantage of the chatbot. Interestingly, the usage in all the classes was dominated by a few "superusers", while the majority of students did not heavily rely on the chatbot, even though it was freely available.

The researchers also observed that students used the chatbot not just for course-related purposes, but also for their own personal tasks and interests. This suggests that the fears of educators - that students would completely depend on the chatbot and stop learning on their own - did not materialize in this study.

The researchers propose that the low usage in the non-LLM-focused classes may be due to the need for more tailored and scaffolded experiences with these powerful AI chatbots, targeted towards specific student use cases.

Technical Explanation

The researchers set up an experiment where they provided students in three different courses with unrestricted access to a state-of-the-art large language model (LLM) chatbot. The chatbot was designed to mimic the functionality of commercial chatbots like ChatGPT, with the underlying engine being OpenAI's GPT-4.

This chatbot was integrated into the online learning materials for the three courses. One of the courses focused on the use of LLMs in software engineering, while the other two courses were not directly related to LLMs.

The researchers observed that in the courses not focused on LLMs, only a minority of students actively engaged with the chatbot. In contrast, nearly all students in the LLM-focused course leveraged the chatbot. Across all courses, the majority of the chatbot usage came from a small number of "superusers", while most students did not heavily rely on the chatbot, despite its availability.

The researchers also found that students used the chatbot not only for course-specific purposes, but also for their own personal tasks and interests.

Critical Analysis

The researchers acknowledge that the low usage of the chatbot in the non-LLM-focused courses may be due to the need for more tailored and scaffolded experiences with these powerful AI systems. Evaluating the Effectiveness of LLMs in Introductory Computer Science Education and Evaluation of LLM Chatbots for OSINT-based Cyber Threat have highlighted the importance of carefully designing LLM-based educational experiences to ensure they are effective and beneficial for students.

Additionally, the researchers note that the study was conducted in a specific context, and the results may not be generalizable to all educational settings. Perspective Study of Chinese Social Media Regarding LLM has shown that cultural and social factors can influence the perception and adoption of LLMs in different regions.

Further research may be needed to explore the long-term implications of providing unrestricted access to powerful LLM chatbots in educational settings, and to investigate more effective ways of integrating these technologies to support student learning and growth.

Conclusion

This study provides valuable insights into how students engage with a state-of-the-art LLM chatbot when it is made freely available in their courses. The findings suggest that while a small number of students enthusiastically utilize the chatbot, the majority do not heavily rely on it, even in courses where it is directly relevant. This challenges the concern that students will completely depend on LLMs and stop learning on their own.

The researchers highlight the need for more tailored and scaffolded experiences with LLM chatbots in educational contexts, to ensure they are used effectively and in alignment with learning objectives. As ChatGPT is Here to Help, Not to and Beyond Code Generation: An Observational Study of ChatGPT Usage have shown, the integration of these powerful AI systems into education requires careful consideration and design.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Related Papers

👨‍🏫

CS1-LLM: Integrating LLMs into CS1 Instruction

Annapurna Vadaparty, Daniel Zingaro, David H. Smith IV, Mounika Padala, Christine Alvarado, Jamie Gorson Benario, Leo Porter

The recent, widespread availability of Large Language Models (LLMs) like ChatGPT and GitHub Copilot may impact introductory programming courses (CS1) both in terms of what should be taught and how to teach it. Indeed, recent research has shown that LLMs are capable of solving the majority of the assignments and exams we previously used in CS1. In addition, professional software engineers are often using these tools, raising the question of whether we should be training our students in their use as well. This experience report describes a CS1 course at a large research-intensive university that fully embraces the use of LLMs from the beginning of the course. To incorporate the LLMs, the course was intentionally altered to reduce emphasis on syntax and writing code from scratch. Instead, the course now emphasizes skills needed to successfully produce software with an LLM. This includes explaining code, testing code, and decomposing large problems into small functions that are solvable by an LLM. In addition to frequent, formative assessments of these skills, students were given three large, open-ended projects in three separate domains (data science, image processing, and game design) that allowed them to showcase their creativity in topics of their choosing. In an end-of-term survey, students reported that they appreciated learning with the assistance of the LLM and that they interacted with the LLM in a variety of ways when writing code. We provide lessons learned for instructors who may wish to incorporate LLMs into their course.

6/26/2024

cs.CY cs.SE

Evaluating the Effectiveness of LLMs in Introductory Computer Science Education: A Semester-Long Field Study

Wenhan Lyu (Rachel), Yimeng Wang (Rachel), Tingting (Rachel), Chung, Yifan Sun, Yixuan Zhang

The integration of AI assistants, especially through the development of Large Language Models (LLMs), into computer science education has sparked significant debate. An emerging body of work has looked into using LLMs in education, but few have examined the impacts of LLMs on students in entry-level programming courses, particularly in real-world contexts and over extended periods. To address this research gap, we conducted a semester-long, between-subjects study with 50 students using CodeTutor, an LLM-powered assistant developed by our research team. Our study results show that students who used CodeTutor (the experimental group) achieved statistically significant improvements in their final scores compared to peers who did not use the tool (the control group). Within the experimental group, those without prior experience with LLM-powered tools demonstrated significantly greater performance gain than their counterparts. We also found that students expressed positive feedback regarding CodeTutor's capability, though they also had concerns about CodeTutor's limited role in developing critical thinking skills. Over the semester, students' agreement with CodeTutor's suggestions decreased, with a growing preference for support from traditional human teaching assistants. Our analysis further reveals that the quality of user prompts was significantly correlated with CodeTutor's response effectiveness. Building upon our results, we discuss the implications of our findings for integrating Generative AI literacy into curricula to foster critical thinking skills and turn to examining the temporal dynamics of user engagement with LLM-powered tools. We further discuss the discrepancy between the anticipated functions of tools and students' actual capabilities, which sheds light on the need for tailored strategies to improve educational outcomes.

5/6/2024

cs.HC

🏋️

A Perspective Study on Chinese Social Media regarding LLM for Education and Beyond

Yao Tian, Chengwei Tong, Lik-Hang Lee, Reza Hadi Mogavi, Yong Liao, Pengyuan Zhou

The application of AI-powered tools has piqued the interest of many fields, particularly in the academic community. This study uses ChatGPT, currently the most powerful and popular AI tool, as a representative example to analyze how the Chinese public perceives the potential of large language models (LLMs) for educational and general purposes. Although facing accessibility challenges, we found that the number of discussions on ChatGPT per month is 16 times that of Ernie Bot developed by Baidu, the most popular alternative product to ChatGPT in the mainland, making ChatGPT a more suitable subject for our analysis. The study also serves as the first effort to investigate the changes in public opinion as AI technologies become more advanced and intelligent. The analysis reveals that, upon first encounters with advanced AI that was not yet highly capable, some social media users believed that AI advancements would benefit education and society, while others feared that advanced AI, like ChatGPT, would make humans feel inferior and lead to problems such as cheating and a decline in moral principles. The majority of users remained neutral. Interestingly, with the rapid development and improvement of AI capabilities, public attitudes have tended to shift in a positive direction. We present a thorough analysis of the trending shift and a roadmap to ensure the ethical application of ChatGPT-like models in education and beyond.

6/3/2024

cs.CY cs.HC

Evaluation of LLM Chatbots for OSINT-based Cyber Threat Awareness

Samaneh Shafee, Alysson Bessani, Pedro M. Ferreira

Knowledge sharing about emerging threats is crucial in the rapidly advancing field of cybersecurity and forms the foundation of Cyber Threat Intelligence (CTI). In this context, Large Language Models are becoming increasingly significant in the field of cybersecurity, presenting a wide range of opportunities. This study surveys the performance of ChatGPT, GPT4all, Dolly, Stanford Alpaca, Alpaca-LoRA, Falcon, and Vicuna chatbots in binary classification and Named Entity Recognition (NER) tasks performed using Open Source INTelligence (OSINT). We utilize well-established data collected in previous research from Twitter to assess the competitiveness of these chatbots when compared to specialized models trained for those tasks. In binary classification experiments, Chatbot GPT-4 as a commercial model achieved an acceptable F1 score of 0.94, and the open-source GPT4all model achieved an F1 score of 0.90. However, concerning cybersecurity entity recognition, all evaluated chatbots have limitations and are less effective. This study demonstrates the capability of chatbots for OSINT binary classification and shows that they require further improvement in NER to effectively replace specially trained models. Our results shed light on the limitations of the LLM chatbots when compared to specialized models, and can help researchers improve chatbots technology with the objective to reduce the required effort to integrate machine learning in OSINT-based CTI tools.

4/22/2024

cs.CR cs.CL cs.LG