Evaluating the Effectiveness of LLMs in Introductory Computer Science Education: A Semester-Long Field Study

2404.13414

Published 5/6/2024 by Wenhan Lyu (Rachel), Yimeng Wang (Rachel), Tingting (Rachel), Chung, Yifan Sun, Yixuan Zhang

Evaluating the Effectiveness of LLMs in Introductory Computer Science Education: A Semester-Long Field Study

Abstract

The integration of AI assistants, especially through the development of Large Language Models (LLMs), into computer science education has sparked significant debate. An emerging body of work has looked into using LLMs in education, but few have examined the impacts of LLMs on students in entry-level programming courses, particularly in real-world contexts and over extended periods. To address this research gap, we conducted a semester-long, between-subjects study with 50 students using CodeTutor, an LLM-powered assistant developed by our research team. Our study results show that students who used CodeTutor (the experimental group) achieved statistically significant improvements in their final scores compared to peers who did not use the tool (the control group). Within the experimental group, those without prior experience with LLM-powered tools demonstrated significantly greater performance gain than their counterparts. We also found that students expressed positive feedback regarding CodeTutor's capability, though they also had concerns about CodeTutor's limited role in developing critical thinking skills. Over the semester, students' agreement with CodeTutor's suggestions decreased, with a growing preference for support from traditional human teaching assistants. Our analysis further reveals that the quality of user prompts was significantly correlated with CodeTutor's response effectiveness. Building upon our results, we discuss the implications of our findings for integrating Generative AI literacy into curricula to foster critical thinking skills and turn to examining the temporal dynamics of user engagement with LLM-powered tools. We further discuss the discrepancy between the anticipated functions of tools and students' actual capabilities, which sheds light on the need for tailored strategies to improve educational outcomes.

Create account to get full access

Overview

Field study evaluating the effectiveness of Large Language Models (LLMs) in introductory computer science education over an entire semester
Examined how students and instructors interacted with and perceived the use of LLMs for learning and teaching
Provides insights into the opportunities and challenges of integrating LLMs into introductory CS courses

Plain English Explanation

This study looked at how well large language models (LLMs) could be used to help teach introductory computer science (CS) courses. The researchers ran a semester-long field study where they had students and instructors use LLMs for various tasks like answering questions, generating code, and providing feedback. They wanted to understand how effective LLMs were for learning and teaching CS, and what challenges might come up.

The key findings were that LLMs could be helpful tools for certain tasks, like providing explanations and generating code. However, there were also limitations, like LLMs sometimes producing incorrect or biased information. The study also highlighted the importance of instructors guiding students on how to properly use and interpret LLM outputs. Overall, the research provides valuable insights into the potential benefits and drawbacks of integrating LLMs into introductory CS education.

Technical Explanation

The researchers conducted a semester-long field study to evaluate the effectiveness of LLMs in introductory computer science education. They had students and instructors in an introductory CS course use LLM-powered tools for various learning and teaching tasks, such as asking questions, generating code, and providing feedback.

The study design involved collecting both quantitative and qualitative data to assess the impacts of LLM usage. Quantitative metrics included student learning outcomes, while qualitative data came from interviews and surveys to understand perceptions and experiences. The researchers also analyzed the quality and accuracy of the LLM outputs produced during the study.

Key findings included that LLMs could be useful for tasks like providing explanations and generating code snippets. However, the LLM outputs also sometimes contained errors or biases that needed to be addressed. The study also highlighted the important role of instructors in guiding students on how to appropriately use and interpret LLM-generated content.

Critical Analysis

The researchers acknowledge several limitations and areas for further research. For example, the field study was conducted in a single introductory CS course, so the findings may not generalize to other contexts or more advanced CS curricula. Additionally, the study did not explore the long-term impacts of LLM usage on student learning and engagement over multiple semesters.

While the paper provides valuable insights, there are also some potential concerns that could be further explored. For instance, the study does not delve deeply into issues of fairness and bias in the LLM outputs, which could have important implications for equitable access to educational resources.

Additionally, the researchers could have examined the comparative effectiveness of different LLMs for the specific tasks and contexts of introductory CS education. This could help instructors make more informed decisions about which LLM-powered tools to integrate into their courses.

Overall, this study provides a valuable starting point for understanding the role of LLMs in CS education, but there is still much more to explore in terms of the opportunities, challenges, and best practices for effective integration.

Conclusion

This semester-long field study offers important insights into the use of large language models in introductory computer science education. While LLMs can be helpful for certain learning and teaching tasks, the research also highlights the need for careful implementation and instructor guidance to address limitations like inaccuracies and biases in the LLM outputs.

The findings from this study can inform efforts to effectively integrate LLMs into CS curricula and support students' learning. As the use of LLMs continues to expand in education, this research provides a valuable starting point for understanding both the opportunities and challenges that educators may encounter.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Related Papers

👨‍🏫

CS1-LLM: Integrating LLMs into CS1 Instruction

Annapurna Vadaparty, Daniel Zingaro, David H. Smith IV, Mounika Padala, Christine Alvarado, Jamie Gorson Benario, Leo Porter

The recent, widespread availability of Large Language Models (LLMs) like ChatGPT and GitHub Copilot may impact introductory programming courses (CS1) both in terms of what should be taught and how to teach it. Indeed, recent research has shown that LLMs are capable of solving the majority of the assignments and exams we previously used in CS1. In addition, professional software engineers are often using these tools, raising the question of whether we should be training our students in their use as well. This experience report describes a CS1 course at a large research-intensive university that fully embraces the use of LLMs from the beginning of the course. To incorporate the LLMs, the course was intentionally altered to reduce emphasis on syntax and writing code from scratch. Instead, the course now emphasizes skills needed to successfully produce software with an LLM. This includes explaining code, testing code, and decomposing large problems into small functions that are solvable by an LLM. In addition to frequent, formative assessments of these skills, students were given three large, open-ended projects in three separate domains (data science, image processing, and game design) that allowed them to showcase their creativity in topics of their choosing. In an end-of-term survey, students reported that they appreciated learning with the assistance of the LLM and that they interacted with the LLM in a variety of ways when writing code. We provide lessons learned for instructors who may wish to incorporate LLMs into their course.

6/26/2024

cs.CY cs.SE

🛸

Analyzing LLM Usage in an Advanced Computing Class in India

Chaitanya Arora, Utkarsh Venaik, Pavit Singh, Sahil Goyal, Jatin Tyagi, Shyama Goel, Ujjwal Singhal, Dhruv Kumar

This paper investigates the usage patterns of undergraduate and graduate students when engaging with large language models (LLMs) to tackle programming assignments in the context of advanced computing courses. Existing work predominantly focuses on the influence of LLMs in introductory programming contexts. Additionally, there is a scarcity of studies analyzing actual conversations between students and LLMs. Our study provides a comprehensive quantitative and qualitative analysis of raw interactions between students and LLMs within an advanced computing course (Distributed Systems) at an Indian University. We further complement this by conducting student interviews to gain deeper insights into their usage patterns. Our study shows that students make use of large language models (LLMs) in various ways: generating code or debugging code by identifying and fixing errors. They also copy and paste assignment descriptions into LLM interfaces for specific solutions, ask conceptual questions about complex programming ideas or theoretical concepts, and generate test cases to check code functionality and robustness. Our analysis includes over 4,000 prompts from 411 students and conducting interviews with 10 students. Our analysis shows that LLMs excel at generating boilerplate code and assisting in debugging, while students handle the integration of components and system troubleshooting. This aligns with the learning objectives of advanced computing courses, which are oriented towards teaching students how to build systems and troubleshoot, with less emphasis on generating code from scratch. Therefore, LLM tools can be leveraged to increase student productivity, as shown by the data we collected. This study contributes to the ongoing discussion on LLM use in education, advocating for their usefulness in advanced computing courses to complement higher-level learning and productivity.

4/9/2024

cs.HC cs.CY

🚀

Which LLM should I use?: Evaluating LLMs for tasks performed by Undergraduate Computer Science Students

Vibhor Agarwal, Madhav Krishan Garg, Sahiti Dharmavaram, Dhruv Kumar

This study evaluates the effectiveness of various large language models (LLMs) in performing tasks common among undergraduate computer science students. Although a number of research studies in the computing education community have explored the possibility of using LLMs for a variety of tasks, there is a lack of comprehensive research comparing different LLMs and evaluating which LLMs are most effective for different tasks. Our research systematically assesses some of the publicly available LLMs such as Google Bard, ChatGPT(3.5), GitHub Copilot Chat, and Microsoft Copilot across diverse tasks commonly encountered by undergraduate computer science students in India. These tasks include code explanation and documentation, solving class assignments, technical interview preparation, learning new concepts and frameworks, and email writing. Evaluation for these tasks was carried out by pre-final year and final year undergraduate computer science students and provides insights into the models' strengths and limitations. This study aims to guide students as well as instructors in selecting suitable LLMs for any specific task and offers valuable insights on how LLMs can be used constructively by students and instructors.

4/4/2024

cs.CY cs.HC cs.LG

💬

Student Perspectives on Using a Large Language Model (LLM) for an Assignment on Professional Ethics

Virginia Grande, Natalie Kiesler, Maria Andreina Francisco R

The advent of Large Language Models (LLMs) started a serious discussion among educators on how LLMs would affect, e.g., curricula, assessments, and students' competencies. Generative AI and LLMs also raised ethical questions and concerns for computing educators and professionals. This experience report presents an assignment within a course on professional competencies, including some related to ethics, that computing master's students need in their careers. For the assignment, student groups discussed the ethical process by Lennerfors et al. by analyzing a case: a fictional researcher considers whether to attend the real CHI 2024 conference in Hawaii. The tasks were (1) to participate in in-class discussions on the case, (2) to use an LLM of their choice as a discussion partner for said case, and (3) to document both discussions, reflecting on their use of the LLM. Students reported positive experiences with the LLM as a way to increase their knowledge and understanding, although some identified limitations. The LLM provided a wider set of options for action in the studied case, including unfeasible ones. The LLM would not select a course of action, so students had to choose themselves, which they saw as coherent. From the educators' perspective, there is a need for more instruction for students using LLMs: some students did not perceive the tools as such but rather as an authoritative knowledge base. Therefore, this work has implications for educators considering the use of LLMs as discussion partners or tools to practice critical thinking, especially in computing ethics education.

6/19/2024

cs.CY