Automated Assessment of Multimodal Answer Sheets in the STEM domain

Read original: arXiv:2409.15749 - Published 9/25/2024 by Rajlaxmi Patil, Aditya Ashutosh Kulkarni, Ruturaj Ghatage, Sharvi Endait, Geetanjali Kale, Raviraj Joshi

Automated Assessment of Multimodal Answer Sheets in the STEM domain

Overview

This paper presents an automated system for assessing multimodal answer sheets in STEM (Science, Technology, Engineering, and Mathematics) domains.
The system combines text extraction and computer vision techniques to analyze handwritten and drawn responses on answer sheets.
The goal is to develop a scalable and consistent grading solution that can support next-generation science assessments.

Plain English Explanation

The paper describes a system that can automatically grade student answer sheets in STEM subjects. These answer sheets often contain a mix of written text and hand-drawn diagrams or illustrations. The researchers developed a way to combine natural language processing to extract the written content with computer vision techniques to analyze the visual elements.

The goal is to create a more scalable and consistent grading process, rather than having human graders assess each answer sheet individually. This could be especially helpful for large-scale science assessments that include a variety of response types. By automating the grading process, the system aims to reduce the potential for human error or bias.

Technical Explanation

The researchers developed a multimodal assessment system that combines several key components:

Text extraction using a large language model (Mistral-7B) to recognize and transcribe the handwritten text on the answer sheets.
Computer vision analysis using the YoloV5 object detection model to identify and localize visual elements like diagrams, illustrations, and equations.
A scoring mechanism that integrates the text and visual information to provide an overall assessment of the student's response.

The system was trained and evaluated on a dataset called CRAFT, which contains multimodal answer sheets covering various STEM topics. The researchers demonstrate the effectiveness of their approach through extensive experiments and comparisons to human graders.

Critical Analysis

The paper presents a promising approach for automating the assessment of multimodal answer sheets in STEM domains. However, the researchers acknowledge several limitations and areas for further research:

The system's performance may be influenced by the quality and diversity of the training data (CRAFT dataset). Expanding the dataset or incorporating more robust data augmentation techniques could improve generalization.
The current scoring mechanism relies on a simple integration of the text and visual assessments. Exploring more sophisticated fusion methods or neural models that can jointly reason about the multimodal content may lead to more accurate and nuanced evaluations.
The system's ability to handle complex or open-ended questions, where the responses may not follow a predetermined structure, remains an area for further investigation.

Overall, the research demonstrates the potential of combining natural language processing and computer vision techniques to automate the assessment of multimodal STEM knowledge. However, continued advancements in both the technical capabilities and the understanding of multimodal learning will be essential for the widespread adoption of such systems in educational settings.

Conclusion

The paper presents an automated system for assessing multimodal answer sheets in STEM domains, leveraging text extraction and computer vision techniques to provide a scalable and consistent grading solution. This could significantly benefit next-generation science assessments that require the evaluation of a diverse range of student responses. While the research shows promising results, ongoing advancements in multimodal learning and understanding will be crucial for the further development and deployment of such automated assessment systems in educational settings.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

Automated Assessment of Multimodal Answer Sheets in the STEM domain

Rajlaxmi Patil, Aditya Ashutosh Kulkarni, Ruturaj Ghatage, Sharvi Endait, Geetanjali Kale, Raviraj Joshi

In the domain of education, the integration of,technology has led to a transformative era, reshaping traditional,learning paradigms. Central to this evolution is the automation,of grading processes, particularly within the STEM domain encompassing Science, Technology, Engineering, and Mathematics.,While efforts to automate grading have been made in subjects,like Literature, the multifaceted nature of STEM assessments,presents unique challenges, ranging from quantitative analysis,to the interpretation of handwritten diagrams. To address these,challenges, this research endeavors to develop efficient and reliable grading methods through the implementation of automated,assessment techniques using Artificial Intelligence (AI). Our,contributions lie in two key areas: firstly, the development of a,robust system for evaluating textual answers in STEM, leveraging,sample answers for precise comparison and grading, enabled by,advanced algorithms and natural language processing techniques.,Secondly, a focus on enhancing diagram evaluation, particularly,flowcharts, within the STEM context, by transforming diagrams,into textual representations for nuanced assessment using a,Large Language Model (LLM). By bridging the gap between,visual representation and semantic meaning, our approach ensures accurate evaluation while minimizing manual intervention.,Through the integration of models such as CRAFT for text,extraction and YoloV5 for object detection, coupled with LLMs,like Mistral-7B for textual evaluation, our methodology facilitates,comprehensive assessment of multimodal answer sheets. This,paper provides a detailed account of our methodology, challenges,encountered, results, and implications, emphasizing the potential,of AI-driven approaches in revolutionizing grading practices in,STEM education.

9/25/2024

🤖

AI and Machine Learning for Next Generation Science Assessments

Xiaoming Zhai

This chapter focuses on the transformative role of Artificial Intelligence (AI) and Machine Learning (ML) in science assessments. The paper begins with a discussion of the Framework for K-12 Science Education, which calls for a shift from conceptual learning to knowledge-in-use. This shift necessitates the development of new types of assessments that align with the Framework's three dimensions: science and engineering practices, disciplinary core ideas, and crosscutting concepts. The paper further highlights the limitations of traditional assessment methods like multiple-choice questions, which often fail to capture the complexities of scientific thinking and three-dimensional learning in science. It emphasizes the need for performance-based assessments that require students to engage in scientific practices like modeling, explanation, and argumentation. The paper achieves three major goals: reviewing the current state of ML-based assessments in science education, introducing a framework for scoring accuracy in ML-based automatic assessments, and discussing future directions and challenges. It delves into the evolution of ML-based automatic scoring systems, discussing various types of ML, like supervised, unsupervised, and semi-supervised learning. These systems can provide timely and objective feedback, thus alleviating the burden on teachers. The paper concludes by exploring pre-trained models like BERT and finetuned ChatGPT, which have shown promise in assessing students' written responses effectively.

5/14/2024

Can AI Assistance Aid in the Grading of Handwritten Answer Sheets?

Pritam Sil, Parag Chaudhuri, Bhaskaran Raman

With recent advancements in artificial intelligence (AI), there has been growing interest in using state of the art (SOTA) AI solutions to provide assistance in grading handwritten answer sheets. While a few commercial products exist, the question of whether AI-assistance can actually reduce grading effort and time has not yet been carefully considered in published literature. This work introduces an AI-assisted grading pipeline. The pipeline first uses text detection to automatically detect question regions present in a question paper PDF. Next, it uses SOTA text detection methods to highlight important keywords present in the handwritten answer regions of scanned answer sheets to assist in the grading process. We then evaluate a prototype implementation of the AI-assisted grading pipeline deployed on an existing e-learning management platform. The evaluation involves a total of 5 different real-life examinations across 4 different courses at a reputed institute; it consists of a total of 42 questions, 17 graders, and 468 submissions. We log and analyze the grading time for each handwritten answer while using AI assistance and without it. Our evaluations have shown that, on average, the graders take 31% less time while grading a single response and 33% less grading time while grading a single answer sheet using AI assistance.

8/26/2024

Beyond human subjectivity and error: a novel AI grading system

Alexandra Gobrecht, Felix Tuma, Moritz Moller, Thomas Zoller, Mark Zakhvatkin, Alexandra Wuttig, Holger Sommerfeldt, Sven Schutt

The grading of open-ended questions is a high-effort, high-impact task in education. Automating this task promises a significant reduction in workload for education professionals, as well as more consistent grading outcomes for students, by circumventing human subjectivity and error. While recent breakthroughs in AI technology might facilitate such automation, this has not been demonstrated at scale. It this paper, we introduce a novel automatic short answer grading (ASAG) system. The system is based on a fine-tuned open-source transformer model which we trained on large set of exam data from university courses across a large range of disciplines. We evaluated the trained model's performance against held-out test data in a first experiment and found high accuracy levels across a broad spectrum of unseen questions, even in unseen courses. We further compared the performance of our model with that of certified human domain experts in a second experiment: we first assembled another test dataset from real historical exams - the historic grades contained in that data were awarded to students in a regulated, legally binding examination process; we therefore considered them as ground truth for our experiment. We then asked certified human domain experts and our model to grade the historic student answers again without disclosing the historic grades. Finally, we compared the hence obtained grades with the historic grades (our ground truth). We found that for the courses examined, the model deviated less from the official historic grades than the human re-graders - the model's median absolute error was 44 % smaller than the human re-graders', implying that the model is more consistent than humans in grading. These results suggest that leveraging AI enhanced grading can reduce human subjectivity, improve consistency and thus ultimately increase fairness.

5/8/2024