Exploring the use of Generative AI to Support Automated Just-in-Time Programming for Visual Scene Displays

Read original: arXiv:2408.11137 - Published 8/22/2024 by Cynthia Zastudil, Christine Holyfield, Christine Kapp, Xandria Crosland, Elizabeth Lorah, Tara Zimmerman, Stephen MacNeil
Total Score

0

Exploring the use of Generative AI to Support Automated Just-in-Time Programming for Visual Scene Displays

Sign in to get full access

or

If you already have an account, we'll log you in

Overview

  • Explores the use of generative AI to automate the creation of visual scene displays (VSDs) for individuals with autism and other communication needs.
  • Proposes a "just-in-time" programming approach to generate VSDs based on the user's context and needs.
  • Investigates the feasibility and effectiveness of this approach through two user studies.

Plain English Explanation

This research paper looks at how generative AI can be used to automatically create visual scene displays (VSDs) for people with autism and other communication challenges. VSDs are interactive visual aids that can help individuals express their thoughts and needs.

The researchers developed a "just-in-time" approach, where the AI system generates VSDs based on the user's current context and requirements. This could allow for more personalized and relevant visual aids, rather than relying on pre-made displays.

The team conducted two user studies to test the feasibility and effectiveness of this generative AI approach for VSDs. They wanted to see how well the system could create useful visuals and if users found them helpful for communication.

Technical Explanation

The researchers proposed a just-in-time programming approach to automatically generate visual scene displays (VSDs) using generative AI models. This involves training the AI system to create VSDs tailored to the user's specific needs and situation, rather than relying on pre-designed displays.

The first user study evaluated the system's ability to generate relevant VSDs based on textual descriptions of the user's communicative intent. Participants were asked to provide feedback on the usefulness and accuracy of the generated VSDs. The results showed that the system was able to create VSDs that were deemed relevant and helpful by the users.

The second study explored the integration of the generative VSD system into a multimodal language model that could understand the user's context and generate appropriate visuals in real-time. This aimed to create a more seamless and adaptive communication experience for the user.

The findings suggest that the use of generative AI can be a promising approach for automating the creation of personalized VSDs to support individuals with autism and other communication needs.

Critical Analysis

The research presented several promising results, but also highlighted some limitations and areas for further exploration. While the generative AI system was able to create relevant VSDs, the paper noted that the quality and accuracy of the generated visuals could still be improved. Additional research is needed to refine the models and enhance the user experience.

Furthermore, the integration of the generative VSD system with a multimodal language model was only explored in a relatively limited capacity. More extensive testing and evaluation are required to fully assess the feasibility and benefits of this approach in real-world settings.

The paper also acknowledged the need for broader accessibility considerations, such as ensuring the system is compatible with assistive technologies and meets the diverse needs of the target user group. Ongoing collaboration with end-users and accessibility experts will be crucial for further development and deployment.

Conclusion

This research represents an important step towards leveraging generative AI to support automated, personalized, and context-aware visual aids for individuals with autism and other communication challenges. The findings suggest that this approach has the potential to enhance the accessibility and effectiveness of visual scene displays, empowering users to better express their thoughts, needs, and experiences.

However, further refinement, integration, and evaluation are necessary to fully realize the benefits of this technology. Continued collaboration with the target user community and accessibility experts will be crucial to ensure the system meets the diverse needs of individuals with communication difficulties. Overall, this research opens up new avenues for exploring the applications of generative AI in assistive technologies and inclusive design.



This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

Exploring the use of Generative AI to Support Automated Just-in-Time Programming for Visual Scene Displays
Total Score

0

Exploring the use of Generative AI to Support Automated Just-in-Time Programming for Visual Scene Displays

Cynthia Zastudil, Christine Holyfield, Christine Kapp, Xandria Crosland, Elizabeth Lorah, Tara Zimmerman, Stephen MacNeil

Millions of people worldwide rely on alternative and augmentative communication devices to communicate. Visual scene displays (VSDs) can enhance communication for these individuals by embedding communication options within contextualized images. However, existing VSDs often present default images that may lack relevance or require manual configuration, placing a significant burden on communication partners. In this study, we assess the feasibility of leveraging large multimodal models (LMM), such as GPT-4V, to automatically create communication options for VSDs. Communication options were sourced from a LMM and speech-language pathologists (SLPs) and AAC researchers (N=13) for evaluation through an expert assessment conducted by the SLPs and AAC researchers. We present the study's findings, supplemented by insights from semi-structured interviews (N=5) about SLP's and AAC researchers' opinions on the use of generative AI in augmentative and alternative communication devices. Our results indicate that the communication options generated by the LMM were contextually relevant and often resembled those created by humans. However, vital questions remain that must be addressed before LMMs can be confidently implemented in AAC devices.

Read more

8/22/2024

📈

Total Score

0

Emerging Practices for Large Multimodal Model (LMM) Assistance for People with Visual Impairments: Implications for Design

Jingyi Xie, Rui Yu, He Zhang, Sooyeon Lee, Syed Masum Billah, John M. Carroll

People with visual impairments perceive their environment non-visually and often use AI-powered assistive tools to obtain textual descriptions of visual information. Recent large vision-language model-based AI-powered tools like Be My AI are more capable of understanding users' inquiries in natural language and describing the scene in audible text; however, the extent to which these tools are useful to visually impaired users is currently understudied. This paper aims to fill this gap. Our study with 14 visually impaired users reveals that they are adapting these tools organically -- not only can these tools facilitate complex interactions in household, spatial, and social contexts, but they also act as an extension of users' cognition, as if the cognition were distributed in the visual information. We also found that although the tools are currently not goal-oriented, users accommodate this limitation and embrace the tools' capabilities for broader use. These findings enable us to envision design implications for creating more goal-oriented, real-time processing, and reliable AI-powered assistive technology.

Read more

7/15/2024

An Early Investigation into the Utility of Multimodal Large Language Models in Medical Imaging
Total Score

0

An Early Investigation into the Utility of Multimodal Large Language Models in Medical Imaging

Sulaiman Khan, Md. Rafiul Biswas, Alina Murad, Hazrat Ali, Zubair Shah

Recent developments in multimodal large language models (MLLMs) have spurred significant interest in their potential applications across various medical imaging domains. On the one hand, there is a temptation to use these generative models to synthesize realistic-looking medical image data, while on the other hand, the ability to identify synthetic image data in a pool of data is also significantly important. In this study, we explore the potential of the Gemini (textit{gemini-1.0-pro-vision-latest}) and GPT-4V (gpt-4-vision-preview) models for medical image analysis using two modalities of medical image data. Utilizing synthetic and real imaging data, both Gemini AI and GPT-4V are first used to classify real versus synthetic images, followed by an interpretation and analysis of the input images. Experimental results demonstrate that both Gemini and GPT-4 could perform some interpretation of the input images. In this specific experiment, Gemini was able to perform slightly better than the GPT-4V on the classification task. In contrast, responses associated with GPT-4V were mostly generic in nature. Our early investigation presented in this work provides insights into the potential of MLLMs to assist with the classification and interpretation of retinal fundoscopy and lung X-ray images. We also identify key limitations associated with the early investigation study on MLLMs for specialized tasks in medical image analysis.

Read more

6/4/2024

Taking the Next Step with Generative Artificial Intelligence: The Transformative Role of Multimodal Large Language Models in Science Education
Total Score

0

Taking the Next Step with Generative Artificial Intelligence: The Transformative Role of Multimodal Large Language Models in Science Education

Arne Bewersdorff, Christian Hartmann, Marie Hornberger, Kathrin Se{ss}ler, Maria Bannert, Enkelejda Kasneci, Gjergji Kasneci, Xiaoming Zhai, Claudia Nerdel

The integration of Artificial Intelligence (AI), particularly Large Language Model (LLM)-based systems, in education has shown promise in enhancing teaching and learning experiences. However, the advent of Multimodal Large Language Models (MLLMs) like GPT-4 with vision (GPT-4V), capable of processing multimodal data including text, sound, and visual inputs, opens a new era of enriched, personalized, and interactive learning landscapes in education. Grounded in theory of multimedia learning, this paper explores the transformative role of MLLMs in central aspects of science education by presenting exemplary innovative learning scenarios. Possible applications for MLLMs could range from content creation to tailored support for learning, fostering competencies in scientific practices, and providing assessment and feedback. These scenarios are not limited to text-based and uni-modal formats but can be multimodal, increasing thus personalization, accessibility, and potential learning effectiveness. Besides many opportunities, challenges such as data protection and ethical considerations become more salient, calling for robust frameworks to ensure responsible integration. This paper underscores the necessity for a balanced approach in implementing MLLMs, where the technology complements rather than supplants the educator's role, ensuring thus an effective and ethical use of AI in science education. It calls for further research to explore the nuanced implications of MLLMs on the evolving role of educators and to extend the discourse beyond science education to other disciplines. Through the exploration of potentials, challenges, and future implications, we aim to contribute to a preliminary understanding of the transformative trajectory of MLLMs in science education and beyond.

Read more

9/5/2024