Inclusive Design Insights from a Preliminary Image-Based Conversational Search Systems Evaluation

Read original: arXiv:2403.19899 - Published 4/1/2024 by Yue Zheng, Lei Yu, Junmian Chen, Tianyu Xia, Yuanyuan Yin, Shan Wang, Haiming Liu

Inclusive Design Insights from a Preliminary Image-Based Conversational Search Systems Evaluation

Introduction

The paper discusses the development of conversational search engines that cater to users with disabilities or cognitive impairments. These users often face challenges in using standard conversational search engines due to linguistic limitations and lack of awareness about the engine's intricacies.

The authors propose text-based, image-based, and mixed search engines with various tasks assigned to 21 participants. By integrating sensor technologies to capture physiological signals, the aim is to determine the optimal search strategy for particular user groups, fostering a more inclusive and accessible search interface.

The key component of this strategy is the conversion of textual results from conversational search engines into visual formats, which facilitates information understanding. Simultaneously, sensors are utilized to naturally record user input, removing the need for human intervention.

The research centers on an image-based conversational search system that leverages sensor data, such as gestures and eye movements, to gauge user satisfaction. This feedback refines subsequent searches, fostering a more adaptive and user-centric experience.

The objective of this paradigm shift is to increase the accessibility of information, particularly for individuals with disabilities, thereby improving their independence and expanding their knowledge.

Related Work

The provided text discusses recent research on the evolution and utility of voice assistants and conversational interfaces. It highlights the emergence and significance of popular voice assistants like Alexa, Siri, and Cortana, and their integration across various sectors. The text also explores spoken conversational search, which examines user interactions in speech-only search tasks.

Research from 2017 and 2018 provides a comprehensive overview of how people interact with these systems and the challenges involved. A key area of research has been the usability and effectiveness of conversational interfaces. One study assessed the System Usability Scale's applicability for evaluating voice-based user interfaces, emphasizing the nuances of voice interactions.

Another study explored user engagement with chatbots during collaborative searches, presenting findings on how chatbots can enhance the search experience. Conversational search systems have been studied not only for individual use but also for collaborative scenarios. One approach focused on embedding search into conversational platforms to support collaborative search, aiming to improve user interaction and data retrieval.

Additionally, research has delved into voice query clarification, highlighting the need for refining voice-based search queries to improve search outcomes.

Experiment Setup

The paper describes the system design and experiment design for an image-based and mixed conversational search engine. Key points:

System Design:

The system integrates several well-established APIs like AWS Rekognition for image analysis, ChatGPT for natural language processing, and the ARASAAC pictogram dataset.
The frontend has three segments: text-based search, image-based search, and a hybrid system accepting images and responding with images/text.
The backend uses OpenAI services for text conversations and a separate component for image-text interactions and Google API image search.
The user interface was carefully designed based on user feedback, adopting Material Design principles from Google. It has distinct UIs for text, image, and mixed input/output modes.
Image recognition accuracy was improved by matching input images to the ARASAAC symbol library instead of generating sentences from labels.

Experiment Design:

20 participants were involved in the study, mainly students aged 23-25.
Participants interacted with the search engine in text, image, and mixed modes to complete 3 movie-related tasks in randomized order.
Sensor data (camera, eye-tracker, electrodes) was collected during the experiment.
After the tasks, participants filled out a questionnaire to provide feedback on the different search modes and their suitability for intellectual disabilities.

The paper details the system architecture, user interface design considerations, and the structured experiment flow to evaluate the conversational search engine.

Findings and Discussion

The paper discusses an experiment involving 20 valid data sets of skin conductivity, heart rate, eye movement, and facial emotion metrics collected from participants. The data was captured using the iMotions software with millisecond precision. The objective of the study is to analyze users' physiological data during decision-making processes in search operations, specifically to facilitate individuals with physical disabilities in executing searches via sensors.

The researchers focused on analyzing sensor data fluctuations within the 40 milliseconds post-mouse movement, utilizing the average data during this period to reflect users' physiological traits. The challenge lay in determining the commencement point of this interval.

The study examined the average Galvanic Skin Response (GSR) levels of users across different search engine modes. GSR, an indicator of physiological activation and emotional state, is derived as the mean value over the user's interaction period. It is measured in microsiemens (μS) to represent conductivity, which is used to signify variations in skin conductivity, primarily attributed to perspiration.

Figure 2: Variation of data in different modes

The paper examines the correlation between physiological parameters of participants and their emotional states when using different search engine display modes (text, image, or mixed). The study found that:

Heart rate was not significantly affected by the display mode.
Galvanic skin response (GSR), eye fixation time, and engagement mood exhibited notable changes across modes.
- GSR decreased in Image mode but increased in Mixed mode, suggesting images alone may reduce stress/cognitive load, while a combined display may increase them.
- Eye fixation time increased in Image mode, indicating comprehension challenges requiring longer focus.
- Mixed mode mitigated this comprehension issue.
- Facial expressions shifted notably in Image and Mixed modes, possibly due to an initial learning curve with image formats.
Within the first 40 milliseconds of mouse movement, physiological signal changes were observed, potentially indicating decision-making processes during searches.
The paper analyzed physiological data within this 40ms timeframe to validate if it could signify decision-making during searches.

In summary, the display mode affected users' physiological responses like GSR, eye movements, and facial expressions, with implications for cognitive load, stress levels, and decision-making processes during search tasks.

Figure 3: Rate of change in icon form

The paper presents the results of a study evaluating an image-based conversational search system. The key findings are:

Physiological signals like eye movements and gestures indicated high user engagement with the image-based system. Users frequently displayed positive feedback actions like nodding in agreement.

Many users preferred the text-based and mixed (text + image) systems over the image-only system. However, they recognized the potential of conversational search systems, especially for individuals with intellectual disabilities.

The image-based system provided an intuitive interface, reducing cognitive load. Users comprehended search results faster compared to text-based outputs. The adaptive feedback loop further enhanced user satisfaction.

The system shows promise for revolutionizing search experiences for users with disabilities. The visual approach caters to those with linguistic or cognitive challenges, while sensor integration allows gesture-based feedback, promoting inclusivity for physically disabled users.

Compared to text-based systems, the image-based approach transcends language barriers through universal visual cues. The sensor-based feedback mechanism provides a more intuitive experience than text-based methods.

While the image-based system excelled in inclusivity and engagement, some users familiar with traditional systems initially struggled to adapt to the novel approach.

Overall, the findings highlight the image-based conversational search system's potential as an inclusive and user-centric innovation in information retrieval, driven by its adaptive capabilities and sensor feedback.

Conclusion and Future Recommendations

Here is a summary of the key points from the provided text:

The project aimed to create an intuitive user interface focused on ease of use and a user-centric experience. A popular frontend framework and material design style were used for the frontend development. The iMotions application and sensors were used for data collection and user experience analysis. A major achievement was the design and implementation of a conversational search system. The backend was robust, enabling fast data acquisition and processing. The user interface development prioritized a natural, dialogue-centric design using modern frameworks and the material design aesthetic for an intuitive user interaction. Data structure optimization enabled quick searches and rendering for timely user feedback. The interface worked across diverse devices.

System evaluation provided insights into performance and user experience through varied task levels and a counterbalancing strategy. Emotion analysis using iMotions revealed strengths and areas for improvement. User feedback was positive, highlighting the system's potential for individuals with intellectual challenges.

The text-based search system was intuitive, while the image-based system demanded more cognitive effort. The hybrid text and image system received the most engagement, indicating enhanced interactivity. Participant feedback preferred the text-based and hybrid systems and affirmed the value for assisting those with intellectual disabilities.

While each system had strengths, the hybrid conversational search system offered an optimal balance of clarity and engagement, underscoring its potential for improving technological accessibility for individuals with disabilities.

Future work includes fine-tuning algorithms for accurate search results aligned with user preferences, expanding emotion analysis with more sensors and emotions, enhancing accessibility features for cognitive disabilities, and adding a real-time feedback mechanism to adapt to changing user needs.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

Inclusive Design Insights from a Preliminary Image-Based Conversational Search Systems Evaluation

Yue Zheng, Lei Yu, Junmian Chen, Tianyu Xia, Yuanyuan Yin, Shan Wang, Haiming Liu

The digital realm has witnessed the rise of various search modalities, among which the Image-Based Conversational Search System stands out. This research delves into the design, implementation, and evaluation of this specific system, juxtaposing it against its text-based and mixed counterparts. A diverse participant cohort ensures a broad evaluation spectrum. Advanced tools facilitate emotion analysis, capturing user sentiments during interactions, while structured feedback sessions offer qualitative insights. Results indicate that while the text-based system minimizes user confusion, the image-based system presents challenges in direct information interpretation. However, the mixed system achieves the highest engagement, suggesting an optimal blend of visual and textual information. Notably, the potential of these systems, especially the image-based modality, to assist individuals with intellectual disabilities is highlighted. The study concludes that the Image-Based Conversational Search System, though challenging in some aspects, holds promise, especially when integrated into a mixed system, offering both clarity and engagement.

4/1/2024

Towards Investigating Biases in Spoken Conversational Search

Sachin Pathiyan Cherumanal, Falk Scholer, Johanne R. Trippas, Damiano Spina

Voice-based systems like Amazon Alexa, Google Assistant, and Apple Siri, along with the growing popularity of OpenAI's ChatGPT and Microsoft's Copilot, serve diverse populations, including visually impaired and low-literacy communities. This reflects a shift in user expectations from traditional search to more interactive question-answering models. However, presenting information effectively in voice-only channels remains challenging due to their linear nature. This limitation can impact the presentation of complex queries involving controversial topics with multiple perspectives. Failing to present diverse viewpoints may perpetuate or introduce biases and affect user attitudes. Balancing information load and addressing biases is crucial in designing a fair and effective voice-based system. To address this, we (i) review how biases and user attitude changes have been studied in screen-based web search, (ii) address challenges in studying these changes in voice-based settings like SCS, (iii) outline research questions, and (iv) propose an experimental setup with variables, data, and instruments to explore biases in a voice-based setting like Spoken Conversational Search.

9/4/2024

💬

Towards Detecting and Mitigating Cognitive Bias in Spoken Conversational Search

Kaixin Ji, Sachin Pathiyan Cherumanal, Johanne R. Trippas, Danula Hettiachchi, Flora D. Salim, Falk Scholer, Damiano Spina

Instruments such as eye-tracking devices have contributed to understanding how users interact with screen-based search engines. However, user-system interactions in audio-only channels -- as is the case for Spoken Conversational Search (SCS) -- are harder to characterize, given the lack of instruments to effectively and precisely capture interactions. Furthermore, in this era of information overload, cognitive bias can significantly impact how we seek and consume information -- especially in the context of controversial topics or multiple viewpoints. This paper draws upon insights from multiple disciplines (including information seeking, psychology, cognitive science, and wearable sensors) to provoke novel conversations in the community. To this end, we discuss future opportunities and propose a framework including multimodal instruments and methods for experimental designs and settings. We demonstrate preliminary results as an example. We also outline the challenges and offer suggestions for adopting this multimodal approach, including ethical considerations, to assist future researchers and practitioners in exploring cognitive biases in SCS.

8/9/2024

Engineering Conversational Search Systems: A Review of Applications, Architectures, and Functional Components

Phillip Schneider, Wessel Poelman, Michael Rovatsos, Florian Matthes

Conversational search systems enable information retrieval via natural language interactions, with the goal of maximizing users' information gain over multiple dialogue turns. The increasing prevalence of conversational interfaces adopting this search paradigm challenges traditional information retrieval approaches, stressing the importance of better understanding the engineering process of developing these systems. We undertook a systematic literature review to investigate the links between theoretical studies and technical implementations of conversational search systems. Our review identifies real-world application scenarios, system architectures, and functional components. We consolidate our results by presenting a layered architecture framework and explaining the core functions of conversational search systems. Furthermore, we reflect on our findings in light of the rapid progress in large language models, discussing their capabilities, limitations, and directions for future research.

7/2/2024