How explainable AI affects human performance: A systematic review of the behavioural consequences of saliency maps

Read original: arXiv:2404.16042 - Published 4/29/2024 by Romy Muller

🤖

Overview

This paper systematically reviews 68 user studies to investigate the effectiveness of saliency maps in helping humans understand how deep neural networks classify images.
The review found that while saliency maps can enhance human performance in some cases, null effects or even costs are quite common.
The authors analyzed various factors that may modulate these effects, including the nature of the human tasks, AI performance, explanation methods, images, human participants, and comparison conditions.

Plain English Explanation

Saliency maps are visual tools that highlight the parts of an image that a deep learning model "pays attention to" when making a classification. The idea is that these saliency maps can help humans understand how the model is making its decisions, similar to how explaining AI errors can help humans.

However, this review suggests that the benefits of saliency maps for humans may not be as straightforward as one might expect. In some cases, saliency maps did improve human performance on tasks related to understanding the model's behavior. But in many other cases, there was no improvement or even a negative impact.

The researchers looked at a number of factors that might influence whether saliency maps are useful. For example, they found that saliency maps tended to be more beneficial when the task was focused on understanding the model, rather than just classifying the images. They also found that the effects depended on whether the model's prediction was correct or incorrect.

Interestingly, the specific explanation method used to generate the saliency maps didn't seem to have much impact. And the effects were quite variable depending on the comparison conditions used in the studies.

Overall, this review suggests that the value of saliency maps for human understanding is not as clear-cut as it might seem. The benefits appear to depend on the details of the task and the model's performance, rather than just the saliency maps themselves.

Technical Explanation

This systematic review examines 68 user studies that investigated the effectiveness of saliency maps in helping humans understand how deep neural networks classify images. Saliency maps are visual representations that highlight the regions of an image that a model pays the most attention to when making a classification.

The authors organized the empirical outcomes of these studies along several factors, including:

The nature of the human tasks (e.g. AI-focused vs. image-focused)
The performance of the AI model (correct vs. incorrect predictions)
The specific XAI methods used to generate the saliency maps
The characteristics of the images being classified
The demographics of the human participants
The comparison conditions used in the studies

They found that while saliency maps can enhance human performance in some cases, null effects or even costs (i.e. worse performance) are quite common.

In tasks focused on understanding the AI model's behavior, benefits of saliency maps were more common. But in image-focused tasks (e.g. classifying the images themselves), the effects were more mixed and often dependent on the specific cognitive requirements of the task.

Interestingly, the specific XAI method used to generate the saliency maps had surprisingly little impact on the outcomes. The effects were also highly dependent on the comparison conditions used in the studies.

The evidence was more limited for factors related to the images and human participants. But the impacts of these factors tended to be quite variable and context-dependent.

Critical Analysis

This systematic review provides a nuanced and insightful look at the potential benefits and limitations of using saliency maps to help humans understand deep learning models. The authors did a commendable job of carefully analyzing a large body of prior research to identify the key factors that can influence the effectiveness of this explanatory technique.

One strength of the paper is its balanced, objective tone. The authors acknowledge the complexity of the issue and don't overstate the conclusions. They rightly point out that the effects of saliency maps appear to be highly dependent on the specific context and research design.

That said, the review also highlights some gaps in the existing literature that warrant further investigation. For example, the limited evidence on how image and human characteristics impact the usefulness of saliency maps suggests a need for more targeted studies in these areas.

Additionally, while the authors touch on the importance of the comparison conditions used in the studies, they don't delve too deeply into the potential issues with the experimental designs. It would be helpful to see a more critical exploration of methodological factors that could be influencing the inconsistent findings.

Overall, this paper makes a valuable contribution by shedding light on the complex relationship between saliency maps and human understanding of AI systems. The results caution against overly simplistic assumptions about the explanatory power of these visualizations and point to the need for more nuanced, context-dependent approaches to XAI.

Conclusion

This systematic review casts doubt on the assumption that saliency maps are inherently useful for helping humans understand how deep neural networks classify images. While saliency maps can enhance human performance in some cases, the benefits appear to be highly dependent on factors like the specific task, the model's performance, and the comparison conditions used.

The findings suggest that the value of saliency maps may be more limited than previously thought, at least in their current form. Researchers and practitioners should be cautious about relying on saliency maps as a universal solution for explainable AI, and instead focus on developing more contextualized, user-centric approaches to model interpretability.

Overall, this paper highlights the importance of carefully evaluating the real-world effectiveness of XAI techniques through rigorous user studies. It sets the stage for further research to better understand the conditions under which saliency maps and other explanatory tools can meaningfully support human understanding of complex AI systems.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

🤖

How explainable AI affects human performance: A systematic review of the behavioural consequences of saliency maps

Romy Muller

Saliency maps can explain how deep neural networks classify images. But are they actually useful for humans? The present systematic review of 68 user studies found that while saliency maps can enhance human performance, null effects or even costs are quite common. To investigate what modulates these effects, the empirical outcomes were organised along several factors related to the human tasks, AI performance, XAI methods, images to be classified, human participants and comparison conditions. In image-focused tasks, benefits were less common than in AI-focused tasks, but the effects depended on the specific cognitive requirements. Moreover, benefits were usually restricted to incorrect AI predictions in AI-focused tasks but to correct ones in image-focused tasks. XAI-related factors had surprisingly little impact. The evidence was limited for image- and human-related factors and the effects were highly dependent on the comparison conditions. These findings may support the design of future user studies.

4/29/2024

Unraveling the Dilemma of AI Errors: Exploring the Effectiveness of Human and Machine Explanations for Large Language Models

Marvin Pafla, Kate Larson, Mark Hancock

The field of eXplainable artificial intelligence (XAI) has produced a plethora of methods (e.g., saliency-maps) to gain insight into artificial intelligence (AI) models, and has exploded with the rise of deep learning (DL). However, human-participant studies question the efficacy of these methods, particularly when the AI output is wrong. In this study, we collected and analyzed 156 human-generated text and saliency-based explanations collected in a question-answering task (N=40) and compared them empirically to state-of-the-art XAI explanations (integrated gradients, conservative LRP, and ChatGPT) in a human-participant study (N=136). Our findings show that participants found human saliency maps to be more helpful in explaining AI answers than machine saliency maps, but performance negatively correlated with trust in the AI model and explanations. This finding hints at the dilemma of AI errors in explanation, where helpful explanations can lead to lower task performance when they support wrong AI predictions.

4/12/2024

Towards A Comprehensive Visual Saliency Explanation Framework for AI-based Face Recognition Systems

Yuhang Lu, Zewei Xu, Touradj Ebrahimi

Over recent years, deep convolutional neural networks have significantly advanced the field of face recognition techniques for both verification and identification purposes. Despite the impressive accuracy, these neural networks are often criticized for lacking explainability. There is a growing demand for understanding the decision-making process of AI-based face recognition systems. Some studies have investigated the use of visual saliency maps as explanations, but they have predominantly focused on the specific face verification case. The discussion on more general face recognition scenarios and the corresponding evaluation methodology for these explanations have long been absent in current research. Therefore, this manuscript conceives a comprehensive explanation framework for face recognition tasks. Firstly, an exhaustive definition of visual saliency map-based explanations for AI-based face recognition systems is provided, taking into account the two most common recognition situations individually, i.e., face verification and identification. Secondly, a new model-agnostic explanation method named CorrRISE is proposed to produce saliency maps, which reveal both the similar and dissimilar regions between any given face images. Subsequently, the explanation framework conceives a new evaluation methodology that offers quantitative measurement and comparison of the performance of general visual saliency explanation methods in face recognition. Consequently, extensive experiments are carried out on multiple verification and identification scenarios. The results showcase that CorrRISE generates insightful saliency maps and demonstrates superior performance, particularly in similarity maps in comparison with the state-of-the-art explanation approaches.

7/9/2024

🔮

Bridging the Gap Between Saliency Prediction and Image Quality Assessment

Kirillov Alexey, Andrey Moskalenko, Dmitriy Vatolin

Over the past few years, deep neural models have made considerable advances in image quality assessment (IQA). However, the underlying reasons for their success remain unclear, owing to the complex nature of deep neural networks. IQA aims to describe how the human visual system (HVS) works and to create its efficient approximations. On the other hand, Saliency Prediction task aims to emulate HVS via determining areas of visual interest. Thus, we believe that saliency plays a crucial role in human perception. In this work, we conduct an empirical study that reveals the relation between IQA and Saliency Prediction tasks, demonstrating that the former incorporates knowledge of the latter. Moreover, we introduce a novel SACID dataset of saliency-aware compressed images and conduct a large-scale comparison of classic and neural-based IQA methods. All supplementary code and data will be available at the time of publication.

5/9/2024