Concept-based explainability for an EEG transformer model

Read original: arXiv:2307.12745 - Published 8/26/2024 by Anders Gj{o}lbye, William Lehn-Schi{o}ler, 'Ashildur J'onsd'ottir, Bergd'is Arnard'ottir, Lars Kai Hansen

📈

Overview

Deep learning models are complex due to their size, structure, and randomness in training.
Explaining the inner workings of these models is a challenge.
Concept Activation Vectors (CAVs) aim to understand deep models' internal states in terms of human-aligned concepts.
CAVs were first applied to image classification and later adapted to other domains, including natural language processing.
This work attempts to apply CAVs to electroencephalogram (EEG) data for explainability in the BENDR transformer model.

Plain English Explanation

Deep learning models, which are the foundation of many AI systems, can be quite complex. This complexity arises from the massive size of the models, their intricate structure, and the inherent randomness involved in the training process. Explaining how these models arrive at their outputs is a significant challenge.

To address this, researchers introduced a technique called Concept Activation Vectors (CAVs). The idea behind CAVs is to understand the internal states of deep learning models in terms of human-understandable concepts. These concepts are identified as directions in the model's latent space, using linear discriminants.

This method was originally applied to image classification, but has since been adapted to other domains, including natural language processing. In this work, the researchers attempt to apply CAVs to electroencephalogram (EEG) data to explain the inner workings of the BENDR transformer model.

The key challenge in this endeavor is to define the explanatory concepts and select relevant datasets to ground these concepts in the model's latent space. The researchers explore two approaches: using externally labeled EEG datasets, and applying anatomically defined concepts. Both of these methods aim to provide valuable insights into the representations learned by deep EEG models.

Technical Explanation

The researchers in this work attempt to apply the Concept Activation Vectors (CAVs) method to electroencephalogram (EEG) data for explainability in the BENDR transformer model.

CAVs are a technique designed to understand the internal states of deep learning models in terms of human-aligned concepts. These concepts are identified as directions in the model's latent space, using linear discriminants.

The key challenge in this endeavor is to define the explanatory concepts and select relevant datasets to ground these concepts in the model's latent space. The researchers explore two approaches:

Externally Labeled EEG Datasets: This is a straightforward generalization of the methods used in image classification, where the concepts are derived from external dataset annotations.
Anatomically Defined Concepts: This is a novel approach specific to EEG data, where the concepts are based on anatomical brain regions.

The researchers present evidence that both of these approaches to concept formation yield valuable insights into the representations learned by the deep EEG model, BENDR.

Critical Analysis

The researchers in this work have made an important contribution by attempting to apply the Concept Activation Vectors (CAVs) method to electroencephalogram (EEG) data for explainability in the BENDR transformer model.

One potential limitation of the study is the reliance on external datasets and anatomical concepts, which may not fully capture the nuances of the EEG data and the BENDR model. Additionally, the researchers do not provide a detailed comparison of the two concept formation approaches, which could shed light on their relative strengths and weaknesses.

Further research could explore alternative concept formation methods, such as data-driven approaches that identify salient patterns in the EEG data itself. Additionally, it would be valuable to investigate the generalizability of the CAVs method across different EEG-based tasks and models, to better understand its broader applicability.

Overall, this work represents an important step towards improving the explainability of deep learning models in the EEG domain, and the insights gained could have significant implications for knowledge graph and natural language processing applications as well.

Conclusion

This work explores the application of Concept Activation Vectors (CAVs) to electroencephalogram (EEG) data for explainability in the BENDR transformer model. The researchers present two approaches to concept formation: using externally labeled EEG datasets and applying anatomically defined concepts.

The results demonstrate that both of these methods can provide valuable insights into the representations learned by the deep EEG model. This work represents an important step towards improving the explainability of complex deep learning models, with potential implications for a wide range of applications, including knowledge graphs and natural language processing.

However, the study also highlights the need for further research to explore alternative concept formation methods and investigate the generalizability of the CAVs approach across different EEG-based tasks and models. By continuing to advance the field of model explainability, researchers can help build more transparent and trustworthy AI systems.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

📈

Concept-based explainability for an EEG transformer model

Anders Gj{o}lbye, William Lehn-Schi{o}ler, 'Ashildur J'onsd'ottir, Bergd'is Arnard'ottir, Lars Kai Hansen

Deep learning models are complex due to their size, structure, and inherent randomness in training procedures. Additional complexity arises from the selection of datasets and inductive biases. Addressing these challenges for explainability, Kim et al. (2018) introduced Concept Activation Vectors (CAVs), which aim to understand deep models' internal states in terms of human-aligned concepts. These concepts correspond to directions in latent space, identified using linear discriminants. Although this method was first applied to image classification, it was later adapted to other domains, including natural language processing. In this work, we attempt to apply the method to electroencephalogram (EEG) data for explainability in Kostas et al.'s BENDR (2021), a large-scale transformer model. A crucial part of this endeavor involves defining the explanatory concepts and selecting relevant datasets to ground concepts in the latent space. Our focus is on two mechanisms for EEG concept formation: the use of externally labeled EEG datasets, and the application of anatomically defined concepts. The former approach is a straightforward generalization of methods used in image classification, while the latter is novel and specific to EEG. We present evidence that both approaches to concept formation yield valuable insights into the representations learned by deep EEG models.

8/26/2024

Explaining Explainability: Understanding Concept Activation Vectors

Angus Nicolson, Lisa Schut, J. Alison Noble, Yarin Gal

Recent interpretability methods propose using concept-based explanations to translate the internal representations of deep learning models into a language that humans are familiar with: concepts. This requires understanding which concepts are present in the representation space of a neural network. One popular method for finding concepts is Concept Activation Vectors (CAVs), which are learnt using a probe dataset of concept exemplars. In this work, we investigate three properties of CAVs. CAVs may be: (1) inconsistent between layers, (2) entangled with different concepts, and (3) spatially dependent. Each property provides both challenges and opportunities in interpreting models. We introduce tools designed to detect the presence of these properties, provide insight into how they affect the derived explanations, and provide recommendations to minimise their impact. Understanding these properties can be used to our advantage. For example, we introduce spatially dependent CAVs to test if a model is translation invariant with respect to a specific concept and class. Our experiments are performed on ImageNet and a new synthetic dataset, Elements. Elements is designed to capture a known ground truth relationship between concepts and classes. We release this dataset to facilitate further research in understanding and evaluating interpretability methods.

4/8/2024

Knowledge graphs for empirical concept retrieval

Lenka Tv{e}tkov'a, Teresa Karen Scheidt, Maria Mandrup Fogh, Ellen Marie Gaunby J{o}rgensen, Finn {AA}rup Nielsen, Lars Kai Hansen

Concept-based explainable AI is promising as a tool to improve the understanding of complex models at the premises of a given user, viz. as a tool for personalized explainability. An important class of concept-based explainability methods is constructed with empirically defined concepts, indirectly defined through a set of positive and negative examples, as in the TCAV approach (Kim et al., 2018). While it is appealing to the user to avoid formal definitions of concepts and their operationalization, it can be challenging to establish relevant concept datasets. Here, we address this challenge using general knowledge graphs (such as, e.g., Wikidata or WordNet) for comprehensive concept definition and present a workflow for user-driven data collection in both text and image domains. The concepts derived from knowledge graphs are defined interactively, providing an opportunity for personalization and ensuring that the concepts reflect the user's intentions. We test the retrieved concept datasets on two concept-based explainability methods, namely concept activation vectors (CAVs) and concept activation regions (CARs) (Crabbe and van der Schaar, 2022). We show that CAVs and CARs based on these empirical concept datasets provide robust and accurate explanations. Importantly, we also find good alignment between the models' representations of concepts and the structure of knowledge graphs, i.e., human representations. This supports our conclusion that knowledge graph-based concepts are relevant for XAI.

4/11/2024

TextCAVs: Debugging vision models using text

Angus Nicolson, Yarin Gal, J. Alison Noble

Concept-based interpretability methods are a popular form of explanation for deep learning models which provide explanations in the form of high-level human interpretable concepts. These methods typically find concept activation vectors (CAVs) using a probe dataset of concept examples. This requires labelled data for these concepts -- an expensive task in the medical domain. We introduce TextCAVs: a novel method which creates CAVs using vision-language models such as CLIP, allowing for explanations to be created solely using text descriptions of the concept, as opposed to image exemplars. This reduced cost in testing concepts allows for many concepts to be tested and for users to interact with the model, testing new ideas as they are thought of, rather than a delay caused by image collection and annotation. In early experimental results, we demonstrate that TextCAVs produces reasonable explanations for a chest x-ray dataset (MIMIC-CXR) and natural images (ImageNet), and that these explanations can be used to debug deep learning-based models.

8/19/2024