MindSet: Vision. A toolbox for testing DNNs on key psychological experiments

Read original: arXiv:2404.05290 - Published 4/9/2024 by Valerio Biscione, Dong Yin, Gaurav Malhotra, Marin Dujmovic, Milton L. Montero, Guillermo Puebla, Federico Adolfi, Rachel F. Heaton, John E. Hummel, Benjamin D. Evans and 2 others

MindSet: Vision. A toolbox for testing DNNs on key psychological experiments

Overview

• This paper introduces MindSet: Vision, a toolbox for testing Deep Neural Networks (DNNs) on key psychological experiments. • The toolbox aims to assess the cognitive capabilities of DNNs by evaluating their performance on a diverse set of psychological tasks. • The authors explore the relationship between DNN architectures and their ability to capture human-like cognitive processes.

Plain English Explanation

The researchers have created a new tool called MindSet: Vision that allows them to test how well different AI systems, specifically deep neural networks (DNNs), can perform on a variety of psychological experiments. The goal is to better understand the cognitive capabilities of these AI systems and how they compare to human cognitive abilities.

By evaluating the performance of DNNs on tasks that are designed to measure different aspects of human cognition, the researchers hope to gain insights into the strengths and limitations of these AI systems. This could help us better understand the nature of human intelligence and how it differs from the way AI systems process information.

The Insights from Use of Previously Unseen Neural Architectures and Concept-Based Analysis of Neural Networks via Vision papers provide relevant context on evaluating the cognitive capabilities of AI systems.

Technical Explanation

The paper introduces MindSet: Vision, a toolbox for testing the performance of Deep Neural Networks (DNNs) on a wide range of psychological experiments. The authors aim to use this toolbox to assess the cognitive capabilities of DNNs and explore the relationship between DNN architectures and their ability to capture human-like cognitive processes.

The toolbox includes a diverse set of psychological tasks, such as visual perception, decision-making, and memory. By evaluating the performance of different DNN architectures on these tasks, the researchers hope to gain insights into the strengths and limitations of these AI systems in terms of their cognitive capabilities.

The How Much Data Are Enough? Investigating Dataset Size for Biomedical Image Classification and RADEdit: Stress Testing Biomedical Vision Models via Adversarial Example Generation papers provide relevant background on evaluating the performance of AI systems on various tasks.

Critical Analysis

The authors acknowledge the limitations of their approach, noting that the ability of DNNs to perform well on psychological tasks may not necessarily reflect their true cognitive capabilities. There are concerns that the tasks may not fully capture the complexity of human cognition, and that the performance of DNNs may be influenced by factors such as dataset size and architecture design.

Additionally, the paper does not address the potential ethical implications of using psychological experiments to evaluate the cognitive capabilities of AI systems. There are concerns about the unintended consequences of such research and the potential for misuse or misinterpretation of the findings.

The Mind to Image: Projecting Visual Mental Imagination paper raises important questions about the ethical considerations in this area of research.

Conclusion

Overall, the MindSet: Vision toolbox represents an interesting approach to evaluating the cognitive capabilities of DNNs. By assessing their performance on a diverse set of psychological tasks, the researchers hope to gain a better understanding of the strengths and limitations of these AI systems. However, the paper acknowledges the need for further research to address the limitations of this approach and explore the ethical implications of this line of inquiry.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

MindSet: Vision. A toolbox for testing DNNs on key psychological experiments

Valerio Biscione, Dong Yin, Gaurav Malhotra, Marin Dujmovic, Milton L. Montero, Guillermo Puebla, Federico Adolfi, Rachel F. Heaton, John E. Hummel, Benjamin D. Evans, Karim Habashy, Jeffrey S. Bowers

Multiple benchmarks have been developed to assess the alignment between deep neural networks (DNNs) and human vision. In almost all cases these benchmarks are observational in the sense they are composed of behavioural and brain responses to naturalistic images that have not been manipulated to test hypotheses regarding how DNNs or humans perceive and identify objects. Here we introduce the toolbox MindSet: Vision, consisting of a collection of image datasets and related scripts designed to test DNNs on 30 psychological findings. In all experimental conditions, the stimuli are systematically manipulated to test specific hypotheses regarding human visual perception and object recognition. In addition to providing pre-generated datasets of images, we provide code to regenerate these datasets, offering many configurable parameters which greatly extend the dataset versatility for different research contexts, and code to facilitate the testing of DNNs on these image datasets using three different methods (similarity judgments, out-of-distribution classification, and decoder method), accessible at https://github.com/MindSetVision/mindset-vision. We test ResNet-152 on each of these methods as an example of how the toolbox can be used.

4/9/2024

Evaluating Multiview Object Consistency in Humans and Image Models

Tyler Bonnen, Stephanie Fu, Yutong Bai, Thomas O'Connell, Yoni Friedman, Nancy Kanwisher, Joshua B. Tenenbaum, Alexei A. Efros

We introduce a benchmark to directly evaluate the alignment between human observers and vision models on a 3D shape inference task. We leverage an experimental design from the cognitive sciences which requires zero-shot visual inferences about object shape: given a set of images, participants identify which contain the same/different objects, despite considerable viewpoint variation. We draw from a diverse range of images that include common objects (e.g., chairs) as well as abstract shapes (i.e., procedurally generated `nonsense' objects). After constructing over 2000 unique image sets, we administer these tasks to human participants, collecting 35K trials of behavioral data from over 500 participants. This includes explicit choice behaviors as well as intermediate measures, such as reaction time and gaze data. We then evaluate the performance of common vision models (e.g., DINOv2, MAE, CLIP). We find that humans outperform all models by a wide margin. Using a multi-scale evaluation approach, we identify underlying similarities and differences between models and humans: while human-model performance is correlated, humans allocate more time/processing on challenging trials. All images, data, and code can be accessed via our project page.

9/11/2024

NTSEBENCH: Cognitive Reasoning Benchmark for Vision Language Models

Pranshu Pandya, Agney S Talwarr, Vatsal Gupta, Tushar Kataria, Vivek Gupta, Dan Roth

Cognitive textual and visual reasoning tasks, such as puzzles, series, and analogies, demand the ability to quickly reason, decipher, and evaluate patterns both textually and spatially. While LLMs and VLMs, through extensive training on large amounts of human-curated data, have attained a high level of pseudo-human intelligence in some common sense reasoning tasks, they still struggle with more complex reasoning tasks that require cognitive understanding. In this work, we introduce a new dataset, NTSEBench, designed to evaluate the cognitive multi-modal reasoning and problem-solving skills of large models. The dataset comprises 2,728 multiple-choice questions comprising of a total of 4,642 images across 26 categories sampled from the NTSE examination conducted nationwide in India, featuring both visual and textual general aptitude questions that do not rely on rote learning. We establish baselines on the dataset using state-of-the-art LLMs and VLMs. To facilitate a comparison between open source and propriety models, we propose four distinct modeling strategies to handle different modalities (text and images) in the dataset instances.

7/16/2024

Dimensions underlying the representational alignment of deep neural networks with humans

Florian P. Mahner, Lukas Muttenthaler, Umut Guc{c}lu, Martin N. Hebart

Determining the similarities and differences between humans and artificial intelligence is an important goal both in machine learning and cognitive neuroscience. However, similarities in representations only inform us about the degree of alignment, not the factors that determine it. Drawing upon recent developments in cognitive science, we propose a generic framework for yielding comparable representations in humans and deep neural networks (DNN). Applying this framework to humans and a DNN model of natural images revealed a low-dimensional DNN embedding of both visual and semantic dimensions. In contrast to humans, DNNs exhibited a clear dominance of visual over semantic features, indicating divergent strategies for representing images. While in-silico experiments showed seemingly-consistent interpretability of DNN dimensions, a direct comparison between human and DNN representations revealed substantial differences in how they process images. By making representations directly comparable, our results reveal important challenges for representational alignment, offering a means for improving their comparability.

6/28/2024