Biologically-Motivated Learning Model for Instructed Visual Processing

2306.02415

Published 6/18/2024 by Roy Abel, Shimon Ullman

📈

Abstract

As part of understanding how the brain learns, ongoing work seeks to combine biological knowledge and current artificial intelligence (AI) modeling in an attempt to find an efficient biologically plausible learning scheme. Current models of biologically plausible learning often use a cortical-like combination of bottom-up (BU) and top-down (TD) processing, where the TD part carries feedback signals used for learning. However, in the visual cortex, the TD pathway plays a second major role of visual attention, by guiding the visual process to locations and tasks of interest. A biological model should therefore combine the two tasks, and learn to guide the visual process. We introduce a model that uses a cortical-like combination of BU and TD processing that naturally integrates the two major functions of the TD stream. The integrated model is obtained by an appropriate connectivity pattern between the BU and TD streams, a novel processing cycle that uses the TD part twice, and the use of 'Counter-Hebb' learning that operates across the streams. We show that the 'Counter-Hebb' mechanism can provide an exact backpropagation synaptic modification. We further demonstrate the model's ability to guide the visual stream to perform a task of interest, achieving competitive performance compared with AI models on standard multi-task learning benchmarks. The successful combination of learning and visual guidance could provide a new view on combining BU and TD processing in human vision, and suggests possible directions for both biologically plausible models and artificial instructed models, such as vision-language models (VLMs).

Create account to get full access

Overview

This paper explores combining biological knowledge and artificial intelligence (AI) modeling to develop an efficient, biologically plausible learning scheme.
The model uses a cortical-like combination of bottom-up (BU) and top-down (TD) processing, where the TD pathway plays two key roles: feedback for learning and visual attention.
The model integrates these two functions of the TD stream by using a specific connectivity pattern, a novel processing cycle, and 'Counter-Hebb' learning across the BU and TD streams.
The 'Counter-Hebb' mechanism can provide an exact backpropagation synaptic modification.
The model can guide the visual stream to perform tasks of interest, achieving competitive performance on standard multi-task learning benchmarks.

Plain English Explanation

The human brain is an incredible learning machine, and researchers are trying to understand how it works in order to build more efficient and capable artificial intelligence (AI) systems. This paper describes a model that combines what we know about the biology of the brain with current AI techniques.

The key idea is that the brain uses a combination of bottom-up (BU) and top-down (TD) processing. The BU stream takes in information from the senses, while the TD stream provides feedback and guidance. In the visual cortex, the TD pathway plays an important role in visual attention, helping us focus on the things that are most relevant to us.

The model in this paper tries to mimic this BU-TD interplay in a way that integrates both the learning and the attention functions of the TD stream. It does this by using a specific pattern of connections between the BU and TD streams, a novel processing cycle, and a learning rule called 'Counter-Hebb' that operates across the two streams.

Interestingly, the 'Counter-Hebb' learning rule turns out to be equivalent to the backpropagation algorithm, a widely used technique in AI for training neural networks. So this model provides a biologically plausible way to achieve the powerful learning capabilities of backpropagation.

Furthermore, the model is able to guide the visual stream to focus on and perform tasks of interest, matching the performance of state-of-the-art AI models on standard benchmarks. This suggests that the combination of learning and visual attention could be a useful approach for both biologically inspired models and practical AI systems, such as vision-language models.

Technical Explanation

The paper introduces a model that uses a cortical-like combination of bottom-up (BU) and top-down (TD) processing to integrate the two major functions of the TD stream: feedback for learning and visual attention.

The key elements of the model include:

Connectivity Pattern: The model uses a specific connectivity pattern between the BU and TD streams to integrate the two functions.
Processing Cycle: The model employs a novel processing cycle that uses the TD part twice, once for learning and once for attention.
'Counter-Hebb' Learning: The model uses a 'Counter-Hebb' learning rule that operates across the BU and TD streams. This learning rule is shown to be equivalent to the backpropagation algorithm, providing a biologically plausible way to achieve the power of backpropagation.

The authors demonstrate that this integrated model can guide the visual stream to perform tasks of interest, achieving competitive performance on standard multi-task learning benchmarks compared to state-of-the-art AI models.

Critical Analysis

The paper presents a compelling approach to combining biological knowledge and AI modeling, but there are a few potential limitations and areas for further research:

Biological Plausibility: While the model aims to be biologically plausible, the use of the 'Counter-Hebb' learning rule, which is equivalent to backpropagation, may not be entirely biologically realistic. Further research is needed to explore more biologically grounded learning mechanisms.
Task Generalization: The model is demonstrated on standard multi-task learning benchmarks, but it's unclear how well it would generalize to more complex, real-world tasks. Exploring the model's performance on a wider range of tasks could provide valuable insights.
Interpretability: As with many AI models, the internal workings of this integrated BU-TD model may be difficult to interpret and understand. Improving the interpretability of the model could enhance its usefulness for both biological and AI research.

Despite these potential limitations, the paper presents an interesting and potentially impactful approach to combining biological and AI insights, which could lead to new breakthroughs in our understanding of the human brain and the development of more capable and efficient artificial intelligence systems.

Conclusion

This paper introduces a biologically plausible model that integrates the two major functions of the top-down (TD) pathway in the visual cortex: feedback for learning and visual attention. By using a specific connectivity pattern, a novel processing cycle, and a 'Counter-Hebb' learning rule, the model can guide the visual stream to perform tasks of interest while achieving competitive performance on standard benchmarks.

The successful combination of learning and visual guidance in this model could provide valuable insights for both biologically inspired models and practical artificial intelligence applications, such as vision-language models. Further research is needed to address potential limitations and explore the model's broader applicability, but this work represents an important step towards bridging the gap between biological knowledge and advanced AI.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Related Papers

Towards Two-Stream Foveation-based Active Vision Learning

Timur Ibrayev, Amitangshu Mukherjee, Sai Aparna Aketi, Kaushik Roy

Deep neural network (DNN) based machine perception frameworks process the entire input in a one-shot manner to provide answers to both what object is being observed and where it is located. In contrast, the two-stream hypothesis from neuroscience explains the neural processing in the human visual cortex as an active vision system that utilizes two separate regions of the brain to answer the what and the where questions. In this work, we propose a machine learning framework inspired by the two-stream hypothesis and explore the potential benefits that it offers. Specifically, the proposed framework models the following mechanisms: 1) ventral (what) stream focusing on the input regions perceived by the fovea part of an eye (foveation), 2) dorsal (where) stream providing visual guidance, and 3) iterative processing of the two streams to calibrate visual focus and process the sequence of focused image patches. The training of the proposed framework is accomplished by label-based DNN training for the ventral stream model and reinforcement learning for the dorsal stream model. We show that the two-stream foveation-based learning is applicable to the challenging task of weakly-supervised object localization (WSOL), where the training data is limited to the object class or its attributes. The framework is capable of both predicting the properties of an object and successfully localizing it by predicting its bounding box. We also show that, due to the independent nature of the two streams, the dorsal model can be applied on its own to unseen images to localize objects from different datasets.

4/23/2024

cs.CV cs.AI

Having Second Thoughts? Let's hear it

Jung H. Lee, Sujith Vijayan

Deep learning models loosely mimic bottom-up signal pathways from low-order sensory areas to high-order cognitive areas. After training, DL models can outperform humans on some domain-specific tasks, but their decision-making process has been known to be easily disrupted. Since the human brain consists of multiple functional areas highly connected to one another and relies on intricate interplays between bottom-up and top-down (from high-order to low-order areas) processing, we hypothesize that incorporating top-down signal processing may make DL models more robust. To address this hypothesis, we propose a certification process mimicking selective attention and test if it could make DL models more robust. Our empirical evaluations suggest that this newly proposed certification can improve DL models' accuracy and help us build safety measures to alleviate their vulnerabilities with both artificial and natural adversarial examples.

6/3/2024

cs.CV cs.AI

🗣️

BIMM: Brain Inspired Masked Modeling for Video Representation Learning

Zhifan Wan, Jie Zhang, Changzhen Li, Shiguang Shan

The visual pathway of human brain includes two sub-pathways, ie, the ventral pathway and the dorsal pathway, which focus on object identification and dynamic information modeling, respectively. Both pathways comprise multi-layer structures, with each layer responsible for processing different aspects of visual information. Inspired by visual information processing mechanism of the human brain, we propose the Brain Inspired Masked Modeling (BIMM) framework, aiming to learn comprehensive representations from videos. Specifically, our approach consists of ventral and dorsal branches, which learn image and video representations, respectively. Both branches employ the Vision Transformer (ViT) as their backbone and are trained using masked modeling method. To achieve the goals of different visual cortices in the brain, we segment the encoder of each branch into three intermediate blocks and reconstruct progressive prediction targets with light weight decoders. Furthermore, drawing inspiration from the information-sharing mechanism in the visual pathways, we propose a partial parameter sharing strategy between the branches during training. Extensive experiments demonstrate that BIMM achieves superior performance compared to the state-of-the-art methods.

5/22/2024

cs.CV

Neuro-Inspired Hierarchical Multimodal Learning

Xiongye Xiao, Gengshuo Liu, Gaurav Gupta, Defu Cao, Shixuan Li, Yaxing Li, Tianqing Fang, Mingxi Cheng, Paul Bogdan

Integrating and processing information from various sources or modalities are critical for obtaining a comprehensive and accurate perception of the real world. Drawing inspiration from neuroscience, we develop the Information-Theoretic Hierarchical Perception (ITHP) model, which utilizes the concept of information bottleneck. Distinct from most traditional fusion models that aim to incorporate all modalities as input, our model designates the prime modality as input, while the remaining modalities act as detectors in the information pathway. Our proposed perception model focuses on constructing an effective and compact information flow by achieving a balance between the minimization of mutual information between the latent state and the input modal state, and the maximization of mutual information between the latent states and the remaining modal states. This approach leads to compact latent state representations that retain relevant information while minimizing redundancy, thereby substantially enhancing the performance of downstream tasks. Experimental evaluations on both the MUStARD and CMU-MOSI datasets demonstrate that our model consistently distills crucial information in multimodal learning scenarios, outperforming state-of-the-art benchmarks.

4/24/2024

cs.LG cs.AI