Scalable Event-by-event Processing of Neuromorphic Sensory Signals With Deep State-Space Models

2404.18508

Published 4/30/2024 by Mark Schone, Neeraj Mohan Sushma, Jingyue Zhuge, Christian Mayr, Anand Subramoney, David Kappel

Scalable Event-by-event Processing of Neuromorphic Sensory Signals With Deep State-Space Models

Abstract

Event-based sensors are well suited for real-time processing due to their fast response times and encoding of the sensory data as successive temporal differences. These and other valuable properties, such as a high dynamic range, are suppressed when the data is converted to a frame-based format. However, most current methods either collapse events into frames or cannot scale up when processing the event data directly event-by-event. In this work, we address the key challenges of scaling up event-by-event modeling of the long event streams emitted by such sensors, which is a particularly relevant problem for neuromorphic computing. While prior methods can process up to a few thousand time steps, our model, based on modern recurrent deep state-space models, scales to event streams of millions of events for both training and inference.We leverage their stable parameterization for learning long-range dependencies, parallelizability along the sequence dimension, and their ability to integrate asynchronous events effectively to scale them up to long event streams.We further augment these with novel event-centric techniques enabling our model to match or beat the state-of-the-art performance on several event stream benchmarks. In the Spiking Speech Commands task, we improve state-of-the-art by a large margin of 6.6% to 87.1%. On the DVS128-Gestures dataset, we achieve competitive results without using frames or convolutional neural networks. Our work demonstrates, for the first time, that it is possible to use fully event-based processing with purely recurrent networks to achieve state-of-the-art task performance in several event-based benchmarks.

Get summaries of the top AI research delivered straight to your inbox:

Overview

This paper presents a scalable approach for processing neuromorphic sensory signals, such as those from event-based vision sensors, using deep state-space models.
The proposed method can efficiently process event streams in an event-by-event manner, enabling real-time processing of high-dimensional, asynchronous neuromorphic data.
The authors demonstrate the effectiveness of their approach on various event-based vision tasks, including object recognition, pose estimation, and visual-inertial odometry.

Plain English Explanation

Neuromorphic sensors, such as event-based vision sensors, are a type of sensor that work differently from traditional cameras. Instead of capturing full images at a fixed rate, they only detect and report changes in the scene, similar to how the human eye and brain process visual information.

This paper introduces a new way to process the data from these neuromorphic sensors using a technique called "deep state-space models". The key idea is to build a deep neural network that can efficiently process the incoming stream of events (changes in the scene) one-by-one, without having to wait for a full image.

The authors show that this approach can handle the high-dimensional, asynchronous data from neuromorphic sensors in real-time, and apply it to tasks like object recognition, pose estimation, and visual-inertial odometry (estimating a device's position and orientation).

Technical Explanation

The paper proposes a deep state-space model (DSSM) architecture for processing event-based sensory data in a scalable, event-by-event manner. The DSSM maintains a hidden state that is updated with each incoming event, allowing it to efficiently process the high-dimensional, asynchronous data stream.

The authors design a specific DSSM variant called the Spiking Structured State-Space Model (S4) that leverages spiking neural network principles to further improve the efficiency of the event-based processing.

To evaluate their approach, the researchers apply the DSSM and S4 models to several event-based vision tasks, including object recognition, pose estimation, and visual-inertial odometry. They demonstrate that their methods outperform previous state-of-the-art approaches in terms of accuracy and computational efficiency.

Critical Analysis

The authors make a compelling case for the benefits of their scalable, event-by-event processing approach for neuromorphic sensory signals. By using deep state-space models, they are able to effectively handle the high-dimensional, asynchronous nature of the event-based data, enabling real-time processing and high-performance on various computer vision tasks.

One potential limitation of the work is the need for labeled training data to learn the state-space models. In some applications, obtaining large-scale labeled event-based datasets may be challenging. The authors mention the possibility of leveraging unsupervised or self-supervised learning techniques to address this issue, which would be an interesting direction for future research.

Additionally, while the paper demonstrates the effectiveness of the proposed methods on several benchmark tasks, it would be valuable to see how they perform on a wider range of real-world event-based vision applications, such as those encountered in robotics or autonomous vehicles. Exploring the robustness and generalization of the DSSM and S4 models in these more diverse and potentially noisier scenarios could provide additional insights.

Conclusion

This paper presents a novel approach for scalable, event-by-event processing of neuromorphic sensory signals using deep state-space models. By efficiently updating a hidden state representation with each incoming event, the proposed methods can handle the high-dimensional, asynchronous data from event-based vision sensors in real-time, enabling high-performance on a variety of computer vision tasks.

The authors' work demonstrates the potential of deep state-space models to unlock the full capabilities of neuromorphic sensors, paving the way for more efficient and responsive artificial perception systems. As event-based sensing technologies continue to evolve, this research could have significant implications for applications ranging from robotics and autonomous vehicles to assistive technologies and beyond.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Related Papers

State Space Models for Event Cameras

Nikola Zubi'c, Mathias Gehrig, Davide Scaramuzza

Today, state-of-the-art deep neural networks that process event-camera data first convert a temporal window of events into dense, grid-like input representations. As such, they exhibit poor generalizability when deployed at higher inference frequencies (i.e., smaller temporal windows) than the ones they were trained on. We address this challenge by introducing state-space models (SSMs) with learnable timescale parameters to event-based vision. This design adapts to varying frequencies without the need to retrain the network at different frequencies. Additionally, we investigate two strategies to counteract aliasing effects when deploying the model at higher frequencies. We comprehensively evaluate our approach against existing methods based on RNN and Transformer architectures across various benchmarks, including Gen1 and 1 Mpx event camera datasets. Our results demonstrate that SSM-based models train 33% faster and also exhibit minimal performance degradation when tested at higher frequencies than the training input. Traditional RNN and Transformer models exhibit performance drops of more than 20 mAP, with SSMs having a drop of 3.76 mAP, highlighting the effectiveness of SSMs in event-based vision tasks.

4/19/2024

cs.CV cs.LG

📈

Spiking Structured State Space Model for Monaural Speech Enhancement

Yu Du, Xu Liu, Yansong Chua

Speech enhancement seeks to extract clean speech from noisy signals. Traditional deep learning methods face two challenges: efficiently using information in long speech sequences and high computational costs. To address these, we introduce the Spiking Structured State Space Model (Spiking-S4). This approach merges the energy efficiency of Spiking Neural Networks (SNN) with the long-range sequence modeling capabilities of Structured State Space Models (S4), offering a compelling solution. Evaluation on the DNS Challenge and VoiceBank+Demand Datasets confirms that Spiking-S4 rivals existing Artificial Neural Network (ANN) methods but with fewer computational resources, as evidenced by reduced parameters and Floating Point Operations (FLOPs).

4/23/2024

cs.SD cs.CV eess.AS

🏋️

Covariant spatio-temporal receptive fields for neuromorphic computing

Jens Egholm Pedersen, Jorg Conradt, Tony Lindeberg

Biological nervous systems constitute important sources of inspiration towards computers that are faster, cheaper, and more energy efficient. Neuromorphic disciplines view the brain as a coevolved system, simultaneously optimizing the hardware and the algorithms running on it. There are clear efficiency gains when bringing the computations into a physical substrate, but we presently lack theories to guide efficient implementations. Here, we present a principled computational model for neuromorphic systems in terms of spatio-temporal receptive fields, based on affine Gaussian kernels over space and leaky-integrator and leaky integrate-and-fire models over time. Our theory is provably covariant to spatial affine and temporal scaling transformations, and with close similarities to the visual processing in mammalian brains. We use these spatio-temporal receptive fields as a prior in an event-based vision task, and show that this improves the training of spiking networks, which otherwise is known as problematic for event-based vision. This work combines efforts within scale-space theory and computational neuroscience to identify theoretically well-founded ways to process spatio-temporal signals in neuromorphic systems. Our contributions are immediately relevant for signal processing and event-based vision, and can be extended to other processing tasks over space and time, such as memory and control.

5/9/2024

cs.NE cs.CV cs.LG

🧪

V2CE: Video to Continuous Events Simulator

Zhongyang Zhang, Shuyang Cui, Kaidong Chai, Haowen Yu, Subhasis Dasgupta, Upal Mahbub, Tauhidur Rahman

Dynamic Vision Sensor (DVS)-based solutions have recently garnered significant interest across various computer vision tasks, offering notable benefits in terms of dynamic range, temporal resolution, and inference speed. However, as a relatively nascent vision sensor compared to Active Pixel Sensor (APS) devices such as RGB cameras, DVS suffers from a dearth of ample labeled datasets. Prior efforts to convert APS data into events often grapple with issues such as a considerable domain shift from real events, the absence of quantified validation, and layering problems within the time axis. In this paper, we present a novel method for video-to-events stream conversion from multiple perspectives, considering the specific characteristics of DVS. A series of carefully designed losses helps enhance the quality of generated event voxels significantly. We also propose a novel local dynamic-aware timestamp inference strategy to accurately recover event timestamps from event voxels in a continuous fashion and eliminate the temporal layering problem. Results from rigorous validation through quantified metrics at all stages of the pipeline establish our method unquestionably as the current state-of-the-art (SOTA).

4/30/2024

cs.CV cs.AI