Speck: A Smart event-based Vision Sensor with a low latency 327K Neuron Convolutional Neuronal Network Processing Pipeline

Read original: arXiv:2304.06793 - Published 5/28/2024 by Ole Richter (SynSense AG, Swizerland, Bio-Inspired Circuits and Systems, Groningen Cognitive Systems and Materials Center), Yannan Xing (SynSense, PR China), Michele De Marchi (SynSense AG, Swizerland), Carsten Nielsen (SynSense AG, Swizerland) and 20 others
Total Score

0

👀

Sign in to get full access

or

If you already have an account, we'll log you in

Overview

  • This paper presents a smart vision sensor System on Chip (SoC) that combines an event-based camera and a low-power asynchronous spiking Convolutional Neural Network (sCNN) computing architecture.
  • The goal is to enable the extraction of high-level information from a variety of sensors, which is in high demand due to the increasing number of smart devices that require sensory processing on the edge.
  • By integrating both the sensor and processing on a single chip, the system can lower production costs and facilitate small stand-alone applications as well as functioning as an edge node in larger systems.

Plain English Explanation

The paper describes a new type of computer chip that combines a special kind of camera with a powerful but efficient neural network processor. The camera is "event-based," which means it only sends information when something changes in the image, rather than continuously sending data. This results in a sparse data stream, which the neural network is designed to process quickly and efficiently.

By putting the camera and the neural network processor on the same chip, the system can be made smaller and cheaper to produce. This makes it well-suited for use in small, standalone applications or as part of larger "edge computing" systems, where data processing happens closer to the sensors rather than in a centralized location.

The key idea is to create a fast, low-power visual processing pipeline that can extract high-level information from various sensors, which is becoming increasingly important as the number of smart devices with advanced sensory capabilities continues to grow.

Technical Explanation

The paper presents a smart vision sensor System on Chip (SoC) that combines an event-based camera and a low-power asynchronous spiking Convolutional Neural Network (sCNN) computing architecture. By integrating both the sensor and processing on a single die, the system can lower unit production costs significantly.

The event-driven nature of the vision sensor delivers high-speed signals in a sparse data stream. This is reflected in the processing pipeline, which focuses on optimizing highly sparse computation and minimizing latency for 9 sCNN layers to 3.36 microseconds per incoming event. The asynchronous architecture and sCNN processing principle are key to achieving this extremely low-latency visual processing on a small form factor with a low energy budget.

The paper benchmarks the system's performance against other sCNN-capable processors, demonstrating its efficiency and suitability for edge computing applications.

Critical Analysis

The paper presents a compelling solution for enabling efficient edge computing with visual sensors. The integration of the event-based camera and the low-power sCNN processor on a single chip is a clever approach to reducing costs and form factor while maintaining high performance.

One potential limitation is the reliance on a specific type of camera (event-based) and neural network architecture (sCNN). While these choices seem well-justified for the target use cases, they may limit the system's flexibility or applicability to other sensing modalities or processing requirements.

Additionally, the paper does not delve deeply into the potential trade-offs or challenges in deploying such a system in real-world edge computing scenarios, such as dealing with environmental factors, security considerations, or integration with larger systems. Further research and testing in these areas would be valuable to fully assess the system's strengths and weaknesses.

Overall, this research represents an important step forward in the development of efficient and low-latency edge computing solutions that can extract high-level information from sensors. Continued innovation in this space could have significant implications for the growth of smart devices and the broader field of edge computing.

Conclusion

This paper presents an innovative smart vision sensor SoC that combines an event-based camera and a low-power sCNN computing architecture on a single chip. By integrating the sensor and processing, the system can achieve low-cost, low-power, and low-latency visual processing suitable for edge computing applications.

The key innovations include the use of an event-based camera to generate a sparse data stream and the efficient asynchronous sCNN processing pipeline to extract high-level information from this data. This approach represents an important advancement in enabling the extraction of meaningful insights from sensors at the edge, which is crucial for the growing number of smart devices and edge computing systems.

While the paper focuses on a specific sensor and neural network architecture, the general principles and findings could inspire further research and development in the field of efficient and low-latency edge computing solutions. Continued progress in this area has the potential to unlock new applications and capabilities for smart devices and edge computing systems across a wide range of industries.



This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

👀

Total Score

0

Speck: A Smart event-based Vision Sensor with a low latency 327K Neuron Convolutional Neuronal Network Processing Pipeline

Ole Richter (SynSense AG, Swizerland, Bio-Inspired Circuits and Systems, Groningen Cognitive Systems and Materials Center), Yannan Xing (SynSense, PR China), Michele De Marchi (SynSense AG, Swizerland), Carsten Nielsen (SynSense AG, Swizerland), Merkourios Katsimpris (SynSense AG, Swizerland), Roberto Cattaneo (SynSense AG, Swizerland), Yudi Ren (SynSense, PR China), Yalun Hu (SynSense, PR China), Qian Liu (SynSense AG, Swizerland), Sadique Sheik (SynSense AG, Swizerland), Tugba Demirci (SynSense AG, Swizerland, SynSense, PR China), Ning Qiao (SynSense AG, Swizerland, SynSense, PR China)

Edge computing solutions that enable the extraction of high-level information from a variety of sensors is in increasingly high demand. This is due to the increasing number of smart devices that require sensory processing for their application on the edge. To tackle this problem, we present a smart vision sensor System on Chip (SoC), featuring an event-based camera and a low-power asynchronous spiking Convolutional Neural Network (sCNN) computing architecture embedded on a single chip. By combining both sensor and processing on a single die, we can lower unit production costs significantly. Moreover, the simple end-to-end nature of the SoC facilitates small stand-alone applications as well as functioning as an edge node in larger systems. The event-driven nature of the vision sensor delivers high-speed signals in a sparse data stream. This is reflected in the processing pipeline, which focuses on optimising highly sparse computation and minimising latency for 9 sCNN layers to 3.36{mu}s for an incoming event. Overall, this results in an extremely low-latency visual processing pipeline deployed on a small form factor with a low energy budget and sensor cost. We present the asynchronous architecture, the individual blocks, and the sCNN processing principle and benchmark against other sCNN capable processors.

Read more

5/28/2024

Co-designing a Sub-millisecond Latency Event-based Eye Tracking System with Submanifold Sparse CNN
Total Score

0

Co-designing a Sub-millisecond Latency Event-based Eye Tracking System with Submanifold Sparse CNN

Baoheng Zhang, Yizhao Gao, Jingyuan Li, Hayden Kwok-Hay So

Eye-tracking technology is integral to numerous consumer electronics applications, particularly in the realm of virtual and augmented reality (VR/AR). These applications demand solutions that excel in three crucial aspects: low-latency, low-power consumption, and precision. Yet, achieving optimal performance across all these fronts presents a formidable challenge, necessitating a balance between sophisticated algorithms and efficient backend hardware implementations. In this study, we tackle this challenge through a synergistic software/hardware co-design of the system with an event camera. Leveraging the inherent sparsity of event-based input data, we integrate a novel sparse FPGA dataflow accelerator customized for submanifold sparse convolution neural networks (SCNN). The SCNN implemented on the accelerator can efficiently extract the embedding feature vector from each representation of event slices by only processing the non-zero activations. Subsequently, these vectors undergo further processing by a gated recurrent unit (GRU) and a fully connected layer on the host CPU to generate the eye centers. Deployment and evaluation of our system reveal outstanding performance metrics. On the Event-based Eye-Tracking-AIS2024 dataset, our system achieves 81% p5 accuracy, 99.5% p10 accuracy, and 3.71 Mean Euclidean Distance with 0.7 ms latency while only consuming 2.29 mJ per inference. Notably, our solution opens up opportunities for future eye-tracking systems. Code is available at https://github.com/CASR-HKU/ESDA/tree/eye_tracking.

Read more

4/23/2024

👀

Total Score

0

RN-Net: Reservoir Nodes-Enabled Neuromorphic Vision Sensing Network

Sangmin Yoo, Eric Yeu-Jer Lee, Ziyu Wang, Xinxin Wang, Wei D. Lu

Event-based cameras are inspired by the sparse and asynchronous spike representation of the biological visual system. However, processing the event data requires either using expensive feature descriptors to transform spikes into frames, or using spiking neural networks that are expensive to train. In this work, we propose a neural network architecture, Reservoir Nodes-enabled neuromorphic vision sensing Network (RN-Net), based on simple convolution layers integrated with dynamic temporal encoding reservoirs for local and global spatiotemporal feature detection with low hardware and training costs. The RN-Net allows efficient processing of asynchronous temporal features, and achieves the highest accuracy of 99.2% for DVS128 Gesture reported to date, and one of the highest accuracy of 67.5% for DVS Lip dataset at a much smaller network size. By leveraging the internal device and circuit dynamics, asynchronous temporal feature encoding can be implemented at very low hardware cost without preprocessing and dedicated memory and arithmetic units. The use of simple DNN blocks and standard backpropagation-based training rules further reduces implementation costs.

Read more

5/28/2024

EvSegSNN: Neuromorphic Semantic Segmentation for Event Data
Total Score

0

EvSegSNN: Neuromorphic Semantic Segmentation for Event Data

Dalia Hareb, Jean Martinet

Semantic segmentation is an important computer vision task, particularly for scene understanding and navigation of autonomous vehicles and UAVs. Several variations of deep neural network architectures have been designed to tackle this task. However, due to their huge computational costs and their high memory consumption, these models are not meant to be deployed on resource-constrained systems. To address this limitation, we introduce an end-to-end biologically inspired semantic segmentation approach by combining Spiking Neural Networks (SNNs, a low-power alternative to classical neural networks) with event cameras whose output data can directly feed these neural network inputs. We have designed EvSegSNN, a biologically plausible encoder-decoder U-shaped architecture relying on Parametric Leaky Integrate and Fire neurons in an objective to trade-off resource usage against performance. The experiments conducted on DDD17 demonstrate that EvSegSNN outperforms the closest state-of-the-art model in terms of MIoU while reducing the number of parameters by a factor of $1.6$ and sparing a batch normalization stage.

Read more

6/21/2024