EvGNN: An Event-driven Graph Neural Network Accelerator for Edge Vision

Read original: arXiv:2404.19489 - Published 5/1/2024 by Yufeng Yang, Adrian Kneip, Charlotte Frenkel

🧠

Overview

Proposes EvGNN, a novel event-driven graph neural network (GNN) accelerator for low-latency, energy-efficient edge vision using event-based cameras
Leverages three key ideas: directed dynamic graphs, event queues, and a layer-parallel processing scheme
Demonstrates real-time, microsecond-resolution event-based vision on a Xilinx KV260 Ultrascale+ MPSoC platform

Plain English Explanation

Event-based cameras are a promising alternative to traditional frame-based vision sensors, offering microsecond-scale temporal resolution and sparse information encoding. This allows for low-latency, decentralized, and energy-efficient edge vision systems that don't rely on the cloud. However, mainstream computer vision algorithms, such as convolutional neural networks (CNNs), are not well-suited to take advantage of the unique properties of event-based cameras.

EvGNN introduces a novel event-driven graph neural network (GNN) accelerator that addresses this challenge. The key ideas behind EvGNN are:

Directed dynamic graphs: EvGNN uses a graph structure that efficiently stores information about the single-hop neighbors of each event, without the need for explicit edge storage.
Event queues: EvGNN employs event queues to quickly identify the local neighbors of each event within a specific spatiotemporal range, enabling efficient processing.
Layer-parallel processing: EvGNN's novel processing scheme allows for the low-latency execution of multi-layer GNNs, crucial for high-accuracy event-based vision.

By implementing these ideas, EvGNN is able to achieve real-time, microsecond-resolution event-based vision on an edge computing platform, with high accuracy and low power consumption. This represents an important step towards practical, decentralized computer vision at the edge.

Technical Explanation

EvGNN is designed to efficiently process event data from event-based cameras, which offer advantages over traditional frame-based vision sensors, such as low latency and energy efficiency. However, existing computer vision algorithms, including convolutional neural networks (CNNs), are not well-suited to leverage the unique properties of event-based data.

To address this, EvGNN introduces three key innovations:

Directed dynamic graphs: EvGNN represents the event data using a graph structure, where each node corresponds to a single event. The edges in the graph are dynamically created based on the spatial and temporal proximity of the events, and the graph is directed to efficiently store information about the single-hop neighbors of each event.
Event queues: EvGNN employs event queues to quickly identify the local neighbors of each event within a specific spatiotemporal range. This allows for efficient processing of the event data, without the need for costly global searches.
Layer-parallel processing: EvGNN's novel processing scheme enables the low-latency execution of multi-layer GNNs, which is crucial for achieving high-accuracy event-based vision. This is achieved by parallelizing the computation across the layers of the GNN.

The authors evaluated EvGNN on the N-CARS dataset for car recognition, deploying the system on a Xilinx KV260 Ultrascale+ MPSoC platform. EvGNN demonstrated a classification accuracy of 87.8% and an average latency per event of 16μs, enabling real-time, microsecond-resolution event-based vision at the edge.

Critical Analysis

While EvGNN represents an important step towards practical, decentralized computer vision at the edge, the paper does acknowledge some limitations and areas for further research:

Generalization to other tasks: The evaluation of EvGNN was focused on a single task, car recognition. It would be important to assess the system's performance on a wider range of event-based vision tasks to understand its broader applicability.
Hardware resource utilization: The authors provide details on the hardware resource utilization of EvGNN, but it would be valuable to compare this to other event-based vision approaches, such as lightweight spatiotemporal networks for online eye tracking or state-space models for event cameras, to better contextualize the efficiency of the proposed system.
Scalability: The paper does not explicitly address how EvGNN would scale to handle larger or more complex event-based vision problems. Investigating the system's scalability would be an important area for future research.

Overall, EvGNN represents an exciting development in the field of event-based vision, demonstrating the potential of GNN-based approaches for low-latency, energy-efficient edge computing. The novel ideas introduced in this work could inspire further advancements in this rapidly evolving area of computer vision.

Conclusion

EvGNN is a promising event-driven GNN accelerator that addresses the challenges of applying mainstream computer vision algorithms to event-based cameras. By leveraging directed dynamic graphs, event queues, and a layer-parallel processing scheme, EvGNN is able to achieve real-time, microsecond-resolution event-based vision on an edge computing platform, with high accuracy and low power consumption. This work represents an important step towards practical, decentralized computer vision at the edge, with potential applications in a wide range of domains, from autonomous vehicles to augmented reality.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

🧠

EvGNN: An Event-driven Graph Neural Network Accelerator for Edge Vision

Yufeng Yang, Adrian Kneip, Charlotte Frenkel

Edge vision systems combining sensing and embedded processing promise low-latency, decentralized, and energy-efficient solutions that forgo reliance on the cloud. As opposed to conventional frame-based vision sensors, event-based cameras deliver a microsecond-scale temporal resolution with sparse information encoding, thereby outlining new opportunities for edge vision systems. However, mainstream algorithms for frame-based vision, which mostly rely on convolutional neural networks (CNNs), can hardly exploit the advantages of event-based vision as they are typically optimized for dense matrix-vector multiplications. While event-driven graph neural networks (GNNs) have recently emerged as a promising solution for sparse event-based vision, their irregular structure is a challenge that currently hinders the design of efficient hardware accelerators. In this paper, we propose EvGNN, the first event-driven GNN accelerator for low-footprint, ultra-low-latency, and high-accuracy edge vision with event-based cameras. It relies on three central ideas: (i) directed dynamic graphs exploiting single-hop nodes with edge-free storage, (ii) event queues for the efficient identification of local neighbors within a spatiotemporally decoupled search range, and (iii) a novel layer-parallel processing scheme enabling the low-latency execution of multi-layer GNNs. We deployed EvGNN on a Xilinx KV260 Ultrascale+ MPSoC platform and benchmarked it on the N-CARS dataset for car recognition, demonstrating a classification accuracy of 87.8% and an average latency per event of 16$mu$s, thereby enabling real-time, microsecond-resolution event-based vision at the edge.

5/1/2024

Embedded Graph Convolutional Networks for Real-Time Event Data Processing on SoC FPGAs

Kamil Jeziorek, Piotr Wzorek, Krzysztof Blachut, Andrea Pinna, Tomasz Kryjak

The utilisation of event cameras represents an important and swiftly evolving trend aimed at addressing the constraints of traditional video systems. Particularly within the automotive domain, these cameras find significant relevance for their integration into embedded real-time systems due to lower latency and energy consumption. One effective approach to ensure the necessary throughput and latency for event processing systems is through the utilisation of graph convolutional networks (GCNs). In this study, we introduce a series of hardware-aware optimisations tailored for PointNet++, a GCN architecture designed for point cloud processing. The proposed techniques result in more than a 100-fold reduction in model size compared to Asynchronous Event-based GNN (AEGNN), one of the most recent works in the field, with a relatively small decrease in accuracy (2.3% for N-Caltech101 classification, 1.7% for N-Cars classification), thus following the TinyML trend. Based on software research, we designed a custom EFGCN (Event-Based FPGA-accelerated Graph Convolutional Network) and we implemented it on ZCU104 SoC FPGA platform, achieving a throughput of 13.3 million events per second (MEPS) and real-time partially asynchronous processing with a latency of 4.47 ms. We also address the scalability of the proposed hardware model to improve the obtained accuracy score. To the best of our knowledge, this study marks the first endeavour in accelerating PointNet++ networks on SoC FPGAs, as well as the first hardware architecture exploration of graph convolutional networks implementation for real-time continuous event data processing. We publish both software and hardware source code in an open repository: https://github.com/vision-agh/*** (will be published upon acceptance).

6/12/2024

Co-designing a Sub-millisecond Latency Event-based Eye Tracking System with Submanifold Sparse CNN

Baoheng Zhang, Yizhao Gao, Jingyuan Li, Hayden Kwok-Hay So

Eye-tracking technology is integral to numerous consumer electronics applications, particularly in the realm of virtual and augmented reality (VR/AR). These applications demand solutions that excel in three crucial aspects: low-latency, low-power consumption, and precision. Yet, achieving optimal performance across all these fronts presents a formidable challenge, necessitating a balance between sophisticated algorithms and efficient backend hardware implementations. In this study, we tackle this challenge through a synergistic software/hardware co-design of the system with an event camera. Leveraging the inherent sparsity of event-based input data, we integrate a novel sparse FPGA dataflow accelerator customized for submanifold sparse convolution neural networks (SCNN). The SCNN implemented on the accelerator can efficiently extract the embedding feature vector from each representation of event slices by only processing the non-zero activations. Subsequently, these vectors undergo further processing by a gated recurrent unit (GRU) and a fully connected layer on the host CPU to generate the eye centers. Deployment and evaluation of our system reveal outstanding performance metrics. On the Event-based Eye-Tracking-AIS2024 dataset, our system achieves 81% p5 accuracy, 99.5% p10 accuracy, and 3.71 Mean Euclidean Distance with 0.7 ms latency while only consuming 2.29 mJ per inference. Notably, our solution opens up opportunities for future eye-tracking systems. Code is available at https://github.com/CASR-HKU/ESDA/tree/eye_tracking.

4/23/2024

EvSegSNN: Neuromorphic Semantic Segmentation for Event Data

Dalia Hareb, Jean Martinet

Semantic segmentation is an important computer vision task, particularly for scene understanding and navigation of autonomous vehicles and UAVs. Several variations of deep neural network architectures have been designed to tackle this task. However, due to their huge computational costs and their high memory consumption, these models are not meant to be deployed on resource-constrained systems. To address this limitation, we introduce an end-to-end biologically inspired semantic segmentation approach by combining Spiking Neural Networks (SNNs, a low-power alternative to classical neural networks) with event cameras whose output data can directly feed these neural network inputs. We have designed EvSegSNN, a biologically plausible encoder-decoder U-shaped architecture relying on Parametric Leaky Integrate and Fire neurons in an objective to trade-off resource usage against performance. The experiments conducted on DDD17 demonstrate that EvSegSNN outperforms the closest state-of-the-art model in terms of MIoU while reducing the number of parameters by a factor of $1.6$ and sparing a batch normalization stage.

6/21/2024