FAPNet: An Effective Frequency Adaptive Point-based Eye Tracker

Read original: arXiv:2406.03177 - Published 6/6/2024 by Xiaopeng Lin, Hongwei Ren, Bojun Cheng

FAPNet: An Effective Frequency Adaptive Point-based Eye Tracker

Overview

This paper introduces FAPNet, a frequency adaptive point-based eye tracker that aims to improve the effectiveness and efficiency of eye tracking systems.
The key idea is to adapt the network's frequency based on the speed of eye movements, allowing it to maintain high accuracy while reducing computation and power requirements.
The proposed approach is evaluated on several eye tracking benchmarks and demonstrates improved performance compared to existing methods.

Plain English Explanation

The paper describes a new way to track where a person's eyes are looking, called FAPNet. Current eye tracking systems often struggle to keep up with fast eye movements, leading to inaccurate results. FAPNet: An Effective Frequency Adaptive Point-based Eye Tracker addresses this by adjusting the network's "frequency" - how often it processes new information - based on how quickly the eyes are moving.

This allows FAPNet to maintain high accuracy even during rapid eye movements, while also reducing the overall computational power and energy required. The researchers tested FAPNet on several standard eye tracking benchmarks and found it outperformed existing approaches. The key innovation is dynamically adapting the network's processing frequency to match the user's eye movement patterns, rather than using a fixed rate.

Technical Explanation

The paper proposes a frequency adaptive point-based network (FAPNet) for eye tracking that can dynamically adjust its processing rate to match the speed of the user's eye movements. This is in contrast to existing point-based eye tracking models that use a fixed frequency.

FAPNet consists of several key components:

A point-based feature extractor that encodes eye image patches into a compact feature representation.
A frequency adaptive module that dynamically adjusts the network's processing rate based on eye movement speed.
A lightweight temporal modeling module that tracks eye movement trajectories over time.

The frequency adaptive module is the core innovation, allowing FAPNet to maintain high accuracy even during rapid eye movements by increasing its processing rate, while reducing computational overhead for slower eye movements by decreasing the rate.

Experiments on standard eye tracking benchmarks like MAMBAPUPIL and the Event-Based Eye Tracking AI's 2024 Challenge demonstrate that FAPNet outperforms existing point-based eye tracking networks in accuracy, inference speed, and energy efficiency.

Critical Analysis

The paper makes a compelling case for the effectiveness of FAPNet's frequency adaptive approach, with clear empirical results demonstrating its advantages over prior work. However, a few potential limitations and areas for further research are worth noting:

The paper does not provide a detailed analysis of how the frequency adaptation mechanism responds to different eye movement patterns and how it impacts overall performance. More insights into the adaptive behavior would strengthen the claims.
While FAPNet outperforms existing methods, the absolute accuracy levels are still not perfect, suggesting room for further improvements in the core eye tracking capability.
The evaluation is limited to standard benchmarks, and it would be valuable to assess FAPNet's real-world performance in diverse usage scenarios, such as under varying lighting conditions or with different user populations.
The computational and energy efficiency gains are promising, but a more comprehensive analysis of the trade-offs between accuracy, latency, and resource requirements would provide a clearer picture of FAPNet's practical advantages.

Overall, the work represents a step forward in developing more effective and efficient eye tracking systems, but continued research and validation will be important to realize the full potential of frequency-adaptive approaches like FAPNet.

Conclusion

FAPNet: An Effective Frequency Adaptive Point-based Eye Tracker introduces an innovative approach to eye tracking that dynamically adjusts the network's processing rate to match the speed of the user's eye movements. By adapting the frequency, FAPNet can maintain high accuracy even during rapid eye movements while reducing the overall computational and energy requirements.

The empirical results demonstrate FAPNet's advantages over existing point-based eye tracking models, suggesting it could lead to more effective and efficient eye tracking systems. Further research to better understand the adaptive behavior, improve the core tracking capabilities, and validate real-world performance will be important to fully realize the potential of this frequency-adaptive approach.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

FAPNet: An Effective Frequency Adaptive Point-based Eye Tracker

Xiaopeng Lin, Hongwei Ren, Bojun Cheng

Eye tracking is crucial for human-computer interaction in different domains. Conventional cameras encounter challenges such as power consumption and image quality during different eye movements, prompting the need for advanced solutions with ultra-fast, low-power, and accurate eye trackers. Event cameras, fundamentally designed to capture information about moving objects, exhibit low power consumption and high temporal resolution. This positions them as an alternative to traditional cameras in the realm of eye tracking. Nevertheless, existing event-based eye tracking networks neglect the pivotal sparse and fine-grained temporal information in events, resulting in unsatisfactory performance. Moreover, the energy-efficient features are further compromised by the use of excessively complex models, hindering efficient deployment on edge devices. In this paper, we utilize Point Cloud as the event representation to harness the high temporal resolution and sparse characteristics of events in eye tracking tasks. We rethink the point-based architecture PEPNet with preprocessing the long-term relationships between samples, leading to the innovative design of FAPNet. A frequency adaptive mechanism is designed to realize adaptive tracking according to the speed of the pupil movement and the Inter Sample LSTM module is introduced to utilize the temporal correlation between samples. In the Event-based Eye Tracking Challenge, we utilize vanilla PEPNet, which is the former work to achieve the $p_{10}$ accuracy of 97.95%. On the SEET synthetic dataset, FAPNet can achieve state-of-the-art while consuming merely 10% of the PEPNet's computational resources. Notably, the computational demand of FAPNet is independent of the sensor's spatial resolution, enhancing its applicability on resource-limited edge devices.

6/6/2024

Evaluating Image-Based Face and Eye Tracking with Event Cameras

Khadija Iddrisu, Waseem Shariff, Noel E. OConnor, Joseph Lemley, Suzanne Little

Event Cameras, also known as Neuromorphic sensors, capture changes in local light intensity at the pixel level, producing asynchronously generated data termed ``events''. This distinct data format mitigates common issues observed in conventional cameras, like under-sampling when capturing fast-moving objects, thereby preserving critical information that might otherwise be lost. However, leveraging this data often necessitates the development of specialized, handcrafted event representations that can integrate seamlessly with conventional Convolutional Neural Networks (CNNs), considering the unique attributes of event data. In this study, We evaluate event-based Face and Eye tracking. The core objective of our study is to showcase the viability of integrating conventional algorithms with event-based data, transformed into a frame format while preserving the unique benefits of event cameras. To validate our approach, we constructed a frame-based event dataset by simulating events between RGB frames derived from the publicly accessible Helen Dataset. We assess its utility for face and eye detection tasks through the application of GR-YOLO -- a pioneering technique derived from YOLOv3. This evaluation includes a comparative analysis with results derived from training the dataset with YOLOv8. Subsequently, the trained models were tested on real event streams from various iterations of Prophesee's event cameras and further evaluated on the Faces in Event Stream (FES) benchmark dataset. The models trained on our dataset shows a good prediction performance across all the datasets obtained for validation with the best results of a mean Average precision score of 0.91. Additionally, The models trained demonstrated robust performance on real event camera data under varying light conditions.

8/21/2024

A Lightweight Spatiotemporal Network for Online Eye Tracking with Event Camera

Yan Ru Pei, Sasskia Bruers, S'ebastien Crouzet, Douglas McLelland, Olivier Coenen

Event-based data are commonly encountered in edge computing environments where efficiency and low latency are critical. To interface with such data and leverage their rich temporal features, we propose a causal spatiotemporal convolutional network. This solution targets efficient implementation on edge-appropriate hardware with limited resources in three ways: 1) deliberately targets a simple architecture and set of operations (convolutions, ReLU activations) 2) can be configured to perform online inference efficiently via buffering of layer outputs 3) can achieve more than 90% activation sparsity through regularization during training, enabling very significant efficiency gains on event-based processors. In addition, we propose a general affine augmentation strategy acting directly on the events, which alleviates the problem of dataset scarcity for event-based systems. We apply our model on the AIS 2024 event-based eye tracking challenge, reaching a score of 0.9916 p10 accuracy on the Kaggle private testset.

4/16/2024

Co-designing a Sub-millisecond Latency Event-based Eye Tracking System with Submanifold Sparse CNN

Baoheng Zhang, Yizhao Gao, Jingyuan Li, Hayden Kwok-Hay So

Eye-tracking technology is integral to numerous consumer electronics applications, particularly in the realm of virtual and augmented reality (VR/AR). These applications demand solutions that excel in three crucial aspects: low-latency, low-power consumption, and precision. Yet, achieving optimal performance across all these fronts presents a formidable challenge, necessitating a balance between sophisticated algorithms and efficient backend hardware implementations. In this study, we tackle this challenge through a synergistic software/hardware co-design of the system with an event camera. Leveraging the inherent sparsity of event-based input data, we integrate a novel sparse FPGA dataflow accelerator customized for submanifold sparse convolution neural networks (SCNN). The SCNN implemented on the accelerator can efficiently extract the embedding feature vector from each representation of event slices by only processing the non-zero activations. Subsequently, these vectors undergo further processing by a gated recurrent unit (GRU) and a fully connected layer on the host CPU to generate the eye centers. Deployment and evaluation of our system reveal outstanding performance metrics. On the Event-based Eye-Tracking-AIS2024 dataset, our system achieves 81% p5 accuracy, 99.5% p10 accuracy, and 3.71 Mean Euclidean Distance with 0.7 ms latency while only consuming 2.29 mJ per inference. Notably, our solution opens up opportunities for future eye-tracking systems. Code is available at https://github.com/CASR-HKU/ESDA/tree/eye_tracking.

4/23/2024