A Framework for Pupil Tracking with Event Cameras

Read original: arXiv:2407.16665 - Published 7/24/2024 by Khadija Iddrisu, Waseem Shariff, Suzanne Little

A Framework for Pupil Tracking with Event Cameras

Overview

This paper presents a framework for tracking a person's pupil using an event-based camera.
Event-based cameras are a novel type of sensor that capture changes in the scene rather than full images, which can provide advantages for certain applications.
The proposed framework utilizes specialized neural network models to detect and track the pupil from the sparse event-based camera data.

Plain English Explanation

The paper describes a system for monitoring a person's eye movements using a special type of camera called an "event-based" camera. Traditional cameras capture full images at a fixed rate, but event-based cameras only record changes in the scene. This can provide some benefits, like faster response times and lower power consumption.

The researchers developed a neural network-based framework to detect and track the pupil of the eye from the sparse data produced by the event-based camera. This allows them to measure aspects of eye movement, which could be useful for applications like human-computer interaction or gaze-based control systems.

The key innovation is using specialized machine learning models to extract meaningful information about the eye's position and motion from the unconventional event-based camera data. This requires overcoming the challenge that the camera only records changes, rather than full images.

Technical Explanation

The paper presents a framework for pupil tracking using event-based cameras. Event-based cameras differ from traditional cameras in that they only record changes in the scene, rather than full images at a fixed rate. This sparse data representation can provide advantages like faster response times and lower power consumption for certain applications.

The proposed framework utilizes a two-stage neural network architecture to detect and track the pupil from the event-based camera data. The first stage is a pupil detection model that identifies the location of the pupil in each event frame. The second stage is a pupil tracking model that estimates the pupil's position and motion over time.

The authors evaluate their framework on a custom dataset of event-based eye recordings and demonstrate that it can accurately track the pupil with low latency. They also compare their approach to prior work on event-based eye tracking and show improved performance.

Critical Analysis

The paper presents a well-designed framework for pupil tracking using event-based cameras and validates its effectiveness through experimental evaluation. However, the authors acknowledge certain limitations of their approach, such as the need for careful camera calibration and the potential for performance degradation in challenging lighting conditions.

Additionally, the dataset used for evaluation is relatively small and may not capture the full diversity of real-world eye movement scenarios. Further research could explore the framework's performance on larger and more diverse datasets.

Overall, the proposed framework represents a promising step forward in the use of event-based cameras for eye tracking applications. However, additional work is needed to address the current limitations and further explore the potential of this technology.

Conclusion

This paper introduces a novel framework for pupil tracking using event-based cameras. By leveraging specialized neural network models, the system can accurately detect and track the pupil from the sparse event-based camera data. The authors demonstrate the framework's effectiveness through experimental evaluation and comparison to prior work.

The ability to precisely monitor eye movements using event-based cameras could have significant implications for applications such as human-computer interaction, gaze-based control systems, and vision-based assistive technologies. Further research and development in this area may lead to new breakthroughs in these and other domains.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

A Framework for Pupil Tracking with Event Cameras

Khadija Iddrisu, Waseem Shariff, Suzanne Little

Saccades are extremely rapid movements of both eyes that occur simultaneously, typically observed when an individual shifts their focus from one object to another. These movements are among the swiftest produced by humans and possess the potential to achieve velocities greater than that of blinks. The peak angular speed of the eye during a saccade can reach as high as 700{deg}/s in humans, especially during larger saccades that cover a visual angle of 25{deg}. Previous research has demonstrated encouraging outcomes in comprehending neurological conditions through the study of saccades. A necessary step in saccade detection involves accurately identifying the precise location of the pupil within the eye, from which additional information such as gaze angles can be inferred. Conventional frame-based cameras often struggle with the high temporal precision necessary for tracking very fast movements, resulting in motion blur and latency issues. Event cameras, on the other hand, offer a promising alternative by recording changes in the visual scene asynchronously and providing high temporal resolution and low latency. By bridging the gap between traditional computer vision and event-based vision, we present events as frames that can be readily utilized by standard deep learning algorithms. This approach harnesses YOLOv8, a state-of-the-art object detection technology, to process these frames for pupil tracking using the publicly accessible Ev-Eye dataset. Experimental results demonstrate the framework's effectiveness, highlighting its potential applications in neuroscience, ophthalmology, and human-computer interaction.

7/24/2024

Microsaccade-inspired Event Camera for Robotics

Botao He, Ze Wang, Yuan Zhou, Jingxi Chen, Chahat Deep Singh, Haojia Li, Yuman Gao, Shaojie Shen, Kaiwei Wang, Yanjun Cao, Chao Xu, Yiannis Aloimonos, Fei Gao, Cornelia Fermuller

Neuromorphic vision sensors or event cameras have made the visual perception of extremely low reaction time possible, opening new avenues for high-dynamic robotics applications. These event cameras' output is dependent on both motion and texture. However, the event camera fails to capture object edges that are parallel to the camera motion. This is a problem intrinsic to the sensor and therefore challenging to solve algorithmically. Human vision deals with perceptual fading using the active mechanism of small involuntary eye movements, the most prominent ones called microsaccades. By moving the eyes constantly and slightly during fixation, microsaccades can substantially maintain texture stability and persistence. Inspired by microsaccades, we designed an event-based perception system capable of simultaneously maintaining low reaction time and stable texture. In this design, a rotating wedge prism was mounted in front of the aperture of an event camera to redirect light and trigger events. The geometrical optics of the rotating wedge prism allows for algorithmic compensation of the additional rotational motion, resulting in a stable texture appearance and high informational output independent of external motion. The hardware device and software solution are integrated into a system, which we call Artificial MIcrosaccade-enhanced EVent camera (AMI-EV). Benchmark comparisons validate the superior data quality of AMI-EV recordings in scenarios where both standard cameras and event cameras fail to deliver. Various real-world experiments demonstrate the potential of the system to facilitate robotics perception both for low-level and high-level vision tasks.

5/29/2024

Evaluating Image-Based Face and Eye Tracking with Event Cameras

Khadija Iddrisu, Waseem Shariff, Noel E. OConnor, Joseph Lemley, Suzanne Little

Event Cameras, also known as Neuromorphic sensors, capture changes in local light intensity at the pixel level, producing asynchronously generated data termed ``events''. This distinct data format mitigates common issues observed in conventional cameras, like under-sampling when capturing fast-moving objects, thereby preserving critical information that might otherwise be lost. However, leveraging this data often necessitates the development of specialized, handcrafted event representations that can integrate seamlessly with conventional Convolutional Neural Networks (CNNs), considering the unique attributes of event data. In this study, We evaluate event-based Face and Eye tracking. The core objective of our study is to showcase the viability of integrating conventional algorithms with event-based data, transformed into a frame format while preserving the unique benefits of event cameras. To validate our approach, we constructed a frame-based event dataset by simulating events between RGB frames derived from the publicly accessible Helen Dataset. We assess its utility for face and eye detection tasks through the application of GR-YOLO -- a pioneering technique derived from YOLOv3. This evaluation includes a comparative analysis with results derived from training the dataset with YOLOv8. Subsequently, the trained models were tested on real event streams from various iterations of Prophesee's event cameras and further evaluated on the Faces in Event Stream (FES) benchmark dataset. The models trained on our dataset shows a good prediction performance across all the datasets obtained for validation with the best results of a mean Average precision score of 0.91. Additionally, The models trained demonstrated robust performance on real event camera data under varying light conditions.

8/21/2024

A Lightweight Spatiotemporal Network for Online Eye Tracking with Event Camera

Yan Ru Pei, Sasskia Bruers, S'ebastien Crouzet, Douglas McLelland, Olivier Coenen

Event-based data are commonly encountered in edge computing environments where efficiency and low latency are critical. To interface with such data and leverage their rich temporal features, we propose a causal spatiotemporal convolutional network. This solution targets efficient implementation on edge-appropriate hardware with limited resources in three ways: 1) deliberately targets a simple architecture and set of operations (convolutions, ReLU activations) 2) can be configured to perform online inference efficiently via buffering of layer outputs 3) can achieve more than 90% activation sparsity through regularization during training, enabling very significant efficiency gains on event-based processors. In addition, we propose a general affine augmentation strategy acting directly on the events, which alleviates the problem of dataset scarcity for event-based systems. We apply our model on the AIS 2024 event-based eye tracking challenge, reaching a score of 0.9916 p10 accuracy on the Kaggle private testset.

4/16/2024