FSboard: Over 3 million characters of ASL fingerspelling collected via smartphones

Read original: arXiv:2407.15806 - Published 7/23/2024 by Manfred Georg, Garrett Tanzer, Saad Hassan, Maximus Shengelia, Esha Uboweja, Sam Sepah, Sean Forbes, Thad Starner

FSboard: Over 3 million characters of ASL fingerspelling collected via smartphones

Overview

Over 3 million characters of American Sign Language (ASL) fingerspelling collected via smartphones
Largest dataset of its kind, enabling improved artificial intelligence (AI) models for ASL recognition
Significant contribution to advancing accessibility and technology for the Deaf community

Plain English Explanation

The paper describes a dataset of over 3 million characters of American Sign Language (ASL) fingerspelling, which was collected using smartphones. Fingerspelling is an important part of ASL, where individual letters are spelled out using hand gestures. This dataset represents the largest collection of its kind and will enable the development of more accurate AI models for recognizing and translating ASL fingerspelling.

The ability to accurately recognize and translate ASL fingerspelling has important implications for improving accessibility and communication for the Deaf community. By providing a large, high-quality dataset for training AI models, this research helps advance the state-of-the-art in sign language recognition technology. This can lead to more effective and user-friendly assistive technologies, such as real-time translation apps or virtual sign language interpreters.

Technical Explanation

The researchers developed a mobile app called "FSboard" that allows users to record themselves fingerspelling individual letters and words in ASL. Over 1,000 participants from the Deaf community contributed to this dataset, which contains over 3 million total characters of fingerspelling across various words and phrases.

The data was collected in a controlled environment, with participants asked to fingerspell specific prompts while holding their smartphones in a consistent position. This resulted in a high-quality dataset with consistent lighting, background, and camera angles, which is crucial for training accurate AI models.

The researchers used this dataset to benchmark the performance of several state-of-the-art machine learning models for ASL fingerspelling recognition. They found that the large scale and quality of the data enabled significant improvements in recognition accuracy compared to previous smaller-scale datasets.

Critical Analysis

The researchers acknowledge that the dataset, while large, may not fully represent the diversity of ASL fingerspelling styles and variations across different regions, ages, and backgrounds. They encourage further data collection efforts to expand the diversity of the dataset.

Additionally, the research focused solely on the recognition of isolated fingerspelled letters and words, rather than continuous, contextual ASL. Future work could explore extending the dataset and models to handle more natural, conversational sign language.

Despite these limitations, the FSboard dataset represents a significant advancement in the field of sign language recognition and will undoubtedly enable substantial progress in the development of accessible technologies for the Deaf community.

Conclusion

The FSboard dataset of over 3 million characters of ASL fingerspelling is a groundbreaking contribution to the field of sign language recognition. By providing a large-scale, high-quality dataset, this research enables the development of more accurate AI models for fingerspelling translation, which in turn can lead to improved accessibility and communication technologies for the Deaf community. This work represents an important step forward in the ongoing effort to bridge the gap between the Deaf and hearing worlds through the power of technology.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

FSboard: Over 3 million characters of ASL fingerspelling collected via smartphones

Manfred Georg, Garrett Tanzer, Saad Hassan, Maximus Shengelia, Esha Uboweja, Sam Sepah, Sean Forbes, Thad Starner

Progress in machine understanding of sign languages has been slow and hampered by limited data. In this paper, we present FSboard, an American Sign Language fingerspelling dataset situated in a mobile text entry use case, collected from 147 paid and consenting Deaf signers using Pixel 4A selfie cameras in a variety of environments. Fingerspelling recognition is an incomplete solution that is only one small part of sign language translation, but it could provide some immediate benefit to Deaf/Hard of Hearing signers as more broadly capable technology develops. At >3 million characters in length and >250 hours in duration, FSboard is the largest fingerspelling recognition dataset to date by a factor of >10x. As a simple baseline, we finetune 30 Hz MediaPipe Holistic landmark inputs into ByT5-Small and achieve 11.1% Character Error Rate (CER) on a test set with unique phrases and signers. This quality degrades gracefully when decreasing frame rate and excluding face/body landmarks: plausible optimizations to help models run on device in real time.

7/23/2024

Fingerspelling within Sign Language Translation

Garrett Tanzer

Fingerspelling poses challenges for sign language processing due to its high-frequency motion and use for open-vocabulary terms. While prior work has studied fingerspelling recognition, there has been little attention to evaluating how well sign language translation models understand fingerspelling in the context of entire sentences -- and improving this capability. We manually annotate instances of fingerspelling within FLEURS-ASL and use them to evaluate the effect of two simple measures to improve fingerspelling recognition within American Sign Language to English translation: 1) use a model family (ByT5) with character- rather than subword-level tokenization, and 2) mix fingerspelling recognition data into the translation training mixture. We find that 1) substantially improves understanding of fingerspelling (and therefore translation quality overall), but the effect of 2) is mixed.

8/14/2024

SignSpeak: Open-Source Time Series Classification for ASL Translation

Aditya Makkar, Divya Makkar, Aarav Patel, Liam Hebert

The lack of fluency in sign language remains a barrier to seamless communication for hearing and speech-impaired communities. In this work, we propose a low-cost, real-time ASL-to-speech translation glove and an exhaustive training dataset of sign language patterns. We then benchmarked this dataset with supervised learning models, such as LSTMs, GRUs and Transformers, where our best model achieved 92% accuracy. The SignSpeak dataset has 7200 samples encompassing 36 classes (A-Z, 1-10) and aims to capture realistic signing patterns by using five low-cost flex sensors to measure finger positions at each time step at 36 Hz. Our open-source dataset, models and glove designs, provide an accurate and efficient ASL translator while maintaining cost-effectiveness, establishing a framework for future work to build on.

7/22/2024

💬

An Open-Source American Sign Language Fingerspell Recognition and Semantic Pose Retrieval Interface

Kevin Jose Thomas

This paper introduces an open-source interface for American Sign Language fingerspell recognition and semantic pose retrieval, aimed to serve as a stepping stone towards more advanced sign language translation systems. Utilizing a combination of convolutional neural networks and pose estimation models, the interface provides two modular components: a recognition module for translating ASL fingerspelling into spoken English and a production module for converting spoken English into ASL pose sequences. The system is designed to be highly accessible, user-friendly, and capable of functioning in real-time under varying environmental conditions like backgrounds, lighting, skin tones, and hand sizes. We discuss the technical details of the model architecture, application in the wild, as well as potential future enhancements for real-world consumer applications.

8/20/2024