iSign: A Benchmark for Indian Sign Language Processing

Read original: arXiv:2407.05404 - Published 7/9/2024 by Abhinav Joshi, Romit Mohanty, Mounika Kanakanti, Andesha Mangla, Sudeep Choudhary, Monali Barbate, Ashutosh Modi

iSign: A Benchmark for Indian Sign Language Processing

Overview

This paper introduces iSign, a new benchmark dataset for Indian Sign Language (ISL) processing.
The dataset includes over 10,000 video samples of 226 unique ISL signs performed by 100 native signers.
The benchmark covers various ISL processing tasks, including sign recognition, continuous sign language translation, and sign language generation.
The authors provide comprehensive evaluation protocols and baseline models to facilitate research in this underexplored domain.

Plain English Explanation

The paper presents a new dataset called iSign that can be used to develop and test AI systems for understanding and working with Indian Sign Language (ISL). ISL is the primary sign language used in India, but there has been relatively little research and progress in this area compared to other sign languages like American Sign Language.

The iSign dataset includes over 10,000 video clips of 226 different ISL signs being performed by 100 different native ISL signers. This diversity of signs and signers makes the dataset useful for training and evaluating AI models on a range of ISL processing tasks. These tasks include [object Object], [object Object], and [object Object].

By providing this standardized dataset and evaluation protocols, the authors hope to spur more research and progress in the field of ISL processing. This could lead to better AI-powered assistive technologies for the deaf and hard-of-hearing community in India, as well as improved accessibility for ISL users in education, healthcare, and other domains.

Technical Explanation

The iSign dataset was curated by the authors to address the lack of large-scale resources for Indian Sign Language processing. It consists of 10,117 video clips covering 226 unique ISL signs performed by 100 native signers. The signers were recruited from different regions of India to capture linguistic and cultural diversity.

The dataset is designed to support a variety of ISL processing tasks, including [object Object], continuous sign language translation, and sign language generation. The authors provide standard train/val/test splits, as well as evaluation protocols and baseline models for each task.

For example, the sign recognition task involves classifying individual sign instances from the video clips. The authors trained a 3D convolutional neural network model as a baseline, achieving an accuracy of 91.7% on the test set.

The continuous translation task requires translating a sequence of signs into spoken language. The authors adapted a transformer-based model pretrained on general language data and finetuned it on the iSign dataset, achieving a BLEU score of 35.9.

The sign language generation task involves generating realistic sign language sequences from text input. The authors used a generative adversarial network approach with 3D pose estimation as input, obtaining promising qualitative results.

Critical Analysis

The iSign dataset and benchmark represent an important step forward for ISL processing research. By providing a large-scale, diverse dataset with standardized protocols, the authors have created a valuable resource to drive progress in this underexplored domain.

That said, the dataset has some notable limitations. The video clips are relatively short (2-3 seconds) and lack the contextual richness of real-world sign language interactions. Additionally, the dataset only covers a subset of the full ISL lexicon and does not include continuous signing samples.

The baseline models presented also have room for improvement. While they provide strong starting points, more advanced techniques in areas like [object Object] and [object Object] could likely yield better performance.

Overall, the iSign benchmark is a valuable contribution that should spur greater research activity in Indian Sign Language processing. However, continued efforts are needed to expand the dataset, improve modeling approaches, and ultimately develop practical assistive technologies for the ISL community.

Conclusion

The iSign dataset and benchmark introduced in this paper represent a significant advancement in the field of Indian Sign Language processing. By providing a large-scale, diverse dataset and standardized evaluation protocols, the authors have created a valuable resource to drive progress in this underexplored domain.

The benchmark covers a range of ISL processing tasks, including sign recognition, continuous translation, and sign language generation. Baseline models demonstrate the potential of current AI techniques, but also highlight areas for further improvement.

Ultimately, the iSign benchmark serves as an important step towards developing practical assistive technologies for the deaf and hard-of-hearing community in India. With continued research and innovation, this work could lead to better accessibility and empowerment for ISL users in education, healthcare, and other critical domains.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

iSign: A Benchmark for Indian Sign Language Processing

Abhinav Joshi, Romit Mohanty, Mounika Kanakanti, Andesha Mangla, Sudeep Choudhary, Monali Barbate, Ashutosh Modi

Indian Sign Language has limited resources for developing machine learning and data-driven approaches for automated language processing. Though text/audio-based language processing techniques have shown colossal research interest and tremendous improvements in the last few years, Sign Languages still need to catch up due to the need for more resources. To bridge this gap, in this work, we propose iSign: a benchmark for Indian Sign Language (ISL) Processing. We make three primary contributions to this work. First, we release one of the largest ISL-English datasets with more than 118K video-sentence/phrase pairs. To the best of our knowledge, it is the largest sign language dataset available for ISL. Second, we propose multiple NLP-specific tasks (including SignVideo2Text, SignPose2Text, Text2Pose, Word Prediction, and Sign Semantics) and benchmark them with the baseline models for easier access to the research community. Third, we provide detailed insights into the proposed benchmarks with a few linguistic insights into the workings of ISL. We streamline the evaluation of Sign Language processing, addressing the gaps in the NLP research community for Sign Languages. We release the dataset, tasks, and models via the following website: https://exploration-lab.github.io/iSign/

7/9/2024

YouTube-SL-25: A Large-Scale, Open-Domain Multilingual Sign Language Parallel Corpus

Garrett Tanzer, Biao Zhang

Even for better-studied sign languages like American Sign Language (ASL), data is the bottleneck for machine learning research. The situation is worse yet for the many other sign languages used by Deaf/Hard of Hearing communities around the world. In this paper, we present YouTube-SL-25, a large-scale, open-domain multilingual corpus of sign language videos with seemingly well-aligned captions drawn from YouTube. With >3000 hours of videos across >25 sign languages, YouTube-SL-25 is a) >3x the size of YouTube-ASL, b) the largest parallel sign language dataset to date, and c) the first or largest parallel dataset for many of its component languages. We provide baselines for sign-to-text tasks using a unified multilingual multitask model based on T5 and report scores on benchmarks across 4 sign languages. The results demonstrate that multilingual transfer benefits both higher- and lower-resource sign languages within YouTube-SL-25.

7/17/2024

SignSpeak: Open-Source Time Series Classification for ASL Translation

Aditya Makkar, Divya Makkar, Aarav Patel, Liam Hebert

The lack of fluency in sign language remains a barrier to seamless communication for hearing and speech-impaired communities. In this work, we propose a low-cost, real-time ASL-to-speech translation glove and an exhaustive training dataset of sign language patterns. We then benchmarked this dataset with supervised learning models, such as LSTMs, GRUs and Transformers, where our best model achieved 92% accuracy. The SignSpeak dataset has 7200 samples encompassing 36 classes (A-Z, 1-10) and aims to capture realistic signing patterns by using five low-cost flex sensors to measure finger positions at each time step at 36 Hz. Our open-source dataset, models and glove designs, provide an accurate and efficient ASL translator while maintaining cost-effectiveness, establishing a framework for future work to build on.

7/22/2024

SignLLM: Sign Languages Production Large Language Models

Sen Fang, Lei Wang, Ce Zheng, Yapeng Tian, Chen Chen

In this paper, we introduce the first comprehensive multilingual sign language dataset named Prompt2Sign, which builds from public data including American Sign Language (ASL) and seven others. Our dataset transforms a vast array of videos into a streamlined, model-friendly format, optimized for training with translation models like seq2seq and text2text. Building on this new dataset, we propose SignLLM, the first multilingual Sign Language Production (SLP) model, which includes two novel multilingual SLP modes that allow for the generation of sign language gestures from input text or prompt. Both of the modes can use a new loss and a module based on reinforcement learning, which accelerates the training by enhancing the model's capability to autonomously sample high-quality data. We present benchmark results of SignLLM, which demonstrate that our model achieves state-of-the-art performance on SLP tasks across eight sign languages.

5/20/2024