1DCNNTrans: BISINDO Sign Language Interpreters in Improving the Inclusiveness of Public Services

Read original: arXiv:2409.01975 - Published 9/4/2024 by Muchammad Daniyal Kautsar, Ridwan Akmal, Afra Majida Hariono

1DCNNTrans: BISINDO Sign Language Interpreters in Improving the Inclusiveness of Public Services

Overview

This paper presents a new deep learning model called 1DCNNTrans for translating BISINDO (Indonesian Sign Language) to Indonesian text.
The goal is to improve the inclusiveness of public services by enabling deaf and hard-of-hearing individuals to communicate more effectively.
The model combines 1D convolutional neural networks and transformer architectures to leverage both spatial and sequential information in sign language.

Plain English Explanation

The researchers developed a new artificial intelligence (AI) system called 1DCNNTrans to help translate BISINDO, the sign language used in Indonesia, into written Indonesian text. The motivation is to make public services more accessible and inclusive for deaf and hard-of-hearing individuals in Indonesia.

Sign language is a visual language that uses hand shapes, movements, and facial expressions to convey meaning. Translating between sign language and spoken/written language is challenging because the two have very different structures and modalities. The 1DCNNTrans model aims to bridge this gap by combining two powerful AI techniques - convolutional neural networks (CNNs) and transformers.

CNNs are good at extracting spatial features from visual inputs like sign language videos, while transformers excel at modeling the sequential patterns in language. By integrating these two approaches, 1DCNNTrans can effectively capture both the spatial and sequential aspects of sign language to produce accurate translations.

Making public services more accessible for the deaf and hard-of-hearing is an important goal for promoting inclusivity and equal opportunity in society. This AI system could help enable better communication and information access for these underserved communities when interacting with government agencies, healthcare providers, and other public-facing organizations.

Technical Explanation

The 1DCNNTrans model architecture combines 1D convolutional neural networks and transformer modules to translate BISINDO sign language videos into written Indonesian text.

The 1D convolution layers extract spatial features from the input video frames, which encode the hand shapes, movements, and other visual cues in the sign language. These features are then passed to a transformer encoder-decoder network, which models the sequential and contextual relationships in the language translation task.

The transformer component uses self-attention mechanisms to dynamically weight the relevance of different parts of the input sequence when generating the output translation. This allows the model to better capture the complex grammatical structure and semantics that differ between sign language and written language.

The researchers trained and evaluated the 1DCNNTrans model on a new BISINDO dataset they collected, which contains over 30,000 annotated sign language video clips. Experiments showed that the hybrid 1DCNNTrans architecture outperformed both pure CNN and pure transformer baselines on sign language translation accuracy.

Critical Analysis

The researchers acknowledge several limitations and areas for further work in this study:

The current BISINDO dataset, while the largest of its kind, is still relatively small compared to the massive datasets used for training state-of-the-art machine translation models. Expanding the dataset size and diversity could further improve 1DCNNTrans performance.
The model was only evaluated on isolated sign language videos, not continuous signing. Handling the challenges of real-world, multi-signer sign language sequences is an important next step.
While the focus was on BISINDO, the 1DCNNTrans approach could potentially be applied to other sign languages. Validating the model's generalizability is an interesting avenue for future research.

Additionally, we could question whether a fully automated sign language translation system is the optimal solution, or if a human-in-the-loop approach utilizing skilled sign language interpreters would be more appropriate for sensitive public service contexts. The tradeoffs between automation, accuracy, and the human element warrant further exploration.

Conclusion

This paper presents a novel deep learning model called 1DCNNTrans that aims to bridge the gap between BISINDO sign language and written Indonesian text. By combining convolutional neural networks and transformers, the model can effectively capture both the spatial and sequential aspects of sign language to produce accurate translations.

Improving the inclusiveness of public services for deaf and hard-of-hearing individuals is an important societal goal. The 1DCNNTrans system has the potential to enable better communication and information access, empowering these underserved communities. While the current research has some limitations, the proposed approach represents a promising step forward in sign language translation technology.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

1DCNNTrans: BISINDO Sign Language Interpreters in Improving the Inclusiveness of Public Services

Muchammad Daniyal Kautsar, Ridwan Akmal, Afra Majida Hariono

Indonesia ranks fourth globally in the number of deaf cases. Individuals with hearing impairments often find communication challenging, necessitating the use of sign language. However, there are limited public services that offer such inclusivity. On the other hand, advancements in artificial intelligence (AI) present promising solutions to overcome communication barriers faced by the deaf. This study aims to explore the application of AI in developing models for a simplified sign language translation app and dictionary, designed for integration into public service facilities, to facilitate communication for individuals with hearing impairments, thereby enhancing inclusivity in public services. The researchers compared the performance of LSTM and 1D CNN + Transformer (1DCNNTrans) models for sign language recognition. Through rigorous testing and validation, it was found that the LSTM model achieved an accuracy of 94.67%, while the 1DCNNTrans model achieved an accuracy of 96.12%. Model performance evaluation indicated that although the LSTM exhibited lower inference latency, it showed weaknesses in classifying classes with similar keypoints. In contrast, the 1DCNNTrans model demonstrated greater stability and higher F1 scores for classes with varying levels of complexity compared to the LSTM model. Both models showed excellent performance, exceeding 90% validation accuracy and demonstrating rapid classification of 50 sign language gestures.

9/4/2024

💬

New!American Sign Language to Text Translation using Transformer and Seq2Seq with LSTM

Gregorius Guntur Sunardi Putra, Adifa Widyadhani Chanda D'Layla, Dimas Wahono, Riyanarto Sarno, Agus Tri Haryono

Sign language translation is one of the important issues in communication between deaf and hearing people, as it expresses words through hand, body, and mouth movements. American Sign Language is one of the sign languages used, one of which is the alphabetic sign. The development of neural machine translation technology is moving towards sign language translation. Transformer became the state-of-the-art in natural language processing. This study compares the Transformer with the Sequence-to-Sequence (Seq2Seq) model in translating sign language to text. In addition, an experiment was conducted by adding Residual Long Short-Term Memory (ResidualLSTM) in the Transformer. The addition of ResidualLSTM to the Transformer reduces the performance of the Transformer model by 23.37% based on the BLEU Score value. In comparison, the Transformer itself increases the BLEU Score value by 28.14 compared to the Seq2Seq model.

9/18/2024

🤿

From Rule-Based Models to Deep Learning Transformers Architectures for Natural Language Processing and Sign Language Translation Systems: Survey, Taxonomy and Performance Evaluation

Nada Shahin, Leila Ismail

With the growing Deaf and Hard of Hearing population worldwide and the persistent shortage of certified sign language interpreters, there is a pressing need for an efficient, signs-driven, integrated end-to-end translation system, from sign to gloss to text and vice-versa. There has been a wealth of research on machine translations and related reviews. However, there are few works on sign language machine translation considering the particularity of the language being continuous and dynamic. This paper aims to address this void, providing a retrospective analysis of the temporal evolution of sign language machine translation algorithms and a taxonomy of the Transformers architectures, the most used approach in language translation. We also present the requirements of a real-time Quality-of-Service sign language ma-chine translation system underpinned by accurate deep learning algorithms. We propose future research directions for sign language translation systems.

8/28/2024

BAUST Lipi: A BdSL Dataset with Deep Learning Based Bangla Sign Language Recognition

Md Hadiuzzaman, Mohammed Sowket Ali, Tamanna Sultana, Abdur Raj Shafi, Abu Saleh Musa Miah, Jungpil Shin

People commonly communicate in English, Arabic, and Bengali spoken languages through various mediums. However, deaf and hard-of-hearing individuals primarily use body language and sign language to express their needs and achieve independence. Sign language research is burgeoning to enhance communication with the deaf community. While many researchers have made strides in recognizing sign languages such as French, British, Arabic, Turkish, and American, there has been limited research on Bangla sign language (BdSL) with less-than-satisfactory results. One significant barrier has been the lack of a comprehensive Bangla sign language dataset. In our work, we introduced a new BdSL dataset comprising alphabets totaling 18,000 images, with each image being 224x224 pixels in size. Our dataset encompasses 36 Bengali symbols, of which 30 are consonants and the remaining six are vowels. Despite our dataset contribution, many existing systems continue to grapple with achieving high-performance accuracy for BdSL. To address this, we devised a hybrid Convolutional Neural Network (CNN) model, integrating multiple convolutional layers, activation functions, dropout techniques, and LSTM layers. Upon evaluating our hybrid-CNN model with the newly created BdSL dataset, we achieved an accuracy rate of 97.92%. We are confident that both our BdSL dataset and hybrid CNN model will be recognized as significant milestones in BdSL research.

8/21/2024