American Sign Language to Text Translation using Transformer and Seq2Seq with LSTM

Read original: arXiv:2409.10874 - Published 9/18/2024 by Gregorius Guntur Sunardi Putra, Adifa Widyadhani Chanda D'Layla, Dimas Wahono, Riyanarto Sarno, Agus Tri Haryono

💬

Overview

Sign language translation is an important issue in communication between deaf and hearing people.
American Sign Language (ASL) is one of the sign languages used, including the alphabetic sign.
Neural machine translation technology is advancing towards sign language translation.
The study compares the Transformer model with the Sequence-to-Sequence (Seq2Seq) model in translating sign language to text.
An experiment was conducted by adding Residual Long Short-Term Memory (ResidualLSTM) to the Transformer.

Plain English Explanation

Sign language is a way for deaf and hard-of-hearing people to communicate using hand, body, and mouth movements. American Sign Language is one type of sign language that uses an alphabetic sign system. As natural language processing technology advances, researchers are working on developing machine translation systems that can translate sign language to written text.

This study looked at two different machine translation models - the Transformer and the Sequence-to-Sequence (Seq2Seq) model - to see how well they could translate sign language to text. The researchers also tried adding a component called Residual Long Short-Term Memory (ResidualLSTM) to the Transformer model to see if that would improve its performance.

Technical Explanation

The researchers used the Transformer model, which has become the state-of-the-art in natural language processing, and compared it to the Sequence-to-Sequence (Seq2Seq) model for translating sign language to text. The Transformer model is known for its ability to capture long-range dependencies in language, which could be valuable for translating the complex movements and patterns of sign language.

The experiment involved adding a Residual Long Short-Term Memory (ResidualLSTM) component to the Transformer model. ResidualLSTM is a type of recurrent neural network that can help the model better remember and process sequential information, which could be useful for sign language translation.

The researchers evaluated the performance of the Transformer, Seq2Seq, and Transformer with ResidualLSTM models using the BLEU score, which measures the similarity between the model's translations and human-generated reference translations. The results showed that the Transformer model outperformed the Seq2Seq model, increasing the BLEU score by 28.14. However, adding the ResidualLSTM component to the Transformer actually reduced its performance by 23.37% based on the BLEU score.

Critical Analysis

The study provides a useful comparison of different machine translation models for sign language, which is an important and challenging problem in assistive technology. The researchers chose appropriate metrics, like the BLEU score, to evaluate the models' performance.

However, the paper does not delve into the potential reasons why adding the ResidualLSTM component reduced the Transformer's performance. It would be helpful to understand the specific limitations or challenges of incorporating this type of recurrent neural network into the Transformer architecture for sign language translation.

Additionally, the study is focused on a single language pair (ASL to English text), and it's unclear how the models would perform on other sign language to text translation tasks. Further research could explore the generalizability of these findings to other sign languages and language pairs.

Conclusion

This study demonstrates the potential of the Transformer model for sign language translation, outperforming the Seq2Seq model. However, the addition of the ResidualLSTM component did not improve the Transformer's performance as expected. The results highlight the ongoing challenges in developing effective machine translation systems for sign language and the need for further research in this important area of assistive technology.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

💬

New!American Sign Language to Text Translation using Transformer and Seq2Seq with LSTM

Gregorius Guntur Sunardi Putra, Adifa Widyadhani Chanda D'Layla, Dimas Wahono, Riyanarto Sarno, Agus Tri Haryono

Sign language translation is one of the important issues in communication between deaf and hearing people, as it expresses words through hand, body, and mouth movements. American Sign Language is one of the sign languages used, one of which is the alphabetic sign. The development of neural machine translation technology is moving towards sign language translation. Transformer became the state-of-the-art in natural language processing. This study compares the Transformer with the Sequence-to-Sequence (Seq2Seq) model in translating sign language to text. In addition, an experiment was conducted by adding Residual Long Short-Term Memory (ResidualLSTM) in the Transformer. The addition of ResidualLSTM to the Transformer reduces the performance of the Transformer model by 23.37% based on the BLEU Score value. In comparison, the Transformer itself increases the BLEU Score value by 28.14 compared to the Seq2Seq model.

9/18/2024

🤿

From Rule-Based Models to Deep Learning Transformers Architectures for Natural Language Processing and Sign Language Translation Systems: Survey, Taxonomy and Performance Evaluation

Nada Shahin, Leila Ismail

With the growing Deaf and Hard of Hearing population worldwide and the persistent shortage of certified sign language interpreters, there is a pressing need for an efficient, signs-driven, integrated end-to-end translation system, from sign to gloss to text and vice-versa. There has been a wealth of research on machine translations and related reviews. However, there are few works on sign language machine translation considering the particularity of the language being continuous and dynamic. This paper aims to address this void, providing a retrospective analysis of the temporal evolution of sign language machine translation algorithms and a taxonomy of the Transformers architectures, the most used approach in language translation. We also present the requirements of a real-time Quality-of-Service sign language ma-chine translation system underpinned by accurate deep learning algorithms. We propose future research directions for sign language translation systems.

8/28/2024

1DCNNTrans: BISINDO Sign Language Interpreters in Improving the Inclusiveness of Public Services

Muchammad Daniyal Kautsar, Ridwan Akmal, Afra Majida Hariono

Indonesia ranks fourth globally in the number of deaf cases. Individuals with hearing impairments often find communication challenging, necessitating the use of sign language. However, there are limited public services that offer such inclusivity. On the other hand, advancements in artificial intelligence (AI) present promising solutions to overcome communication barriers faced by the deaf. This study aims to explore the application of AI in developing models for a simplified sign language translation app and dictionary, designed for integration into public service facilities, to facilitate communication for individuals with hearing impairments, thereby enhancing inclusivity in public services. The researchers compared the performance of LSTM and 1D CNN + Transformer (1DCNNTrans) models for sign language recognition. Through rigorous testing and validation, it was found that the LSTM model achieved an accuracy of 94.67%, while the 1DCNNTrans model achieved an accuracy of 96.12%. Model performance evaluation indicated that although the LSTM exhibited lower inference latency, it showed weaknesses in classifying classes with similar keypoints. In contrast, the 1DCNNTrans model demonstrated greater stability and higher F1 scores for classes with varying levels of complexity compared to the LSTM model. Both models showed excellent performance, exceeding 90% validation accuracy and demonstrating rapid classification of 50 sign language gestures.

9/4/2024

Reconsidering Sentence-Level Sign Language Translation

Garrett Tanzer, Maximus Shengelia, Ken Harrenstien, David Uthus

Historically, sign language machine translation has been posed as a sentence-level task: datasets consisting of continuous narratives are chopped up and presented to the model as isolated clips. In this work, we explore the limitations of this task framing. First, we survey a number of linguistic phenomena in sign languages that depend on discourse-level context. Then as a case study, we perform the first human baseline for sign language translation that actually substitutes a human into the machine learning task framing, rather than provide the human with the entire document as context. This human baseline -- for ASL to English translation on the How2Sign dataset -- shows that for 33% of sentences in our sample, our fluent Deaf signer annotators were only able to understand key parts of the clip in light of additional discourse-level context. These results underscore the importance of understanding and sanity checking examples when adapting machine learning to new domains.

6/18/2024