BAUST Lipi: A BdSL Dataset with Deep Learning Based Bangla Sign Language Recognition

Read original: arXiv:2408.10518 - Published 8/21/2024 by Md Hadiuzzaman, Mohammed Sowket Ali, Tamanna Sultana, Abdur Raj Shafi, Abu Saleh Musa Miah, Jungpil Shin

BAUST Lipi: A BdSL Dataset with Deep Learning Based Bangla Sign Language Recognition

Overview

This research paper presents BAUST Lipi, a dataset for Bangla Sign Language (BdSL) recognition.
The dataset was created using deep learning techniques and includes video samples of different BdSL signs.
The researchers also developed various machine learning models to recognize BdSL signs from the dataset.

Plain English Explanation

The paper describes the creation of a Bangla Sign Language (BdSL) dataset called BAUST Lipi and the development of machine learning models to recognize BdSL signs from this dataset. BdSL is the sign language used by the deaf and hard-of-hearing community in Bangladesh.

The researchers collected video samples of different BdSL signs and used deep learning techniques to create the BAUST Lipi dataset. This dataset can be used to train machine learning models to recognize BdSL signs. The researchers then developed various machine learning models, including convolutional neural networks (CNNs), to classify the BdSL signs in the dataset.

The creation of the BAUST Lipi dataset and the development of the BdSL recognition models are significant because they can help improve communication and accessibility for the deaf and hard-of-hearing community in Bangladesh. By making it easier to recognize and translate BdSL, these advancements can break down barriers and enable better integration of the deaf community into society.

Technical Explanation

The paper first provides a literature review of previous work on sign language recognition, highlighting the challenges and limitations of existing datasets and models. The researchers then describe the BAUST Lipi dataset, which they created by collecting video samples of 40 different BdSL signs performed by 10 native signers. They preprocessed the video data and used data augmentation techniques to expand the dataset.

The researchers then evaluated various machine learning models for BdSL sign recognition, including CNNs, support vector machines (SVMs), and k-nearest neighbors (KNN) algorithms. They trained and tested these models on the BAUST Lipi dataset and reported their performance metrics, such as accuracy, precision, recall, and F1-score.

The results show that the CNN-based models, particularly a modified version of the VGG-16 architecture, achieved the highest recognition accuracy of up to 92.5%. The researchers also conducted ablation studies to understand the impact of different dataset and model design choices on the recognition performance.

Critical Analysis

The paper provides a comprehensive approach to creating a BdSL dataset and developing machine learning models for sign language recognition. The use of deep learning techniques, such as CNNs, is a promising approach and the reported recognition accuracy is impressive.

However, the paper does not discuss the potential limitations or biases in the BAUST Lipi dataset, such as the diversity of signers, the coverage of different sign variations, or the representativeness of the selected signs. Additionally, the paper does not provide details on the computational resources and training time required for the machine learning models, which could be important considerations for real-world deployment.

Future research could explore ways to expand the BAUST Lipi dataset, investigate the generalization of the models to unseen signs or signers, and evaluate the models' performance in real-world scenarios with continuous sign language recognition tasks.

Conclusion

This research paper presents a significant contribution to the field of Bangla Sign Language (BdSL) recognition by introducing the BAUST Lipi dataset and developing effective machine learning models for BdSL sign classification. The high recognition accuracy achieved by the CNN-based models suggests that these techniques can be valuable for improving communication and accessibility for the deaf and hard-of-hearing community in Bangladesh. The work paves the way for further advancements in BdSL recognition and its real-world applications.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

BAUST Lipi: A BdSL Dataset with Deep Learning Based Bangla Sign Language Recognition

Md Hadiuzzaman, Mohammed Sowket Ali, Tamanna Sultana, Abdur Raj Shafi, Abu Saleh Musa Miah, Jungpil Shin

People commonly communicate in English, Arabic, and Bengali spoken languages through various mediums. However, deaf and hard-of-hearing individuals primarily use body language and sign language to express their needs and achieve independence. Sign language research is burgeoning to enhance communication with the deaf community. While many researchers have made strides in recognizing sign languages such as French, British, Arabic, Turkish, and American, there has been limited research on Bangla sign language (BdSL) with less-than-satisfactory results. One significant barrier has been the lack of a comprehensive Bangla sign language dataset. In our work, we introduced a new BdSL dataset comprising alphabets totaling 18,000 images, with each image being 224x224 pixels in size. Our dataset encompasses 36 Bengali symbols, of which 30 are consonants and the remaining six are vowels. Despite our dataset contribution, many existing systems continue to grapple with achieving high-performance accuracy for BdSL. To address this, we devised a hybrid Convolutional Neural Network (CNN) model, integrating multiple convolutional layers, activation functions, dropout techniques, and LSTM layers. Upon evaluating our hybrid-CNN model with the newly created BdSL dataset, we achieved an accuracy rate of 97.92%. We are confident that both our BdSL dataset and hybrid CNN model will be recognized as significant milestones in BdSL research.

8/21/2024

Deep Neural Network-Based Sign Language Recognition: A Comprehensive Approach Using Transfer Learning with Explainability

A. E. M Ridwan, Mushfiqul Islam Chowdhury, Mekhala Mariam Mary, Md Tahmid Chowdhury Abir

To promote inclusion and ensuring effective communication for those who rely on sign language as their main form of communication, sign language recognition (SLR) is crucial. Sign language recognition (SLR) seamlessly incorporates with diverse technology, enhancing accessibility for the deaf community by facilitating their use of digital platforms, video calls, and communication devices. To effectively solve this problem, we suggest a novel solution that uses a deep neural network to fully automate sign language recognition. This methodology integrates sophisticated preprocessing methodologies to optimise the overall performance. The architectures resnet, inception, xception, and vgg are utilised to selectively categorise images of sign language. We prepared a DNN architecture and merged it with the pre-processing architectures. In the post-processing phase, we utilised the SHAP deep explainer, which is based on cooperative game theory, to quantify the influence of specific features on the output of a machine learning model. Bhutanese-Sign-Language (BSL) dataset was used for training and testing the suggested technique. While training on Bhutanese-Sign-Language (BSL) dataset, overall ResNet50 with the DNN model performed better accuracy which is 98.90%. Our model's ability to provide informational clarity was assessed using the SHAP (SHapley Additive exPlanations) method. In part to its considerable robustness and reliability, the proposed methodological approach can be used to develop a fully automated system for sign language recognition.

9/12/2024

Bengali Sign Language Recognition through Hand Pose Estimation using Multi-Branch Spatial-Temporal Attention Model

Abu Saleh Musa Miah, Md. Al Mehedi Hasan, Md Hadiuzzaman, Muhammad Nazrul Islam, Jungpil Shin

Hand gesture-based sign language recognition (SLR) is one of the most advanced applications of machine learning, and computer vision uses hand gestures. Although, in the past few years, many researchers have widely explored and studied how to address BSL problems, specific unaddressed issues remain, such as skeleton and transformer-based BSL recognition. In addition, the lack of evaluation of the BSL model in various concealed environmental conditions can prove the generalized property of the existing model by facing daily life signs. As a consequence, existing BSL recognition systems provide a limited perspective of their generalisation ability as they are tested on datasets containing few BSL alphabets that have a wide disparity in gestures and are easy to differentiate. To overcome these limitations, we propose a spatial-temporal attention-based BSL recognition model considering hand joint skeletons extracted from the sequence of images. The main aim of utilising hand skeleton-based BSL data is to ensure the privacy and low-resolution sequence of images, which need minimum computational cost and low hardware configurations. Our model captures discriminative structural displacements and short-range dependency based on unified joint features projected onto high-dimensional feature space. Specifically, the use of Separable TCN combined with a powerful multi-head spatial-temporal attention architecture generated high-performance accuracy. The extensive experiments with a proposed dataset and two benchmark BSL datasets with a wide range of evaluations, such as intra- and inter-dataset evaluation settings, demonstrated that our proposed models achieve competitive performance with extremely low computational complexity and run faster than existing models.

8/27/2024

1DCNNTrans: BISINDO Sign Language Interpreters in Improving the Inclusiveness of Public Services

Muchammad Daniyal Kautsar, Ridwan Akmal, Afra Majida Hariono

Indonesia ranks fourth globally in the number of deaf cases. Individuals with hearing impairments often find communication challenging, necessitating the use of sign language. However, there are limited public services that offer such inclusivity. On the other hand, advancements in artificial intelligence (AI) present promising solutions to overcome communication barriers faced by the deaf. This study aims to explore the application of AI in developing models for a simplified sign language translation app and dictionary, designed for integration into public service facilities, to facilitate communication for individuals with hearing impairments, thereby enhancing inclusivity in public services. The researchers compared the performance of LSTM and 1D CNN + Transformer (1DCNNTrans) models for sign language recognition. Through rigorous testing and validation, it was found that the LSTM model achieved an accuracy of 94.67%, while the 1DCNNTrans model achieved an accuracy of 96.12%. Model performance evaluation indicated that although the LSTM exhibited lower inference latency, it showed weaknesses in classifying classes with similar keypoints. In contrast, the 1DCNNTrans model demonstrated greater stability and higher F1 scores for classes with varying levels of complexity compared to the LSTM model. Both models showed excellent performance, exceeding 90% validation accuracy and demonstrating rapid classification of 50 sign language gestures.

9/4/2024