Abusive Speech Detection in Indic Languages Using Acoustic Features

Read original: arXiv:2407.20808 - Published 7/31/2024 by Anika A. Spiesberger, Andreas Triantafyllopoulos, Iosif Tsangko, Bjorn W. Schuller

Abusive Speech Detection in Indic Languages Using Acoustic Features

Overview

This paper explores the use of acoustic features for detecting abusive speech in Indic languages.
The researchers developed a machine learning model to classify speech as abusive or non-abusive based on acoustic characteristics.
They tested their approach on multiple Indic languages, including Hindi, Bengali, and Malayalam.

Plain English Explanation

The researchers in this paper wanted to create a system that could automatically detect when someone is using abusive or offensive language, but instead of looking at the actual words being used, they focused on the acoustic features of the speech.

Acoustic features refer to things like the pitch, volume, and tone of a person's voice. The idea is that even if someone is saying something abusive, the way they say it might have certain patterns or characteristics that could be used to identify it as abusive speech.

So the researchers developed a machine learning model that could analyze these acoustic features and classify a piece of audio as either abusive or not abusive. They tested this model on several different languages commonly spoken in India, including Hindi, Bengali, and Malayalam.

Technical Explanation

The paper describes the researchers' methodology for developing an abusive speech detection system using acoustic features. They first collected speech datasets in various Indic languages, including abusive and non-abusive samples. They then extracted a range of acoustic features from the audio data, such as pitch, energy, and spectral characteristics.

Next, the researchers trained a machine learning model, specifically a Support Vector Machine (SVM), to classify the speech samples as abusive or non-abusive based on the acoustic features. They evaluated the model's performance using metrics like accuracy, precision, recall, and F1-score.

The results showed that the acoustic-based approach was able to achieve reasonably good performance, with F1-scores ranging from 0.65 to 0.80 across the different Indic languages tested. This suggests that acoustic features can be a useful complement to text-based approaches for detecting abusive speech, especially in scenarios where the actual content of the speech may be difficult to analyze.

Critical Analysis

The paper provides a promising initial exploration of using acoustic features for abusive speech detection in Indic languages. However, there are a few key limitations and areas for further research:

The datasets used in the study were relatively small, which may limit the generalizability of the findings. Larger and more diverse datasets would be needed to fully evaluate the approach.
The researchers only tested a single machine learning model (SVM). Exploring other models, such as deep learning-based approaches, could potentially improve the performance.
The paper does not provide much insight into the specific acoustic features that were most informative for distinguishing abusive from non-abusive speech. Understanding these patterns could lead to more targeted feature engineering and model design.
The study focused on binary classification (abusive vs. non-abusive). Extending the approach to more nuanced multi-class or graded abusive speech detection could be valuable.
The authors did not discuss potential privacy concerns or ethical considerations around using acoustic data for this task, which would be an important area to address.

Overall, the paper represents a solid initial step in exploring the use of acoustic features for abusive speech detection in Indic languages. Further research is needed to fully evaluate the approach and address the identified limitations.

Conclusion

This paper demonstrates the potential of using acoustic features, such as pitch, volume, and tone, to automatically detect abusive speech in multiple Indic languages. The researchers developed a machine learning model that was able to achieve reasonable performance in classifying speech samples as abusive or non-abusive based on these acoustic characteristics.

While the study has some limitations, it suggests that acoustic-based approaches could be a valuable complement to text-based methods for addressing the challenge of abusive content moderation, particularly in scenarios where the actual content of the speech may be difficult to analyze. Continued research in this area, with a focus on larger datasets, more advanced models, and deeper understanding of the relevant acoustic patterns, could lead to more robust and effective abusive speech detection systems for Indic languages and beyond.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

Abusive Speech Detection in Indic Languages Using Acoustic Features

Anika A. Spiesberger, Andreas Triantafyllopoulos, Iosif Tsangko, Bjorn W. Schuller

Abusive content in online social networks is a well-known problem that can cause serious psychological harm and incite hatred. The ability to upload audio data increases the importance of developing methods to detect abusive content in speech recordings. However, simply transferring the mechanisms from written abuse detection would ignore relevant information such as emotion and tone. In addition, many current algorithms require training in the specific language for which they are being used. This paper proposes to use acoustic and prosodic features to classify abusive content. We used the ADIMA data set, which contains recordings from ten Indic languages, and trained different models in multilingual and cross-lingual settings. Our results show that it is possible to classify abusive and non-abusive content using only acoustic and prosodic features. The most important and influential features are discussed.

7/31/2024

🔎

How to Solve Few-Shot Abusive Content Detection Using the Data We Actually Have

Viktor Hangya, Alexander Fraser

Due to the broad range of social media platforms, the requirements of abusive language detection systems are varied and ever-changing. Already a large set of annotated corpora with different properties and label sets were created, such as hate or misogyny detection, but the form and targets of abusive speech are constantly evolving. Since, the annotation of new corpora is expensive, in this work we leverage datasets we already have, covering a wide range of tasks related to abusive language detection. Our goal is to build models cheaply for a new target label set and/or language, using only a few training examples of the target domain. We propose a two-step approach: first we train our model in a multitask fashion. We then carry out few-shot adaptation to the target requirements. Our experiments show that using already existing datasets and only a few-shots of the target task the performance of models improve both monolingually and across languages. Our analysis also shows that our models acquire a general understanding of abusive language, since they improve the prediction of labels which are present only in the target dataset and can benefit from knowledge about labels which are not directly used for the target task.

5/7/2024

Breaking the Silence Detecting and Mitigating Gendered Abuse in Hindi, Tamil, and Indian English Online Spaces

Advaitha Vetagiri, Gyandeep Kalita, Eisha Halder, Chetna Taparia, Partha Pakray, Riyanka Manna

Online gender-based harassment is a widespread issue limiting the free expression and participation of women and marginalized genders in digital spaces. Detecting such abusive content can enable platforms to curb this menace. We participated in the Gendered Abuse Detection in Indic Languages shared task at ICON2023 that provided datasets of annotated Twitter posts in English, Hindi and Tamil for building classifiers to identify gendered abuse. Our team CNLP-NITS-PP developed an ensemble approach combining CNN and BiLSTM networks that can effectively model semantic and sequential patterns in textual data. The CNN captures localized features indicative of abusive language through its convolution filters applied on embedded input text. To determine context-based offensiveness, the BiLSTM analyzes this sequence for dependencies among words and phrases. Multiple variations were trained using FastText and GloVe word embeddings for each language dataset comprising over 7,600 crowdsourced annotations across labels for explicit abuse, targeted minority attacks and general offences. The validation scores showed strong performance across f1-measures, especially for English 0.84. Our experiments reveal how customizing embeddings and model hyperparameters can improve detection capability. The proposed architecture ranked 1st in the competition, proving its ability to handle real-world noisy text with code-switching. This technique has a promising scope as platforms aim to combat cyber harassment facing Indic language internet users. Our Code is at https://github.com/advaithavetagiri/CNLP-NITS-PP

4/4/2024

🔎

Investigating Causal Cues: Strengthening Spoofed Audio Detection with Human-Discernible Linguistic Features

Zahra Khanjani, Tolulope Ale, Jianwu Wang, Lavon Davis, Christine Mallinson, Vandana P. Janeja

Several types of spoofed audio, such as mimicry, replay attacks, and deepfakes, have created societal challenges to information integrity. Recently, researchers have worked with sociolinguistics experts to label spoofed audio samples with Expert Defined Linguistic Features (EDLFs) that can be discerned by the human ear: pitch, pause, word-initial and word-final release bursts of consonant stops, audible intake or outtake of breath, and overall audio quality. It is established that there is an improvement in several deepfake detection algorithms when they augmented the traditional and common features of audio data with these EDLFs. In this paper, using a hybrid dataset comprised of multiple types of spoofed audio augmented with sociolinguistic annotations, we investigate causal discovery and inferences between the discernible linguistic features and the label in the audio clips, comparing the findings of the causal models with the expert ground truth validation labeling process. Our findings suggest that the causal models indicate the utility of incorporating linguistic features to help discern spoofed audio, as well as the overall need and opportunity to incorporate human knowledge into models and techniques for strengthening AI models. The causal discovery and inference can be used as a foundation of training humans to discern spoofed audio as well as automating EDLFs labeling for the purpose of performance improvement of the common AI-based spoofed audio detectors.

9/11/2024