I Searched for a Religious Song in Amharic and Got Sexual Content Instead: Investigating Online Harm in Low-Resourced Languages on YouTube

Read original: arXiv:2405.16656 - Published 5/28/2024 by Hellina Hailu Nigatu, Inioluwa Deborah Raji
Total Score

0

I Searched for a Religious Song in Amharic and Got Sexual Content Instead: Investigating Online Harm in Low-Resourced Languages on YouTube

Sign in to get full access

or

If you already have an account, we'll log you in

Overview

  • This paper investigates online harm in low-resourced languages on the YouTube platform.
  • The researchers focus on the Amharic language and explore how users searching for religious content can instead encounter sexually explicit material.
  • The study examines recommendation systems, community guidelines, and user experiences to understand this issue.
  • The findings have implications for improving search quality and content moderation for low-resourced languages online.

Plain English Explanation

The paper looks at a problem that can happen when people search for things online in languages that don't have as much digital content available, like the Amharic language. The researchers found that people searching for religious songs in Amharic on YouTube were sometimes recommended or shown videos with sexual content instead. This can be upsetting and problematic.

The researchers investigated this issue by looking at how YouTube's recommendation system works, the guidelines YouTube has for what content is allowed, and the experiences of users searching for this kind of content. They wanted to understand why this was happening and find ways to improve the situation.

The findings from this study are important because they highlight challenges with providing safe, high-quality search results for people using languages that don't have as much digital content available. This is an issue that many papers have explored for low-resourced languages online. Addressing problems like this can help make the internet a more inclusive and helpful place for people around the world.

Technical Explanation

The paper investigates the phenomenon of users searching for religious content in the Amharic language on YouTube, but instead being recommended or shown sexually explicit material. The researchers used a mixed-methods approach, combining quantitative analysis of the YouTube recommendation system and qualitative interviews with Amharic-speaking users.

Quantitatively, the team analyzed the YouTube search and recommendation algorithms for Amharic queries related to religious topics. They found that the systems frequently surfaced inappropriate content, likely due to the lack of high-quality Amharic data that the algorithms could draw upon. This aligns with prior research on the challenges of building effective NLP systems for low-resourced languages.

Qualitatively, the researchers conducted semi-structured interviews with 15 Amharic-speaking YouTube users. Participants described their frustration and distress at encountering sexually explicit content when searching for religious material. Some noted feeling that their cultural and religious norms were being violated. These findings echo research on the detection and mitigation of gendered abuse in low-resourced languages and the challenges of defining and enforcing boundaries for offensive speech.

Critical Analysis

The paper provides valuable insights into the practical challenges of content moderation and recommendation systems for low-resourced languages like Amharic. However, it is limited in scope to a single language and platform. Further research is needed to understand whether these issues generalize to other low-resourced languages and online environments.

Additionally, the study does not delve deeply into the potential societal impacts of this problem. For example, the researchers do not explore how encountering inappropriate content when searching for religious material could affect individual users' trust in technology or their sense of online safety. Nor do they consider broader questions of digital inequity and the need for more inclusive internet experiences globally.

It would be interesting for future work to examine toxicity and harmful content across a wider range of topics and platforms, as well as to investigate solutions that bridge social and language barriers online. Nonetheless, this paper makes an important contribution by bringing attention to an underexplored area of online harm.

Conclusion

This paper sheds light on the problem of users in low-resourced language communities encountering inappropriate content when searching for benign material online. The researchers found that the combination of sparse data, imperfect recommendation algorithms, and insufficient content moderation can lead to harmful experiences for Amharic-speaking YouTube users.

The findings highlight the need for more inclusive and culturally-aware approaches to developing search and recommendation systems, as well as strengthening content moderation practices, for underserved language communities. Addressing these challenges could help make the internet a safer and more trustworthy resource for people around the world.



This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

I Searched for a Religious Song in Amharic and Got Sexual Content Instead: Investigating Online Harm in Low-Resourced Languages on YouTube
Total Score

0

I Searched for a Religious Song in Amharic and Got Sexual Content Instead: Investigating Online Harm in Low-Resourced Languages on YouTube

Hellina Hailu Nigatu, Inioluwa Deborah Raji

Online social media platforms such as YouTube have a wide, global reach. However, little is known about the experience of low-resourced language speakers on such platforms; especially in how they experience and navigate harmful content. To better understand this, we (1) conducted semi-structured interviews (n=15) and (2) analyzed search results (n=9313), recommendations (n=3336), channels (n=120) and comments (n=406) of policy-violating sexual content on YouTube focusing on the Amharic language. Our findings reveal that -- although Amharic-speaking YouTube users find the platform crucial for several aspects of their lives -- participants reported unplanned exposure to policy-violating sexual content when searching for benign, popular queries. Furthermore, malicious content creators seem to exploit under-performing language technologies and content moderation to further target vulnerable groups of speakers, including migrant domestic workers, diaspora, and local Ethiopians. Overall, our study sheds light on how failures in low-resourced language technology may lead to exposure to harmful content and suggests implications for stakeholders in minimizing harm. Content Warning: This paper includes discussions of NSFW topics and harmful content (hate, abuse, sexual harassment, self-harm, misinformation). The authors do not support the creation or distribution of harmful content.

Read more

5/28/2024

Low-resourced Languages and Online Knowledge Repositories: A Need-Finding Study
Total Score

0

Low-resourced Languages and Online Knowledge Repositories: A Need-Finding Study

Hellina Hailu Nigatu, John Canny, Sarah E. Chasins

Online Knowledge Repositories (OKRs) like Wikipedia offer communities a way to share and preserve information about themselves and their ways of living. However, for communities with low-resourced languages -- including most African communities -- the quality and volume of content available are often inadequate. One reason for this lack of adequate content could be that many OKRs embody Western ways of knowledge preservation and sharing, requiring many low-resourced language communities to adapt to new interactions. To understand the challenges faced by low-resourced language contributors on the popular OKR Wikipedia, we conducted (1) a thematic analysis of Wikipedia forum discussions and (2) a contextual inquiry study with 14 novice contributors. We focused on three Ethiopian languages: Afan Oromo, Amharic, and Tigrinya. Our analysis revealed several recurring themes; for example, contributors struggle to find resources to corroborate their articles in low-resourced languages, and language technology support, like translation systems and spellcheck, result in several errors that waste contributors' time. We hope our study will support designers in making online knowledge repositories accessible to low-resourced language speakers.

Read more

5/28/2024

Breaking the Silence Detecting and Mitigating Gendered Abuse in Hindi, Tamil, and Indian English Online Spaces
Total Score

0

Breaking the Silence Detecting and Mitigating Gendered Abuse in Hindi, Tamil, and Indian English Online Spaces

Advaitha Vetagiri, Gyandeep Kalita, Eisha Halder, Chetna Taparia, Partha Pakray, Riyanka Manna

Online gender-based harassment is a widespread issue limiting the free expression and participation of women and marginalized genders in digital spaces. Detecting such abusive content can enable platforms to curb this menace. We participated in the Gendered Abuse Detection in Indic Languages shared task at ICON2023 that provided datasets of annotated Twitter posts in English, Hindi and Tamil for building classifiers to identify gendered abuse. Our team CNLP-NITS-PP developed an ensemble approach combining CNN and BiLSTM networks that can effectively model semantic and sequential patterns in textual data. The CNN captures localized features indicative of abusive language through its convolution filters applied on embedded input text. To determine context-based offensiveness, the BiLSTM analyzes this sequence for dependencies among words and phrases. Multiple variations were trained using FastText and GloVe word embeddings for each language dataset comprising over 7,600 crowdsourced annotations across labels for explicit abuse, targeted minority attacks and general offences. The validation scores showed strong performance across f1-measures, especially for English 0.84. Our experiments reveal how customizing embeddings and model hyperparameters can improve detection capability. The proposed architecture ranked 1st in the competition, proving its ability to handle real-world noisy text with code-switching. This technique has a promising scope as platforms aim to combat cyber harassment facing Indic language internet users. Our Code is at https://github.com/advaithavetagiri/CNLP-NITS-PP

Read more

4/4/2024

Assessing the Level of Toxicity Against Distinct Groups in Bangla Social Media Comments: A Comprehensive Investigation
Total Score

0

Assessing the Level of Toxicity Against Distinct Groups in Bangla Social Media Comments: A Comprehensive Investigation

Mukaffi Bin Moin, Pronay Debnath, Usafa Akther Rifa, Rijeet Bin Anis

Social media platforms have a vital role in the modern world, serving as conduits for communication, the exchange of ideas, and the establishment of networks. However, the misuse of these platforms through toxic comments, which can range from offensive remarks to hate speech, is a concerning issue. This study focuses on identifying toxic comments in the Bengali language targeting three specific groups: transgender people, indigenous people, and migrant people, from multiple social media sources. The study delves into the intricate process of identifying and categorizing toxic language while considering the varying degrees of toxicity: high, medium, and low. The methodology involves creating a dataset, manual annotation, and employing pre-trained transformer models like Bangla-BERT, bangla-bert-base, distil-BERT, and Bert-base-multilingual-cased for classification. Diverse assessment metrics such as accuracy, recall, precision, and F1-score are employed to evaluate the model's effectiveness. The experimental findings reveal that Bangla-BERT surpasses alternative models, achieving an F1-score of 0.8903. This research exposes the complexity of toxicity in Bangla social media dialogues, revealing its differing impacts on diverse demographic groups.

Read more

9/26/2024