Gender Representation in TV and Radio: Automatic Information Extraction methods versus Manual Analyses

Read original: arXiv:2406.10316 - Published 6/18/2024 by David Doukhan, Lena Dodson, Manon Conan, Valentin Pelloin, Aur'elien Clamouse, M'elina Lepape, G'eraldine Van Hille, C'ecile M'eadel, Marl`ene Coulomb-Gully
Total Score

0

⛏️

Sign in to get full access

or

If you already have an account, we'll log you in

Overview

  • This paper explores the use of automatic information extraction methods versus manual analyses to study gender representation in TV and radio.
  • The researchers compared the accuracy and insights provided by automated techniques with traditional manual coding approaches.
  • The study analyzed gender representation in media programming across different genres and formats.

Plain English Explanation

The paper looks at ways to measure how much men and women appear in TV and radio shows. The researchers tested two different methods:

  1. Automatic information extraction - Using machine learning algorithms to automatically detect and count the presence of male and female voices or names in media content.

  2. Manual analysis - Having human coders carefully watch and listen to the media, then manually record the gender of speakers.

The goal was to see which method provides more accurate and detailed insights into gender representation in different types of TV and radio programs. This builds on related work in areas like semi-automatic gender annotation, analyzing gender evolution in French media, and spoken language identification strategies.

The researchers wanted to understand if automated techniques could efficiently capture gender data at scale, or if manual human evaluation is still needed to get a more nuanced picture. This ties into broader efforts to challenge negative gender stereotypes through automated systems and encode gender in transformer-based speech recognition.

Technical Explanation

The researchers conducted a comparative study between automatic information extraction methods and manual analyses to assess gender representation in TV and radio content.

The automatic approach leveraged machine learning models to detect speaker gender from audio and text data. This involved training classifiers on labeled datasets to recognize male and female voices or names. The researchers then applied these models to media recordings to automatically quantify the presence of each gender.

In parallel, the team carried out manual coding, where human annotators carefully watched and listened to the media samples, recording the gender of each speaker. This provided a ground truth dataset for evaluating the accuracy of the automatic methods.

The study analyzed gender representation across different media genres, such as news, entertainment, and talk shows, as well as differences between TV and radio formats. The researchers compared the gender ratios, speaker time, and other insights produced by the two approaches.

Critical Analysis

The paper provides a rigorous comparison of automated and manual techniques for analyzing gender in media. The authors acknowledge limitations in both approaches - automatic methods may miss nuances, while manual coding is time-consuming.

A key question is whether the automated techniques can achieve sufficient accuracy to replace or complement manual analyses at scale. The results suggest the automated models perform reasonably well overall, but struggle with certain media types and edge cases.

Further research is needed to improve the robustness and generalization of the automatic gender detection models, especially for more diverse and multilingual media content. Integrating additional contextual cues beyond just speaker audio and names could also enhance the accuracy.

Additionally, the paper does not delve into potential biases or blindspots in the data and annotation processes, which could impact the reliability of the gender representation insights. Incorporating more intersectional perspectives could lead to a richer understanding of gender dynamics in media.

Conclusion

This study offers a valuable comparison of automatic and manual approaches for measuring gender representation in TV and radio programming. The findings suggest that while automated techniques show promise for efficient large-scale analysis, human-driven coding remains important for capturing nuanced, contextual insights.

Advancing the accuracy and scope of these automated gender detection methods could enable more comprehensive, cost-effective monitoring of media content. This could inform efforts to promote greater gender diversity and equality in the entertainment industry and public discourse. However, care must be taken to address potential biases and limitations in both the technical systems and the underlying data.

Overall, this work contributes to the ongoing exploration of how to best leverage a combination of human and machine intelligence to gain a deeper understanding of gender dynamics in the media landscape.



This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

⛏️

Total Score

0

Gender Representation in TV and Radio: Automatic Information Extraction methods versus Manual Analyses

David Doukhan, Lena Dodson, Manon Conan, Valentin Pelloin, Aur'elien Clamouse, M'elina Lepape, G'eraldine Van Hille, C'ecile M'eadel, Marl`ene Coulomb-Gully

This study investigates the relationship between automatic information extraction descriptors and manual analyses to describe gender representation disparities in TV and Radio. Automatic descriptors, including speech time, facial categorization and speech transcriptions are compared with channel reports on a vast 32,000-hour corpus of French broadcasts from 2023. Findings reveal systemic gender imbalances, with women underrepresented compared to men across all descriptors. Notably, manual channel reports show higher women's presence than automatic estimates and references to women are lower than their speech time. Descriptors share common dynamics during high and low audiences, war coverage, or private versus public channels. While women are more visible than audible in French TV, this trend is inverted in news with unseen journalists depicting male protagonists. A statistical test shows 3 main effects influencing references to women: program category, channel and speaker gender.

Read more

6/18/2024

Automatic Classification of News Subjects in Broadcast News: Application to a Gender Bias Representation Analysis
Total Score

0

Automatic Classification of News Subjects in Broadcast News: Application to a Gender Bias Representation Analysis

Valentin Pelloin, Lena Dodson, 'Emile Chapuis, Nicolas Herv'e, David Doukhan

This paper introduces a computational framework designed to delineate gender distribution biases in topics covered by French TV and radio news. We transcribe a dataset of 11.7k hours, broadcasted in 2023 on 21 French channels. A Large Language Model (LLM) is used in few-shot conversation mode to obtain a topic classification on those transcriptions. Using the generated LLM annotations, we explore the finetuning of a specialized smaller classification model, to reduce the computational cost. To evaluate the performances of these models, we construct and annotate a dataset of 804 dialogues. This dataset is made available free of charge for research purposes. We show that women are notably underrepresented in subjects such as sports, politics and conflicts. Conversely, on topics such as weather, commercials and health, women have more speaking time than their overall average across all subjects. We also observe representations differences between private and public service channels.

Read more

7/22/2024

🔎

Total Score

0

A Semi-Automatic Approach to Create Large Gender- and Age-Balanced Speaker Corpora: Usefulness of Speaker Diarization & Identification

R'emi Uro, David Doukhan, Albert Rilliard, Laetitia Larcher, Anissa-Claire Adgharouamane, Marie Tahon, Antoine Laurent

This paper presents a semi-automatic approach to create a diachronic corpus of voices balanced for speaker's age, gender, and recording period, according to 32 categories (2 genders, 4 age ranges and 4 recording periods). Corpora were selected at French National Institute of Audiovisual (INA) to obtain at least 30 speakers per category (a total of 960 speakers; only 874 have be found yet). For each speaker, speech excerpts were extracted from audiovisual documents using an automatic pipeline consisting of speech detection, background music and overlapped speech removal and speaker diarization, used to present clean speaker segments to human annotators identifying target speakers. This pipeline proved highly effective, cutting down manual processing by a factor of ten. Evaluation of the quality of the automatic processing and of the final output is provided. It shows the automatic processing compare to up-to-date process, and that the output provides high quality speech for most of the selected excerpts. This method shows promise for creating large corpora of known target speakers.

Read more

4/29/2024

Articulatory Configurations across Genders and Periods in French Radio and TV archives
Total Score

0

Articulatory Configurations across Genders and Periods in French Radio and TV archives

Benjamin Elie, David Doukhan, R'emi Uro, Lucas Ondel-Yang, Albert Rilliard, Simon Devauchelle

This paper studies changes in articulatory configurations across genders and periods using an inversion from acoustic to articulatory parameters. From a diachronic corpus based on French media archives spanning 60 years from 1955 to 2015, automatic transcription and forced alignment allowed extracting the central frame of each vowel. More than one million frames were obtained from over a thousand speakers across gender and age categories. Their formants were used from these vocalic frames to fit the parameters of Maeda's articulatory model. Evaluations of the quality of these processes are provided. We focus here on two parameters of Maeda's model linked to total vocal tract length: the relative position of the larynx (higher for females) and the lips protrusion (more protruded for males). Implications for voice quality across genders are discussed. The effect across periods seems gender independent; thus, the assertion that females lowered their pitch with time is not supported.

Read more

8/9/2024