Evolution of Voices in French Audiovisual Media Across Genders and Age in a Diachronic Perspective

Read original: arXiv:2404.16104 - Published 4/26/2024 by Albert Rilliard, David Doukhan, R'emi Uro, Simon Devauchelle

🔗

Overview

This paper presents a diachronic (over time) acoustic analysis of the voices of 1023 speakers from French media archives.
The speakers were categorized based on time period, age group, and gender.
The fundamental frequency (F0) and first four formants (F1-F4) were measured for each speaker.
Statistical models were used to examine how the base-F0 (an estimate of vocal register) and vocal tract length changed over time and across genders.

Plain English Explanation

The researchers in this study analyzed the voices of over 1000 people who appeared in French media over several decades. They looked at how certain acoustic features of the voice, like pitch and vocal tract size, changed over time and differed between men and women of different ages.

To do this, they divided the speakers into 32 different categories based on four time periods (the 1950s, 1970s, 1990s, and 2010s), four age groups (20-35, 36-50, 51-65, and over 65), and two genders (male and female). They then measured two key aspects of each speaker's voice:

The fundamental frequency (F0), which corresponds to the overall "pitch" of the voice. From the F0 distribution, they calculated the base-F0, which estimates the speaker's vocal register or typical pitch level.
The first four formants (F1-F4), which are related to the size and shape of the vocal tract. They used these formant frequencies to estimate the speaker's average vocal tract length.

The researchers then used statistical models to see how the base-F0 and vocal tract length changed over the different time periods and between genders, while accounting for the effects of age.

The results showed that voices tended to get lower in pitch (i.e., have a lower base-F0) over time, independent of gender. They also found that women's voices got lower in pitch as they got older, but men's voices did not change as much with age.

Technical Explanation

This study conducted a diachronic acoustic analysis of the voices of 1023 speakers from French media archives, spanning four time periods (1955/56, 1975/76, 1995/96, 2015/16), four age groups (20-35, 36-50, 51-65, >65), and two genders. The researchers estimated the fundamental frequency (F0) and the first four formants (F1-F4) for each speaker, using procedures designed to ensure high-quality estimations on this heterogeneous data.

From the F0 distribution of each speaker, the base-F0 value was calculated as an estimate of vocal register. The average vocal tract length was also estimated from the formant frequencies. These measures of base-F0 and vocal tract length were then fitted to linear mixed models to evaluate how they changed across time periods and genders, while correcting for age effects.

The results showed a significant effect of time period, with a tendency for voices to become lower in pitch (i.e., have a lower base-F0) over the decades, independent of gender. Additionally, a lowering of pitch with age was observed for female but not male speakers.

Critical Analysis

The researchers acknowledge several limitations in their study. First, the use of archival media data means the audio quality and recording conditions were variable and potentially inconsistent across the different time periods. While the researchers described procedures to mitigate these issues, there may still be some unaccounted-for sources of noise or bias in the acoustic measurements.

Additionally, the study only examined two binary gender categories (male and female), which may not capture the full diversity of gender identity and expression among the speakers. Future research could explore voice passing and non-binary voice gender prediction or the role of language proficiency and F0 entrainment in L2 English speakers.

The researchers also did not investigate other potential factors that could influence voice acoustics, such as measures of acoustic diversity in speech or the development of acoustic models for automatic speech recognition. Incorporating these additional variables could provide a more comprehensive understanding of how voice signal processing and machine learning shape the observed changes in voice acoustics over time.

Conclusion

This study presents a detailed diachronic analysis of voice acoustics in a large dataset of French media speakers. The findings suggest that overall pitch levels have trended lower over the past several decades, with women's voices showing a more pronounced lowering of pitch as they age compared to men. These results contribute to our understanding of how voice characteristics may be evolving, with potential implications for fields like speech technology, linguistics, and social sciences. Future research could build on this work by incorporating additional factors and exploring the voice diversity across a broader spectrum of gender identities.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

🔗

Evolution of Voices in French Audiovisual Media Across Genders and Age in a Diachronic Perspective

Albert Rilliard, David Doukhan, R'emi Uro, Simon Devauchelle

We present a diachronic acoustic analysis of the voice of 1023 speakers from French media archives. The speakers are spread across 32 categories based on four periods (years 1955/56, 1975/76, 1995/96, 2015/16), four age groups (20-35; 36-50; 51-65, >65), and two genders. The fundamental frequency ($F_0$) and the first four formants (F1-4) were estimated. Procedures used to ensure the quality of these estimations on heterogeneous data are described. From each speaker's $F_0$ distribution, the base-$F_0$ value was calculated to estimate the register. Average vocal tract length was estimated from formant frequencies. Base-$F_0$ and vocal tract length were fit by linear mixed models to evaluate how they may have changed across time periods and genders, corrected for age effects. Results show an effect of the period with a tendency to lower voices, independently of gender. A lowering of pitch is observed with age for female but not male speakers.

4/26/2024

Articulatory Configurations across Genders and Periods in French Radio and TV archives

Benjamin Elie, David Doukhan, R'emi Uro, Lucas Ondel-Yang, Albert Rilliard, Simon Devauchelle

This paper studies changes in articulatory configurations across genders and periods using an inversion from acoustic to articulatory parameters. From a diachronic corpus based on French media archives spanning 60 years from 1955 to 2015, automatic transcription and forced alignment allowed extracting the central frame of each vowel. More than one million frames were obtained from over a thousand speakers across gender and age categories. Their formants were used from these vocalic frames to fit the parameters of Maeda's articulatory model. Evaluations of the quality of these processes are provided. We focus here on two parameters of Maeda's model linked to total vocal tract length: the relative position of the larynx (higher for females) and the lips protrusion (more protruded for males). Implications for voice quality across genders are discussed. The effect across periods seems gender independent; thus, the assertion that females lowered their pitch with time is not supported.

8/9/2024

🔎

A Semi-Automatic Approach to Create Large Gender- and Age-Balanced Speaker Corpora: Usefulness of Speaker Diarization & Identification

R'emi Uro, David Doukhan, Albert Rilliard, Laetitia Larcher, Anissa-Claire Adgharouamane, Marie Tahon, Antoine Laurent

This paper presents a semi-automatic approach to create a diachronic corpus of voices balanced for speaker's age, gender, and recording period, according to 32 categories (2 genders, 4 age ranges and 4 recording periods). Corpora were selected at French National Institute of Audiovisual (INA) to obtain at least 30 speakers per category (a total of 960 speakers; only 874 have be found yet). For each speaker, speech excerpts were extracted from audiovisual documents using an automatic pipeline consisting of speech detection, background music and overlapped speech removal and speaker diarization, used to present clean speaker segments to human annotators identifying target speakers. This pipeline proved highly effective, cutting down manual processing by a factor of ten. Evaluation of the quality of the automatic processing and of the final output is provided. It shows the automatic processing compare to up-to-date process, and that the output provides high quality speech for most of the selected excerpts. This method shows promise for creating large corpora of known target speakers.

4/29/2024

⛏️

Gender Representation in TV and Radio: Automatic Information Extraction methods versus Manual Analyses

David Doukhan, Lena Dodson, Manon Conan, Valentin Pelloin, Aur'elien Clamouse, M'elina Lepape, G'eraldine Van Hille, C'ecile M'eadel, Marl`ene Coulomb-Gully

This study investigates the relationship between automatic information extraction descriptors and manual analyses to describe gender representation disparities in TV and Radio. Automatic descriptors, including speech time, facial categorization and speech transcriptions are compared with channel reports on a vast 32,000-hour corpus of French broadcasts from 2023. Findings reveal systemic gender imbalances, with women underrepresented compared to men across all descriptors. Notably, manual channel reports show higher women's presence than automatic estimates and references to women are lower than their speech time. Descriptors share common dynamics during high and low audiences, war coverage, or private versus public channels. While women are more visible than audible in French TV, this trend is inverted in news with unseen journalists depicting male protagonists. A statistical test shows 3 main effects influencing references to women: program category, channel and speaker gender.

6/18/2024