Are Sounds Sound for Phylogenetic Reconstruction?

Read original: arXiv:2402.02807 - Published 5/15/2024 by Luise Hauser, Gerhard Jager, Taraka Rama, Johann-Mattis List, Alexandros Stamatakis
Total Score

0

📈

Sign in to get full access

or

If you already have an account, we'll log you in

Overview

  • This paper examines the use of sound-based versus cognate-based approaches for phylogenetic reconstruction (i.e., building family trees) of languages.
  • The researchers compared the performance of these two methods across 10 diverse language family datasets, using state-of-the-art techniques for automated cognate and sound correspondence detection.
  • The key finding is that phylogenies reconstructed from lexical cognates (words with a common origin) are topologically closer to the gold standard phylogenies than those reconstructed from sound correspondences.

Plain English Explanation

When studying the evolution of languages, scholars have traditionally emphasized the importance of sound laws and sound correspondences for inferring the relationships between languages and building family trees. However, most computational studies on language evolution have relied primarily on lexical cognates - words that share a common origin - as the main source of data for reconstructing these phylogenies.

In this study, the researchers wanted to test how well the sound-based approach performs compared to the more commonly used cognate-based approach. They used 10 different language family datasets and state-of-the-art methods for automatically detecting cognates and sound correspondences. The results showed that the phylogenies reconstructed from lexical cognates were, on average, about one-third closer to the known or "gold standard" phylogenies than the phylogenies reconstructed from sound correspondences.

Technical Explanation

The researchers employed 10 diverse language family datasets, representing various geographical regions and language types, to evaluate the performance of sound-based versus cognate-based phylogenetic reconstruction. They used state-of-the-art methods for automated cognate and sound correspondence detection, which allowed them to systematically compare the two approaches.

The phylogenies reconstructed from lexical cognates were found to be topologically closer, by approximately one-third with respect to the generalized quartet distance, to the gold standard phylogenies than the phylogenies reconstructed from sound correspondences. This suggests that, contrary to the traditional emphasis on sound laws and sound correspondences, the lexical cognate-based approach may be more effective for phylogenetic inference in computational studies of language evolution.

Critical Analysis

The paper acknowledges that sound-based approaches have the potential to provide valuable insights into language evolution, as they capture information about phonological changes that are not necessarily reflected in lexical cognates. However, the researchers' findings indicate that the cognate-based approach currently outperforms the sound-based approach in terms of reconstructing accurate phylogenies.

One potential limitation of the study is that the sound correspondence detection methods used may not have been optimized for all the language families included in the datasets. Improvements in these methods could potentially lead to better performance of the sound-based approach in the future.

Additionally, the researchers suggest that combining lexical and sound-based information may be a fruitful avenue for further research, as it could leverage the strengths of both approaches to achieve even more accurate phylogenetic reconstructions.

Conclusion

This study provides important insights into the relative performance of sound-based versus cognate-based approaches for phylogenetic inference in computational studies of language evolution. The findings challenge the traditional emphasis on sound laws and sound correspondences, showing that the more commonly used cognate-based approach currently outperforms the sound-based approach in reconstructing accurate language family trees. The researchers encourage further exploration of ways to integrate both lexical and phonological information to advance our understanding of how languages evolve over time.



This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

📈

Total Score

0

Are Sounds Sound for Phylogenetic Reconstruction?

Luise Hauser, Gerhard Jager, Taraka Rama, Johann-Mattis List, Alexandros Stamatakis

In traditional studies on language evolution, scholars often emphasize the importance of sound laws and sound correspondences for phylogenetic inference of language family trees. However, to date, computational approaches have typically not taken this potential into account. Most computational studies still rely on lexical cognates as major data source for phylogenetic reconstruction in linguistics, although there do exist a few studies in which authors praise the benefits of comparing words at the level of sound sequences. Building on (a) ten diverse datasets from different language families, and (b) state-of-the-art methods for automated cognate and sound correspondence detection, we test, for the first time, the performance of sound-based versus cognate-based approaches to phylogenetic reconstruction. Our results show that phylogenies reconstructed from lexical cognates are topologically closer, by approximately one third with respect to the generalized quartet distance on average, to the gold standard phylogenies than phylogenies reconstructed from sound correspondences.

Read more

5/15/2024

🎯

Total Score

0

Exploring Sound Change Over Time: A Review of Computational and Human Perception

Siqi He, Wei Zhao

Computational and human perception are often considered separate approaches for studying sound changes over time; few works have touched on the intersection of both. To fill this research gap, we provide a pioneering review contrasting computational with human perception from the perspectives of methods and tasks. Overall, computational approaches rely on computer-driven models to perceive historical sound changes on etymological datasets, while human approaches use listener-driven models to perceive ongoing sound changes on recording corpora. Despite their differences, both approaches complement each other on phonetic and acoustic levels, showing the potential to achieve a more comprehensive perception of sound change. Moreover, we call for a comparative study on the datasets used by both approaches to investigate the influence of historical sound changes on ongoing changes. Lastly, we discuss the applications of sound change in computational linguistics, and point out that perceiving sound change alone is insufficient, as many processes of language change are complex, with entangled changes at syntactic, semantic, and phonetic levels.

Read more

7/9/2024

💬

Total Score

0

Can Large Language Models Code Like a Linguist?: A Case Study in Low Resource Sound Law Induction

Atharva Naik, Kexun Zhang, Nathaniel Robinson, Aravind Mysore, Clayton Marr, Hong Sng Rebecca Byrnes, Anna Cai, Kalvin Chang, David Mortensen

Historical linguists have long written a kind of incompletely formalized ''program'' that converts reconstructed words in an ancestor language into words in one of its attested descendants that consist of a series of ordered string rewrite functions (called sound laws). They do this by observing pairs of words in the reconstructed language (protoforms) and the descendent language (reflexes) and constructing a program that transforms protoforms into reflexes. However, writing these programs is error-prone and time-consuming. Prior work has successfully scaffolded this process computationally, but fewer researchers have tackled Sound Law Induction (SLI), which we approach in this paper by casting it as Programming by Examples. We propose a language-agnostic solution that utilizes the programming ability of Large Language Models (LLMs) by generating Python sound law programs from sound change examples. We evaluate the effectiveness of our approach for various LLMs, propose effective methods to generate additional language-agnostic synthetic data to fine-tune LLMs for SLI, and compare our method with existing automated SLI methods showing that while LLMs lag behind them they can complement some of their weaknesses.

Read more

6/19/2024

Total Score

0

Generating Feature Vectors from Phonetic Transcriptions in Cross-Linguistic Data Formats

Arne Rubehn, Jessica Nieder, Robert Forkel, Johann-Mattis List

When comparing speech sounds across languages, scholars often make use of feature representations of individual sounds in order to determine fine-grained sound similarities. Although binary feature systems for large numbers of speech sounds have been proposed, large-scale computational applications often face the challenges that the proposed feature systems -- even if they list features for several thousand sounds -- only cover a smaller part of the numerous speech sounds reflected in actual cross-linguistic data. In order to address the problem of missing data for attested speech sounds, we propose a new approach that can create binary feature vectors dynamically for all sounds that can be represented in the the standardized version of the International Phonetic Alphabet proposed by the Cross-Linguistic Transcription Systems (CLTS) reference catalog. Since CLTS is actively used in large data collections, covering more than 2,000 distinct language varieties, our procedure for the generation of binary feature vectors provides immediate access to a very large collection of multilingual wordlists. Testing our feature system in different ways on different datasets proves that the system is not only useful to provide a straightforward means to compare the similarity of speech sounds, but also illustrates its potential to be used in future cross-linguistic machine learning applications.

Read more

5/8/2024