Historical Ink: Semantic Shift Detection for 19th Century Spanish

Read original: arXiv:2407.12852 - Published 7/22/2024 by Tony Montes, Laura Manrique-G'omez, Rub'en Manrique
Total Score

0

Historical Ink: Semantic Shift Detection for 19th Century Spanish

Sign in to get full access

or

If you already have an account, we'll log you in

Overview

  • This paper explores the detection of semantic shift - how the meaning of words changes over time - in 19th century Spanish from Latin America.
  • The researchers develop a novel approach to identifying semantic shift by leveraging historical text corpora and language models.
  • Their method is tested on a corpus of 19th century Spanish texts, providing insights into how word meanings evolved during this period.

Plain English Explanation

The research paper investigates how the meanings of words changed over time in 19th century Spanish from Latin America. Words can take on new meanings or shift in their usage as language evolves, and the researchers wanted to track these changes. To do this, they developed a new technique that uses historical text collections and language models - computer programs that understand the relationships between words.

The researchers tested their approach on a large set of 19th century Spanish texts from Latin America. By analyzing the shifts in word meanings over this time period, they were able to uncover interesting insights about how the Spanish language was changing and developing. This kind of analysis of historical language can provide valuable information about cultural and societal changes that occurred alongside linguistic evolution.

Technical Explanation

The paper presents a novel approach to detecting semantic shift in 19th century Spanish from Latin America. The researchers leverage historical text corpora and language models to identify changes in word meanings over time.

Their method involves training a language model on the historical corpus and then using that model to detect shifts in the contextual usage of words. This builds on prior work in characterizing semantic change and evaluating lexical semantic change.

The researchers test their approach on a corpus of 19th century Spanish texts from Latin America. By analyzing the semantic shifts observed in this data, they are able to gain insights into the linguistic and cultural evolution of the Spanish language during this historical period.

Critical Analysis

The paper presents a compelling approach to detecting semantic shift in historical text corpora. The use of language models is a powerful technique that can uncover nuanced changes in word meanings over time. However, the researchers acknowledge some limitations to their method, such as the potential biases in the underlying text data.

Additionally, while the focus on 19th century Latin American Spanish is valuable, it would be interesting to see how the technique performs on text from other historical periods or regions. Expanding the scope of the analysis could provide a richer understanding of semantic change in the Spanish language.

Overall, this research makes an important contribution to the field of lexical semantic change detection. The insights gained from studying historical language evolution can shed light on broader societal and cultural transformations. Further development and application of these methods could yield valuable discoveries about the dynamic nature of language.

Conclusion

This paper presents a novel approach to detecting semantic shift in 19th century Spanish from Latin America. By leveraging historical text corpora and language models, the researchers were able to uncover interesting insights about how word meanings evolved during this period.

The findings from this study have implications for our understanding of the cultural and linguistic changes that occurred in 19th century Latin America. The techniques developed could also be applied to other historical language datasets, potentially yielding further valuable discoveries about the dynamic nature of language over time.



This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

Historical Ink: Semantic Shift Detection for 19th Century Spanish
Total Score

0

Historical Ink: Semantic Shift Detection for 19th Century Spanish

Tony Montes, Laura Manrique-G'omez, Rub'en Manrique

This paper explores the evolution of word meanings in 19th-century Spanish texts, with an emphasis on Latin American Spanish, using computational linguistics techniques. It addresses the Semantic Shift Detection (SSD) task, which is crucial for understanding linguistic evolution, particularly in historical contexts. The study focuses on analyzing a set of Spanish target words. To achieve this, a 19th-century Spanish corpus is constructed, and a customizable pipeline for SSD tasks is developed. This pipeline helps find the senses of a word and measure their semantic change between two corpora using fine-tuned BERT-like models with old Spanish texts for both Latin American and general Spanish cases. The results provide valuable insights into the cultural and societal shifts reflected in language changes over time.

Read more

7/22/2024

🔎

Total Score

0

A Survey on Contextualised Semantic Shift Detection

Stefano Montanelli, Francesco Periti

Semantic Shift Detection (SSD) is the task of identifying, interpreting, and assessing the possible change over time in the meanings of a target word. Traditionally, SSD has been addressed by linguists and social scientists through manual and time-consuming activities. In the recent years, computational approaches based on Natural Language Processing and word embeddings gained increasing attention to automate SSD as much as possible. In particular, over the past three years, significant advancements have been made almost exclusively based on word contextualised embedding models, which can handle the multiple usages/meanings of the words and better capture the related semantic shifts. In this paper, we survey the approaches based on contextualised embeddings for SSD (i.e., CSSDetection) and we propose a classification framework characterised by meaning representation, time-awareness, and learning modality dimensions. The framework is exploited i) to review the measures for shift assessment, ii) to compare the approaches on performance, and iii) to discuss the current issues in terms of scalability, interpretability, and robustness. Open challenges and future research directions about CSSDetection are finally outlined.

Read more

6/12/2024

Historical Ink: 19th Century Latin American Spanish Newspaper Corpus with LLM OCR Correction
Total Score

0

Historical Ink: 19th Century Latin American Spanish Newspaper Corpus with LLM OCR Correction

Laura Manrique-G'omez, Tony Montes, Rub'en Manrique

This paper presents two significant contributions: first, a novel dataset of 19th-century Latin American press texts, which addresses the lack of specialized corpora for historical and linguistic analysis in this region. Second, it introduces a framework for OCR error correction and linguistic surface form detection in digitized corpora, utilizing a Large Language Model. This framework is adaptable to various contexts and, in this paper, is specifically applied to the newly created dataset.

Read more

7/19/2024

Survey in Characterization of Semantic Change
Total Score

0

Survey in Characterization of Semantic Change

Jader Martins Camboim de S'a, Marcos Da Silveira, C'edric Pruski

Live languages continuously evolve to integrate the cultural change of human societies. This evolution manifests through neologisms (new words) or textbf{semantic changes} of words (new meaning to existing words). Understanding the meaning of words is vital for interpreting texts coming from different cultures (regionalism or slang), domains (e.g., technical terms), or periods. In computer science, these words are relevant to computational linguistics algorithms such as translation, information retrieval, question answering, etc. Semantic changes can potentially impact the quality of the outcomes of these algorithms. Therefore, it is important to understand and characterize these changes formally. The study of this impact is a recent problem that has attracted the attention of the computational linguistics community. Several approaches propose methods to detect semantic changes with good precision, but more effort is needed to characterize how the meaning of words changes and to reason about how to reduce the impact of semantic change. This survey provides an understandable overview of existing approaches to the textit{characterization of semantic changes} and also formally defines three classes of characterizations: if the meaning of a word becomes more general or narrow (change in dimension) if the word is used in a more pejorative or positive/ameliorated sense (change in orientation), and if there is a trend to use the word in a, for instance, metaphoric or metonymic context (change in relation). We summarized the main aspects of the selected publications in a table and discussed the needs and trends in the research activities on semantic change characterization.

Read more

7/19/2024