Escaping the sentence-level paradigm in machine translation

2304.12959

Published 5/17/2024 by Matt Post, Marcin Junczys-Dowmunt

🤷

Abstract

It is well-known that document context is vital for resolving a range of translation ambiguities, and in fact the document setting is the most natural setting for nearly all translation. It is therefore unfortunate that machine translation -- both research and production -- largely remains stuck in a decades-old sentence-level translation paradigm. It is also an increasingly glaring problem in light of competitive pressure from large language models, which are natively document-based. Much work in document-context machine translation exists, but for various reasons has been unable to catch hold. This paper suggests a path out of this rut by addressing three impediments at once: what architectures should we use? where do we get document-level information for training them? and how do we know whether they are any good? In contrast to work on specialized architectures, we show that the standard Transformer architecture is sufficient, provided it has enough capacity. Next, we address the training data issue by taking document samples from back-translated data only, where the data is not only more readily available, but is also of higher quality compared to parallel document data, which may contain machine translation output. Finally, we propose generative variants of existing contrastive metrics that are better able to discriminate among document systems. Results in four large-data language pairs (DE$rightarrow$EN, EN$rightarrow$DE, EN$rightarrow$FR, and EN$rightarrow$RU) establish the success of these three pieces together in improving document-level performance.

Create account to get full access

Overview

The paper discusses the importance of document context for machine translation, and how current machine translation systems are still largely stuck in a sentence-level paradigm.
The paper proposes a solution that addresses three key challenges: the architecture, the training data, and the evaluation metrics.
The key findings are that the standard Transformer architecture is sufficient if it has enough capacity, using high-quality back-translated data can address the training data issue, and generative variants of existing contrastive metrics can better evaluate document-level performance.

Plain English Explanation

Machine translation systems today typically work at the sentence level, without considering the broader context of the document. However, document context is vital for resolving translation ambiguities. The authors of this paper argue that to truly advance machine translation, we need to move beyond this sentence-level approach and develop systems that can handle translation at the document level.

To tackle this challenge, the paper proposes three key ideas:

Architecture: The authors show that the standard Transformer architecture, which is widely used in language models, can be sufficient for document-level translation, as long as it has enough capacity.
Training Data: Instead of relying on parallel document-level data, which can be scarce and of lower quality, the authors suggest using high-quality back-translated data, which is more readily available.
Evaluation Metrics: The paper introduces new generative variants of existing contrastive metrics, which are better able to distinguish between document-level translation systems and identify the best-performing ones.

By addressing these three critical aspects together, the authors demonstrate significant improvements in document-level translation performance across multiple language pairs.

Technical Explanation

The paper starts by highlighting the importance of document context in machine translation, as it is crucial for resolving a range of translation ambiguities. However, the authors note that current machine translation systems, both in research and production, remain largely stuck in a sentence-level translation paradigm, despite the growing competitive pressure from large language models that are natively document-based.

To address this issue, the paper proposes a multi-pronged approach:

Architecture: The authors show that the standard Transformer architecture, which is widely used in language models, can be sufficient for document-level translation, as long as it has enough capacity. This is in contrast to prior work that focused on specialized architectures for document-context machine translation.
Training Data: The paper addresses the challenge of obtaining high-quality document-level training data by leveraging back-translated data, which is more readily available and of higher quality compared to parallel document data, which may contain machine translation output.
Evaluation Metrics: The authors propose generative variants of existing contrastive metrics, which are better able to discriminate among document-level translation systems and identify the best-performing ones.

The authors evaluate their approach on four large-data language pairs (DE→EN, EN→DE, EN→FR, and EN→RU) and demonstrate significant improvements in document-level translation performance.

Critical Analysis

The paper presents a well-designed and comprehensive approach to addressing the long-standing challenge of document-level machine translation. By focusing on the key impediments – architecture, training data, and evaluation metrics – the authors have tackled the problem from multiple angles.

One potential limitation of the study is that it primarily focuses on large-data language pairs, and it would be interesting to see how the proposed methods perform on lower-resource language pairs, where the availability of high-quality training data may be more scarce.

Additionally, the paper does not delve into the specific reasons why previous approaches to document-level machine translation have been unable to gain traction. A deeper exploration of the historical context and the shortcomings of prior work could have provided further insights and a more robust foundation for the current research.

That said, the authors' use of the standard Transformer architecture, coupled with their innovative approaches to training data and evaluation, represents a significant contribution to the field. Their findings suggest that the future of machine translation may indeed lie in large language models, and this paper lays the groundwork for further advancements in this direction.

Conclusion

This paper proposes a novel paradigm for boosting the translation capabilities of large language models by addressing three key challenges: architecture, training data, and evaluation metrics. The authors demonstrate that the standard Transformer architecture, when provided with sufficient capacity, can effectively handle document-level translation, and that using high-quality back-translated data can address the training data issue. Additionally, the introduction of generative variants of existing contrastive metrics enables better evaluation of document-level translation systems.

The findings of this study have important implications for the future of machine translation, as they suggest a path forward for moving beyond the sentence-level paradigm that has dominated the field for decades. By embracing document-level context and leveraging the power of large language models, the authors have laid the groundwork for a new era of more robust and accurate machine translation.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Related Papers

Recovering document annotations for sentence-level bitext

Rachel Wicks, Matt Post, Philipp Koehn

Data availability limits the scope of any given task. In machine translation, historical models were incapable of handling longer contexts, so the lack of document-level datasets was less noticeable. Now, despite the emergence of long-sequence methods, we remain within a sentence-level paradigm and without data to adequately approach context-aware machine translation. Most large-scale datasets have been processed through a pipeline that discards document-level metadata. In this work, we reconstruct document-level information for three (ParaCrawl, News Commentary, and Europarl) large datasets in German, French, Spanish, Italian, Polish, and Portuguese (paired with English). We then introduce a document-level filtering technique as an alternative to traditional bitext filtering. We present this filtering with analysis to show that this method prefers context-consistent translations rather than those that may have been sentence-level machine translated. Last we train models on these longer contexts and demonstrate improvement in document-level translation without degradation of sentence-level translation. We release our dataset, ParaDocs, and resulting models as a resource to the community.

6/7/2024

cs.CL

Efficiently Exploring Large Language Models for Document-Level Machine Translation with In-context Learning

Menglong Cui, Jiangcun Du, Shaolin Zhu, Deyi Xiong

Large language models (LLMs) exhibit outstanding performance in machine translation via in-context learning. In contrast to sentence-level translation, document-level translation (DOCMT) by LLMs based on in-context learning faces two major challenges: firstly, document translations generated by LLMs are often incoherent; secondly, the length of demonstration for in-context learning is usually limited. To address these issues, we propose a Context-Aware Prompting method (CAP), which enables LLMs to generate more accurate, cohesive, and coherent translations via in-context learning. CAP takes into account multi-level attention, selects the most relevant sentences to the current one as context, and then generates a summary from these collected sentences. Subsequently, sentences most similar to the summary are retrieved from the datastore as demonstrations, which effectively guide LLMs in generating cohesive and coherent translations. We conduct extensive experiments across various DOCMT tasks, and the results demonstrate the effectiveness of our approach, particularly in zero pronoun translation (ZPT) and literary translation tasks.

6/12/2024

cs.CL

Adapting Large Language Models for Document-Level Machine Translation

Minghao Wu, Thuy-Trang Vu, Lizhen Qu, George Foster, Gholamreza Haffari

Large language models (LLMs) have significantly advanced various natural language processing (NLP) tasks. Recent research indicates that moderately-sized LLMs often outperform larger ones after task-specific fine-tuning. This study focuses on adapting LLMs for document-level machine translation (DocMT) for specific language pairs. We first investigate the impact of prompt strategies on translation performance and then conduct extensive experiments using two fine-tuning methods, three LLM backbones, and 18 translation tasks across nine language pairs. Our results show that specialized models can sometimes surpass GPT-4 in translation performance but still face issues like off-target translation due to error propagation in decoding. We provide an in-depth analysis of these LLMs tailored for DocMT, examining translation errors, discourse phenomena, training strategies, the scaling law of parallel documents, recent test set evaluations, and zero-shot crosslingual transfer. Our findings highlight the strengths and limitations of LLM-based DocMT models and provide a foundation for future research.

6/11/2024

cs.CL

Reconsidering Sentence-Level Sign Language Translation

Garrett Tanzer, Maximus Shengelia, Ken Harrenstien, David Uthus

Historically, sign language machine translation has been posed as a sentence-level task: datasets consisting of continuous narratives are chopped up and presented to the model as isolated clips. In this work, we explore the limitations of this task framing. First, we survey a number of linguistic phenomena in sign languages that depend on discourse-level context. Then as a case study, we perform the first human baseline for sign language translation that actually substitutes a human into the machine learning task framing, rather than provide the human with the entire document as context. This human baseline -- for ASL to English translation on the How2Sign dataset -- shows that for 33% of sentences in our sample, our fluent Deaf signer annotators were only able to understand key parts of the clip in light of additional discourse-level context. These results underscore the importance of understanding and sanity checking examples when adapting machine learning to new domains.

6/18/2024

cs.CL