Automatic Alignment of Discourse Relations of Different Discourse Annotation Frameworks

Read original: arXiv:2403.20196 - Published 4/9/2024 by Yingxue Fu

Automatic Alignment of Discourse Relations of Different Discourse Annotation Frameworks

Introduction

The summary is as follows:

Discourse relations are important for achieving coherence in text. Previous studies have shown the benefits of incorporating discourse relations in tasks like sentiment analysis, text summarization, and machine comprehension. Automatic discourse relation classification is a key part of discourse parsing, which is done using frameworks like Rhetorical Structure Theory (RST) and the Lexicalized Tree-Adjoining Grammar for Discourse (D-LTAG). These frameworks have been used to create large discourse corpora like the RST Discourse Treebank and the Penn Discourse Treebank.

Creating discourse corpora requires significant linguistic knowledge and is costly. However, these discourse frameworks share similar understandings of discourse relations and their role in discourse building. An approach to enlarge discourse corpora is to align existing corpora built on different frameworks so they can be used jointly. Aligning discourse relations across different annotation frameworks remains a challenging task.

The text provides an example of RST-style discourse annotation, showing how textual spans are connected by asymmetric discourse relations, with some relations represented by undirected parallel lines. The full text is recursively annotated with these discourse relations.

Figure 1: RST-style annotation (wsj_0624 in RST-DT).

This paper discusses the challenges in aligning discourse relations between two different annotation frameworks, RST-DT and PDTB. The key points are:

RST-DT and PDTB have different assumptions about discourse units and higher-level structures. RST uses clauses as elementary discourse units (EDUs), while PDTB focuses on semantic properties of arguments.
RST enforces a tree structure, while PDTB only considers local relations without commitment to a higher-level structure.
RST-DT uses 78 relation types that can be divided into subject matter and presentational relations, while PDTB has a three-level sense hierarchy and allows multiple sense labels per argument pair.
These differences make it challenging to investigate the relationship between discourse relations in the two frameworks, even when using corpora annotated in multiple frameworks in parallel.
Previous approaches have relied on string matching to align PDTB arguments and RST EDUs, but this is limited by the strong nuclearity hypothesis.
The paper proposes a fully automatic method using label embedding techniques to explore correlations between relations in RST-DT and PDTB without the need for argument matching.

Related Work

The provided text discusses research on mapping discourse relations across different frameworks. It categorizes existing research into three types:

a) Identifying commonly used relations across frameworks by analyzing definitions and examples. b) Introducing fundamental concepts for analyzing relations across frameworks. c) Directly mapping discourse relations based on corpora annotated in multiple frameworks.

The text then focuses on the third approach, summarizing studies in this direction. Key findings include:

Differences in annotation operationalization and relation granularity lead to many-to-many mappings between frameworks.
Segmentation differences present challenges, but using the strong nuclearity hypothesis can help address this.
Explicit discourse connectors in one framework can often be matched to relations in another framework.
Attempts have been made to induce implicit relations in one framework from annotations in another, but challenges remain due to segmentation differences.

The text then discusses label embeddings, which have been shown to be useful in computer vision and NLP tasks. Key points include:

Label embeddings can address issues with one-hot encoding, such as lack of robustness and failure to capture label semantics.
Label embeddings can be obtained from external sources or learned during model training, incorporating additional information such as label hierarchy or textual descriptions.
Label embeddings have been shown to be effective in data-imbalanced settings and zero-shot learning.

Method

This paper presents a method for learning label embeddings that can be used to correlate two discourse annotation frameworks D1 and D2. The key steps are:

Represent the input sequences X using a pre-trained language model like BERT. The [CLS] token representation is used as the sequence representation.
Explore different options for encoding the labels, including using BERT, RoBERTa, random initialization, and incorporating label descriptions. The label embeddings are learnable.
Apply an instance-centered contrastive loss and a label-centered contrastive loss to learn the label embeddings. These losses help mitigate issues with small batch sizes.
Add a label-embedding-based cross-entropy loss and a canonical multi-class cross-entropy loss to further improve the label embeddings and the input representations.

To evaluate the quality of the learnt label embeddings, the paper proposes computing a correlation matrix M between the label embeddings and class representation proxies. The average of the main diagonal values in M is used as an overall measure of embedding quality.

The paper also describes baseline approaches for relation classification and label embedding learning to compare against the proposed method.

$Figure 2: Illustration of the correlation matrix M. 𝐄1⁢…⁢ksubscript𝐄1…𝑘\mathbf{E}{1...k}bold_E start_POSTSUBSCRIPT 1 … italic_k end_POSTSUBSCRIPT represents the k𝑘kitalic_k learnt label embeddings and 𝐇1⁢…⁢ksubscript𝐇1…𝑘\mathbf{H}{1...k}bold_H start_POSTSUBSCRIPT 1 … italic_k end_POSTSUBSCRIPT denotes the k𝑘kitalic_k class representation proxies. After normalization, the average of the values at the diagonal (colored) is the overall measure of the quality of the learnt label embeddings.$

Figure 2: Illustration of the correlation matrix M. 𝐄1⁢…⁢ksubscript𝐄1…𝑘\mathbf{E}

{1...k}bold_E start_POSTSUBSCRIPT 1 … italic_k end_POSTSUBSCRIPT represents the k𝑘kitalic_k learnt label embeddings and 𝐇1⁢…⁢ksubscript𝐇1…𝑘\mathbf{H}

{1...k}bold_H start_POSTSUBSCRIPT 1 … italic_k end_POSTSUBSCRIPT denotes the k𝑘kitalic_k class representation proxies. After normalization, the average of the values at the diagonal (colored) is the overall measure of the quality of the learnt label embeddings.

Experiments

Here is a summary of the provided text:

The researchers focus on 16 relations for RST and PDTB L2 senses with more than 100 instances to learn label embeddings. They preprocess the data by binarizing the RST trees and mapping the 78 relations to 16 classes. For PDTB, they use sections 2-20 for training, 0-1 for validation, and 21-22 for testing.

The researchers run each model 3 times with different random seeds and report the mean and standard deviation. They use the AdamW optimizer, set the learning rate to 1e-5, and adopt early stopping. The temperature for the contrastive losses is set to 0.1.

The results show the performance on PDTB is better than RST. Adding label embeddings generally lowers F1 compared to using cross-entropy loss only, likely due to data sparsity. The random label encoder performs best on classification accuracy but has the lowest quality label embeddings.

To improve RST, the researchers use back translation data augmentation, which increases F1 and the quality of the label embeddings. Separate experiments on PDTB explicit and implicit relations show the label embeddings are more representative for explicit relations.

An ablation study finds the label-centered contrastive loss is most important, followed by the instance-centered contrastive loss and the canonical cross-entropy loss. This differs from prior work.

RST-PDTB Relation Mapping

The provided section of the paper presents the results of mapping the relations between Rhetorical Structure Theory (RST) and Penn Discourse Treebank (PDTB). The key points are:

Table 5 shows the results of mapping 11 RST relations and 12 PDTB explicit relations. The two relations with the highest cosine similarity scores (greater than 0.10) are presented.
From the RST perspective, a PDTB relation can be identified as having a much higher cosine similarity score (≥ 0.40) than the others for most RST relations.
From the PDTB perspective, the mapping results are not symmetric due to differences in relation distributions.
For the extrinsic evaluation, the authors relabeled PDTB explicit relations with RST labels based on the cosine similarity results and the relation distribution. They compared their results with those provided by Costa et al. (2023).
Using an ensemble model, the performance with the authors' method is slightly higher than the results reported in Costa et al. (2023), with an accuracy of 63.13% and F1 of 47.95%.

Conclusions

This paper describes a method for automatically aligning discourse relations from different frameworks. The approach uses label embeddings that are learned alongside input representations during a classification task. This helps address the challenge of segmentation differences, which has been a significant issue in prior studies. The results are evaluated both intrinsically and extrinsically. The authors note that the method's performance is affected by the amount of data available, and they had to exclude some relations due to insufficient training data to learn reliable label embeddings. The paper suggests comparing the method to a theoretical proposal like ISO 24617-8 as future work. Additionally, the authors state the method may extend beyond just discourse relation labeling to aligning any label sets, opening up potential applications in various scenarios.

Acknowledgments

The authors express gratitude to the anonymous reviewers for their helpful feedback and suggestions. They also thank Mark-Jan Nederhof for productive discussions and Craig Myles for recommending the use of the diagonal entries of the normalized correlation matrix as a metric.

Bibliographical References

There is no text provided to summarize.

Language Resource References

The text summarizes two discourse treebank resources: the RST Discourse Treebank and the Penn Discourse Treebank Version 3.0. The RST Discourse Treebank was distributed by the Linguistic Data Consortium (LDC) in 2002 and contains annotated text resources. The Penn Discourse Treebank Version 3.0 was distributed by the LDC in 2019 and is a newer version of the discourse treebank resource.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

Automatic Alignment of Discourse Relations of Different Discourse Annotation Frameworks

Yingxue Fu

Existing discourse corpora are annotated based on different frameworks, which show significant dissimilarities in definitions of arguments and relations and structural constraints. Despite surface differences, these frameworks share basic understandings of discourse relations. The relationship between these frameworks has been an open research question, especially the correlation between relation inventories utilized in different frameworks. Better understanding of this question is helpful for integrating discourse theories and enabling interoperability of discourse corpora annotated under different frameworks. However, studies that explore correlations between discourse relation inventories are hindered by different criteria of discourse segmentation, and expert knowledge and manual examination are typically needed. Some semi-automatic methods have been proposed, but they rely on corpora annotated in multiple frameworks in parallel. In this paper, we introduce a fully automatic approach to address the challenges. Specifically, we extend the label-anchored contrastive learning method introduced by Zhang et al. (2022b) to learn label embeddings during a classification task. These embeddings are then utilized to map discourse relations from different frameworks. We show experimental results on RST-DT (Carlson et al., 2001) and PDTB 3.0 (Prasad et al., 2018).

4/9/2024

A Novel Dependency Framework for Enhancing Discourse Data Analysis

Kun Sun, Rong Wang

The development of different theories of discourse structure has led to the establishment of discourse corpora based on these theories. However, the existence of discourse corpora established on different theoretical bases creates challenges when it comes to exploring them in a consistent and cohesive way. This study has as its primary focus the conversion of PDTB annotations into dependency structures. It employs refined BERT-based discourse parsers to test the validity of the dependency data derived from the PDTB-style corpora in English, Chinese, and several other languages. By converting both PDTB and RST annotations for the same texts into dependencies, this study also applies ``dependency distance'' metrics to examine the correlation between RST dependencies and PDTB dependencies in English. The results show that the PDTB dependency data is valid and that there is a strong correlation between the two types of dependency distance. This study presents a comprehensive approach for analyzing and evaluating discourse corpora by employing discourse dependencies to achieve unified analysis. By applying dependency representations, we can extract data from PDTB, RST, and SDRT corpora in a coherent and unified manner. Moreover, the cross-linguistic validation establishes the framework's generalizability beyond English. The establishment of this comprehensive dependency framework overcomes limitations of existing discourse corpora, supporting a diverse range of algorithms and facilitating further studies in computational discourse analysis and language sciences.

7/18/2024

Multi-Label Classification for Implicit Discourse Relation Recognition

Wanqiu Long, N. Siddharth, Bonnie Webber

Discourse relations play a pivotal role in establishing coherence within textual content, uniting sentences and clauses into a cohesive narrative. The Penn Discourse Treebank (PDTB) stands as one of the most extensively utilized datasets in this domain. In PDTB-3, the annotators can assign multiple labels to an example, when they believe that multiple relations are present. Prior research in discourse relation recognition has treated these instances as separate examples during training, and only one example needs to have its label predicted correctly for the instance to be judged as correct. However, this approach is inadequate, as it fails to account for the interdependence of labels in real-world contexts and to distinguish between cases where only one sense relation holds and cases where multiple relations hold simultaneously. In our work, we address this challenge by exploring various multi-label classification frameworks to handle implicit discourse relation recognition. We show that multi-label classification methods don't depress performance for single-label prediction. Additionally, we give comprehensive analysis of results and data. Our work contributes to advancing the understanding and application of discourse relations and provide a foundation for the future study

6/10/2024

eRST: A Signaled Graph Theory of Discourse Relations and Organization

Amir Zeldes, Tatsuya Aoyama, Yang Janet Liu, Siyao Peng, Debopam Das, Luke Gessler

In this article we present Enhanced Rhetorical Structure Theory (eRST), a new theoretical framework for computational discourse analysis, based on an expansion of Rhetorical Structure Theory (RST). The framework encompasses discourse relation graphs with tree-breaking, non-projective and concurrent relations, as well as implicit and explicit signals which give explainable rationales to our analyses. We survey shortcomings of RST and other existing frameworks, such as Segmented Discourse Representation Theory (SDRT), the Penn Discourse Treebank (PDTB) and Discourse Dependencies, and address these using constructs in the proposed theory. We provide annotation, search and visualization tools for data, and present and evaluate a freely available corpus of English annotated according to our framework, encompassing 12 spoken and written genres with over 200K tokens. Finally, we discuss automatic parsing, evaluation metrics and applications for data in our framework.

8/29/2024