RoTHP: Rotary Position Embedding-based Transformer Hawkes Process

Read original: arXiv:2405.06985 - Published 5/14/2024 by Anningzhe Gao, Shan Dai

💬

Overview

Temporal Point Processes (TPPs) are a common way to model event sequences like financial transactions or user behavior on social media
Hawkes Process, a type of TPP, is widely used for this task
Neural Temporal Point Processes, including the Transformer Hawkes Process (THP), have shown improved performance over traditional methods
However, THPs still struggle with "sequence prediction" - training on past events and predicting future ones
THPs also show performance sensitivity to temporal changes or noise in the data

Plain English Explanation

Temporal Point Processes (TPPs) are a way to model sequences of events that happen over time, like the transactions in a financial market or the posts made by users on a social network. One popular type of TPP is called the Hawkes Process, which is good at capturing how new events can be influenced by past events.

As neural networks have become more powerful, researchers have developed "neural" versions of Hawkes Processes that can learn complex patterns in event data. One of these is called the Transformer Hawkes Process (THP), which uses a special type of neural network called a Transformer to make predictions.

While THPs have shown promise, they still have some limitations. One is the "sequence prediction" problem - they are trained on past event sequences but then need to make predictions about future events, which can be challenging. Another issue is that the way THPs represent the timing of events can make them sensitive to changes or noise in the timestamps.

Technical Explanation

To address these problems, the researchers propose a new architecture called the Rotary Position Embedding-based THP (RoTHP). The key innovation is the use of "relative time embeddings" - a way of encoding the timing of events that is invariant to translations in time. This, the researchers show, gives RoTHP better "sequence prediction flexibility" compared to previous THPs.

Theoretically, the relative time embeddings used in RoTHP provide translation invariance, meaning the model's performance is not as sensitive to shifts or noise in the timestamps. The researchers also demonstrate empirically that RoTHP outperforms THP on sequence prediction tasks, especially when there are temporal changes in the data.

Critical Analysis

The paper provides a solid technical contribution by introducing the RoTHP architecture and demonstrating its advantages over previous Transformer-based Hawkes Process models. The use of relative time embeddings is an interesting and promising approach to address the sequence prediction and timestamp sensitivity issues.

That said, the paper does not delve deeply into the potential limitations or caveats of the RoTHP model. For example, it would be helpful to understand how the model scales to very long event sequences, or how it might perform on datasets with different characteristics than the ones tested.

Additionally, while the theoretical analysis of the translation invariance property is compelling, the researchers could further investigate the practical implications and tradeoffs of this design choice. It's possible that the relative time encoding comes at the expense of losing some temporal information that could be valuable in certain applications.

Overall, this is a well-executed piece of research that makes a worthwhile contribution to the field of neural temporal point processes. Encouraging readers to think critically about the findings and consider potential areas for future work is an important part of a balanced review.

Conclusion

The proposed Rotary Position Embedding-based Transformer Hawkes Process (RoTHP) offers an innovative solution to address limitations of previous neural Hawkes Process models. By using relative time embeddings, RoTHP demonstrates improved performance on sequence prediction tasks and better resilience to temporal changes in the data.

This research advances the state-of-the-art in neural temporal point processes and could have valuable applications in domains like finance, social media, and beyond, where modeling asynchronous event sequences is crucial. As the field continues to evolve, it will be important to further explore the practical tradeoffs and limitations of approaches like RoTHP to ensure they can be reliably deployed in real-world settings.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

💬

RoTHP: Rotary Position Embedding-based Transformer Hawkes Process

Anningzhe Gao, Shan Dai

Temporal Point Processes (TPPs), especially Hawkes Process are commonly used for modeling asynchronous event sequences data such as financial transactions and user behaviors in social networks. Due to the strong fitting ability of neural networks, various neural Temporal Point Processes are proposed, among which the Neural Hawkes Processes based on self-attention such as Transformer Hawkes Process (THP) achieve distinct performance improvement. Although the THP has gained increasing studies, it still suffers from the {sequence prediction issue}, i.e., training on history sequences and inferencing about the future, which is a prevalent paradigm in realistic sequence analysis tasks. What's more, conventional THP and its variants simply adopt initial sinusoid embedding in transformers, which shows performance sensitivity to temporal change or noise in sequence data analysis by our empirical study. To deal with the problems, we propose a new Rotary Position Embedding-based THP (RoTHP) architecture in this paper. Notably, we show the translation invariance property and {sequence prediction flexibility} of our RoTHP induced by the {relative time embeddings} when coupled with Hawkes process theoretically. Furthermore, we demonstrate empirically that our RoTHP can be better generalized in sequence data scenarios with timestamp translations and in sequence prediction tasks.

5/14/2024

Mamba Hawkes Process

Anningzhe Gao, Shan Dai, Yan Hu

Irregular and asynchronous event sequences are prevalent in many domains, such as social media, finance, and healthcare. Traditional temporal point processes (TPPs), like Hawkes processes, often struggle to model mutual inhibition and nonlinearity effectively. While recent neural network models, including RNNs and Transformers, address some of these issues, they still face challenges with long-term dependencies and computational efficiency. In this paper, we introduce the Mamba Hawkes Process (MHP), which leverages the Mamba state space architecture to capture long-range dependencies and dynamic event interactions. Our results show that MHP outperforms existing models across various datasets. Additionally, we propose the Mamba Hawkes Process Extension (MHP-E), which combines Mamba and Transformer models to enhance predictive capabilities. We present the novel application of the Mamba architecture to Hawkes processes, a flexible and extensible model structure, and a theoretical analysis of the synergy between state space models and Hawkes processes. Experimental results demonstrate the superior performance of both MHP and MHP-E, advancing the field of temporal point process modeling.

7/9/2024

Interaction Event Forecasting in Multi-Relational Recursive HyperGraphs: A Temporal Point Process Approach

Tony Gracious, Ambedkar Dukkipati

Modeling the dynamics of interacting entities using an evolving graph is an essential problem in fields such as financial networks and e-commerce. Traditional approaches focus primarily on pairwise interactions, limiting their ability to capture the complexity of real-world interactions involving multiple entities and their intricate relationship structures. This work addresses the problem of forecasting higher-order interaction events in multi-relational recursive hypergraphs. This is done using a dynamic graph representation learning framework that can capture complex relationships involving multiple entities. The proposed model, textit{Relational Recursive Hyperedge Temporal Point Process} (RRHyperTPP) uses an encoder that learns a dynamic node representation based on the historical interaction patterns and then a hyperedge link prediction based decoder to model the event's occurrence. These learned representations are then used for downstream tasks involving forecasting the type and time of interactions. The main challenge in learning from hyperedge events is that the number of possible hyperedges grows exponentially with the number of nodes in the network. This will make the computation of negative log-likelihood of the temporal point process expensive, as the calculation of survival function requires a summation over all possible hyperedges. In our work, we use noise contrastive estimation to learn the parameters of our model, and we have experimentally shown that our models perform better than previous state-of-the-art methods for interaction forecasting.

4/30/2024

Interpretable Neural Temporal Point Processes for Modelling Electronic Health Records

Bingqing Liu

Electronic Health Records (EHR) can be represented as temporal sequences that record the events (medical visits) from patients. Neural temporal point process (NTPP) has achieved great success in modeling event sequences that occur in continuous time space. However, due to the black-box nature of neural networks, existing NTPP models fall short in explaining the dependencies between different event types. In this paper, inspired by word2vec and Hawkes process, we propose an interpretable framework inf2vec for event sequence modelling, where the event influences are directly parameterized and can be learned end-to-end. In the experiment, we demonstrate the superiority of our model on event prediction as well as type-type influences learning.

4/15/2024