Bi-Mamba+: Bidirectional Mamba for Time Series Forecasting

2404.15772

Published 6/28/2024 by Aobo Liang, Xingguo Jiang, Yan Sun, Xiaohou Shi, Ke Li

Bi-Mamba+: Bidirectional Mamba for Time Series Forecasting

Abstract

Long-term time series forecasting (LTSF) provides longer insights into future trends and patterns. Over the past few years, deep learning models especially Transformers have achieved advanced performance in LTSF tasks. However, LTSF faces inherent challenges such as long-term dependencies capturing and sparse semantic characteristics. Recently, a new state space model (SSM) named Mamba is proposed. With the selective capability on input data and the hardware-aware parallel computing algorithm, Mamba has shown great potential in balancing predicting performance and computational efficiency compared to Transformers. To enhance Mamba's ability to preserve historical information in a longer range, we design a novel Mamba+ block by adding a forget gate inside Mamba to selectively combine the new features with the historical features in a complementary manner. Furthermore, we apply Mamba+ both forward and backward and propose Bi-Mamba+, aiming to promote the model's ability to capture interactions among time series elements. Additionally, multivariate time series data in different scenarios may exhibit varying emphasis on intra- or inter-series dependencies. Therefore, we propose a series-relation-aware decider that controls the utilization of channel-independent or channel-mixing tokenization strategy for specific datasets. Extensive experiments on 8 real-world datasets show that our model achieves more accurate predictions compared with state-of-the-art methods.

Create account to get full access

Overview

This paper introduces Bi-Mamba4TS, a novel bidirectional time series forecasting model based on the Mamba transformer architecture.
Bi-Mamba4TS combines a forward and a backward pass through the model to capture both short-term and long-term dependencies in time series data.
The authors demonstrate the effectiveness of Bi-Mamba4TS on several benchmark datasets, showing improved performance over state-of-the-art time series forecasting models.

Plain English Explanation

In this paper, the researchers have developed a new deep learning model called Bi-Mamba4TS for forecasting time series data. Time series data refers to a sequence of values collected over time, such as stock prices, weather measurements, or sales figures.

Forecasting time series data is an important problem in many industries, as it allows organizations to plan and make decisions based on predictions of future values. The researchers' model, Bi-Mamba4TS, is based on a transformer architecture called Mamba, which has been shown to be effective for time series forecasting.

The key innovation in Bi-Mamba4TS is that it processes the time series data in both a forward and a backward direction. This allows the model to capture both short-term and long-term patterns in the data, leading to more accurate forecasts. The researchers show that Bi-Mamba4TS outperforms other state-of-the-art time series forecasting models on several benchmark datasets.

Technical Explanation

The Bi-Mamba4TS model is built upon the Mamba transformer architecture, which has been shown to be effective for time series forecasting and multivariate data analysis. Bi-Mamba4TS extends the Mamba architecture by incorporating a bidirectional processing mechanism, inspired by the Dual-Path MAMBA model.

In the Bi-Mamba4TS model, the input time series is processed in both a forward and a backward direction using separate Mamba transformer blocks. The outputs from the forward and backward passes are then combined to produce the final forecast. This bidirectional processing allows the model to capture both short-term and long-term dependencies in the data, leading to improved forecasting performance.

The authors evaluate Bi-Mamba4TS on several benchmark time series forecasting datasets, including M4, ND5, and Electricity, and compare its performance to other state-of-the-art models such as ST-MambaSync. The results demonstrate that Bi-Mamba4TS achieves superior forecasting accuracy, outperforming the competing models across a range of metrics.

Critical Analysis

The paper provides a thorough evaluation of the Bi-Mamba4TS model, and the authors have made efforts to compare its performance against strong baselines. However, there are a few potential limitations and areas for further research:

The paper does not delve deeply into the computational complexity and training requirements of the Bi-Mamba4TS model. As the model incorporates two separate Mamba transformer blocks, it may have a higher computational burden compared to some simpler models.
The authors do not explore the interpretability of the Bi-Mamba4TS model, which is an important consideration for many real-world applications. Understanding the model's internal mechanisms and the factors driving its predictions could be valuable for gaining trust and insights.
The paper focuses solely on univariate time series forecasting, but many practical applications involve multivariate data. Extending the Bi-Mamba4TS model to handle multivariate time series could further broaden its applicability.
The authors could have explored the model's performance in the presence of missing data or other common challenges in time series forecasting, which would provide a more comprehensive understanding of its robustness.

Conclusion

The Bi-Mamba4TS model proposed in this paper represents a significant advancement in the field of time series forecasting. By incorporating a bidirectional processing mechanism, the model is able to capture both short-term and long-term dependencies in the data, leading to improved forecasting accuracy. The positive results on benchmark datasets suggest that Bi-Mamba4TS could be a valuable tool for organizations across various industries, from finance to supply chain management, to make more accurate predictions and better-informed decisions.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Related Papers

MambaTS: Improved Selective State Space Models for Long-term Time Series Forecasting

Xiuding Cai, Yaoyao Zhu, Xueyao Wang, Yu Yao

In recent years, Transformers have become the de-facto architecture for long-term sequence forecasting (LTSF), but faces challenges such as quadratic complexity and permutation invariant bias. A recent model, Mamba, based on selective state space models (SSMs), has emerged as a competitive alternative to Transformer, offering comparable performance with higher throughput and linear complexity related to sequence length. In this study, we analyze the limitations of current Mamba in LTSF and propose four targeted improvements, leading to MambaTS. We first introduce variable scan along time to arrange the historical information of all the variables together. We suggest that causal convolution in Mamba is not necessary for LTSF and propose the Temporal Mamba Block (TMB). We further incorporate a dropout mechanism for selective parameters of TMB to mitigate model overfitting. Moreover, we tackle the issue of variable scan order sensitivity by introducing variable permutation training. We further propose variable-aware scan along time to dynamically discover variable relationships during training and decode the optimal variable scan order by solving the shortest path visiting all nodes problem during inference. Extensive experiments conducted on eight public datasets demonstrate that MambaTS achieves new state-of-the-art performance.

5/28/2024

cs.LG cs.AI

🔎

Integrating Mamba and Transformer for Long-Short Range Time Series Forecasting

Xiongxiao Xu, Yueqing Liang, Baixiang Huang, Zhiling Lan, Kai Shu

Time series forecasting is an important problem and plays a key role in a variety of applications including weather forecasting, stock market, and scientific simulations. Although transformers have proven to be effective in capturing dependency, its quadratic complexity of attention mechanism prevents its further adoption in long-range time series forecasting, thus limiting them attend to short-range range. Recent progress on state space models (SSMs) have shown impressive performance on modeling long range dependency due to their subquadratic complexity. Mamba, as a representative SSM, enjoys linear time complexity and has achieved strong scalability on tasks that requires scaling to long sequences, such as language, audio, and genomics. In this paper, we propose to leverage a hybrid framework Mambaformer that internally combines Mamba for long-range dependency, and Transformer for short range dependency, for long-short range forecasting. To the best of our knowledge, this is the first paper to combine Mamba and Transformer architecture in time series data. We investigate possible hybrid architectures to combine Mamba layer and attention layer for long-short range time series forecasting. The comparative study shows that the Mambaformer family can outperform Mamba and Transformer in long-short range time series forecasting problem. The code is available at https://github.com/XiongxiaoXu/Mambaformerin-Time-Series.

4/24/2024

cs.LG cs.AI

Is Mamba Effective for Time Series Forecasting?

Zihan Wang, Fanheng Kong, Shi Feng, Ming Wang, Xiaocui Yang, Han Zhao, Daling Wang, Yifei Zhang

In the realm of time series forecasting (TSF), it is imperative for models to adeptly discern and distill hidden patterns within historical time series data to forecast future states. Transformer-based models exhibit formidable efficacy in TSF, primarily attributed to their advantage in apprehending these patterns. However, the quadratic complexity of the Transformer leads to low computational efficiency and high costs, which somewhat hinders the deployment of the TSF model in real-world scenarios. Recently, Mamba, a selective state space model, has gained traction due to its ability to process dependencies in sequences while maintaining near-linear complexity. For TSF tasks, these characteristics enable Mamba to comprehend hidden patterns as the Transformer and reduce computational overhead compared to the Transformer. Therefore, we propose a Mamba-based model named Simple-Mamba (S-Mamba) for TSF. Specifically, we tokenize the time points of each variate autonomously via a linear layer. A bidirectional Mamba layer is utilized to extract inter-variate correlations and a Feed-Forward Network is set to learn temporal dependencies. Finally, the generation of forecast outcomes through a linear mapping layer. Experiments on thirteen public datasets prove that S-Mamba maintains low computational overhead and achieves leading performance. Furthermore, we conduct extensive experiments to explore Mamba's potential in TSF tasks. Our code is available at https://github.com/wzhwzhwzh0921/S-D-Mamba.

4/30/2024

cs.LG

🤷

Mamba: Linear-Time Sequence Modeling with Selective State Spaces

Albert Gu, Tri Dao

Foundation models, now powering most of the exciting applications in deep learning, are almost universally based on the Transformer architecture and its core attention module. Many subquadratic-time architectures such as linear attention, gated convolution and recurrent models, and structured state space models (SSMs) have been developed to address Transformers' computational inefficiency on long sequences, but they have not performed as well as attention on important modalities such as language. We identify that a key weakness of such models is their inability to perform content-based reasoning, and make several improvements. First, simply letting the SSM parameters be functions of the input addresses their weakness with discrete modalities, allowing the model to selectively propagate or forget information along the sequence length dimension depending on the current token. Second, even though this change prevents the use of efficient convolutions, we design a hardware-aware parallel algorithm in recurrent mode. We integrate these selective SSMs into a simplified end-to-end neural network architecture without attention or even MLP blocks (Mamba). Mamba enjoys fast inference (5$times$ higher throughput than Transformers) and linear scaling in sequence length, and its performance improves on real data up to million-length sequences. As a general sequence model backbone, Mamba achieves state-of-the-art performance across several modalities such as language, audio, and genomics. On language modeling, our Mamba-3B model outperforms Transformers of the same size and matches Transformers twice its size, both in pretraining and downstream evaluation.

6/3/2024

cs.LG cs.AI