Tiny Time Mixers (TTMs): Fast Pre-trained Models for Enhanced Zero/Few-Shot Forecasting of Multivariate Time Series

2401.03955

Published 4/10/2024 by Vijay Ekambaram, Arindam Jati, Nam H. Nguyen, Pankaj Dayama, Chandra Reddy, Wesley M. Gifford, Jayant Kalagnanam

cs.LG cs.AI

🤿

Abstract

Large pre-trained models for zero/few-shot learning excel in language and vision domains but encounter challenges in multivariate time series (TS) due to the diverse nature and scarcity of publicly available pre-training data. Consequently, there has been a recent surge in utilizing pre-trained large language models (LLMs) with token adaptations for TS forecasting. These approaches employ cross-domain transfer learning and surprisingly yield impressive results. However, these models are typically very slow and large (~billion parameters) and do not consider cross-channel correlations. To address this, we present Tiny Time Mixers (TTM), a significantly small model based on the lightweight TSMixer architecture. TTM marks the first success in developing fast and tiny general pre-trained models (<1M parameters), exclusively trained on public TS datasets, with effective transfer learning capabilities for forecasting. To tackle the complexity of pre-training on multiple datasets with varied temporal resolutions, we introduce several novel enhancements such as adaptive patching, dataset augmentation via downsampling, and resolution prefix tuning. Moreover, we employ a multi-level modeling strategy to effectively model channel correlations and infuse exogenous signals during fine-tuning, a crucial capability lacking in existing benchmarks. TTM shows significant accuracy gains (12-38%) over popular benchmarks in few/zero-shot forecasting. It also drastically reduces the compute needs as compared to LLM-TS methods, with a 14X cut in learnable parameters, 106X less total parameters, and substantial reductions in fine-tuning (65X) and inference time (54X). In fact, TTM's zero-shot often surpasses the few-shot results in many popular benchmarks, highlighting the efficacy of our approach. Models and source code are available at https://huggingface.co/ibm/TTM

Get summaries of the top AI research delivered straight to your inbox:

Overview

Large pre-trained models excel in language and vision tasks but face challenges in multivariate time series (TS) forecasting due to limited public data for pre-training.
Recent approaches use pre-trained large language models (LLMs) with token adaptations for TS forecasting, yielding impressive results through cross-domain transfer learning.
However, these LLM-based models are slow, large, and do not consider cross-channel correlations.
To address these issues, the paper introduces "Tiny Time Mixers" (TTM), a small (< 1M parameters) general pre-trained model exclusively trained on public TS datasets, with effective transfer learning capabilities.

Plain English Explanation

Large AI models trained on huge datasets have become very good at tasks like understanding language and identifying objects in images. However, these models struggle when it comes to forecasting time series data, such as predicting future stock prices or weather patterns. This is because there is a lack of publicly available datasets that these models can be pre-trained on for time series tasks.

To overcome this challenge, researchers have started using pre-trained language models and adapting them for time series forecasting. These adapted models can "transfer" the knowledge they've gained from language tasks to time series problems, and they've been shown to perform quite well. But the downside is that these language models are very large, slow, and don't consider the relationships between different variables in the time series data.

The paper introduces a new model called "Tiny Time Mixers" (TTM) that addresses these issues. TTM is a much smaller and faster model (less than 1 million parameters) that is trained specifically on time series datasets. By using several novel techniques, the researchers were able to create a general-purpose time series model that can be quickly fine-tuned to different forecasting problems and outperforms the larger, slower language-based models. In fact, TTM's "zero-shot" performance, where it makes predictions without any fine-tuning, is often better than the "few-shot" performance of the larger models, highlighting the effectiveness of the researchers' approach.

Technical Explanation

The paper presents "Tiny Time Mixers" (TTM), a small (< 1M parameters) general pre-trained model for multivariate time series (TS) forecasting. This addresses the challenges faced by large pre-trained models, which excel in language and vision but struggle with TS data due to the diverse nature and scarcity of public pre-training data.

The authors employ a lightweight TSMixer architecture and introduce several novel enhancements to effectively pre-train on multiple TS datasets with varied temporal resolutions:

Adaptive Patching: The model adaptively learns patch sizes based on the input resolution to better capture temporal patterns.
Dataset Augmentation via Downsampling: The training data is augmented by downsampling high-resolution TS, allowing the model to learn cross-resolution representations.
Resolution Prefix Tuning: A resolution prefix is prepended to the input, enabling the model to better handle multi-resolution TS during fine-tuning.

Additionally, the authors employ a multi-level modeling strategy to effectively capture cross-channel correlations and infuse exogenous signals during fine-tuning, a capability lacking in existing benchmarks.

The results show that TTM achieves significant accuracy gains (12-38%) over popular benchmarks in few/zero-shot TS forecasting. Importantly, TTM's zero-shot performance often surpasses the few-shot results of larger LLM-based models, demonstrating the effectiveness of the proposed approach.

Compared to LLM-TS methods, TTM drastically reduces the compute needs, with a 14X cut in learnable parameters, 106X less total parameters, and substantial reductions in fine-tuning (65X) and inference time (54X).

Critical Analysis

The paper presents a promising approach to developing small, efficient, and effective pre-trained models for time series forecasting. The authors' focus on tackling the unique challenges of TS data, such as varied temporal resolutions and cross-channel correlations, is commendable.

However, the paper could have discussed some potential limitations or caveats of the TTM model. For example, it's unclear how well the model would perform on highly complex or long-term TS data, or how it would scale to even larger and more diverse TS datasets. Additionally, the paper does not provide much insight into the model's interpretability or the ability to understand the underlying patterns it has learned.

Furthermore, the paper could have discussed potential real-world deployment challenges, such as the availability and quality of exogenous data sources that the model relies on during fine-tuning, or the robustness of the model to noisy or missing data in practical TS forecasting scenarios.

Despite these minor limitations, the paper's contribution in developing a highly efficient and effective pre-trained model for TS forecasting is significant. The open-sourcing of the code and pre-trained models will likely spur further research and innovation in this area.

Conclusion

The paper introduces "Tiny Time Mixers" (TTM), a small and efficient pre-trained model for multivariate time series forecasting that addresses the limitations of large language models in this domain. By employing novel techniques like adaptive patching, dataset augmentation, and multi-level modeling, the researchers have created a general-purpose TS model that can be quickly fine-tuned to various forecasting tasks with impressive accuracy.

The key strengths of TTM are its small size, fast performance, and ability to outperform larger models in both few-shot and zero-shot scenarios. This highlights the potential of developing specialized pre-trained models for time series data, rather than relying solely on adapting language models. The open-sourcing of TTM's code and pre-trained models will likely accelerate progress in this area and lead to more efficient and effective time series forecasting solutions.

Related Papers

🤔

MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training

Brandon McKinzie, Zhe Gan, Jean-Philippe Fauconnier, Sam Dodge, Bowen Zhang, Philipp Dufter, Dhruti Shah, Xianzhi Du, Futang Peng, Floris Weers, Anton Belyi, Haotian Zhang, Karanjeet Singh, Doug Kang, Ankur Jain, Hongyu H`e, Max Schwarzer, Tom Gunter, Xiang Kong, Aonan Zhang, Jianyu Wang, Chong Wang, Nan Du, Tao Lei, Sam Wiseman, Guoli Yin, Mark Lee, Zirui Wang, Ruoming Pang, Peter Grasch, Alexander Toshev, Yinfei Yang

In this work, we discuss building performant Multimodal Large Language Models (MLLMs). In particular, we study the importance of various architecture components and data choices. Through careful and comprehensive ablations of the image encoder, the vision language connector, and various pre-training data choices, we identified several crucial design lessons. For example, we demonstrate that for large-scale multimodal pre-training using a careful mix of image-caption, interleaved image-text, and text-only data is crucial for achieving state-of-the-art (SOTA) few-shot results across multiple benchmarks, compared to other published pre-training results. Further, we show that the image encoder together with image resolution and the image token count has substantial impact, while the vision-language connector design is of comparatively negligible importance. By scaling up the presented recipe, we build MM1, a family of multimodal models up to 30B parameters, including both dense models and mixture-of-experts (MoE) variants, that are SOTA in pre-training metrics and achieve competitive performance after supervised fine-tuning on a range of established multimodal benchmarks. Thanks to large-scale pre-training, MM1 enjoys appealing properties such as enhanced in-context learning, and multi-image reasoning, enabling few-shot chain-of-thought prompting.

4/22/2024

cs.CV cs.CL cs.LG

Large Language Models for Time Series: A Survey

Xiyuan Zhang, Ranak Roy Chowdhury, Rajesh K. Gupta, Jingbo Shang

Large Language Models (LLMs) have seen significant use in domains such as natural language processing and computer vision. Going beyond text, image and graphics, LLMs present a significant potential for analysis of time series data, benefiting domains such as climate, IoT, healthcare, traffic, audio and finance. This survey paper provides an in-depth exploration and a detailed taxonomy of the various methodologies employed to harness the power of LLMs for time series analysis. We address the inherent challenge of bridging the gap between LLMs' original text data training and the numerical nature of time series data, and explore strategies for transferring and distilling knowledge from LLMs to numerical time series analysis. We detail various methodologies, including (1) direct prompting of LLMs, (2) time series quantization, (3) aligning techniques, (4) utilization of the vision modality as a bridging mechanism, and (5) the combination of LLMs with tools. Additionally, this survey offers a comprehensive overview of the existing multimodal time series and text datasets and delves into the challenges and future opportunities of this emerging field. We maintain an up-to-date Github repository which includes all the papers and datasets discussed in the survey.

5/8/2024

cs.LG cs.AI cs.CL

💬

Tiny Titans: Can Smaller Large Language Models Punch Above Their Weight in the Real World for Meeting Summarization?

Xue-Yong Fu, Md Tahmid Rahman Laskar, Elena Khasanova, Cheng Chen, Shashi Bhushan TN

Large Language Models (LLMs) have demonstrated impressive capabilities to solve a wide range of tasks without being explicitly fine-tuned on task-specific datasets. However, deploying LLMs in the real world is not trivial, as it requires substantial computing resources. In this paper, we investigate whether smaller, compact LLMs are a good alternative to the comparatively Larger LLMs2 to address significant costs associated with utilizing LLMs in the real world. In this regard, we study the meeting summarization task in a real-world industrial environment and conduct extensive experiments by comparing the performance of fine-tuned compact LLMs (e.g., FLAN-T5, TinyLLaMA, LiteLLaMA) with zero-shot larger LLMs (e.g., LLaMA-2, GPT-3.5, PaLM-2). We observe that most smaller LLMs, even after fine-tuning, fail to outperform larger zero-shot LLMs in meeting summarization datasets. However, a notable exception is FLAN-T5 (780M parameters), which performs on par or even better than many zero-shot Larger LLMs (from 7B to above 70B parameters), while being significantly smaller. This makes compact LLMs like FLAN-T5 a suitable cost-efficient solution for real-world industrial deployment.

4/16/2024

cs.CL

📈

A decoder-only foundation model for time-series forecasting

Abhimanyu Das, Weihao Kong, Rajat Sen, Yichen Zhou

Motivated by recent advances in large language models for Natural Language Processing (NLP), we design a time-series foundation model for forecasting whose out-of-the-box zero-shot performance on a variety of public datasets comes close to the accuracy of state-of-the-art supervised forecasting models for each individual dataset. Our model is based on pretraining a patched-decoder style attention model on a large time-series corpus, and can work well across different forecasting history lengths, prediction lengths and temporal granularities.

4/19/2024

cs.CL cs.AI cs.LG