Causal Discovery-Driven Change Point Detection in Time Series

Read original: arXiv:2407.07290 - Published 7/11/2024 by Shanyun Gao, Raghavendra Addanki, Tong Yu, Ryan A. Rossi, Murat Kocaoglu

🔎

Overview

The paper presents a two-stage non-parametric algorithm for detecting changes in the distribution of specific components of a multivariate time series, assuming an underlying structural causal model.
The algorithm first learns parts of the causal structure using constraint-based discovery methods, then uses conditional relative Pearson divergence estimation to identify the change points.
This approach aims to relax the typical assumption of independent and identically distributed (IID) samples in conventional change point detection methods, based on the Causal Markov Condition.

Plain English Explanation

Change point detection in time series is the process of identifying times when the probability distribution of a time series changes. This is useful in many areas, such as monitoring human activity or detecting anomalies in medical data.

When dealing with multivariate time series (i.e., time series with multiple components or variables), researchers typically look at the joint distribution of all the variables. If any one variable changes, the whole time series is assumed to have changed. However, in practice, we may only be interested in certain components of the time series and want to explore abrupt changes in their distributions, even if other components remain stable.

This paper proposes a novel approach to address this problem. It assumes an underlying structural causal model that governs the time series data generation. The algorithm first uses constraint-based discovery methods to learn parts of the causal structure. It then uses a statistical measure called conditional relative Pearson divergence to identify the change points in the components of interest, while taking into account the causal relationships between the variables.

This approach allows the researchers to focus on the specific changes they care about, without being distracted by changes in other parts of the multivariate time series. It also relaxes the typical assumption of independent and identically distributed (IID) samples that is often made in conventional change point detection methods.

Technical Explanation

The key elements of the proposed algorithm are:

Causal Structure Learning: The first stage of the algorithm uses constraint-based discovery methods to learn parts of the causal structure that underlies the time series data generation process. This allows the algorithm to identify the causal relationships between the different components of the multivariate time series.
Conditional Relative Pearson Divergence Estimation: The second stage of the algorithm uses a statistical measure called conditional relative Pearson divergence to quantify the distribution disparity between consecutive segments in the time series. This metric focuses on the specific components of interest, while taking into account the causal relationships learned in the first stage.

The researchers argue that by incorporating the causal structure, their approach can relax the typical assumption of independent and identically distributed (IID) samples that is often made in conventional change point detection methods. This is based on the Causal Markov Condition, which states that a variable is independent of its non-descendants given its parents.

The performance of the proposed algorithm is evaluated on both synthetic and real-world datasets, and the results validate the correctness and utility of the approach.

Critical Analysis

The paper presents a novel and theoretically grounded approach to change point detection in multivariate time series, which is a valuable contribution to the field. The incorporation of causal structure learning is a particularly interesting aspect, as it allows the algorithm to focus on the specific changes that are of interest, rather than being distracted by changes in other parts of the time series.

However, the paper does not provide a detailed discussion of the limitations or potential drawbacks of the proposed approach. For example, the reliance on constraint-based causal discovery methods may be sensitive to the quality and completeness of the available data, and the performance of the algorithm may suffer if the causal structure is not accurately learned.

Additionally, the paper does not explore the computational complexity of the algorithm or its scalability to large-scale, high-dimensional time series data. These are important practical considerations that could impact the real-world applicability of the method.

Further research could also investigate the robustness of the approach to noise, missing data, or other common challenges in time series analysis, as well as compare its performance to other state-of-the-art change point detection techniques in a more comprehensive benchmarking exercise.

Conclusion

The paper presents a novel two-stage non-parametric algorithm for detecting changes in the distribution of specific components of a multivariate time series, assuming an underlying structural causal model. By first learning parts of the causal structure and then using conditional relative Pearson divergence estimation, the algorithm can focus on the changes that are of interest, while relaxing the typical assumption of independent and identically distributed (IID) samples.

The proposed approach has the potential to be a valuable tool in a wide range of applications, such as human activity monitoring, medical data analysis, and financial forecasting, where the ability to identify and understand changes in specific components of a complex, multivariate time series is crucial. Further research and real-world validation of the method could help solidify its place in the change point detection toolkit.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

🔎

Causal Discovery-Driven Change Point Detection in Time Series

Shanyun Gao, Raghavendra Addanki, Tong Yu, Ryan A. Rossi, Murat Kocaoglu

Change point detection in time series seeks to identify times when the probability distribution of time series changes. It is widely applied in many areas, such as human-activity sensing and medical science. In the context of multivariate time series, this typically involves examining the joint distribution of high-dimensional data: If any one variable changes, the whole time series is assumed to have changed. However, in practical applications, we may be interested only in certain components of the time series, exploring abrupt changes in their distributions in the presence of other time series. Here, assuming an underlying structural causal model that governs the time-series data generation, we address this problem by proposing a two-stage non-parametric algorithm that first learns parts of the causal structure through constraint-based discovery methods. The algorithm then uses conditional relative Pearson divergence estimation to identify the change points. The conditional relative Pearson divergence quantifies the distribution disparity between consecutive segments in the time series, while the causal discovery method enables a focus on the causal mechanism, facilitating access to independent and identically distributed (IID) samples. Theoretically, the typical assumption of samples being IID in conventional change point detection methods can be relaxed based on the Causal Markov Condition. Through experiments on both synthetic and real-world datasets, we validate the correctness and utility of our approach.

7/11/2024

Bayesian Autoregressive Online Change-Point Detection with Time-Varying Parameters

Ioanna-Yvonni Tsaknaki, Fabrizio Lillo, Piero Mazzarisi

Change points in real-world systems mark significant regime shifts in system dynamics, possibly triggered by exogenous or endogenous factors. These points define regimes for the time evolution of the system and are crucial for understanding transitions in financial, economic, social, environmental, and technological contexts. Building upon the Bayesian approach introduced in cite{c:07}, we devise a new method for online change point detection in the mean of a univariate time series, which is well suited for real-time applications and is able to handle the general temporal patterns displayed by data in many empirical contexts. We first describe time series as an autoregressive process of an arbitrary order. Second, the variance and correlation of the data are allowed to vary within each regime driven by a scoring rule that updates the value of the parameters for a better fit of the observations. Finally, a change point is detected in a probabilistic framework via the posterior distribution of the current regime length. By modeling temporal dependencies and time-varying parameters, the proposed approach enhances both the estimate accuracy and the forecasting power. Empirical validations using various datasets demonstrate the method's effectiveness in capturing memory and dynamic patterns, offering deeper insights into the non-stationary dynamics of real-world systems.

7/24/2024

🔎

Predictive change point detection for heterogeneous data

Anna-Christina Glock, Florian Sobieczky, Johannes Furnkranz, Peter Filzmoser, Martin Jech

A change point detection (CPD) framework assisted by a predictive machine learning model called Predict and Compare is introduced and characterised in relation to other state-of-the-art online CPD routines which it outperforms in terms of false positive rate and out-of-control average run length. The method's focus is on improving standard methods from sequential analysis such as the CUSUM rule in terms of these quality measures. This is achieved by replacing typically used trend estimation functionals such as the running mean with more sophisticated predictive models (Predict step), and comparing their prognosis with actual data (Compare step). The two models used in the Predict step are the ARIMA model and the LSTM recursive neural network. However, the framework is formulated in general terms, so as to allow the use of other prediction or comparison methods than those tested here. The power of the method is demonstrated in a tribological case study in which change points separating the run-in, steady-state, and divergent wear phases are detected in the regime of very few false positives.

5/6/2024

Benchmarking changepoint detection algorithms on cardiac time series

Ayse Cakmak, Erik Reinertsen, Shamim Nemati, Gari D. Clifford

The pattern of state changes in a biomedical time series can be related to health or disease. This work presents a principled approach for selecting a changepoint detection algorithm for a specific task, such as disease classification. Eight key algorithms were compared, and the performance of each algorithm was evaluated as a function of temporal tolerance, noise, and abnormal conduction (ectopy) on realistic artificial cardiovascular time series data. All algorithms were applied to real data (cardiac time series of 22 patients with REM-behavior disorder (RBD) and 15 healthy controls) using the parameters selected on artificial data. Finally, features were derived from the detected changepoints to classify RBD patients from healthy controls using a K-Nearest Neighbors approach. On artificial data, Modified Bayesian Changepoint Detection algorithm provided superior positive predictive value for state change identification while Recursive Mean Difference Maximization (RMDM) achieved the highest true positive rate. For the classification task, features derived from the RMDM algorithm provided the highest leave one out cross validated accuracy of 0.89 and true positive rate of 0.87. Automatically detected changepoints provide useful information about subject's physiological state which cannot be directly observed. However, the choice of change point detection algorithm depends on the nature of the underlying data and the downstream application, such as a classification task. This work represents the first time change point detection algorithms have been compared in a meaningful way and utilized in a classification task, which demonstrates the effect of changepoint algorithm choice on application performance.

4/22/2024