CESNET-TimeSeries24: Time Series Dataset for Network Traffic Anomaly Detection and Forecasting

Read original: arXiv:2409.18874 - Published 9/30/2024 by Josef Koumar, Karel Hynek, Tom'av{s} v{C}ejka, Pavel v{S}iv{s}ka

🌐

Overview

Network traffic anomaly detection is crucial for maintaining computer network security and identifying malicious activities.
Forecasting-based methods are a primary approach to anomaly detection, but real-world network datasets for forecasting and anomaly detection are lacking, which can lead to overestimation of algorithm performance.
This paper introduces a dataset of network traffic time series data collected from the CESNET3 network over 40 weeks, comprising 275,000 active IP addresses.
The dataset provides a unique and authentic challenge for forecasting and anomaly detection models, with high variability among network entities due to the ISP origin of the data.
The dataset offers valuable insights into the practical deployment of forecast-based anomaly detection approaches.

Plain English Explanation

When computers are connected to a network, it's important to be able to detect if something unusual or suspicious is happening with the network traffic. One way to do this is by using forecasting-based anomaly detection. This involves making predictions about what the normal network traffic should look like, and then flagging any activity that doesn't match those predictions as a potential anomaly or problem.

However, the researchers found that there's a lack of real-world network traffic datasets that can be used to test and improve these forecasting-based anomaly detection methods. The datasets that do exist may not accurately reflect the full complexity and variability of actual network traffic, leading to inflated estimates of how well the anomaly detection techniques perform in the real world.

To address this gap, the researchers created a new dataset from 40 weeks of network traffic data collected from the CESNET3 network. This dataset includes information on the behavior of 275,000 active internet protocol (IP) addresses. The fact that the data comes from an internet service provider (ISP) network means there is a high degree of diversity and unpredictability in the network traffic, which provides a more realistic and challenging testbed for anomaly detection algorithms.

By making this dataset publicly available, the researchers hope it will help researchers and developers create more robust and effective anomaly detection systems that can better handle the complexity of real-world network traffic. This is an important step for improving the security and reliability of computer networks.

Technical Explanation

The paper introduces a new dataset for evaluating forecasting-based anomaly detection techniques in network traffic. The dataset was collected from the CESNET3 network over a 40-week period and contains time series data on the behavior of 275,000 active IP addresses.

The researchers highlight that comprehensive real-world network datasets are often lacking, leading to potential overestimation of anomaly detection algorithm performance when evaluated on more limited datasets. The CESNET3 dataset aims to address this gap by providing a large-scale, diverse set of network traffic behavior data.

The high variability in the network entities included in the dataset, due to the ISP origin of the data, is intended to create a more authentic and challenging testbed for anomaly detection models. This diversity is expected to provide valuable insights into the practical deployment of forecast-based anomaly detection approaches.

The dataset is made publicly available to support further research and development of time series anomaly detection techniques in the context of network security and management.

Critical Analysis

The dataset introduced in this paper appears to be a valuable contribution to the field of network traffic anomaly detection. By providing a large-scale, real-world dataset with high variability in network entity behavior, the researchers have addressed an important limitation in the existing literature.

However, the paper does not discuss potential limitations or caveats of the dataset. For example, it is unclear if the dataset includes labeled anomalies or ground truth information, which would be crucial for evaluating the performance of anomaly detection algorithms. Additionally, the paper does not address potential biases or skew in the dataset, such as over-representation of certain types of network traffic or entities.

Future research could explore these areas and investigate how the characteristics of the CESNET3 dataset impact the performance and generalization of different anomaly detection approaches. Comparing the dataset to other publicly available network traffic datasets could also provide valuable insights into its unique properties and suitability for specific research questions.

Conclusion

This paper introduces a new dataset of network traffic time series data collected from the CESNET3 network, which offers a unique and challenging testbed for forecasting-based anomaly detection techniques. The dataset's diversity and authentic representation of real-world network traffic variability make it a valuable resource for researchers and developers working to improve the security and reliability of computer networks. By releasing this dataset publicly, the researchers have taken an important step in advancing the state of the art in network anomaly detection.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

🌐

CESNET-TimeSeries24: Time Series Dataset for Network Traffic Anomaly Detection and Forecasting

Josef Koumar, Karel Hynek, Tom'av{s} v{C}ejka, Pavel v{S}iv{s}ka

Anomaly detection in network traffic is crucial for maintaining the security of computer networks and identifying malicious activities. One of the primary approaches to anomaly detection are methods based on forecasting. Nevertheless, extensive real-world network datasets for forecasting and anomaly detection techniques are missing, potentially causing performance overestimation of anomaly detection algorithms. This manuscript addresses this gap by introducing a dataset comprising time series data of network entities' behavior, collected from the CESNET3 network. The dataset was created from 40 weeks of network traffic of 275 thousand active IP addresses. The ISP origin of the presented data ensures a high level of variability among network entities, which forms a unique and authentic challenge for forecasting and anomaly detection models. It provides valuable insights into the practical deployment of forecast-based anomaly detection approaches.

9/30/2024

Explainable Online Unsupervised Anomaly Detection for Cyber-Physical Systems via Causal Discovery from Time Series

Daniele Meli

Online unsupervised detection of anomalies is crucial to guarantee the correct operation of cyber-physical systems and the safety of humans interacting with them. State-of-the-art approaches based on deep learning via neural networks achieve outstanding performance at anomaly recognition, evaluating the discrepancy between a normal model of the system (with no anomalies) and the real-time stream of sensor time series. However, large training data and time are typically required, and explainability is still a challenge to identify the root of the anomaly and implement predictive maintainance. In this paper, we use causal discovery to learn a normal causal graph of the system, and we evaluate the persistency of causal links during real-time acquisition of sensor data to promptly detect anomalies. On two benchmark anomaly detection datasets, we show that our method has higher training efficiency, outperforms the accuracy of state-of-the-art neural architectures and correctly identifies the sources of >10 different anomalies. The code is at https://github.com/Isla-lab/causal_anomaly_detection.

7/30/2024

Time Series Anomaly Detection with CNN for Environmental Sensors in Healthcare-IoT

Mirza Akhi Khatun, Mangolika Bhattacharya, Ciar'an Eising, Lubna Luxmi Dhirani

This research develops a new method to detect anomalies in time series data using Convolutional Neural Networks (CNNs) in healthcare-IoT. The proposed method creates a Distributed Denial of Service (DDoS) attack using an IoT network simulator, Cooja, which emulates environmental sensors such as temperature and humidity. CNNs detect anomalies in time series data, resulting in a 92% accuracy in identifying possible attacks.

7/31/2024

❗

GNN-based Anomaly Detection for Encoded Network Traffic

Anasuya Chattopadhyay, Daniel Reti, Hans D. Schotten

The early research report explores the possibility of using Graph Neural Networks (GNNs) for anomaly detection in internet traffic data enriched with information. While recent studies have made significant progress in using GNNs for anomaly detection in finance, multivariate time-series, and biochemistry domains, there is limited research in the context of network flow data. In this report, we explore the idea that leverages information-enriched features extracted from network flow packet data to improve the performance of GNN in anomaly detection. The idea is to utilize feature encoding (binary, numerical, and string) to capture the relationships between the network components, allowing the GNN to learn latent relationships and better identify anomalies.

5/24/2024