Towards Stable and Storage-efficient Dataset Distillation: Matching Convexified Trajectory

Read original: arXiv:2406.19827 - Published 7/1/2024 by Wenliang Zhong, Haoyu Tang, Qinghai Zheng, Mingzhu Xu, Yupeng Hu, Liqiang Nie

Towards Stable and Storage-efficient Dataset Distillation: Matching Convexified Trajectory

Overview

This paper presents a novel method for dataset distillation, which aims to compress large datasets into a small set of synthetic data points that can effectively train machine learning models.
The key idea is to match the "convexified trajectories" of the original dataset during the distillation process, which helps improve the stability and storage-efficiency of the resulting distilled dataset.
The authors demonstrate the effectiveness of their approach, called "Convex Trajectory Matching (CTM)", on various image classification benchmarks and show that it outperforms previous state-of-the-art dataset distillation methods.

Plain English Explanation

Dataset distillation is a technique that allows you to take a large, complex dataset and boil it down to a much smaller set of synthetic data points. This can be useful when you want to train a machine learning model but don't have access to the full original dataset, or when you need to conserve storage space.

The paper introduces a new way to do dataset distillation, called Convex Trajectory Matching (CTM). The key idea is to look at how the model's training "trajectory" (the path it takes through the parameter space) changes over time, and try to match that trajectory as closely as possible when generating the distilled dataset.

By focusing on the trajectory, rather than just the final model performance, the authors argue that CTM can produce a more stable and storage-efficient distilled dataset. This means the distilled dataset will work better for training new models, and won't take up as much disk space.

The authors test their CTM method on several standard image classification benchmarks, and show that it outperforms previous state-of-the-art dataset distillation techniques. This suggests that paying attention to the full training trajectory, not just the end result, can be a valuable approach for dataset distillation.

Technical Explanation

The paper introduces a novel dataset distillation method called Convex Trajectory Matching (CTM) that aims to improve the stability and storage-efficiency of distilled datasets. The key innovation is to focus on matching the "convexified trajectories" of the original dataset during the distillation process, rather than just optimizing for final model performance.

Specifically, the authors propose to first "convexify" the original dataset's training trajectories by fitting a convex hull around them. They then define a distillation loss that tries to match the distilled dataset's trajectories to this convex hull, in addition to the standard classification loss. This helps ensure the distilled dataset follows a similar optimization path to the original data, leading to more stable and compact distilled datasets.

The authors evaluate CTM on several image classification benchmarks, including CIFAR-10, CIFAR-100, and Tiny ImageNet. They show that CTM outperforms previous state-of-the-art dataset distillation methods like Selective Match (SElMatch), Trajectory Consistency Distillation (TCD), and Tract in terms of both final model accuracy and the storage efficiency of the distilled dataset.

The authors also show that CTM is competitive with the recently proposed Hyper-SD method, which also focuses on preserving the training dynamics during distillation. However, CTM is more general and does not require the complex multi-stage training procedure of Hyper-SD.

Critical Analysis

The paper presents a compelling approach to dataset distillation that appears to offer improvements over prior state-of-the-art methods. The focus on matching the convexified trajectories is a novel and interesting idea, as it aims to preserve not just the final model performance, but also the dynamics of the training process.

One potential limitation is that the method relies on being able to accurately "convexify" the original dataset's training trajectories. This may be challenging in practice, especially for more complex datasets or model architectures. The authors do not provide a detailed analysis of the robustness of their convexification approach.

Additionally, while the authors demonstrate the effectiveness of CTM on standard image classification benchmarks, it would be valuable to see how it performs on a wider range of tasks and datasets. Evaluating the method's scalability and generalization capabilities would help strengthen the claims about its broader applicability.

Finally, the paper does not delve into the potential real-world implications and use cases of their dataset distillation approach. It would be useful for the authors to discuss how CTM could be leveraged in practical machine learning scenarios, such as on-device training, data privacy, or resource-constrained environments.

Conclusion

Overall, the paper presents a novel and promising dataset distillation method called Convex Trajectory Matching (CTM) that aims to improve the stability and storage-efficiency of distilled datasets. By focusing on matching the convexified trajectories of the original dataset, the authors demonstrate that CTM can outperform previous state-of-the-art techniques on several image classification benchmarks.

While the paper has a few limitations, such as the robustness of the convexification process and the need for further evaluations on a wider range of tasks, the core idea of preserving the training dynamics during distillation is a valuable contribution to the field of dataset distillation. As machine learning models continue to grow in complexity and data requirements, techniques like CTM could play an important role in enabling more efficient and accessible model training.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

Towards Stable and Storage-efficient Dataset Distillation: Matching Convexified Trajectory

Wenliang Zhong, Haoyu Tang, Qinghai Zheng, Mingzhu Xu, Yupeng Hu, Liqiang Nie

The rapid evolution of deep learning and large language models has led to an exponential growth in the demand for training data, prompting the development of Dataset Distillation methods to address the challenges of managing large datasets. Among these, Matching Training Trajectories (MTT) has been a prominent approach, which replicates the training trajectory of an expert network on real data with a synthetic dataset. However, our investigation found that this method suffers from three significant limitations: 1. Instability of expert trajectory generated by Stochastic Gradient Descent (SGD); 2. Low convergence speed of the distillation process; 3. High storage consumption of the expert trajectory. To address these issues, we offer a new perspective on understanding the essence of Dataset Distillation and MTT through a simple transformation of the objective function, and introduce a novel method called Matching Convexified Trajectory (MCT), which aims to provide better guidance for the student trajectory. MCT leverages insights from the linearized dynamics of Neural Tangent Kernel methods to create a convex combination of expert trajectories, guiding the student network to converge rapidly and stably. This trajectory is not only easier to store, but also enables a continuous sampling strategy during distillation, ensuring thorough learning and fitting of the entire expert trajectory. Comprehensive experiments across three public datasets validate the superiority of MCT over traditional MTT methods.

7/1/2024

Dataset Distillation by Automatic Training Trajectories

Dai Liu, Jindong Gu, Hu Cao, Carsten Trinitis, Martin Schulz

Dataset Distillation is used to create a concise, yet informative, synthetic dataset that can replace the original dataset for training purposes. Some leading methods in this domain prioritize long-range matching, involving the unrolling of training trajectories with a fixed number of steps (NS) on the synthetic dataset to align with various expert training trajectories. However, traditional long-range matching methods possess an overfitting-like problem, the fixed step size NS forces synthetic dataset to distortedly conform seen expert training trajectories, resulting in a loss of generality-especially to those from unencountered architecture. We refer to this as the Accumulated Mismatching Problem (AMP), and propose a new approach, Automatic Training Trajectories (ATT), which dynamically and adaptively adjusts trajectory length NS to address the AMP. Our method outperforms existing methods particularly in tests involving cross-architectures. Moreover, owing to its adaptive nature, it exhibits enhanced stability in the face of parameter variations.

7/22/2024

Distilling Long-tailed Datasets

Zhenghao Zhao, Haoxuan Wang, Yuzhang Shang, Kai Wang, Yan Yan

Dataset distillation (DD) aims to distill a small, information-rich dataset from a larger one for efficient neural network training. However, existing DD methods struggle with long-tailed datasets, which are prevalent in real-world scenarios. By investigating the reasons behind this unexpected result, we identified two main causes: 1) Expert networks trained on imbalanced data develop biased gradients, leading to the synthesis of similarly imbalanced distilled datasets. Parameter matching, a common technique in DD, involves aligning the learning parameters of the distilled dataset with that of the original dataset. However, in the context of long-tailed datasets, matching biased experts leads to inheriting the imbalance present in the original data, causing the distilled dataset to inadequately represent tail classes. 2) The experts trained on such datasets perform suboptimally on tail classes, resulting in misguided distillation supervision and poor-quality soft-label initialization. To address these issues, we propose a novel long-tailed dataset distillation method, Long-tailed Aware Dataset distillation (LAD). Specifically, we propose Weight Mismatch Avoidance to avoid directly matching the biased expert trajectories. It reduces the distance between the student and the biased expert trajectories and prevents the tail class bias from being distilled to the synthetic dataset. Moreover, we propose Adaptive Decoupled Matching, which jointly matches the decoupled backbone and classifier to improve the tail class performance and initialize reliable soft labels. This work pioneers the field of long-tailed dataset distillation (LTDD), marking the first effective effort to distill long-tailed datasets.

8/28/2024

SelMatch: Effectively Scaling Up Dataset Distillation via Selection-Based Initialization and Partial Updates by Trajectory Matching

Yongmin Lee, Hye Won Chung

Dataset distillation aims to synthesize a small number of images per class (IPC) from a large dataset to approximate full dataset training with minimal performance loss. While effective in very small IPC ranges, many distillation methods become less effective, even underperforming random sample selection, as IPC increases. Our examination of state-of-the-art trajectory-matching based distillation methods across various IPC scales reveals that these methods struggle to incorporate the complex, rare features of harder samples into the synthetic dataset even with the increased IPC, resulting in a persistent coverage gap between easy and hard test samples. Motivated by such observations, we introduce SelMatch, a novel distillation method that effectively scales with IPC. SelMatch uses selection-based initialization and partial updates through trajectory matching to manage the synthetic dataset's desired difficulty level tailored to IPC scales. When tested on CIFAR-10/100 and TinyImageNet, SelMatch consistently outperforms leading selection-only and distillation-only methods across subset ratios from 5% to 30%.

6/28/2024