The devil is in discretization discrepancy. Robustifying Differentiable NAS with Single-Stage Searching Protocol

Read original: arXiv:2405.16610 - Published 5/28/2024 by Konstanty Subbotko, Wojciech Jablonski, Piotr Bilinski

The devil is in discretization discrepancy. Robustifying Differentiable NAS with Single-Stage Searching Protocol

Overview

This paper explores the issue of discretization discrepancy in Differentiable Neural Architecture Search (DNAS) and proposes a new single-stage searching protocol to address it.
DNAS is a popular technique for automatically designing neural network architectures, but it can suffer from discretization errors that lead to suboptimal performance.
The proposed single-stage searching protocol aims to robustify DNAS by avoiding these discretization issues, leading to more accurate and efficient architecture search.

Plain English Explanation

When designing neural networks, researchers often use an approach called Differentiable Neural Architecture Search (DNAS) to automatically explore different network architectures and find the most effective one for a given task. DNAS is appealing because it can explore a large space of possible architectures without requiring manual design.

However, the authors of this paper argue that DNAS can suffer from a problem called "discretization discrepancy." This means that when DNAS converts the continuous architecture parameters into discrete architectural choices, it can introduce errors that lead to suboptimal network designs.

To address this issue, the paper proposes a new "single-stage searching protocol" for DNAS. This approach aims to avoid the discretization step altogether, instead optimizing the architecture parameters directly without converting them to discrete choices. By eliminating the discretization step, the authors believe their method can produce more accurate and efficient neural network architectures.

The key idea is to use a specialized optimization technique that can handle the continuous architecture parameters directly, without the need for discretization. This allows the search process to focus on finding the best possible architecture without being hindered by discretization errors.

Technical Explanation

The paper first examines the problem of discretization discrepancy in DNAS, where the continuous architecture parameters are converted to discrete architectural choices, leading to suboptimal network designs. To address this, the authors propose a new single-stage searching protocol for DNAS.

The key elements of the proposed method are:

Continuous Architecture Representation: Instead of discretizing the architecture parameters, the method keeps them in continuous form throughout the search process.
Specialized Optimization: The authors use a custom optimization technique that can handle the continuous architecture parameters directly, without the need for discretization.
Single-Stage Search: The method performs a single-stage search, optimizing the architecture and model parameters simultaneously, rather than separating the search into multiple stages.

The authors conduct extensive experiments on various image classification tasks to evaluate the performance of their single-stage searching protocol. They compare it to standard DNAS approaches and find that their method can consistently achieve better accuracy while being more computationally efficient.

Critical Analysis

The paper provides a thoughtful analysis of the discretization discrepancy issue in DNAS and presents a novel approach to address it. The authors' single-stage searching protocol appears to be a promising solution, as it avoids the discretization step and optimizes the architecture parameters directly.

One potential limitation of the research is the reliance on specialized optimization techniques, which may not be as widely accessible or easy to implement as standard DNAS methods. Additionally, the authors do not explore the robustness of their approach to different search spaces or task domains, which could be an area for further investigation.

It would also be valuable to see a more detailed analysis of the computational and memory requirements of the single-stage searching protocol, as these factors can be crucial in practical applications of neural architecture search.

Conclusion

This paper identifies a significant problem in DNAS – the issue of discretization discrepancy – and proposes a novel single-stage searching protocol to address it. By avoiding the discretization step and optimizing the architecture parameters directly, the authors demonstrate that their method can achieve better accuracy and efficiency compared to standard DNAS approaches.

The insights and techniques presented in this paper could have important implications for the field of neural architecture search, potentially leading to more accurate and robust neural network designs. As the field continues to evolve, this work highlights the importance of addressing fundamental challenges, such as discretization errors, to unlock the full potential of automated neural network design.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

The devil is in discretization discrepancy. Robustifying Differentiable NAS with Single-Stage Searching Protocol

Konstanty Subbotko, Wojciech Jablonski, Piotr Bilinski

Neural Architecture Search (NAS) has been widely adopted to design neural networks for various computer vision tasks. One of its most promising subdomains is differentiable NAS (DNAS), where the optimal architecture is found in a differentiable manner. However, gradient-based methods suffer from the discretization error, which can severely damage the process of obtaining the final architecture. In our work, we first study the risk of discretization error and show how it affects an unregularized supernet. Then, we present that penalizing high entropy, a common technique of architecture regularization, can hinder the supernet's performance. Therefore, to robustify the DNAS framework, we introduce a novel single-stage searching protocol, which is not reliant on decoding a continuous architecture. Our results demonstrate that this approach outperforms other DNAS methods by achieving 75.3% in the searching stage on the Cityscapes validation dataset and attains performance 1.1% higher than the optimal network of DCNAS on the non-dense search space comprising short connections. The entire training process takes only 5.5 GPU days due to the weight reuse, and yields a computationally efficient architecture. Additionally, we propose a new dataset split procedure, which substantially improves results and prevents architecture degeneration in DARTS.

5/28/2024

🧠

A Lightweight Neural Architecture Search Model for Medical Image Classification

Lunchen Xie, Eugenio Lomurno, Matteo Gambella, Danilo Ardagna, Manuel Roveri, Matteo Matteucci, Qingjiang Shi

Accurate classification of medical images is essential for modern diagnostics. Deep learning advancements led clinicians to increasingly use sophisticated models to make faster and more accurate decisions, sometimes replacing human judgment. However, model development is costly and repetitive. Neural Architecture Search (NAS) provides solutions by automating the design of deep learning architectures. This paper presents ZO-DARTS+, a differentiable NAS algorithm that improves search efficiency through a novel method of generating sparse probabilities by bi-level optimization. Experiments on five public medical datasets show that ZO-DARTS+ matches the accuracy of state-of-the-art solutions while reducing search times by up to three times.

5/7/2024

Graph is all you need? Lightweight data-agnostic neural architecture search without training

Zhenhan Huang, Tejaswini Pedapati, Pin-Yu Chen, Chunhen Jiang, Jianxi Gao

Neural architecture search (NAS) enables the automatic design of neural network models. However, training the candidates generated by the search algorithm for performance evaluation incurs considerable computational overhead. Our method, dubbed nasgraph, remarkably reduces the computational costs by converting neural architectures to graphs and using the average degree, a graph measure, as the proxy in lieu of the evaluation metric. Our training-free NAS method is data-agnostic and light-weight. It can find the best architecture among 200 randomly sampled architectures from NAS-Bench201 in 217 CPU seconds. Besides, our method is able to achieve competitive performance on various datasets including NASBench-101, NASBench-201, and NDS search spaces. We also demonstrate that nasgraph generalizes to more challenging tasks on Micro TransNAS-Bench-101.

5/3/2024

SalNAS: Efficient Saliency-prediction Neural Architecture Search with self-knowledge distillation

Chakkrit Termritthikun, Ayaz Umer, Suwichaya Suwanwimolkul, Feng Xia, Ivan Lee

Recent advancements in deep convolutional neural networks have significantly improved the performance of saliency prediction. However, the manual configuration of the neural network architectures requires domain knowledge expertise and can still be time-consuming and error-prone. To solve this, we propose a new Neural Architecture Search (NAS) framework for saliency prediction with two contributions. Firstly, a supernet for saliency prediction is built with a weight-sharing network containing all candidate architectures, by integrating a dynamic convolution into the encoder-decoder in the supernet, termed SalNAS. Secondly, despite the fact that SalNAS is highly efficient (20.98 million parameters), it can suffer from the lack of generalization. To solve this, we propose a self-knowledge distillation approach, termed Self-KD, that trains the student SalNAS with the weighted average information between the ground truth and the prediction from the teacher model. The teacher model, while sharing the same architecture, contains the best-performing weights chosen by cross-validation. Self-KD can generalize well without the need to compute the gradient in the teacher model, enabling an efficient training system. By utilizing Self-KD, SalNAS outperforms other state-of-the-art saliency prediction models in most evaluation rubrics across seven benchmark datasets while being a lightweight model. The code will be available at https://github.com/chakkritte/SalNAS

7/30/2024