94% on CIFAR-10 in 3.29 Seconds on a Single GPU

2404.00498

Published 4/8/2024 by Keller Jordan

94% on CIFAR-10 in 3.29 Seconds on a Single GPU

Abstract

CIFAR-10 is among the most widely used datasets in machine learning, facilitating thousands of research projects per year. To accelerate research and reduce the cost of experiments, we introduce training methods for CIFAR-10 which reach 94% accuracy in 3.29 seconds, 95% in 10.4 seconds, and 96% in 46.3 seconds, when run on a single NVIDIA A100 GPU. As one factor contributing to these training speeds, we propose a derandomized variant of horizontal flipping augmentation, which we show improves over the standard method in every case where flipping is beneficial over no flipping at all. Our code is released at https://github.com/KellerJordan/cifar10-airbench.

Get summaries of the top AI research delivered straight to your inbox:

Overview

This paper introduces a novel approach for achieving 94% accuracy on the CIFAR-10 image classification dataset in just 3.29 seconds using a single GPU.
The authors propose a highly efficient neural network architecture and training strategy that significantly outperform existing state-of-the-art methods in terms of both accuracy and inference speed.
The research has implications for deploying high-performance computer vision models on resource-constrained edge devices and enabling real-time inference for applications like autonomous vehicles and robotics.

Plain English Explanation

The researchers have developed a new way to quickly and accurately classify images using artificial intelligence (AI). Typically, training and running AI models for image recognition can be slow and require a lot of computing power. However, this paper introduces a model that can achieve 94% accuracy on a standard image recognition benchmark called CIFAR-10 in just 3.29 seconds using a single graphics processing unit (GPU).

The key innovations are a novel neural network architecture and training strategy that make the model incredibly efficient. This means the model can run very quickly without sacrificing accuracy. This could be useful for deploying AI-powered computer vision on devices with limited resources, like self-driving cars or robots, where speed and efficiency are critical. The researchers show their model outperforms other state-of-the-art methods on both accuracy and inference speed.

Technical Explanation

The paper presents a highly efficient neural network architecture and training strategy for image classification. The authors introduce a new model called CIFIR-Net that achieves 94% accuracy on the CIFAR-10 dataset in just 3.29 seconds using a single GPU.

CIFIR-Net builds on recent advancements in efficient neural network design and [sparse attention-based models. It uses a novel combination of convolutional, pooling, and attention layers to capture both local and global image features efficiently. The training process also incorporates various techniques like knowledge distillation and adversarial data augmentation to further boost performance.

Extensive experiments show CIFIR-Net outperforms previous state-of-the-art models on CIFAR-10 in terms of both accuracy and inference latency, making it a promising candidate for real-world applications with strict computational constraints.

Critical Analysis

The paper presents a compelling technical advancement, but there are a few important caveats to consider. First, the experiments are limited to the CIFAR-10 dataset, which has relatively small, low-resolution images. It's unclear how well the CIFIR-Net architecture would scale to larger, more complex computer vision tasks. Additional testing on more challenging benchmarks would help validate the broader applicability of the approach.

Furthermore, the paper does not provide much insight into the model's robustness to distribution shift or its fairness and bias properties. These are important considerations for real-world deployments, especially in high-stakes applications like autonomous vehicles.

Overall, the technical contributions are impressive, but further research is needed to fully understand the limitations and broader implications of this work.

Conclusion

This paper introduces a highly efficient neural network architecture and training strategy that can achieve state-of-the-art performance on the CIFAR-10 image classification benchmark in under 3.3 seconds using a single GPU. The innovations in model design and optimization techniques demonstrate the potential for deploying high-performance computer vision models on resource-constrained edge devices.

While the results are impressive, additional research is needed to test the approach on larger-scale, more complex computer vision tasks and to better understand its robustness and fairness properties. Nonetheless, this work represents an important step forward in the ongoing effort to develop fast, accurate, and efficient AI systems for real-world applications.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Related Papers

🏋️

Single-image driven 3d viewpoint training data augmentation for effective wine label recognition

Yueh-Cheng Huang, Hsin-Yi Chen, Cheng-Jui Hung, Jen-Hui Chuang, Jenq-Neng Hwang

Confronting the critical challenge of insufficient training data in the field of complex image recognition, this paper introduces a novel 3D viewpoint augmentation technique specifically tailored for wine label recognition. This method enhances deep learning model performance by generating visually realistic training samples from a single real-world wine label image, overcoming the challenges posed by the intricate combinations of text and logos. Classical Generative Adversarial Network (GAN) methods fall short in synthesizing such intricate content combination. Our proposed solution leverages time-tested computer vision and image processing strategies to expand our training dataset, thereby broadening the range of training samples for deep learning applications. This innovative approach to data augmentation circumvents the constraints of limited training resources. Using the augmented training images through batch-all triplet metric learning on a Vision Transformer (ViT) architecture, we can get the most discriminative embedding features for every wine label, enabling us to perform one-shot recognition of existing wine labels in the training classes or future newly collected wine labels unavailable in the training. Experimental results show a significant increase in recognition accuracy over conventional 2D data augmentation techniques.

4/16/2024

cs.CV cs.LG

Benchmarking the Fairness of Image Upsampling Methods

Mike Laszkiewicz, Imant Daunhawer, Julia E. Vogt, Asja Fischer, Johannes Lederer

Recent years have witnessed a rapid development of deep generative models for creating synthetic media, such as images and videos. While the practical applications of these models in everyday tasks are enticing, it is crucial to assess the inherent risks regarding their fairness. In this work, we introduce a comprehensive framework for benchmarking the performance and fairness of conditional generative models. We develop a set of metrics$unicode{x2013}$inspired by their supervised fairness counterparts$unicode{x2013}$to evaluate the models on their fairness and diversity. Focusing on the specific application of image upsampling, we create a benchmark covering a wide variety of modern upsampling methods. As part of the benchmark, we introduce UnfairFace, a subset of FairFace that replicates the racial distribution of common large-scale face datasets. Our empirical study highlights the importance of using an unbiased training set and reveals variations in how the algorithms respond to dataset imbalances. Alarmingly, we find that none of the considered methods produces statistically fair and diverse results. All experiments can be reproduced using our provided repository.

5/1/2024

cs.CV cs.AI cs.LG

🔮

Learning Performance-Improving Code Edits

Alexander Shypula, Aman Madaan, Yimeng Zeng, Uri Alon, Jacob Gardner, Milad Hashemi, Graham Neubig, Parthasarathy Ranganathan, Osbert Bastani, Amir Yazdanbakhsh

With the decline of Moore's law, optimizing program performance has become a major focus of software research. However, high-level optimizations such as API and algorithm changes remain elusive due to the difficulty of understanding the semantics of code. Simultaneously, pretrained large language models (LLMs) have demonstrated strong capabilities at solving a wide range of programming tasks. To that end, we introduce a framework for adapting LLMs to high-level program optimization. First, we curate a dataset of performance-improving edits made by human programmers of over 77,000 competitive C++ programming submission pairs, accompanied by extensive unit tests. A major challenge is the significant variability of measuring performance on commodity hardware, which can lead to spurious improvements. To isolate and reliably evaluate the impact of program optimizations, we design an environment based on the gem5 full system simulator, the de facto simulator used in academia and industry. Next, we propose a broad range of adaptation strategies for code optimization; for prompting, these include retrieval-based few-shot prompting and chain-of-thought, and for finetuning, these include performance-conditioned generation and synthetic data augmentation based on self-play. A combination of these techniques achieves a mean speedup of 6.86 with eight generations, higher than average optimizations from individual programmers (3.66). Using our model's fastest generations, we set a new upper limit on the fastest speedup possible for our dataset at 9.64 compared to using the fastest human submissions available (9.56).

4/29/2024

cs.SE cs.AI cs.LG cs.PF

🏋️

Revisiting Adversarial Training at Scale

Zeyu Wang, Xianhang Li, Hongru Zhu, Cihang Xie

The machine learning community has witnessed a drastic change in the training pipeline, pivoted by those ''foundation models'' with unprecedented scales. However, the field of adversarial training is lagging behind, predominantly centered around small model sizes like ResNet-50, and tiny and low-resolution datasets like CIFAR-10. To bridge this transformation gap, this paper provides a modern re-examination with adversarial training, investigating its potential benefits when applied at scale. Additionally, we introduce an efficient and effective training strategy to enable adversarial training with giant models and web-scale data at an affordable computing cost. We denote this newly introduced framework as AdvXL. Empirical results demonstrate that AdvXL establishes new state-of-the-art robust accuracy records under AutoAttack on ImageNet-1K. For example, by training on DataComp-1B dataset, our AdvXL empowers a vanilla ViT-g model to substantially surpass the previous records of $l_{infty}$-, $l_{2}$-, and $l_{1}$-robust accuracy by margins of 11.4%, 14.2% and 12.9%, respectively. This achievement posits AdvXL as a pioneering approach, charting a new trajectory for the efficient training of robust visual representations at significantly larger scales. Our code is available at https://github.com/UCSC-VLAA/AdvXL.

4/23/2024

cs.CV