TabularBench: Benchmarking Adversarial Robustness for Tabular Deep Learning in Real-world Use-cases

Read original: arXiv:2408.07579 - Published 8/15/2024 by Thibault Simonetto, Salah Ghamizi, Maxime Cordy

TabularBench: Benchmarking Adversarial Robustness for Tabular Deep Learning in Real-world Use-cases

Overview

This paper introduces TabularBench, a benchmark for evaluating the adversarial robustness of tabular deep learning models in real-world use cases.
Tabular data is common in many industries, but current adversarial robustness research has largely focused on image and text data.
TabularBench provides a set of diverse, realistic tabular datasets and adversarial attack methods to assess the robustness of these models.

Plain English Explanation

TabularBench: Benchmarking Adversarial Robustness for Tabular Deep Learning in Real-world Use-cases aims to improve the security and reliability of deep learning models used with tabular data, which is very common in fields like finance, healthcare, and business.

Tabular data is organized in rows and columns, like a spreadsheet, and is different from the image and text data that most adversarial robustness research has focused on. The researchers created TabularBench to provide a standardized way to test how well tabular deep learning models can withstand adversarial attacks - small, carefully crafted changes to the input data that can trick the model into making mistakes.

By evaluating models on a diverse set of real-world tabular datasets and attack methods, TabularBench aims to help researchers and practitioners develop more robust and trustworthy AI systems for important applications.

Technical Explanation

TabularBench consists of a set of 10 diverse tabular datasets spanning different domains, ranging from credit card fraud detection to hospital readmission prediction. The researchers designed a variety of targeted adversarial attacks tailored to the unique characteristics of tabular data, such as the discrete and mixed nature of the features.

To evaluate model robustness, the researchers trained several deep learning classifiers on the TabularBench datasets and then applied different adversarial attack methods. They measured the accuracy of the models under attack and compared the performance to standard training without adversarial considerations.

The results show that current tabular deep learning models are highly vulnerable to adversarial attacks, with significant drops in accuracy even for small perturbations to the input data. The researchers also found that adversarial training, a common defense technique, provided only limited improvements in robustness for these real-world tabular use cases.

Critical Analysis

The authors acknowledge several limitations of their work. First, the TabularBench datasets, while diverse, may not capture the full complexity of real-world tabular data. Additionally, the proposed adversarial attacks, while novel, may not be exhaustive, and more sophisticated attack methods could potentially be developed.

The authors also note that the performance of adversarial defenses, such as adversarial training, could be improved with further research and development. The limited effectiveness of these techniques observed in the experiments suggests that new approaches may be needed to enhance the robustness of tabular deep learning models.

Further research is also needed to understand the underlying vulnerabilities of tabular deep learning models and develop more targeted and effective defense mechanisms. The authors encourage the community to build on the TabularBench framework to advance the state of the art in this important area.

Conclusion

TabularBench provides a valuable benchmark for assessing the adversarial robustness of tabular deep learning models in real-world use cases. The findings highlight the significant vulnerability of these models to adversarial attacks, which is a critical concern for the deployment of AI systems in high-stakes domains like finance, healthcare, and business.

By establishing a standardized evaluation framework and identifying key research challenges, this work lays the groundwork for developing more robust and trustworthy tabular deep learning models. Continued research in this area has the potential to improve the safety and reliability of AI-powered decision-making, with far-reaching implications for society.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

TabularBench: Benchmarking Adversarial Robustness for Tabular Deep Learning in Real-world Use-cases

Thibault Simonetto, Salah Ghamizi, Maxime Cordy

While adversarial robustness in computer vision is a mature research field, fewer researchers have tackled the evasion attacks against tabular deep learning, and even fewer investigated robustification mechanisms and reliable defenses. We hypothesize that this lag in the research on tabular adversarial attacks is in part due to the lack of standardized benchmarks. To fill this gap, we propose TabularBench, the first comprehensive benchmark of robustness of tabular deep learning classification models. We evaluated adversarial robustness with CAA, an ensemble of gradient and search attacks which was recently demonstrated as the most effective attack against a tabular model. In addition to our open benchmark (https://github.com/serval-uni-lu/tabularbench) where we welcome submissions of new models and defenses, we implement 7 robustification mechanisms inspired by state-of-the-art defenses in computer vision and propose the largest benchmark of robust tabular deep learning over 200 models across five critical scenarios in finance, healthcare and security. We curated real datasets for each use case, augmented with hundreds of thousands of realistic synthetic inputs, and trained and assessed our models with and without data augmentations. We open-source our library that provides API access to all our pre-trained robust tabular models, and the largest datasets of real and synthetic tabular inputs. Finally, we analyze the impact of various defenses on the robustness and provide actionable insights to design new defenses and robustification mechanisms.

8/15/2024

Constrained Adaptive Attack: Effective Adversarial Attack Against Deep Neural Networks for Tabular Data

Thibault Simonetto, Salah Ghamizi, Maxime Cordy

State-of-the-art deep learning models for tabular data have recently achieved acceptable performance to be deployed in industrial settings. However, the robustness of these models remains scarcely explored. Contrary to computer vision, there are no effective attacks to properly evaluate the adversarial robustness of deep tabular models due to intrinsic properties of tabular data, such as categorical features, immutability, and feature relationship constraints. To fill this gap, we first propose CAPGD, a gradient attack that overcomes the failures of existing gradient attacks with adaptive mechanisms. This new attack does not require parameter tuning and further degrades the accuracy, up to 81% points compared to the previous gradient attacks. Second, we design CAA, an efficient evasion attack that combines our CAPGD attack and MOEVA, the best search-based attack. We demonstrate the effectiveness of our attacks on five architectures and four critical use cases. Our empirical study demonstrates that CAA outperforms all existing attacks in 17 over the 20 settings, and leads to a drop in the accuracy by up to 96.1% points and 21.9% points compared to CAPGD and MOEVA respectively while being up to five times faster than MOEVA. Given the effectiveness and efficiency of our new attacks, we argue that they should become the minimal test for any new defense or robust architectures in tabular machine learning.

6/4/2024

Investigating Imperceptibility of Adversarial Attacks on Tabular Data: An Empirical Analysis

Zhipeng He, Chun Ouyang, Laith Alzubaidi, Alistair Barros, Catarina Moreira

Adversarial attacks are a potential threat to machine learning models by causing incorrect predictions through imperceptible perturbations to the input data. While these attacks have been extensively studied in unstructured data like images, applying them to tabular data, poses new challenges. These challenges arise from the inherent heterogeneity and complex feature interdependencies in tabular data, which differ from the image data. To account for this distinction, it is necessary to establish tailored imperceptibility criteria specific to tabular data. However, there is currently a lack of standardised metrics for assessing the imperceptibility of adversarial attacks on tabular data. To address this gap, we propose a set of key properties and corresponding metrics designed to comprehensively characterise imperceptible adversarial attacks on tabular data. These are: proximity to the original input, sparsity of altered features, deviation from the original data distribution, sensitivity in perturbing features with narrow distribution, immutability of certain features that should remain unchanged, feasibility of specific feature values that should not go beyond valid practical ranges, and feature interdependencies capturing complex relationships between data attributes. We evaluate the imperceptibility of five adversarial attacks, including both bounded attacks and unbounded attacks, on tabular data using the proposed imperceptibility metrics. The results reveal a trade-off between the imperceptibility and effectiveness of these attacks. The study also identifies limitations in current attack algorithms, offering insights that can guide future research in the area. The findings gained from this empirical analysis provide valuable direction for enhancing the design of adversarial attack algorithms, thereby advancing adversarial machine learning on tabular data.

8/22/2024

A Comprehensive Benchmark of Machine and Deep Learning Across Diverse Tabular Datasets

Assaf Shmuel, Oren Glickman, Teddy Lazebnik

The analysis of tabular datasets is highly prevalent both in scientific research and real-world applications of Machine Learning (ML). Unlike many other ML tasks, Deep Learning (DL) models often do not outperform traditional methods in this area. Previous comparative benchmarks have shown that DL performance is frequently equivalent or even inferior to models such as Gradient Boosting Machines (GBMs). In this study, we introduce a comprehensive benchmark aimed at better characterizing the types of datasets where DL models excel. Although several important benchmarks for tabular datasets already exist, our contribution lies in the variety and depth of our comparison: we evaluate 111 datasets with 20 different models, including both regression and classification tasks. These datasets vary in scale and include both those with and without categorical variables. Importantly, our benchmark contains a sufficient number of datasets where DL models perform best, allowing for a thorough analysis of the conditions under which DL models excel. Building on the results of this benchmark, we train a model that predicts scenarios where DL models outperform alternative methods with 86.1% accuracy (AUC 0.78). We present insights derived from this characterization and compare these findings to previous benchmarks.

8/28/2024