Challenging Gradient Boosted Decision Trees with Tabular Transformers for Fraud Detection at Booking.com

2405.13692

YC

0

Reddit

0

Published 5/24/2024 by Sergei Krutikov (Booking.com), Bulat Khaertdinov (Maastricht University), Rodion Kiriukhin (Booking.com), Shubham Agrawal (Booking.com), Kees Jan De Vries (Booking.com)

🔎

Abstract

Transformer-based neural networks, empowered by Self-Supervised Learning (SSL), have demonstrated unprecedented performance across various domains. However, related literature suggests that tabular Transformers may struggle to outperform classical Machine Learning algorithms, such as Gradient Boosted Decision Trees (GBDT). In this paper, we aim to challenge GBDTs with tabular Transformers on a typical task faced in e-commerce, namely fraud detection. Our study is additionally motivated by the problem of selection bias, often occurring in real-life fraud detection systems. It is caused by the production system affecting which subset of traffic becomes labeled. This issue is typically addressed by sampling randomly a small part of the whole production data, referred to as a Control Group. This subset follows a target distribution of production data and therefore is usually preferred for training classification models with standard ML algorithms. Our methodology leverages the capabilities of Transformers to learn transferable representations using all available data by means of SSL, giving it an advantage over classical methods. Furthermore, we conduct large-scale experiments, pre-training tabular Transformers on vast amounts of data instances and fine-tuning them on smaller target datasets. The proposed approach outperforms heavily tuned GBDTs by a considerable margin of the Average Precision (AP) score. Pre-trained models show more consistent performance than the ones trained from scratch when fine-tuning data is limited. Moreover, they require noticeably less labeled data for reaching performance comparable to their GBDT competitor that utilizes the whole dataset.

Create account to get full access

or

If you already have an account, we'll log you in

Overview

  • Transformers, powered by self-supervised learning, have demonstrated impressive performance across various domains.
  • However, existing literature suggests that tabular Transformers may struggle to outperform classical machine learning algorithms like Gradient Boosted Decision Trees (GBDT).
  • This paper aims to challenge GBDTs with tabular Transformers on a fraud detection task, a common problem in e-commerce.
  • The study also addresses the issue of selection bias, which often occurs in real-world fraud detection systems.

Plain English Explanation

Transformers, a type of neural network, have recently shown remarkable capabilities in many areas. However, some research indicates that when working with tabular data (structured data in rows and columns), Transformers may not always outperform traditional machine learning methods like Gradient Boosted Decision Trees (GBDTs).

In this paper, the researchers wanted to see if they could change that. They decided to pit tabular Transformers against GBDTs in a fraud detection task, which is a common problem faced by e-commerce companies. Fraud detection is challenging because of a problem called "selection bias." This means that the data used to train the models may not fully represent the real-world distribution of fraudulent and non-fraudulent transactions.

The researchers believed that Transformers, with their ability to learn from large amounts of unlabeled data through a technique called self-supervised learning, could have an advantage over classical methods in this scenario. By pre-training the Transformers on lots of data and then fine-tuning them on smaller, more targeted datasets, the researchers hoped to outperform the heavily tuned GBDTs.

Technical Explanation

The researchers conducted large-scale experiments, pre-training tabular Transformers on vast amounts of data and then fine-tuning them on smaller target datasets for the fraud detection task. The key idea was to leverage the Transformers' ability to learn transferable representations using self-supervised learning, giving them an advantage over classical methods like GBDTs.

The proposed approach outperformed heavily tuned GBDTs by a considerable margin in terms of the Average Precision (AP) score, a common metric for evaluating fraud detection models. The pre-trained Transformer models also showed more consistent performance than those trained from scratch when the fine-tuning data was limited. Moreover, the pre-trained Transformers required noticeably less labeled data to reach performance comparable to their GBDT counterpart that utilized the entire dataset.

Critical Analysis

The paper provides a compelling approach to addressing the challenge of selection bias in real-world fraud detection systems. By leveraging the power of self-supervised learning, the Transformer-based models were able to learn more robust and transferable representations from the available data, outperforming the traditional GBDT approach.

However, the paper does not explore the potential limitations of this approach. For example, it would be interesting to understand how the Transformer models perform on datasets with different characteristics, such as varying degrees of imbalance or feature sparsity. Additionally, the researchers did not delve into the interpretability of the Transformer-based models, which could be an important consideration for real-world deployment in high-stakes applications like fraud detection.

Further research could also investigate the trade-offs between the computational complexity and inference speed of the Transformer models compared to the GBDT approach, as this could be a critical factor in production environments.

Conclusion

This paper presents a promising approach to challenging traditional machine learning methods like GBDTs in the domain of fraud detection, a common problem faced by e-commerce companies. By leveraging the power of self-supervised learning, the researchers were able to develop tabular Transformer models that outperformed heavily tuned GBDTs, even when the fine-tuning data was limited.

While the results are encouraging, the paper also highlights the need for further exploration of the limitations and trade-offs of this approach. Nonetheless, the successful application of Transformers to tabular data tasks, such as fraud detection, represents an important advancement in the field of machine learning and its potential to solve real-world problems.



This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Related Papers

Credit Card Fraud Detection Using Advanced Transformer Model

Credit Card Fraud Detection Using Advanced Transformer Model

Chang Yu, Yongshun Xu, Jin Cao, Ye Zhang, Yinxin Jin, Mengran Zhu

YC

0

Reddit

0

With the proliferation of various online and mobile payment systems, credit card fraud has emerged as a significant threat to financial security. This study focuses on innovative applications of the latest Transformer models for more robust and precise fraud detection. To ensure the reliability of the data, we meticulously processed the data sources, balancing the dataset to address the issue of data sparsity significantly. We also selected highly correlated vectors to strengthen the training process.To guarantee the reliability and practicality of the new Transformer model, we conducted performance comparisons with several widely adopted models, including Support Vector Machine (SVM), Random Forest, Neural Network, and Logistic Regression. We rigorously compared these models using metrics such as Precision, Recall, and F1 Score. Through these detailed analyses and comparisons, we present to the readers a highly efficient and powerful anti-fraud mechanism with promising prospects. The results demonstrate that the Transformer model not only excels in traditional applications but also shows great potential in niche areas like fraud detection, offering a substantial advancement in the field.

Read more

6/26/2024

🧠

Tabdoor: Backdoor Vulnerabilities in Transformer-based Neural Networks for Tabular Data

Bart Pleiter, Behrad Tajalli, Stefanos Koffas, Gorka Abad, Jing Xu, Martha Larson, Stjepan Picek

YC

0

Reddit

0

Deep Neural Networks (DNNs) have shown great promise in various domains. Alongside these developments, vulnerabilities associated with DNN training, such as backdoor attacks, are a significant concern. These attacks involve the subtle insertion of triggers during model training, allowing for manipulated predictions. More recently, DNNs for tabular data have gained increasing attention due to the rise of transformer models. Our research presents a comprehensive analysis of backdoor attacks on tabular data using DNNs, mainly focusing on transformers. We also propose a novel approach for trigger construction: an in-bounds attack, which provides excellent attack performance while maintaining stealthiness. Through systematic experimentation across benchmark datasets, we uncover that transformer-based DNNs for tabular data are highly susceptible to backdoor attacks, even with minimal feature value alterations. We also verify that our attack can be generalized to other models, like XGBoost and DeepFM. Our results demonstrate up to 100% attack success rate with negligible clean accuracy drop. Furthermore, we evaluate several defenses against these attacks, identifying Spectral Signatures as the most effective. Nevertheless, our findings highlight the need to develop tabular data-specific countermeasures to defend against backdoor attacks.

Read more

4/29/2024

🤷

ExcelFormer: Can a DNN be a Sure Bet for Tabular Prediction?

Jintai Chen, Jiahuan Yan, Qiyuan Chen, Danny Ziyi Chen, Jian Wu, Jimeng Sun

YC

0

Reddit

0

Data organized in tabular format is ubiquitous in real-world applications, and users often craft tables with biased feature definitions and flexibly set prediction targets of their interests. Thus, a rapid development of a robust, effective, dataset-versatile, user-friendly tabular prediction approach is highly desired. While Gradient Boosting Decision Trees (GBDTs) and existing deep neural networks (DNNs) have been extensively utilized by professional users, they present several challenges for casual users, particularly: (i) the dilemma of model selection due to their different dataset preferences, and (ii) the need for heavy hyperparameter searching, failing which their performances are deemed inadequate. In this paper, we delve into this question: Can we develop a deep learning model that serves as a sure bet solution for a wide range of tabular prediction tasks, while also being user-friendly for casual users? We delve into three key drawbacks of deep tabular models, encompassing: (P1) lack of rotational variance property, (P2) large data demand, and (P3) over-smooth solution. We propose ExcelFormer, addressing these challenges through a semi-permeable attention module that effectively constrains the influence of less informative features to break the DNNs' rotational invariance property (for P1), data augmentation approaches tailored for tabular data (for P2), and attentive feedforward network to boost the model fitting capability (for P3). These designs collectively make ExcelFormer a sure bet solution for diverse tabular datasets. Extensive and stratified experiments conducted on real-world datasets demonstrate that our model outperforms previous approaches across diverse tabular data prediction tasks, and this framework can be friendly to casual users, offering ease of use without the heavy hyperparameter tuning.

Read more

5/27/2024

Gradient Boosted Filters For Signal Processing

Gradient Boosted Filters For Signal Processing

Jose A. Lopez, Georg Stemmer, Hector A. Cordourier

YC

0

Reddit

0

Gradient boosted decision trees have achieved remarkable success in several domains, particularly those that work with static tabular data. However, the application of gradient boosted models to signal processing is underexplored. In this work, we introduce gradient boosted filters for dynamic data, by employing Hammerstein systems in place of decision trees. We discuss the relationship of our approach to the Volterra series, providing the theoretical underpinning for its application. We demonstrate the effective generalizability of our approach with examples.

Read more

5/16/2024