GPTScan: Detecting Logic Vulnerabilities in Smart Contracts by Combining GPT with Program Analysis

Read original: arXiv:2308.03314 - Published 5/7/2024 by Yuqiang Sun, Daoyuan Wu, Yue Xue, Han Liu, Haijun Wang, Zhengzi Xu, Xiaofei Xie, Yang Liu
Total Score

0

🗣️

Sign in to get full access

or

If you already have an account, we'll log you in

Overview

  • Smart contracts, which are self-executing contracts with the terms of the agreement directly written into code, are prone to various vulnerabilities that can lead to substantial financial losses over time.
  • Current analysis tools primarily target vulnerabilities with fixed control or data-flow patterns, such as re-entrancy and integer overflow, but a recent study revealed that around 80% of Web3 security bugs cannot be audited by existing tools.
  • The paper proposes GPTScan, a tool that combines large language models (LLMs) like GPT with static analysis for smart contract logic vulnerability detection.

Plain English Explanation

The paper explores how Generative Pre-training Transformer (GPT), a type of large language model, can be used to help identify vulnerabilities in smart contracts. Smart contracts are self-executing digital agreements, kind of like automated contracts, that are written in code.

The problem is that these smart contracts can have vulnerabilities, or weaknesses, that can lead to significant financial losses. Current tools for analyzing smart contracts are limited because they mainly look for specific types of vulnerabilities with predictable patterns, like re-entrancy attacks or integer overflow issues. However, a recent study found that around 80% of the security bugs in Web3 (the decentralized internet built on blockchain technology) cannot be detected by these existing tools.

To address this, the researchers developed a new tool called GPTScan that combines GPT with traditional static code analysis techniques. Instead of just relying on GPT to identify vulnerabilities, which could lead to many false positives, GPTScan breaks down different types of logic vulnerabilities into specific scenarios and properties. It then uses GPT as a versatile tool to help understand the code and match it to these vulnerability patterns. GPTScan also has a step to validate the potential issues it finds using static analysis, which helps reduce false positives.

The researchers evaluated GPTScan on a diverse dataset of around 400 smart contract projects and 3,000 Solidity files (Solidity is a programming language used for Ethereum smart contracts). They found that GPTScan achieves high precision (over 90%) for detecting vulnerabilities in token contracts and acceptable precision (57.14%) for larger projects like Web3Bugs. Importantly, it was able to find 9 new vulnerabilities that were missed by human auditors. GPTScan is also fast and cost-effective, taking only about 14 seconds and $0.01 to scan 1,000 lines of Solidity code.

Technical Explanation

The paper proposes GPTScan, a novel tool that combines large language models (LLMs) like GPT with static analysis for smart contract logic vulnerability detection. Unlike existing tools that focus on fixed control or data-flow patterns, GPTScan takes a more comprehensive approach by breaking down logic vulnerability types into specific scenarios and properties.

GPTScan utilizes GPT as a versatile code understanding tool, rather than relying solely on it to identify vulnerabilities, which can lead to high false positives. By matching candidate vulnerabilities with GPT, GPTScan can leverage the model's ability to comprehend code semantics and context. To enhance accuracy, GPTScan further instructs GPT to intelligently recognize key variables and statements, which are then validated through static confirmation.

The researchers evaluated GPTScan on a diverse dataset of around 400 contract projects and 3,000 Solidity files. The results show that GPTScan achieves high precision (over 90%) for token contracts and acceptable precision (57.14%) for large projects like Web3Bugs. Importantly, it was able to detect 9 new vulnerabilities that were missed by human auditors. GPTScan is also fast and cost-effective, taking an average of 14.39 seconds and $0.01 to scan per thousand lines of Solidity code. The static confirmation step helps reduce two-thirds of the false positives.

Critical Analysis

The paper presents a novel approach to smart contract vulnerability detection by leveraging the capabilities of large language models like GPT. The key strength of GPTScan is its ability to go beyond fixed vulnerability patterns and address a broader range of logic vulnerabilities, which constitute the majority of Web3 security bugs.

However, the paper acknowledges that GPTScan's performance, while promising, is still not perfect. The precision for larger projects is relatively lower (57.14%), indicating that there is room for improvement, especially in handling more complex code bases. Additionally, the paper does not provide a detailed analysis of the types of vulnerabilities that GPTScan struggles to detect or the specific scenarios where it falls short.

Further research could explore ways to enhance GPTScan's accuracy, potentially by incorporating more advanced techniques in the static analysis component or by fine-tuning the language model on a larger and more diverse dataset of smart contracts. Additionally, it would be beneficial to conduct a comprehensive comparison with other state-of-the-art vulnerability detection tools to better understand GPTScan's relative strengths and weaknesses.

While the paper demonstrates the potential of combining LLMs and static analysis for smart contract security, it is crucial to continue exploring the limitations and ethical implications of using such systems, especially in the context of high-stakes financial applications. The risk of false positives or missed vulnerabilities could have significant consequences, and the research community should remain vigilant in addressing these challenges.

Conclusion

The paper presents GPTScan, a novel tool that combines large language models like GPT with static analysis to detect logic vulnerabilities in smart contracts. By breaking down vulnerability types into specific scenarios and properties, GPTScan leverages the code understanding capabilities of GPT while leveraging static analysis to validate potential issues and reduce false positives.

The evaluation results demonstrate that GPTScan can achieve high precision and effectively detect ground-truth vulnerabilities, including several that were missed by human auditors. This approach represents a significant advancement in smart contract security, addressing the limitations of existing tools that focus on fixed vulnerability patterns.

While GPTScan shows promising results, the paper also highlights the need for further research to enhance its accuracy, especially for larger and more complex code bases. Exploring ways to improve the static analysis component and fine-tune the language model could lead to even more robust vulnerability detection capabilities. As the use of smart contracts continues to grow, tools like GPTScan will play a crucial role in ensuring the security and reliability of decentralized applications.



This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

🗣️

Total Score

0

GPTScan: Detecting Logic Vulnerabilities in Smart Contracts by Combining GPT with Program Analysis

Yuqiang Sun, Daoyuan Wu, Yue Xue, Han Liu, Haijun Wang, Zhengzi Xu, Xiaofei Xie, Yang Liu

Smart contracts are prone to various vulnerabilities, leading to substantial financial losses over time. Current analysis tools mainly target vulnerabilities with fixed control or data-flow patterns, such as re-entrancy and integer overflow. However, a recent study on Web3 security bugs revealed that about 80% of these bugs cannot be audited by existing tools due to the lack of domain-specific property description and checking. Given recent advances in Large Language Models (LLMs), it is worth exploring how Generative Pre-training Transformer (GPT) could aid in detecting logicc vulnerabilities. In this paper, we propose GPTScan, the first tool combining GPT with static analysis for smart contract logic vulnerability detection. Instead of relying solely on GPT to identify vulnerabilities, which can lead to high false positives and is limited by GPT's pre-trained knowledge, we utilize GPT as a versatile code understanding tool. By breaking down each logic vulnerability type into scenarios and properties, GPTScan matches candidate vulnerabilities with GPT. To enhance accuracy, GPTScan further instructs GPT to intelligently recognize key variables and statements, which are then validated by static confirmation. Evaluation on diverse datasets with around 400 contract projects and 3K Solidity files shows that GPTScan achieves high precision (over 90%) for token contracts and acceptable precision (57.14%) for large projects like Web3Bugs. It effectively detects ground-truth logic vulnerabilities with a recall of over 70%, including 9 new vulnerabilities missed by human auditors. GPTScan is fast and cost-effective, taking an average of 14.39 seconds and 0.01 USD to scan per thousand lines of Solidity code. Moreover, static confirmation helps GPTScan reduce two-thirds of false positives.

Read more

5/7/2024

🌀

Total Score

0

Automated Software Vulnerability Static Code Analysis Using Generative Pre-Trained Transformer Models

Elijah Pelofske, Vincent Urias, Lorie M. Liebrock

Generative Pre-Trained Transformer models have been shown to be surprisingly effective at a variety of natural language processing tasks -- including generating computer code. We evaluate the effectiveness of open source GPT models for the task of automatic identification of the presence of vulnerable code syntax (specifically targeting C and C++ source code). This task is evaluated on a selection of 36 source code examples from the NIST SARD dataset, which are specifically curated to not contain natural English that indicates the presence, or lack thereof, of a particular vulnerability. The NIST SARD source code dataset contains identified vulnerable lines of source code that are examples of one out of the 839 distinct Common Weakness Enumerations (CWE), allowing for exact quantification of the GPT output classification error rate. A total of 5 GPT models are evaluated, using 10 different inference temperatures and 100 repetitions at each setting, resulting in 5,000 GPT queries per vulnerable source code analyzed. Ultimately, we find that the GPT models that we evaluated are not suitable for fully automated vulnerability scanning because the false positive and false negative rates are too high to likely be useful in practice. However, we do find that the GPT models perform surprisingly well at automated vulnerability detection for some of the test cases, in particular surpassing random sampling, and being able to identify the exact lines of code that are vulnerable albeit at a low success rate. The best performing GPT model result found was Llama-2-70b-chat-hf with inference temperature of 0.1 applied to NIST SARD test case 149165 (which is an example of a buffer overflow vulnerability), which had a binary classification recall score of 1.0 and a precision of 1.0 for correctly and uniquely identifying the vulnerable line of code and the correct CWE number.

Read more

8/2/2024

AuditGPT: Auditing Smart Contracts with ChatGPT
Total Score

0

AuditGPT: Auditing Smart Contracts with ChatGPT

Shihao Xia, Shuai Shao, Mengting He, Tingting Yu, Linhai Song, Yiying Zhang

To govern smart contracts running on Ethereum, multiple Ethereum Request for Comment (ERC) standards have been developed, each containing a set of rules to guide the behaviors of smart contracts. Violating the ERC rules could cause serious security issues and financial loss, signifying the importance of verifying smart contracts follow ERCs. Today's practices of such verification are to either manually audit each single contract or use expert-developed, limited-scope program-analysis tools, both of which are far from being effective in identifying ERC rule violations. This paper presents a tool named AuditGPT that leverages large language models (LLMs) to automatically and comprehensively verify ERC rules against smart contracts. To build AuditGPT, we first conduct an empirical study on 222 ERC rules specified in four popular ERCs to understand their content, their security impacts, their specification in natural language, and their implementation in Solidity. Guided by the study, we construct AuditGPT by separating the large, complex auditing process into small, manageable tasks and design prompts specialized for each ERC rule type to enhance LLMs' auditing performance. In the evaluation, AuditGPT successfully pinpoints 418 ERC rule violations and only reports 18 false positives, showcasing its effectiveness and accuracy. Moreover, AuditGPT beats an auditing service provided by security experts in effectiveness, accuracy, and cost, demonstrating its advancement over state-of-the-art smart-contract auditing practices.

Read more

4/9/2024

🛸

Total Score

0

PropertyGPT: LLM-driven Formal Verification of Smart Contracts through Retrieval-Augmented Property Generation

Ye Liu, Yue Xue, Daoyuan Wu, Yuqiang Sun, Yi Li, Miaolei Shi, Yang Liu

With recent advances in large language models (LLMs), this paper explores the potential of leveraging state-of-the-art LLMs, such as GPT-4, to transfer existing human-written properties (e.g., those from Certora auditing reports) and automatically generate customized properties for unknown code. To this end, we embed existing properties into a vector database and retrieve a reference property for LLM-based in-context learning to generate a new prop- erty for a given code. While this basic process is relatively straight- forward, ensuring that the generated properties are (i) compilable, (ii) appropriate, and (iii) runtime-verifiable presents challenges. To address (i), we use the compilation and static analysis feedback as an external oracle to guide LLMs in iteratively revising the generated properties. For (ii), we consider multiple dimensions of similarity to rank the properties and employ a weighted algorithm to identify the top-K properties as the final result. For (iii), we design a dedicated prover to formally verify the correctness of the generated prop- erties. We have implemented these strategies into a novel system called PropertyGPT, with 623 human-written properties collected from 23 Certora projects. Our experiments show that PropertyGPT can generate comprehensive and high-quality properties, achieving an 80% recall compared to the ground truth. It successfully detected 26 CVEs/attack incidents out of 37 tested and also uncovered 12 zero-day vulnerabilities, resulting in $8,256 bug bounty rewards.

Read more

5/7/2024