Automated Creation of Source Code Variants of a Cryptographic Hash Function Implementation Using Generative Pre-Trained Transformer Models

Read original: arXiv:2404.15681 - Published 7/11/2024 by Elijah Pelofske, Vincent Urias, Lorie M. Liebrock

Automated Creation of Source Code Variants of a Cryptographic Hash Function Implementation Using Generative Pre-Trained Transformer Models

Overview

This research paper explores the automated creation of source code variants of a cryptographic hash function implementation using generative pre-trained transformer (GPT) models.
The researchers investigate how GPT models can be used to generate diverse and functionally equivalent versions of source code for a particular cryptographic hash function implementation.
The goal is to showcase the potential of GPT models in automating the software engineering task of creating source code variants, which can be useful for software testing, security analysis, and other applications.

Plain English Explanation

In this paper, the researchers used a type of artificial intelligence (AI) called a generative pre-trained transformer (GPT) model to automatically generate different versions of the source code for a cryptographic hash function. A cryptographic hash function is a mathematical algorithm that takes data of any size and converts it into a unique, fixed-size code. These hash functions are used in many security and encryption applications.

The researchers wanted to see if a GPT model could be trained to create multiple, slightly different versions of the source code for a hash function, while still keeping the core functionality the same. This is useful because having a variety of code versions can help test the software more thoroughly and identify potential security vulnerabilities that might not be found in a single version.

The generative AI approach used in this paper is similar to how GPT models can be used to generate human-like text. In this case, the GPT model was trained on the original hash function source code, and then it was able to generate new, similar-looking code versions.

The researchers found that the GPT-generated code variants were indeed functionally equivalent to the original implementation, but with some syntactic and structural differences. This demonstrates the potential of using AI-powered code generation to automate the creation of software variants, which could be a valuable tool for software engineers and security researchers.

Technical Explanation

The researchers used a pre-trained GPT-2 model as the foundation for their code generation system. They fine-tuned the GPT-2 model on the original source code of a cryptographic hash function implementation, which allowed the model to learn the structure and syntax of the code.

Once the model was trained, the researchers used it to generate new, functionally equivalent versions of the hash function source code. They did this by providing the trained GPT-2 model with a prompt containing the original code, and then letting the model continue generating new code that followed a similar structure and logic.

The researchers evaluated the generated code variants in terms of their functional equivalence, syntactic diversity, and structural differences compared to the original implementation. They found that the GPT-generated code maintained the core functionality of the hash function while exhibiting a range of syntactic and structural variations.

This research demonstrates the potential of using generative AI models like GPT to automate the creation of source code variants, which can be valuable for software testing, security analysis, and other applications that require diverse code versions.

Critical Analysis

The researchers acknowledge several limitations and areas for further research in this paper. For example, they note that the generated code variants were not extensively tested for security vulnerabilities, and more work is needed to ensure the reliability and robustness of the generated code.

Additionally, the researchers used a relatively simple cryptographic hash function implementation as the basis for their experiments. It remains to be seen how well the GPT-based code generation approach would scale to more complex, real-world software systems with larger and more intricate codebases.

Another potential concern is the curse of recursion – the possibility that the GPT model could propagate and amplify certain biases or flaws present in the original training data, leading to undesirable or unexpected behavior in the generated code variants.

Overall, this research is a promising step forward in the application of generative AI techniques to software engineering tasks, but more work is needed to fully understand the capabilities and limitations of this approach.

Conclusion

This paper presents a novel approach to automating the creation of source code variants for a cryptographic hash function implementation using generative pre-trained transformer (GPT) models. The researchers demonstrate that GPT-based code generation can produce functionally equivalent, yet syntactically and structurally diverse, versions of the original hash function source code.

The implications of this research extend beyond the specific domain of cryptographic hash functions, as the ability to automatically generate code variants could be valuable for a wide range of software engineering and security applications. As AI-powered code generation continues to advance, it will be interesting to see how this technology is applied to other software development tasks and its impact on the field as a whole.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

Automated Creation of Source Code Variants of a Cryptographic Hash Function Implementation Using Generative Pre-Trained Transformer Models

Elijah Pelofske, Vincent Urias, Lorie M. Liebrock

Generative pre-trained transformers (GPT's) are a type of large language machine learning model that are unusually adept at producing novel, and coherent, natural language. In this study the ability of GPT models to generate novel and correct versions, and notably very insecure versions, of implementations of the cryptographic hash function SHA-1 is examined. The GPT models Llama-2-70b-chat-h, Mistral-7B-Instruct-v0.1, and zephyr-7b-alpha are used. The GPT models are prompted to re-write each function using a modified version of the localGPT framework and langchain to provide word embedding context of the full source code and header files to the model, resulting in over 150,000 function re-write GPT output text blocks, approximately 50,000 of which were able to be parsed as C code and subsequently compiled. The generated code is analyzed for being compilable, correctness of the algorithm, memory leaks, compiler optimization stability, and character distance to the reference implementation. Remarkably, several generated function variants have a high implementation security risk of being correct for some test vectors, but incorrect for other test vectors. Additionally, many function implementations were not correct to the reference algorithm of SHA-1, but produced hashes that have some of the basic characteristics of hash functions. Many of the function re-writes contained serious flaws such as memory leaks, integer overflows, out of bounds accesses, use of uninitialised values, and compiler optimization instability. Compiler optimization settings and SHA-256 hash checksums of the compiled binaries are used to cluster implementations that are equivalent but may not have identical syntax - using this clustering over 100,000 novel and correct versions of the SHA-1 codebase were generated where each component C function of the reference implementation is different from the original code.

7/11/2024

🌀

Automated Software Vulnerability Static Code Analysis Using Generative Pre-Trained Transformer Models

Elijah Pelofske, Vincent Urias, Lorie M. Liebrock

Generative Pre-Trained Transformer models have been shown to be surprisingly effective at a variety of natural language processing tasks -- including generating computer code. We evaluate the effectiveness of open source GPT models for the task of automatic identification of the presence of vulnerable code syntax (specifically targeting C and C++ source code). This task is evaluated on a selection of 36 source code examples from the NIST SARD dataset, which are specifically curated to not contain natural English that indicates the presence, or lack thereof, of a particular vulnerability. The NIST SARD source code dataset contains identified vulnerable lines of source code that are examples of one out of the 839 distinct Common Weakness Enumerations (CWE), allowing for exact quantification of the GPT output classification error rate. A total of 5 GPT models are evaluated, using 10 different inference temperatures and 100 repetitions at each setting, resulting in 5,000 GPT queries per vulnerable source code analyzed. Ultimately, we find that the GPT models that we evaluated are not suitable for fully automated vulnerability scanning because the false positive and false negative rates are too high to likely be useful in practice. However, we do find that the GPT models perform surprisingly well at automated vulnerability detection for some of the test cases, in particular surpassing random sampling, and being able to identify the exact lines of code that are vulnerable albeit at a low success rate. The best performing GPT model result found was Llama-2-70b-chat-hf with inference temperature of 0.1 applied to NIST SARD test case 149165 (which is an example of a buffer overflow vulnerability), which had a binary classification recall score of 1.0 and a precision of 1.0 for correctly and uniquely identifying the vulnerable line of code and the correct CWE number.

8/2/2024

🤖

Automated Multi-Language to English Machine Translation Using Generative Pre-Trained Transformers

Elijah Pelofske, Vincent Urias, Lorie M. Liebrock

The task of accurate and efficient language translation is an extremely important information processing task. Machine learning enabled and automated translation that is accurate and fast is often a large topic of interest in the machine learning and data science communities. In this study, we examine using local Generative Pretrained Transformer (GPT) models to perform automated zero shot black-box, sentence wise, multi-natural-language translation into English text. We benchmark 16 different open-source GPT models, with no custom fine-tuning, from the Huggingface LLM repository for translating 50 different non-English languages into English using translated TED Talk transcripts as the reference dataset. These GPT model inference calls are performed strictly locally, on single A100 Nvidia GPUs. Benchmark metrics that are reported are language translation accuracy, using BLEU, GLEU, METEOR, and chrF text overlap measures, and wall-clock time for each sentence translation. The best overall performing GPT model for translating into English text for the BLEU metric is ReMM-v2-L2-13B with a mean score across all tested languages of $0.152$, for the GLEU metric is ReMM-v2-L2-13B with a mean score across all tested languages of $0.256$, for the chrF metric is Llama2-chat-AYT-13B with a mean score across all tested languages of $0.448$, and for the METEOR metric is ReMM-v2-L2-13B with a mean score across all tested languages of $0.438$.

4/24/2024

🛸

Generative AI-Based Text Generation Methods Using Pre-Trained GPT-2 Model

Rohit Pandey, Hetvi Waghela, Sneha Rakshit, Aparna Rangari, Anjali Singh, Rahul Kumar, Ratnadeep Ghosal, Jaydip Sen

This work delved into the realm of automatic text generation, exploring a variety of techniques ranging from traditional deterministic approaches to more modern stochastic methods. Through analysis of greedy search, beam search, top-k sampling, top-p sampling, contrastive searching, and locally typical searching, this work has provided valuable insights into the strengths, weaknesses, and potential applications of each method. Each text-generating method is evaluated using several standard metrics and a comparative study has been made on the performance of the approaches. Finally, some future directions of research in the field of automatic text generation are also identified.

4/3/2024