Prompts Are Programs Too! Understanding How Developers Build Software Containing Prompts

Read original: arXiv:2409.12447 - Published 9/20/2024 by Jenny T. Liang, Melissa Lin, Nikitha Rao, Brad A. Myers

Prompts Are Programs Too! Understanding How Developers Build Software Containing Prompts

Overview

Explores how developers build software containing prompts
Uses Straussian grounded theory methodology to understand prompt engineering practices
Identifies key challenges and best practices for developers working with prompts

Plain English Explanation

The paper examines how developers create software that utilizes prompts, which are instructions or templates used to guide language models in generating content.

The researchers used a qualitative research method called Straussian grounded theory to gain insights into the prompt engineering process. They interviewed developers and analyzed their experiences to identify common themes and practices.

The findings highlight the unique challenges developers face when incorporating prompts into their software, such as ensuring prompt robustness and managing prompt-related technical debt. The paper also outlines best practices for prompt design, testing, and maintenance to help developers build more effective and reliable prompt-based applications.

Technical Explanation

The paper uses a Straussian grounded theory methodology to explore how developers build software containing prompts. The researchers conducted semi-structured interviews with 30 developers across various industries and analyzed the transcripts to identify recurring themes and patterns.

The analysis uncovered several key insights about prompt engineering practices. Developers reported facing challenges such as ensuring prompt robustness, managing prompt-related technical debt, and maintaining prompt-based systems over time. To address these challenges, the researchers found that successful prompt engineering involves practices like comprehensive prompt testing, modular prompt design, and dedicated prompt maintenance workflows.

The paper also discusses the implications of these findings for the broader field of prompt-based AI systems, highlighting the need for further research and tooling to support developers working at the intersection of language models and application development.

Critical Analysis

The paper provides valuable insights into the practical realities of building software with prompts, an area that has received relatively little attention in the research literature. By using a grounded theory approach, the authors were able to capture the nuanced experiences and pain points of developers, which can inform the design of better prompt engineering tools and practices.

However, the study is limited to a relatively small sample size, and the findings may not be generalizable to all types of prompt-based software development. Additionally, the paper does not delve deeply into the technical details of the prompt engineering process, which could be a fruitful area for further research.

Future studies may want to explore the impact of different prompt engineering approaches on the performance and robustness of the resulting AI-powered applications. Comparisons between prompt-based and traditional software development practices could also yield interesting insights.

Conclusion

This paper offers a rare glimpse into the often-overlooked world of prompt engineering, highlighting the unique challenges and best practices that developers encounter when building software with language models. The findings suggest that prompt-based development requires specialized skills and workflows, and that further research and tooling in this area could significantly benefit the broader AI application development ecosystem.

By shedding light on the practical realities of prompt engineering, this study lays the groundwork for more targeted support and guidance for developers working at the intersection of language models and software development.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

Prompts Are Programs Too! Understanding How Developers Build Software Containing Prompts

Jenny T. Liang, Melissa Lin, Nikitha Rao, Brad A. Myers

The introduction of generative pre-trained models, like GPT-4, has introduced a phenomenon known as prompt engineering, whereby model users repeatedly write and revise prompts while trying to achieve a task. Using these AI models for intelligent features in software applications require using APIs that are controlled through developer-written prompts. These prompts have powered AI experiences in popular software products, potentially reaching millions of users. Despite the growing impact of prompt-powered software, little is known about its development process and its relationship to programming. In this work, we argue that some forms of prompts are programs, and that the development of prompts is a distinct phenomenon in programming. We refer to this phenomenon as prompt programming. To this end, we develop an understanding of prompt programming using Straussian grounded theory through interviews with 20 developers engaged in prompt development across a variety of contexts, models, domains, and prompt complexities. Through this study, we contribute 14 observations about prompt programming. For example, rather than building mental models of code, prompt programmers develop mental models of the FM's behavior on the prompt and its unique qualities by interacting with the model. While prior research has shown that experts have well-formed mental models, we find that prompt programmers who have developed dozens of prompts, each with many iterations, still struggle to develop reliable mental models. This contributes to a rapid and unsystematic development process. Taken together, our observations indicate that prompt programming is significantly different from traditional software development, motivating the creation of tools to support prompt programming. Our findings have implications for software engineering practitioners, educators, and researchers.

9/20/2024

The Prompt Report: A Systematic Survey of Prompting Techniques

Sander Schulhoff, Michael Ilie, Nishant Balepur, Konstantine Kahadze, Amanda Liu, Chenglei Si, Yinheng Li, Aayush Gupta, HyoJung Han, Sevien Schulhoff, Pranav Sandeep Dulepet, Saurav Vidyadhara, Dayeon Ki, Sweta Agrawal, Chau Pham, Gerson Kroiz, Feileen Li, Hudson Tao, Ashay Srivastava, Hevander Da Costa, Saloni Gupta, Megan L. Rogers, Inna Goncearenco, Giuseppe Sarli, Igor Galynker, Denis Peskoff, Marine Carpuat, Jules White, Shyamal Anadkat, Alexander Hoyle, Philip Resnik

Generative Artificial Intelligence (GenAI) systems are being increasingly deployed across all parts of industry and research settings. Developers and end users interact with these systems through the use of prompting or prompt engineering. While prompting is a widespread and highly researched concept, there exists conflicting terminology and a poor ontological understanding of what constitutes a prompt due to the area's nascency. This paper establishes a structured understanding of prompts, by assembling a taxonomy of prompting techniques and analyzing their use. We present a comprehensive vocabulary of 33 vocabulary terms, a taxonomy of 58 text-only prompting techniques, and 40 techniques for other modalities. We further present a meta-analysis of the entire literature on natural language prefix-prompting.

7/16/2024

💬

Prompt Engineering a Prompt Engineer

Qinyuan Ye, Maxamed Axmed, Reid Pryzant, Fereshte Khani

Prompt engineering is a challenging yet crucial task for optimizing the performance of large language models on customized tasks. It requires complex reasoning to examine the model's errors, hypothesize what is missing or misleading in the current prompt, and communicate the task with clarity. While recent works indicate that large language models can be meta-prompted to perform automatic prompt engineering, we argue that their potential is limited due to insufficient guidance for complex reasoning in the meta-prompt. We fill this gap by infusing into the meta-prompt three key components: detailed descriptions, context specification, and a step-by-step reasoning template. The resulting method, named PE2, exhibits remarkable versatility across diverse language tasks. It finds prompts that outperform let's think step by step by 6.3% on MultiArith and 3.1% on GSM8K, and outperforms competitive baselines on counterfactual tasks by 6.9%. Further, we show that PE2 can make targeted and highly specific prompt edits, rectify erroneous prompts, and induce multi-step plans for complex tasks.

7/4/2024

🛸

Can Developers Prompt? A Controlled Experiment for Code Documentation Generation

Hans-Alexander Kruse, Tim Puhlfur{ss}, Walid Maalej

Large language models (LLMs) bear great potential for automating tedious development tasks such as creating and maintaining code documentation. However, it is unclear to what extent developers can effectively prompt LLMs to create concise and useful documentation. We report on a controlled experiment with 20 professionals and 30 computer science students tasked with code documentation generation for two Python functions. The experimental group freely entered ad-hoc prompts in a ChatGPT-like extension of Visual Studio Code, while the control group executed a predefined few-shot prompt. Our results reveal that professionals and students were unaware of or unable to apply prompt engineering techniques. Especially students perceived the documentation produced from ad-hoc prompts as significantly less readable, less concise, and less helpful than documentation from prepared prompts. Some professionals produced higher quality documentation by just including the keyword Docstring in their ad-hoc prompts. While students desired more support in formulating prompts, professionals appreciated the flexibility of ad-hoc prompting. Participants in both groups rarely assessed the output as perfect. Instead, they understood the tools as support to iteratively refine the documentation. Further research is needed to understand which prompting skills and preferences developers have and which support they need for certain tasks.

8/2/2024