A Transformer-Based Approach for Smart Invocation of Automatic Code Completion

Read original: arXiv:2405.14753 - Published 5/24/2024 by Aral de Moor, Arie van Deursen, Maliheh Izadi

🛠️

Overview

Developers often use code completion tools, but these tools can be disruptive and costly to run.
The paper presents a machine learning model that can predict when to invoke a code completion tool based on the code context and available telemetry data.
The model was trained on a dataset of 200k developer interactions with a code completion plugin and was deployed in a real-world environment with 34 developers.

Plain English Explanation

Code completion tools, which suggest code snippets as developers type, can be very helpful for programming productivity. However, these tools can also be disruptive if they suggest completions too often, interrupting developers who are trying to focus on their work. Additionally, running these powerful language models can be computationally expensive for companies.

To address these issues, the researchers developed a machine learning model that can predict when a developer should receive code completion suggestions. The model was trained on a large dataset of how developers actually used a code completion plugin in practice. By understanding the right moments to provide suggestions, the model can avoid unnecessary interruptions while still offering helpful completions when needed.

The researchers found that their transformer-based model outperformed simpler baselines at this task, while still being fast enough to use in real-time. They also explored ways to enhance the model by incorporating additional telemetry data about the developer's workflow and context.

To validate their approach, the researchers deployed the model in a real-world setting with 34 developers, giving them 74,000 actual code completion invocations to analyze. This provided valuable insights into how the model performed in practice.

Technical Explanation

The researchers collected a dataset of 200,000 developer interactions with a cross-IDE code completion plugin. They then trained several machine learning models, including a small-scale transformer model, to predict when the code completion tool should be invoked based on the current code context and available telemetry data.

Their results showed that the transformer model significantly outperformed simpler baseline models at this task, while still maintaining low latency. The researchers also explored integrating additional telemetry signals, such as developer workflow information, directly into the pre-trained transformer model, with promising results.

To further evaluate their approach, the researchers deployed the model in an online environment with 34 developers, resulting in 74,000 actual code completion invocations. This real-world deployment provided valuable insights into the practical performance and impact of their model.

Critical Analysis

The researchers have made a thoughtful attempt to address the challenge of optimizing when code completion tools should be invoked, which is an important practical concern. By training their model on real-world developer interactions, they have gained insights into how these tools are used in practice, which is often overlooked in existing research.

However, the paper does not provide detailed analysis of the specific factors the model uses to make its predictions, nor does it explore potential biases or limitations of the training data. Additionally, while the real-world deployment is a valuable contribution, the sample size of 34 developers may be too small to draw strong conclusions about the model's broader applicability.

Further research could explore how the model's performance varies across different development environments, task types, and developer skill levels. Investigating the model's interpretability and potential ethical considerations around selectively providing code completion suggestions would also be worthwhile.

Conclusion

This research presents a promising approach to optimizing the use of code completion tools, balancing their benefits with the need to avoid disrupting developers. By training a machine learning model to predict the right moments to provide suggestions, the researchers have taken an important step towards enhancing the interactive code generation experience.

While further research is needed to fully understand the model's performance and limitations, this work demonstrates the potential for AI-powered tools to improve developer productivity and workflow in practical, real-world settings.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

🛠️

A Transformer-Based Approach for Smart Invocation of Automatic Code Completion

Aral de Moor, Arie van Deursen, Maliheh Izadi

Transformer-based language models are highly effective for code completion, with much research dedicated to enhancing the content of these completions. Despite their effectiveness, these models come with high operational costs and can be intrusive, especially when they suggest too often and interrupt developers who are concentrating on their work. Current research largely overlooks how these models interact with developers in practice and neglects to address when a developer should receive completion suggestions. To tackle this issue, we developed a machine learning model that can accurately predict when to invoke a code completion tool given the code context and available telemetry data. To do so, we collect a dataset of 200k developer interactions with our cross-IDE code completion plugin and train several invocation filtering models. Our results indicate that our small-scale transformer model significantly outperforms the baseline while maintaining low enough latency. We further explore the search space for integrating additional telemetry data into a pre-trained transformer directly and obtain promising results. To further demonstrate our approach's practical potential, we deployed the model in an online environment with 34 developers and provided real-world insights based on 74k actual invocations.

5/24/2024

🤖

Full Line Code Completion: Bringing AI to Desktop

Anton Semenkin, Vitaliy Bibaev, Yaroslav Sokolov, Kirill Krylov, Alexey Kalina, Anna Khannanova, Danila Savenkov, Darya Rovdo, Igor Davidenko, Kirill Karnaukhov, Maxim Vakhrushev, Mikhail Kostyukov, Mikhail Podvitskii, Petr Surkov, Yaroslav Golubev, Nikita Povarov, Timofey Bryksin

In recent years, several industrial solutions for the problem of multi-token code completion have appeared, each making a great advance in the area but mostly focusing on cloud-based runtime and avoiding working on the end user's device. In this work, we describe our approach for building a multi-token code completion feature for the JetBrains' IntelliJ Platform, which we call Full Line Code Completion. The feature suggests only syntactically correct code and works fully locally, i.e., data querying and the generation of suggestions happens on the end user's machine. We share important time and memory-consumption restrictions, as well as design principles that a code completion engine should satisfy. Working entirely on the end user's device, our code completion engine enriches user experience while being not only fast and compact but also secure. We share a number of useful techniques to meet the stated development constraints and also describe offline and online evaluation pipelines that allowed us to make better decisions. Our online evaluation shows that the usage of the tool leads to 1.5 times more code in the IDE being produced by code completion. The described solution was initially started with the help of researchers and was bundled into two JetBrains' IDEs - PyCharm Pro and DataSpell - at the end of 2023, so we believe that this work is useful for bridging academia and industry, providing researchers with the knowledge of what happens when complex research-based solutions are integrated into real products.

5/15/2024

💬

Leveraging Large Language Models for Software Model Completion: Results from Industrial and Public Datasets

Christof Tinnes, Alisa Welter, Sven Apel

Modeling structure and behavior of software systems plays a crucial role in the industrial practice of software engineering. As with other software engineering artifacts, software models are subject to evolution. Supporting modelers in evolving software models with recommendations for model completions is still an open problem, though. In this paper, we explore the potential of large language models for this task. In particular, we propose an approach, retrieval-augmented generation, leveraging large language models, model histories, and retrieval-augmented generation for model completion. Through experiments on three datasets, including an industrial application, one public open-source community dataset, and one controlled collection of simulated model repositories, we evaluate the potential of large language models for model completion with retrieval-augmented generation. We found that large language models are indeed a promising technology for supporting software model evolution (62.30% semantically correct completions on real-world industrial data and up to 86.19% type-correct completions). The general inference capabilities of large language models are particularly useful when dealing with concepts for which there are few, noisy, or no examples at all.

6/28/2024

Retrieval-augmented code completion for local projects using large language models

Marko Hostnik, Marko Robnik-v{S}ikonja

The use of large language models (LLMs) is becoming increasingly widespread among software developers. However, privacy and computational requirements are problematic with commercial solutions and the use of LLMs. In this work, we focus on using LLMs with around 160 million parameters that are suitable for local execution and augmentation with retrieval from local projects. We train two models based on the transformer architecture, the generative model GPT-2 and the retrieval-adapted RETRO model, on open-source Python files, and empirically evaluate and compare them, confirming the benefits of vector embedding based retrieval. Further, we improve our models' performance with In-context retrieval-augmented generation, which retrieves code snippets based on the Jaccard similarity of tokens. We evaluate In-context retrieval-augmented generation on larger models and conclude that, despite its simplicity, the approach is more suitable than using the RETRO architecture. We highlight the key role of proper tokenization in achieving the full potential of LLMs in code completion.

8/12/2024