AgentKit: Structured LLM Reasoning with Dynamic Graphs

Read original: arXiv:2404.11483 - Published 7/26/2024 by Yue Wu, Yewen Fan, So Yeon Min, Shrimai Prabhumoye, Stephen McAleer, Yonatan Bisk, Ruslan Salakhutdinov, Yuanzhi Li, Tom Mitchell

AgentKit: Structured LLM Reasoning with Dynamic Graphs

Overview

AgentKit is a framework that allows developers to create and manage complex workflows without traditional coding.
It uses a graph-based approach to define and orchestrate agent-based workflows, making it easier to design, deploy, and maintain complex applications.
The key features of AgentKit include node components, a skill library, prompts, and a knowledge base, which work together to enable flow engineering.

Plain English Explanation

AgentKit is a tool that helps developers build complex applications without having to write a lot of traditional computer code. Instead, it uses a visual, graph-based approach. Developers can create "nodes" that represent different tasks or actions, and then connect those nodes together to define the workflow of the application.

Each node is a self-contained component that can perform a specific function, like fetching data or sending an email. Developers can choose from a library of pre-built skills, or create their own custom nodes. These nodes are then linked together using "prompts" that describe how the different parts of the workflow should interact.

The framework also includes a knowledge base, which acts like a database of information that the application can draw upon. This knowledge base can be used to provide context and background information to the different nodes in the workflow.

Overall, AgentKit is designed to make it easier for developers to design, deploy, and maintain complex applications. By using a visual, graph-based approach instead of traditional coding, it can save time and reduce the risk of errors or bugs.

Technical Explanation

AgentKit is a flow engineering framework that uses Kahn's algorithm to define and orchestrate agent-based workflows. At the core of AgentKit are node components - self-contained units that encapsulate specific functionality. These nodes can be connected together using prompts to define the flow of the application.

The framework also includes a skill library - a collection of pre-built node components that developers can use to quickly assemble workflows. Additionally, AgentKit provides a knowledge base that stores contextual information that can be accessed by the nodes during execution.

AgentKit's UI allows developers to visually design and configure workflows using a graph-based interface. This approach simplifies the development process by abstracting away the underlying complexity of traditional coding.

The key innovation of AgentKit is its use of a declarative, graph-based model to define application flows, rather than relying on imperative coding. This shift in paradigm enables greater extensibility and encapsulation, as well as improved observability and reusability of workflow components.

Critical Analysis

The paper provides a thorough overview of the AgentKit framework and its key components. The graph-based approach and use of Kahn's algorithm are well-explained and seem to offer a compelling alternative to traditional coding-based workflow management.

One potential limitation is the reliance on a pre-defined "skill library" of node components. While this can simplify development, it may also constrain the flexibility of the system and limit the ability to create entirely custom functionality. The authors acknowledge this and discuss the possibility of allowing developers to create their own node components.

Additionally, the paper does not provide detailed performance or scalability analysis of the AgentKit framework. As applications grow in complexity, it will be important to understand how the graph-based approach and Kahn's algorithm scale, particularly in terms of execution time and resource utilization.

Finally, the paper would benefit from a more comprehensive discussion of the potential challenges and limitations of the framework, such as how it handles error handling, versioning, and integration with existing systems. These aspects are briefly mentioned but could be explored in greater depth.

Conclusion

AgentKit presents a novel approach to flow engineering that leverages a graph-based model and agent-based architecture to simplify the development of complex applications. By abstracting away the underlying complexity of traditional coding, the framework has the potential to improve productivity, extensibility, and maintainability for developers.

While the paper provides a solid technical foundation for AgentKit, further research and real-world deployment will be necessary to fully assess the framework's strengths, weaknesses, and long-term viability. Nonetheless, the concepts and ideas presented in this work offer an interesting and potentially impactful contribution to the field of workflow management and application development.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

AgentKit: Structured LLM Reasoning with Dynamic Graphs

Yue Wu, Yewen Fan, So Yeon Min, Shrimai Prabhumoye, Stephen McAleer, Yonatan Bisk, Ruslan Salakhutdinov, Yuanzhi Li, Tom Mitchell

We propose an intuitive LLM prompting framework (AgentKit) for multifunctional agents. AgentKit offers a unified framework for explicitly constructing a complex thought process from simple natural language prompts. The basic building block in AgentKit is a node, containing a natural language prompt for a specific subtask. The user then puts together chains of nodes, like stacking LEGO pieces. The chains of nodes can be designed to explicitly enforce a naturally structured thought process. For example, for the task of writing a paper, one may start with the thought process of 1) identify a core message, 2) identify prior research gaps, etc. The nodes in AgentKit can be designed and combined in different ways to implement multiple advanced capabilities including on-the-fly hierarchical planning, reflection, and learning from interactions. In addition, due to the modular nature and the intuitive design to simulate explicit human thought process, a basic agent could be implemented as simple as a list of prompts for the subtasks and therefore could be designed and tuned by someone without any programming experience. Quantitatively, we show that agents designed through AgentKit achieve SOTA performance on WebShop and Crafter. These advances underscore AgentKit's potential in making LLM agents effective and accessible for a wider range of applications. https://github.com/holmeswww/AgentKit

7/26/2024

💬

Language Agents as Optimizable Graphs

Mingchen Zhuge, Wenyi Wang, Louis Kirsch, Francesco Faccio, Dmitrii Khizbullin, Jurgen Schmidhuber

Various human-designed prompt engineering techniques have been proposed to improve problem solvers based on Large Language Models (LLMs), yielding many disparate code bases. We unify these approaches by describing LLM-based agents as computational graphs. The nodes implement functions to process multimodal data or query LLMs, and the edges describe the information flow between operations. Graphs can be recursively combined into larger composite graphs representing hierarchies of inter-agent collaboration (where edges connect operations of different agents). Our novel automatic graph optimizers (1) refine node-level LLM prompts (node optimization) and (2) improve agent orchestration by changing graph connectivity (edge optimization). Experiments demonstrate that our framework can be used to efficiently develop, integrate, and automatically improve various LLM agents. The code can be found at https://github.com/metauto-ai/gptswarm.

8/23/2024

LLM-Based Open-Domain Integrated Task and Knowledge Assistants with Programmable Policies

Harshit Joshi, Shicheng Liu, James Chen, Robert Weigle, Monica S. Lam

Programming LLM-based knowledge and task assistants that faithfully conform to developer-provided policies is challenging. These agents must retrieve and provide consistent, accurate, and relevant information to address user's queries and needs. Yet such agents generate unfounded responses (hallucinate). Traditional dialogue trees can only handle a limited number of conversation flows, making them inherently brittle. To this end, we present KITA - a programmable framework for creating task-oriented conversational agents that are designed to handle complex user interactions. Unlike LLMs, KITA provides reliable grounded responses, with controllable agent policies through its expressive specification, KITA Worksheet. In contrast to dialog trees, it is resilient to diverse user queries, helpful with knowledge sources, and offers ease of programming policies through its declarative paradigm. Through a real-user study involving 62 participants, we show that KITA beats the GPT-4 with function calling baseline by 26.1, 22.5, and 52.4 points on execution accuracy, dialogue act accuracy, and goal completion rate, respectively. We also release 22 real-user conversations with KITA manually corrected to ensure accuracy.

7/9/2024

CuriousLLM: Elevating Multi-Document QA with Reasoning-Infused Knowledge Graph Prompting

Zukang Yang, Zixuan Zhu

In the field of Question Answering (QA), unifying large language models (LLMs) with external databases has shown great success. However, these methods often fall short in providing the advanced reasoning needed for complex QA tasks. To address these issues, we improve over a novel approach called Knowledge Graph Prompting (KGP), which combines knowledge graphs with a LLM-based agent to improve reasoning and search accuracy. Nevertheless, the original KGP framework necessitates costly fine-tuning with large datasets yet still suffers from LLM hallucination. Therefore, we propose a reasoning-infused LLM agent to enhance this framework. This agent mimics human curiosity to ask follow-up questions to more efficiently navigate the search. This simple modification significantly boosts the LLM performance in QA tasks without the high costs and latency associated with the initial KGP framework. Our ultimate goal is to further develop this approach, leading to more accurate, faster, and cost-effective solutions in the QA domain.

4/16/2024