Efficient Enumeration of Recursive Plans in Transformation-based Query Optimizers

Read original: arXiv:2312.02572 - Published 4/4/2024 by Amela Fejza (TYREX), Pierre Genev`es (TYREX), Nabil Layaida (TYREX)

✅

Overview

Query optimizers in many database systems are built on the transformation-based Volcano/Cascades framework
Previous transformations in this framework only focused on queries without recursion
This paper proposes a new data structure called the recursive logical query dag (RLQDAG) that can capture and transform recursive queries

Plain English Explanation

Databases are like giant warehouses that store and manage all kinds of information. When you ask a database a question, it has to figure out the best way to find the answer. This process is called query optimization, and many databases use a framework called Volcano/Cascades to do it.

The Volcano/Cascades framework works by breaking down the query into smaller pieces, called a logical query dag (LQDAG), and then trying different ways to rearrange and transform those pieces to find the most efficient way to get the answer. However, the original LQDAG could only handle queries that didn't involve repeating the same steps over and over (recursion).

This paper proposes a new and improved version of the LQDAG called the recursive logical query dag (RLQDAG). The RLQDAG can capture and transform queries that do involve recursion, which opens up a whole new world of complex queries that databases can handle more efficiently.

The key ideas in the RLQDAG are:

Capturing recursion: The RLQDAG can represent sets of repeating query steps, rather than just individual steps
Guiding transformations: The RLQDAG uses special "annotated equivalence nodes" to help the optimizer figure out how to rearrange the recursive parts of the query
Grouped transformations: The RLQDAG can transform whole groups of repeating query steps at once, instead of just one step at a time
Incremental updates: The RLQDAG can keep track of the changes needed as it transforms the query, rather than having to start over from scratch

By incorporating these ideas, the RLQDAG provides a more powerful and flexible way for databases to optimize complex, recursive queries.

Technical Explanation

The key innovation in this paper is the recursive logical query dag (RLQDAG), which extends the existing logical query dag (LQDAG) data structure used in transformation-based query optimizers like Volcano/Cascades.

The LQDAG represents a query as a directed acyclic graph (DAG) of logical relational algebra operators. Previous work on the LQDAG focused only on queries without recursion. The RLQDAG introduced in this paper adds the ability to capture and transform recursive queries.

The RLQDAG achieves this through several key extensions:

Capturing recursion: The RLQDAG can represent sets of recursive relational algebra terms, rather than just individual terms.
Annotated equivalence nodes: The RLQDAG uses special annotated nodes to guide the more complex transformations required for recursive queries.
Grouped transformations: The RLQDAG can transform entire groups of recursive subterms at once, rather than transforming individual terms sequentially.
Incremental updates: The RLQDAG can incrementally update the necessary annotations as transformations are applied, avoiding the need to recompute everything from scratch.

The authors formally define the syntax and semantics of the RLQDAG, with a focus on capturing subterm sharing and recursion. They then implement the RLQDAG in a prototype and evaluate it against the state-of-the-art, showing significant performance improvements for recursive queries.

Critical Analysis

The proposed RLQDAG represents an important advance in query optimization capabilities, as it enables more efficient exploration of plan spaces for recursive queries. This is a valuable contribution, as many real-world applications involve complex, recursive data and queries.

That said, the paper does not extensively explore the limitations or potential issues with the RLQDAG approach. For example, it's unclear how the RLQDAG would scale to extremely large or highly complex recursive queries, or how it might perform compared to other optimization techniques like materialized views or the use of specialized recursive query processing algorithms.

Additionally, the paper's evaluation focuses only on performance metrics, without considering other important factors like memory usage, ease of implementation, or the generality of the approach. Further research could explore these areas and provide a more holistic assessment of the RLQDAG's strengths and weaknesses.

Overall, the RLQDAG represents a promising step forward in query optimization for recursive queries, but there is likely room for continued refinement and exploration of its capabilities and limitations.

Conclusion

This paper introduces the recursive logical query dag (RLQDAG), a novel data structure that extends the transformation-based Volcano/Cascades framework to enable more efficient optimization of recursive queries in database systems. By capturing recursion, guiding transformations, performing grouped transformations, and updating annotations incrementally, the RLQDAG provides a powerful and flexible way to explore plan spaces for complex, recursive queries.

The authors' implementation and evaluation of the RLQDAG demonstrates significant performance improvements over the state-of-the-art, suggesting that this approach could have significant real-world impact in enabling database systems to handle a broader range of queries and use cases. As the demand for complex data analysis and processing continues to grow, innovations like the RLQDAG will play an increasingly important role in pushing the boundaries of what's possible with database technology.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

✅

Efficient Enumeration of Recursive Plans in Transformation-based Query Optimizers

Amela Fejza (TYREX), Pierre Genev`es (TYREX), Nabil Layaida (TYREX)

Query optimizers built on the transformation-based Volcano/Cascades framework are used in many database systems. Transformations proposed earlier on the logical query dag (LQDAG) data structure, which is key in such a framework, focus only on recursion-free queries. In this paper, we propose the recursive logical query dag (RLQDAG) which extends the LQDAG with the ability to capture and transform recursive queries, leveraging recent developments in recursive relational algebra. Specifically, this extension includes: (i) the ability of capturing and transforming sets of recursive relational terms thanks to (ii) annotated equivalence nodes used for guiding transformations that are more complex in the presence of recursion; and (iii) RLQDAG rewrite rules that transform sets of subterms in a grouped manner, instead of transforming individual terms in a sequential manner; and that (iv) incrementally update the necessary annotations. Core concepts of the RLQDAG are formalized using a syntax and formal semantics with a particular focus on subterm sharing and recursion. The result is a clean generalization of the LQDAG transformation-based approach, enabling more efficient explorations of plan spaces for recursive queries. An implementation of the proposed approach shows significant performance gains compared to the state-of-the-art.

4/4/2024

Learned Graph Rewriting with Equality Saturation: A New Paradigm in Relational Query Rewrite and Beyond

George-Octavian Bu{a}rbulescu, Taiyi Wang, Zak Singh, Eiko Yoneki

Query rewrite systems perform graph substitutions using rewrite rules to generate optimal SQL query plans. Rewriting logical and physical relational query plans is proven to be an NP-hard sequential decision-making problem with a search space exponential in the number of rewrite rules. In this paper, we address the query rewrite problem by interleaving Equality Saturation and Graph Reinforcement Learning (RL). The proposed system, Aurora, rewrites relational queries by guiding Equality Saturation, a method from compiler literature to perform non-destructive graph rewriting, with a novel RL agent that embeds both the spatial structure of the query graph as well as the temporal dimension associated with the sequential construction of query plans. Our results show Graph Reinforcement Learning for non-destructive graph rewriting yields SQL plans orders of magnitude faster than existing equality saturation solvers, while also achieving competitive results against mainstream query optimisers.

7/19/2024

A Novel Technique for Query Plan Representation Based on Graph Neural Networks

Baoming Chang, Amin Kamali, Verena Kantere

Learning representations for query plans play a pivotal role in machine learning-based query optimizers of database management systems. To this end, particular model architectures are proposed in the literature to transform the tree-structured query plans into representations with formats learnable by downstream machine learning models. However, existing research rarely compares and analyzes the query plan representation capabilities of these tree models and their direct impact on the performance of the overall optimizer. To address this problem, we perform a comparative study to explore the effect of using different state-of-the-art tree models on the optimizer's cost estimation and plan selection performance in relatively complex workloads. Additionally, we explore the possibility of using graph neural networks (GNNs) in the query plan representation task. We propose a novel tree model BiGG employing Bidirectional GNN aggregated by Gated recurrent units (GRUs) and demonstrate experimentally that BiGG provides significant improvements to cost estimation tasks and relatively excellent plan selection performance compared to the state-of-the-art tree models.

6/6/2024

Enhancing Tabular Data Optimization with a Flexible Graph-based Reinforced Exploration Strategy

Xiaohan Huang, Dongjie Wang, Zhiyuan Ning, Ziyue Qiao, Qingqing Long, Haowei Zhu, Min Wu, Yuanchun Zhou, Meng Xiao

Tabular data optimization methods aim to automatically find an optimal feature transformation process that generates high-value features and improves the performance of downstream machine learning tasks. Current frameworks for automated feature transformation rely on iterative sequence generation tasks, optimizing decision strategies through performance feedback from downstream tasks. However, these approaches fail to effectively utilize historical decision-making experiences and overlook potential relationships among generated features, thus limiting the depth of knowledge extraction. Moreover, the granularity of the decision-making process lacks dynamic backtracking capabilities for individual features, leading to insufficient adaptability when encountering inefficient pathways, adversely affecting overall robustness and exploration efficiency. To address the limitations observed in current automatic feature engineering frameworks, we introduce a novel method that utilizes a feature-state transformation graph to effectively preserve the entire feature transformation journey, where each node represents a specific transformation state. During exploration, three cascading agents iteratively select nodes and idea mathematical operations to generate new transformation states. This strategy leverages the inherent properties of the graph structure, allowing for the preservation and reuse of valuable transformations. It also enables backtracking capabilities through graph pruning techniques, which can rectify inefficient transformation paths. To validate the efficacy and flexibility of our approach, we conducted comprehensive experiments and detailed case studies, demonstrating superior performance in diverse scenarios.

6/12/2024