Cross-Modality Program Representation Learning for Electronic Design Automation with High-Level Synthesis

2406.09606

Published 7/1/2024 by Zongyue Qin, Yunsheng Bai, Atefeh Sohrabizadeh, Zijian Ding, Ziniu Hu, Yizhou Sun, Jason Cong

Cross-Modality Program Representation Learning for Electronic Design Automation with High-Level Synthesis

Abstract

In recent years, domain-specific accelerators (DSAs) have gained popularity for applications such as deep learning and autonomous driving. To facilitate DSA designs, programmers use high-level synthesis (HLS) to compile a high-level description written in C/C++ into a design with low-level hardware description languages that eventually synthesize DSAs on circuits. However, creating a high-quality HLS design still demands significant domain knowledge, particularly in microarchitecture decisions expressed as textit{pragmas}. Thus, it is desirable to automate such decisions with the help of machine learning for predicting the quality of HLS designs, requiring a deeper understanding of the program that consists of original code and pragmas. Naturally, these programs can be considered as sequence data. In addition, these programs can be compiled and converted into a control data flow graph (CDFG). But existing works either fail to leverage both modalities or combine the two in shallow or coarse ways. We propose ProgSG, a model that allows interaction between the source code sequence modality and the graph modality in a deep and fine-grained way. To alleviate the scarcity of labeled designs, a pre-training method is proposed based on a suite of compiler's data flow analysis tasks. Experimental results show that ProgSG reduces the RMSE of design performance predictions by up to $22%$, and identifies designs with an average of $1.10times$ and $1.26times$ (up to $8.17times$ and $13.31times$) performance improvement in design space exploration (DSE) task compared to HARP and AutoDSE, respectively.

Create account to get full access

Overview

This paper presents a novel approach for cross-modality program representation learning in the context of Electronic Design Automation (EDA) with High-Level Synthesis (HLS).
The researchers develop a framework that can effectively capture and leverage the connections between different program representations (e.g., source code, high-level synthesis constraints, and hardware designs) to improve the performance of various EDA tasks.
The proposed method aims to address the challenges of traditional HLS approaches, which often struggle with efficiently handling the complex relationships between program representations.

Plain English Explanation

When designing electronic systems, engineers often use a process called High-Level Synthesis (HLS) to translate high-level programming code into low-level hardware designs. This process can be challenging because the program code, the constraints for the hardware, and the final hardware design are all closely related but exist in different "modalities" or representations.

The researchers in this paper have developed a new way to capture the connections between these different representations. By learning how the code, constraints, and hardware designs are linked, their framework can better understand the overall program and improve the performance of various EDA tasks, such as [LINK: https://aimodels.fyi/papers/arxiv/skip-benchmark-generating-system-level-high-level] benchmark generation or [LINK: https://aimodels.fyi/papers/arxiv/hysynth-context-free-llm-approximation-guiding-program] program synthesis.

The key innovation is the ability to learn a shared representation that can capture the relationships between the different modalities, rather than treating them as completely separate. This allows the system to better leverage the information contained in each representation to improve its understanding and performance on EDA problems.

Technical Explanation

The researchers propose a [LINK: https://aimodels.fyi/papers/arxiv/hlsfactory-framework-empowering-high-level-synthesis-datasets] framework for cross-modality program representation learning in the context of EDA and HLS. The core of their approach is a neural network-based model that can learn a shared latent representation capturing the connections between source code, HLS constraints, and hardware designs.

This shared representation is learned by training the model to perform various tasks that require understanding the relationships between the different modalities, such as [LINK: https://aimodels.fyi/papers/arxiv/codegrag-extracting-composed-syntax-graphs-retrieval-augmented] code-to-constraint prediction and hardware design reconstruction from source code.

By learning this cross-modality representation, the model can then be used to improve the performance of other EDA tasks, such as [LINK: https://aimodels.fyi/papers/arxiv/explaining-eda-synthesis-errors-llms] synthesis error explanation, where the shared representation can provide valuable insights into the root causes of errors.

The researchers evaluate their approach on several benchmark datasets and demonstrate significant improvements over traditional HLS methods, highlighting the benefits of their cross-modality representation learning framework.

Critical Analysis

The researchers acknowledge several limitations and areas for future work in their paper. One key challenge is the need for large, high-quality datasets that capture the complex relationships between program representations in the EDA domain. The authors note that the available datasets may not fully reflect the real-world complexity of industrial-scale design workflows.

Additionally, the proposed framework relies on the availability of labeled data for the various cross-modality tasks, which may not always be easy to obtain. Exploring unsupervised or semi-supervised learning approaches could help address this limitation and expand the applicability of the method.

Another potential issue is the interpretability of the learned cross-modality representations. While the model's performance on various tasks is impressive, it may be beneficial to investigate ways to make the learned representations more interpretable and explainable, which could further enhance their usefulness in EDA workflows.

Conclusion

This paper presents a novel approach for cross-modality program representation learning in the context of Electronic Design Automation and High-Level Synthesis. By learning a shared latent representation that captures the connections between source code, HLS constraints, and hardware designs, the researchers have developed a framework that can significantly improve the performance of various EDA tasks.

The key contribution of this work is the ability to leverage the information contained in different program representations, rather than treating them as separate entities. This cross-modality understanding has the potential to streamline and enhance EDA workflows, ultimately leading to more efficient and effective electronic system design processes.

While the proposed method shows promising results, the researchers have identified several areas for future research, such as the need for larger and more diverse datasets, as well as the exploration of more interpretable representation learning techniques. Addressing these challenges could further strengthen the impact of this cross-modality program representation learning approach in the field of Electronic Design Automation.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Related Papers

📊

Skip the Benchmark: Generating System-Level High-Level Synthesis Data using Generative Machine Learning

Yuchao Liao, Tosiron Adegbija, Roman Lysecky, Ravi Tandon

High-Level Synthesis (HLS) Design Space Exploration (DSE) is a widely accepted approach for efficiently exploring Pareto-optimal and optimal hardware solutions during the HLS process. Several HLS benchmarks and datasets are available for the research community to evaluate their methodologies. Unfortunately, these resources are limited and may not be sufficient for complex, multi-component system-level explorations. Generating new data using existing HLS benchmarks can be cumbersome, given the expertise and time required to effectively generate data for different HLS designs and directives. As a result, synthetic data has been used in prior work to evaluate system-level HLS DSE. However, the fidelity of the synthetic data to real data is often unclear, leading to uncertainty about the quality of system-level HLS DSE. This paper proposes a novel approach, called Vaegan, that employs generative machine learning to generate synthetic data that is robust enough to support complex system-level HLS DSE experiments that would be unattainable with only the currently available data. We explore and adapt a Variational Autoencoder (VAE) and Generative Adversarial Network (GAN) for this task and evaluate our approach using state-of-the-art datasets and metrics. We compare our approach to prior works and show that Vaegan effectively generates synthetic HLS data that closely mirrors the ground truth's distribution.

4/24/2024

cs.LG cs.AI cs.AR

HYSYNTH: Context-Free LLM Approximation for Guiding Program Synthesis

Shraddha Barke, Emmanuel Anaya Gonzalez, Saketh Ram Kasibatla, Taylor Berg-Kirkpatrick, Nadia Polikarpova

Many structured prediction and reasoning tasks can be framed as program synthesis problems, where the goal is to generate a program in a domain-specific language (DSL) that transforms input data into the desired output. Unfortunately, purely neural approaches, such as large language models (LLMs), often fail to produce fully correct programs in unfamiliar DSLs, while purely symbolic methods based on combinatorial search scale poorly to complex problems. Motivated by these limitations, we introduce a hybrid approach, where LLM completions for a given task are used to learn a task-specific, context-free surrogate model, which is then used to guide program synthesis. We evaluate this hybrid approach on three domains, and show that it outperforms both unguided search and direct sampling from LLMs, as well as existing program synthesizers.

5/28/2024

cs.PL cs.AI

HLSFactory: A Framework Empowering High-Level Synthesis Datasets for Machine Learning and Beyond

Stefan Abi-Karam, Rishov Sarkar, Allison Seigler, Sean Lowe, Zhigang Wei, Hanqiu Chen, Nanditha Rao, Lizy John, Aman Arora, Cong Hao

Machine learning (ML) techniques have been applied to high-level synthesis (HLS) flows for quality-of-result (QoR) prediction and design space exploration (DSE). Nevertheless, the scarcity of accessible high-quality HLS datasets and the complexity of building such datasets present challenges. Existing datasets have limitations in terms of benchmark coverage, design space enumeration, vendor extensibility, or lack of reproducible and extensible software for dataset construction. Many works also lack user-friendly ways to add more designs, limiting wider adoption of such datasets. In response to these challenges, we introduce HLSFactory, a comprehensive framework designed to facilitate the curation and generation of high-quality HLS design datasets. HLSFactory has three main stages: 1) a design space expansion stage to elaborate single HLS designs into large design spaces using various optimization directives across multiple vendor tools, 2) a design synthesis stage to execute HLS and FPGA tool flows concurrently across designs, and 3) a data aggregation stage for extracting standardized data into packaged datasets for ML usage. This tripartite architecture ensures broad design space coverage via design space expansion and supports multiple vendor tools. Users can contribute to each stage with their own HLS designs and synthesis results and extend the framework itself with custom frontends and tool flows. We also include an initial set of built-in designs from common HLS benchmarks curated open-source HLS designs. We showcase the versatility and multi-functionality of our framework through six case studies: I) Design space sampling; II) Fine-grained parallelism backend speedup; III) Targeting Intel's HLS flow; IV) Adding new auxiliary designs; V) Integrating published HLS data; VI) HLS tool version regression benchmarking. Code at https://github.com/sharc-lab/HLSFactory.

5/20/2024

cs.AR cs.LG

CodeGRAG: Extracting Composed Syntax Graphs for Retrieval Augmented Cross-Lingual Code Generation

Kounianhua Du, Renting Rui, Huacan Chai, Lingyue Fu, Wei Xia, Yasheng Wang, Ruiming Tang, Yong Yu, Weinan Zhang

Utilizing large language models to generate codes has shown promising meaning in software development revolution. Despite the intelligence shown by the general large language models, their specificity in code generation can still be improved due to the syntactic gap and mismatched vocabulary existing among natural language and different programming languages. In addition, programming languages are inherently logical and complex, making them hard to be correctly generated. Existing methods rely on multiple prompts to the large language model to explore better solutions, which is expensive. In this paper, we propose Syntax Graph Retrieval Augmented Code Generation (CodeGRAG) to enhance the performance of LLMs in single-round code generation tasks. CodeGRAG extracts and summarizes the control flow and data flow of code blocks to fill the gap between programming languages and natural language. The extracted external structural knowledge models the inherent flows of code blocks, which can facilitate LLMs for better understanding of code syntax and serve as a bridge among different programming languages. CodeGRAG significantly improves the code generation ability of LLMs and can even offer performance gain for cross-lingual code generation, e.g., C++ for Python.

5/7/2024

cs.SE cs.AI