RNA-FrameFlow: Flow Matching for de novo 3D RNA Backbone Design

Read original: arXiv:2406.13839 - Published 6/21/2024 by Rishabh Anand, Chaitanya K. Joshi, Alex Morehead, Arian R. Jamasb, Charles Harris, Simon V. Mathis, Kieran Didi, Bryan Hooi, Pietro Li`o
Total Score

0

RNA-FrameFlow: Flow Matching for de novo 3D RNA Backbone Design

Sign in to get full access

or

If you already have an account, we'll log you in

Overview

  • This paper introduces a new method called RNA-FrameFlow for designing 3D RNA backbone structures from sequence information.
  • It builds on previous work on protein backbone generation using SE3-Flow and Sequence-Augmented SE3 Flow.
  • The method uses a flow-based model to generate 3D RNA structures that match the desired sequence and structural constraints.

Plain English Explanation

The RNA-FrameFlow method is a way to design 3D structures for RNA molecules based solely on their genetic sequence. RNA is a molecule similar to DNA that plays important roles in the body, but figuring out its 3D shape is challenging.

RNA-FrameFlow uses an artificial intelligence technique called "flow matching" to generate 3D shapes for an RNA sequence. It builds on previous work that used similar techniques to design 3D structures for proteins. The key idea is to train a neural network model to learn the relationship between RNA sequences and their 3D shapes. Once trained, the model can then generate new 3D shapes that match a given RNA sequence.

This is valuable because knowing the 3D structure of an RNA molecule can provide important insights into its function and how it might interact with other molecules in the body. Being able to computationally design 3D RNA structures from just the sequence information alone could accelerate biomedical research and drug discovery.

Technical Explanation

The RNA-FrameFlow pipeline consists of several key components:

  1. RNA Structural Encoding: The 3D structure of an RNA molecule is represented as a sequence of rigid body frames, capturing the orientation and position of each nucleotide in 3D space.

  2. SE3 Flow Matching: Building on previous work on SE3-Flow and Sequence-Augmented SE3 Flow, the model uses a flow-based approach to generate 3D RNA backbones that match both the desired sequence and structural constraints.

  3. Sequence-to-Structure Regression: The core of the model is a neural network that learns to map from RNA sequences to their corresponding 3D structural representations.

  4. Structure Refinement: The generated 3D structures are further refined using energy minimization techniques to ensure they satisfy physicochemical constraints.

The authors evaluate RNA-FrameFlow on a variety of benchmark datasets and show that it can generate realistic 3D RNA structures that closely match the target sequences and structural properties. Compared to previous methods, RNA-FrameFlow demonstrates improved accuracy and efficiency in de novo 3D RNA backbone design.

Critical Analysis

The authors acknowledge several limitations of the current RNA-FrameFlow approach:

  • The model is trained on a limited set of known RNA structures, which may not capture the full diversity of possible 3D conformations.
  • The flow-based generative model may struggle to capture complex long-range interactions and correlations within the 3D structure.
  • The refinement step using energy minimization is computationally expensive and could be further optimized.

Additionally, the paper does not provide a thorough comparison to other state-of-the-art methods for 3D RNA structure prediction, such as GRNADE, which may offer complementary strengths and weaknesses.

Further research could explore ways to improve the model's ability to capture the nuances of RNA structure, as well as integrate it with other computational tools for a more comprehensive RNA design pipeline.

Conclusion

The RNA-FrameFlow method represents a promising approach for computationally designing 3D RNA backbone structures from sequence information alone. By leveraging flow-based generative modeling and sequence-to-structure regression, the method can generate realistic 3D RNA shapes that satisfy both sequence and structural constraints.

This work builds on recent advancements in protein backbone generation and could have significant implications for applications in biomedical research, drug discovery, and synthetic biology, where the ability to design custom RNA molecules with desired 3D structures could unlock new avenues of exploration and innovation.



This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on š• ā†’

Related Papers

RNA-FrameFlow: Flow Matching for de novo 3D RNA Backbone Design
Total Score

0

RNA-FrameFlow: Flow Matching for de novo 3D RNA Backbone Design

Rishabh Anand, Chaitanya K. Joshi, Alex Morehead, Arian R. Jamasb, Charles Harris, Simon V. Mathis, Kieran Didi, Bryan Hooi, Pietro Li`o

We introduce RNA-FrameFlow, the first generative model for 3D RNA backbone design. We build upon SE(3) flow matching for protein backbone generation and establish protocols for data preparation and evaluation to address unique challenges posed by RNA modeling. We formulate RNA structures as a set of rigid-body frames and associated loss functions which account for larger, more conformationally flexible RNA backbones (13 atoms per nucleotide) vs. proteins (4 atoms per residue). Toward tackling the lack of diversity in 3D RNA datasets, we explore training with structural clustering and cropping augmentations. Additionally, we define a suite of evaluation metrics to measure whether the generated RNA structures are globally self-consistent (via inverse folding followed by forward folding) and locally recover RNA-specific structural descriptors. The most performant version of RNA-FrameFlow generates locally realistic RNA backbones of 40-150 nucleotides, over 40% of which pass our validity criteria as measured by a self-consistency TM-score >= 0.45, at which two RNAs have the same global fold. Open-source code: https://github.com/rish-16/rna-backbone-design

Read more

6/21/2024

RNAFlow: RNA Structure & Sequence Design via Inverse Folding-Based Flow Matching
Total Score

0

RNAFlow: RNA Structure & Sequence Design via Inverse Folding-Based Flow Matching

Divya Nori, Wengong Jin

The growing significance of RNA engineering in diverse biological applications has spurred interest in developing AI methods for structure-based RNA design. While diffusion models have excelled in protein design, adapting them for RNA presents new challenges due to RNA's conformational flexibility and the computational cost of fine-tuning large structure prediction models. To this end, we propose RNAFlow, a flow matching model for protein-conditioned RNA sequence-structure design. Its denoising network integrates an RNA inverse folding model and a pre-trained RosettaFold2NA network for generation of RNA sequences and structures. The integration of inverse folding in the structure denoising process allows us to simplify training by fixing the structure prediction network. We further enhance the inverse folding model by conditioning it on inferred conformational ensembles to model dynamic RNA conformations. Evaluation on protein-conditioned RNA structure and sequence generation tasks demonstrates RNAFlow's advantage over existing RNA design methods.

Read more

6/11/2024

SE(3)-Stochastic Flow Matching for Protein Backbone Generation
Total Score

0

SE(3)-Stochastic Flow Matching for Protein Backbone Generation

Avishek Joey Bose, Tara Akhound-Sadegh, Guillaume Huguet, Kilian Fatras, Jarrid Rector-Brooks, Cheng-Hao Liu, Andrei Cristian Nica, Maksym Korablyov, Michael Bronstein, Alexander Tong

The computational design of novel protein structures has the potential to impact numerous scientific disciplines greatly. Toward this goal, we introduce FoldFlow, a series of novel generative models of increasing modeling power based on the flow-matching paradigm over $3mathrm{D}$ rigid motions -- i.e. the group $text{SE}(3)$ -- enabling accurate modeling of protein backbones. We first introduce FoldFlow-Base, a simulation-free approach to learning deterministic continuous-time dynamics and matching invariant target distributions on $text{SE}(3)$. We next accelerate training by incorporating Riemannian optimal transport to create FoldFlow-OT, leading to the construction of both more simple and stable flows. Finally, we design FoldFlow-SFM, coupling both Riemannian OT and simulation-free training to learn stochastic continuous-time dynamics over $text{SE}(3)$. Our family of FoldFlow, generative models offers several key advantages over previous approaches to the generative modeling of proteins: they are more stable and faster to train than diffusion-based approaches, and our models enjoy the ability to map any invariant source distribution to any invariant target distribution over $text{SE}(3)$. Empirically, we validate FoldFlow, on protein backbone generation of up to $300$ amino acids leading to high-quality designable, diverse, and novel samples.

Read more

4/12/2024

Sequence-Augmented SE(3)-Flow Matching For Conditional Protein Backbone Generation
Total Score

0

Sequence-Augmented SE(3)-Flow Matching For Conditional Protein Backbone Generation

Guillaume Huguet, James Vuckovic, Kilian Fatras, Eric Thibodeau-Laufer, Pablo Lemos, Riashat Islam, Cheng-Hao Liu, Jarrid Rector-Brooks, Tara Akhound-Sadegh, Michael Bronstein, Alexander Tong, Avishek Joey Bose

Proteins are essential for almost all biological processes and derive their diverse functions from complex 3D structures, which are in turn determined by their amino acid sequences. In this paper, we exploit the rich biological inductive bias of amino acid sequences and introduce FoldFlow-2, a novel sequence-conditioned SE(3)-equivariant flow matching model for protein structure generation. FoldFlow-2 presents substantial new architectural features over the previous FoldFlow family of models including a protein large language model to encode sequence, a new multi-modal fusion trunk that combines structure and sequence representations, and a geometric transformer based decoder. To increase diversity and novelty of generated samples -- crucial for de-novo drug design -- we train FoldFlow-2 at scale on a new dataset that is an order of magnitude larger than PDB datasets of prior works, containing both known proteins in PDB and high-quality synthetic structures achieved through filtering. We further demonstrate the ability to align FoldFlow-2 to arbitrary rewards, e.g. increasing secondary structures diversity, by introducing a Reinforced Finetuning (ReFT) objective. We empirically observe that FoldFlow-2 outperforms previous state-of-the-art protein structure-based generative models, improving over RFDiffusion in terms of unconditional generation across all metrics including designability, diversity, and novelty across all protein lengths, as well as exhibiting generalization on the task of equilibrium conformation sampling. Finally, we demonstrate that a fine-tuned FoldFlow-2 makes progress on challenging conditional design tasks such as designing scaffolds for the VHH nanobody.

Read more

5/31/2024