Motorway: Seamless high speed BFT

2401.10369

Published 5/13/2024 by Neil Giridharan, Florian Suri-Payer, Ittai Abraham, Lorenzo Alvisi, Natacha Crooks

🏋️

Abstract

Today's practical, high performance Byzantine Fault Tolerant (BFT) consensus protocols operate in the partial synchrony model. However, existing protocols are inefficient when deployments are indeed partially synchronous. They deliver either low latency during fault-free, synchronous periods (good intervals) or robust recovery from events that interrupt progress (blips). At one end, traditional, view-based BFT protocols optimize for latency during good intervals, but, when blips occur, can suffer from performance degradation (hangovers) that can last beyond the return of a good interval. At the other end, modern DAG-based BFT protocols recover more gracefully from blips, but exhibit lackluster latency during good intervals. To close the gap, this work presents Motorway, a novel high-throughput BFT protocol that offers both low latency and seamless recovery from blips. By combining a highly parallel asynchronous data dissemination layer with a low-latency, partially synchronous consensus mechanism, Motorway (i) avoids the hangovers incurred by traditional BFT protocols and (ii) matches the throughput of state of the art DAG-based BFT protocols while cutting their latency in half, matching the latency of traditional BFT protocols.

Create account to get full access

Overview

Existing Byzantine Fault Tolerant (BFT) consensus protocols struggle to balance low latency during normal operation with robust recovery from disruptions
Traditional view-based BFT protocols optimize for low latency but suffer "hangovers" after disruptions
Modern DAG-based BFT protocols recover better from disruptions but have higher latency during normal operation
This work presents Motorway, a novel BFT protocol that aims to provide both low latency and seamless recovery from disruptions

Plain English Explanation

Motorway is a new way for computers to agree on things, even when some of them are behaving badly (Byzantine Fault Tolerance). Existing approaches have tradeoffs - some are fast during normal operation, but struggle to recover when something goes wrong. Others handle disruptions better, but are slower overall.

Motorway tries to get the best of both worlds. It uses a two-part system: an asynchronous data sharing layer to spread information quickly, combined with a low-latency consensus mechanism to finalize decisions. This allows Motorway to match the throughput of state-of-the-art DAG-based BFT protocols while cutting their latency in half, matching the latency of traditional BFT protocols.

Importantly, Motorway also avoids the "hangovers" that traditional BFT protocols can suffer after disruptions, recovering seamlessly instead. This makes it a practical, high-performance solution for real-world applications that need to keep running even when things go wrong.

Technical Explanation

Motorway combines a highly parallel asynchronous data dissemination layer with a low-latency, partially synchronous consensus mechanism. This allows it to:

Avoid the "hangovers" that traditional, view-based BFT protocols can suffer after disruptions (e.g. Probabilistic Byzantine Fault Tolerance)
Match the throughput of state-of-the-art DAG-based BFT protocols while cutting their latency in half, reaching the latency of traditional BFT protocols

The asynchronous data dissemination layer spreads transactions quickly in parallel, while the partially synchronous consensus mechanism finalized them with low latency. This combination allows Motorway to be both fast and resilient, in contrast to existing approaches that prioritize one or the other.

Critical Analysis

The paper does not go into extensive detail on the specific algorithms and implementation of Motorway. While the high-level ideas are compelling, more technical information would be needed to fully evaluate the protocol.

Additionally, the performance claims are based on analytical modeling and simulations, not real-world deployments. Practical considerations around network dynamics, node heterogeneity, and other deployment factors could impact Motorway's behavior in production environments.

Further research and experimentation would be needed to validate Motorway's capabilities and understand its limitations. Nonetheless, the core concept of combining asynchronous and synchronous elements is an interesting approach that could inform the development of future BFT protocols.

Conclusion

Motorway represents a novel attempt to bridge the gap between low-latency BFT protocols and those that can better withstand disruptions. By combining asynchronous and partially synchronous components, it aims to deliver both high performance and robust recovery.

While more research is needed to fully evaluate Motorway, the underlying ideas offer a promising direction for practical, high-performance Byzantine fault tolerance. As the need for reliable, decentralized systems continues to grow, innovations like Motorway could play an important role in making that vision a reality.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Related Papers

Shoal++: High Throughput DAG BFT Can Be Fast!

Balaji Arun, Zekun Li, Florian Suri-Payer, Sourav Das, Alexander Spiegelman

Today's practical partially synchronous Byzantine Fault Tolerant (BFT) consensus protocols trade off low latency and high throughput. On the one end, traditional BFT protocols such as PBFT and its derivatives optimize for latency. They require, in fault-free executions, only 3 message exchanges to commit, the optimum for BFT consensus. However, this class of protocols typically relies on a single leader, hampering throughput scalability. On the other end, a new class of so-called DAG-BFT protocols demonstrates how to achieve highly scalable throughput by separating data dissemination from consensus, and using every replica as proposer. Unfortunately, existing DAG-BFT protocols pay a steep latency premium, requiring on average 10.5 message exchanges to commit a transactions. This work aims to soften this tension and proposes Shoal++, a novel DAG-based BFT consensus system that offers the throughput of DAGs while reducing commit latency to an average of 4.5 message exchanges. Our empirical findings are encouraging, showing that Shoal++ achieves throughput comparable to state-of-the-art DAG BFT solutions while reducing latency by up to 60%.

6/3/2024

cs.DC

Probabilistic Byzantine Fault Tolerance (Extended Version)

Diogo Avel~as, Hasan Heydari, Eduardo Alchieri, Tobias Distler, Alysson Bessani

Consensus is a fundamental building block for constructing reliable and fault-tolerant distributed services. Many Byzantine fault-tolerant consensus protocols designed for partially synchronous systems adopt a pessimistic approach when dealing with adversaries, ensuring safety in a deterministic way even under the worst-case scenarios that adversaries can create. Following this approach typically results in either an increase in the message complexity (e.g., PBFT) or an increase in the number of communication steps (e.g., HotStuff). In practice, however, adversaries are not as powerful as the ones assumed by these protocols. Furthermore, it might suffice to ensure safety and liveness properties with high probability. In order to accommodate more realistic and optimistic adversaries and improve the scalability of the BFT consensus, we propose ProBFT (Probabilistic Byzantine Fault Tolerance). ProBFT is a leader-based probabilistic consensus protocol with a message complexity of $O(nsqrt{n})$ and an optimal number of communication steps that tolerates Byzantine faults in permissioned partially synchronous systems. It is built on top of well-known primitives, such as probabilistic Byzantine quorums and verifiable random functions. ProBFT guarantees safety and liveness with high probabilities even with faulty leaders, as long as a supermajority of replicas is correct, and using only a fraction of messages employed in PBFT (e.g., $20%$). We provide a detailed description of ProBFT's protocol and its analysis.

6/12/2024

cs.DC

BBCA-CHAIN: Low Latency, High Throughput BFT Consensus on a DAG

Dahlia Malkhi, Chrysoula Stathakopoulou, Maofan Yin

This paper presents a partially synchronous BFT consensus protocol powered by BBCA, a lightly modified Byzantine Consistent Broadcast (BCB) primitive. BBCA provides a Complete-Adopt semantic through an added probing interface to allow either aborting the broadcast by correct nodes or exclusively, adopting the message consistently in case of a potential delivery. It does not introduce any extra types of messages or additional communication costs to BCB. BBCA is harnessed into BBCA-CHAIN to make direct commits on a chained backbone of a causally ordered graph of blocks, without any additional voting blocks or artificial layering. With the help of Complete-Adopt, the additional knowledge gained from the underlying BCB completely removes the voting latency in popular DAG-based protocols. At the same time, causal ordering allows nodes to propose blocks in parallel and achieve high throughput. BBCA-CHAIN thus closes up the gap between protocols built by consistent broadcasts (e.g., Bullshark) to those without such an abstraction (e.g., PBFT/HotStuff), emphasizing their shared fundamental principles. Using a Bracha-style BCB as an example, we fully specify BBCA-CHAIN with simplicity, serving as a solid basis for high-performance replication systems (and blockchains).

5/28/2024

cs.DC

Moonshot: Optimizing Chain-Based Rotating Leader BFT via Optimistic Proposals

Isaac Doidge, Raghavendra Ramesh, Nibesh Shrestha, Joshua Tobkin

Existing chain-based rotating-leader BFT SMR protocols for the partially synchronous network model with constant commit latencies incur block periods of at least $2delta$ (where $delta$ is the message transmission latency). While a protocol with a block period of $delta$ exists under the synchronous model, its commit latency is linear in the size of the system. To close this gap, we present the first chain-based BFT SMR protocols with $delta$ delay between the proposals of consecutive honest leaders and commit latencies of $3delta$. We present three protocols for the partially synchronous model under different notions of optimistic responsiveness, two of which implement pipelining. All of our protocols achieve reorg resilience and two have short view lengths; properties that many existing chain-based BFT SMR protocols lack. We present an evaluation of our protocols in a wide-area network wherein they demonstrate significant increases in throughput and reductions in latency compared to the state-of-the-art, Jolteon. Our results also demonstrate that techniques commonly employed to reduce communication complexity$unicode{x2014}$such as vote-pipelining and the use of designated vote-aggregators$unicode{x2014}$actually reduce practical performance in many settings.

4/22/2024

cs.DC cs.NI