Synchronous Consensus in Partial Synchrony

2312.12677

Published 5/16/2024 by Ivan Klianev

Synchronous Consensus in Partial Synchrony

Abstract

We demonstrate a deterministic Byzantine consensus algorithm with synchronous operation in partial synchrony. It is naturally leaderless, tolerates any number of $ f<n/2 $ Byzantine processes with 2 rounds of exchange of originator-only signed messages, and terminates within a bounded interval of time. The algorithm is resilient to transient faults and asynchrony in a fraction of links with known size per number of faulty processes. It circumvents asynchronous and faulty links with 3-hop epidemic dissemination. Key finding: the resilience to asynchrony of links and the enabled by it leaderless consensus in partial synchrony ensure algorithm operation with simultaneous validity, safety, and bounded liveness.

Create account to get full access

Overview

This paper explores the problem of reaching consensus in a distributed system with partial synchrony, where the timing of message delivery is not guaranteed.
The authors propose a new algorithm that can achieve synchronous consensus in the presence of Byzantine faults, where some nodes in the network may behave maliciously.
The algorithm provides improved upper bounds on the number of tolerable faults compared to previous approaches, making it more practical for real-world applications.

Plain English Explanation

Reaching agreement, or consensus, is a fundamental challenge in distributed systems, where multiple computers need to work together. This is especially difficult when some of the computers may be malfunctioning or even trying to sabotage the process.

The authors of this paper focus on a setting called "partial synchrony," where the timing of message delivery between computers is not completely predictable. This is a more realistic model than the traditional "synchronous" setting, where messages are guaranteed to arrive within a fixed time.

The authors develop a new algorithm that can still achieve consensus even when some of the computers are behaving badly (the "Byzantine" setting). Importantly, their algorithm is able to tolerate more faulty computers than previous approaches, making it more practical for real-world use.

To explain this in more concrete terms, imagine a group of people trying to decide on a dinner plan. In a synchronous system, everyone would respond instantly, and the group could easily reach a consensus. But in the real world, people might take different amounts of time to respond, and some might even try to derail the process. The authors' algorithm provides a way for the group to still come to a decision, even with these challenges.

Technical Explanation

The authors build on prior work on consensus in partial synchrony and probabilistic Byzantine fault tolerance. They propose a new synchronous consensus algorithm, called SynCon, that can tolerate a greater number of Byzantine faults than previous approaches.

SynCon works by having nodes in the network exchange messages in a series of rounds. In each round, nodes try to reach agreement on a common value. The key innovation is a mechanism that allows nodes to certify that they have received enough messages from other honest nodes to reliably determine the common value, even in the presence of Byzantine faults.

The authors prove that SynCon can tolerate up to f Byzantine faults, where f < n/3, and n is the total number of nodes. This improves upon the f < n/5 bound of previous algorithms, as shown in the Fault-Tolerant Consensus in Anonymous Dynamic Networks paper.

Furthermore, the authors show that SynCon can be combined with techniques from Asymmetric Distributed Trust to achieve consensus even in the presence of adaptive omissions, where faulty nodes can selectively omit messages.

Critical Analysis

The authors present a thorough theoretical analysis of their SynCon algorithm, proving its correctness and upper bounds on the number of tolerable faults. However, the paper does not include any experimental evaluation, which would be helpful to understand the algorithm's practical performance and scalability.

Additionally, the paper does not discuss the potential limitations or edge cases of the SynCon algorithm. For example, it would be useful to know how the algorithm behaves when the proportion of Byzantine faults approaches the theoretical upper bound, or how it handles network partitions or other realistic failure scenarios.

Finally, while the authors compare their results to previous work, they do not discuss the broader implications of their findings or potential future research directions. Exploring how SynCon could be adapted or combined with other techniques to address additional challenges in distributed systems would strengthen the paper's contribution.

Conclusion

This paper presents a new synchronous consensus algorithm, SynCon, that can tolerate a greater number of Byzantine faults compared to previous approaches in a partially synchronous setting. The authors provide a rigorous theoretical analysis, demonstrating how SynCon improves upon the state of the art.

The algorithm's ability to achieve consensus in the presence of faulty nodes has important practical applications, such as in the design of robust and secure distributed systems. While the paper lacks experimental validation and does not fully explore the algorithm's limitations, it represents a significant advancement in the field of fault-tolerant distributed computing.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Related Papers

📈

Partial Synchrony for Free? New Upper Bounds for Byzantine Agreement

Pierre Civit, Muhammad Ayaz Dzulfikar, Seth Gilbert, Rachid Guerraoui, Jovan Komatovic, Manuel Vidigueira, Igor Zablotchi

Byzantine agreement allows n processes to decide on a common value, in spite of arbitrary failures. The seminal Dolev-Reischuk bound states that any deterministic solution to Byzantine agreement exchanges Omega(n^2) bits. In synchronous networks, solutions with optimal O(n^2) bit complexity, optimal fault tolerance, and no cryptography have been established for over three decades. However, these solutions lack robustness under adverse network conditions. Therefore, research has increasingly focused on Byzantine agreement for partially synchronous networks. Numerous solutions have been proposed for the partially synchronous setting. However, these solutions are notoriously hard to prove correct, and the most efficient cryptography-free algorithms still require O(n^3) exchanged bits in the worst case. In this paper, we introduce Oper, the first generic transformation of deterministic Byzantine agreement algorithms from synchrony to partial synchrony. Oper requires no cryptography, is optimally resilient (n >= 3t+1, where t is the maximum number of failures), and preserves the worst-case per-process bit complexity of the transformed synchronous algorithm. Leveraging Oper, we present the first partially synchronous Byzantine agreement algorithm that (1) achieves optimal O(n^2) bit complexity, (2) requires no cryptography, and (3) is optimally resilient (n >= 3t+1), thus showing that the Dolev-Reischuk bound is tight even in partial synchrony. Moreover, we adapt Oper for long values and obtain several new partially synchronous algorithms with improved complexity and weaker (or completely absent) cryptographic assumptions.

4/8/2024

cs.DC

🌐

Fault-tolerant Consensus in Anonymous Dynamic Network

Qinzi Zhang, Lewis Tseng

This paper studies the feasibility of reaching consensus in an anonymous dynamic network. In our model, $n$ anonymous nodes proceed in synchronous rounds. We adopt a hybrid fault model in which up to $f$ nodes may suffer crash or Byzantine faults, and the dynamic message adversary chooses a communication graph for each round. We introduce a stability property of the dynamic network -- $(T,D)$-dynaDegree for $T geq 1$ and $n-1 geq D geq 1$ -- which requires that for every $T$ consecutive rounds, any fault-free node must have incoming directed links from at least $D$ distinct neighbors. These links might occur in different rounds during a $T$-round interval. $(1,n-1)$-dynaDegree means that the graph is a complete graph in every round. $(1,1)$-dynaDegree means that each node has at least one incoming neighbor in every round, but the set of incoming neighbor(s) at each node may change arbitrarily between rounds. We show that exact consensus is impossible even with $(1,n-2)$-dynaDegree. For an arbitrary $T$, we show that for crash-tolerant approximate consensus, $(T,lfloor n/2 rfloor)$-dynaDegree and $n > 2f$ are together necessary and sufficient, whereas for Byzantine approximate consensus, $(T,lfloor (n+3f)/2 rfloor)$-dynaDegree and $n > 5f$ are together necessary and sufficient.

5/7/2024

cs.DC

Probabilistic Byzantine Fault Tolerance (Extended Version)

Diogo Avel~as, Hasan Heydari, Eduardo Alchieri, Tobias Distler, Alysson Bessani

Consensus is a fundamental building block for constructing reliable and fault-tolerant distributed services. Many Byzantine fault-tolerant consensus protocols designed for partially synchronous systems adopt a pessimistic approach when dealing with adversaries, ensuring safety in a deterministic way even under the worst-case scenarios that adversaries can create. Following this approach typically results in either an increase in the message complexity (e.g., PBFT) or an increase in the number of communication steps (e.g., HotStuff). In practice, however, adversaries are not as powerful as the ones assumed by these protocols. Furthermore, it might suffice to ensure safety and liveness properties with high probability. In order to accommodate more realistic and optimistic adversaries and improve the scalability of the BFT consensus, we propose ProBFT (Probabilistic Byzantine Fault Tolerance). ProBFT is a leader-based probabilistic consensus protocol with a message complexity of $O(nsqrt{n})$ and an optimal number of communication steps that tolerates Byzantine faults in permissioned partially synchronous systems. It is built on top of well-known primitives, such as probabilistic Byzantine quorums and verifiable random functions. ProBFT guarantees safety and liveness with high probabilities even with faulty leaders, as long as a supermajority of replicas is correct, and using only a fraction of messages employed in PBFT (e.g., $20%$). We provide a detailed description of ProBFT's protocol and its analysis.

6/12/2024

cs.DC

🤿

Asymmetric Distributed Trust

Orestis Alpos, Christian Cachin, Bjorn Tackmann, Luca Zanolini

Quorum systems are a key abstraction in distributed fault-tolerant computing for capturing trust assumptions. They can be found at the core of many algorithms for implementing reliable broadcasts, shared memory, consensus and other problems. This paper introduces asymmetric Byzantine quorum systems that model subjective trust. Every process is free to choose which combinations of other processes it trusts and which ones it considers faulty. Asymmetric quorum systems strictly generalize standard Byzantine quorum systems, which have only one global trust assumption for all processes. This work also presents protocols that implement abstractions of shared memory, broadcast primitives, and a consensus protocol among processes prone to Byzantine faults and asymmetric trust. The model and protocols pave the way for realizing more elaborate algorithms with asymmetric trust.

5/3/2024

cs.DC cs.CR