Can My Microservice Tolerate an Unreliable Database? Resilience Testing with Fault Injection and Visualization

Read original: arXiv:2404.01886 - Published 4/3/2024 by Michael Assad, Christopher Meiklejohn, Heather Miller, Stephan Krusche

🧪

Overview

Ensuring resilience in microservice applications during database or service disruptions is a significant challenge.
While tools exist for resilience testing of service failures, there is a lack of tools specifically designed for resilience testing of database failures.
Researchers have developed an extension for fault injection in database clients, integrated into an existing tool called Filibuster, to address this gap.

Plain English Explanation

Microservice applications are complex systems made up of many interconnected services. When one of these services or the database it relies on experiences an outage or disruption, it can have a cascading effect, causing the entire application to fail. Developers need ways to test and ensure their applications can withstand these types of failures.

The researchers recognized that while there are tools available to test how well a microservice application can handle failures in the services themselves, there was a lack of options specifically for testing database failures. To address this, they created an extension that can be added to the Filibuster tool. Filibuster allows developers to purposely inject faults or failures into their services to see how the application responds.

The new extension the researchers built adds the ability to also simulate failures in the database layer. This means developers can test how their application handles things like the database suddenly becoming unavailable, returning incorrect data, or experiencing other disruptive issues. The tool supports a wide range of database technologies, including both SQL and NoSQL databases.

A key benefit is that the tool can be integrated directly into the development process. Developers can use it during the normal coding and testing phases to get immediate feedback on how resilient their application is to different types of database failures. There is also an IntelliJ IDE plugin that provides a visual way to see the faults being injected and their impacts.

Technical Explanation

The researchers developed an extension to the existing Filibuster fault injection tool to enable comprehensive testing of database failures in microservice applications. Filibuster already provided the ability to simulate service failures, but lacked specific support for database disruptions.

The new extension adds fault injection capabilities for database clients. It allows developers to systematically simulate a wide range of database issues, such as the database becoming unavailable, returning incorrect data, or experiencing other failures. The tool supports both SQL and NoSQL databases, including Redis, Apache Cassandra, CockroachDB, PostgreSQL, and DynamoDB.

A key feature is the integration with the IntelliJ IDE, which provides developers with visual feedback on the injected faults and their impacts on the application. This allows developers to rapidly identify and address resilience issues during the development phase, rather than waiting for problems to surface in production.

The researchers evaluated the tool through a video demonstration showcasing its capabilities in simulating different database failure scenarios and observing the application's response. This illustrates how the tool can be used to comprehensively test the resilience of microservice applications to database disruptions.

Critical Analysis

The paper presents a valuable contribution in addressing a notable gap in resilience testing tools for microservice applications. By expanding the fault injection capabilities of Filibuster to cover database failures, the researchers have provided developers with a much-needed solution for proactively evaluating application resilience.

One potential area for further research could be exploring the integration of the tool with additional IDE platforms beyond IntelliJ, to expand its accessibility to a broader developer audience. Additionally, while the paper highlights the tool's support for a range of database technologies, it may be worthwhile to assess its performance and effectiveness across a larger sample size of real-world database systems and application architectures.

Overall, this work represents a valuable step forward in enabling more comprehensive resilience testing for microservice applications, which is essential given the complexity and distributed nature of these systems. Empowering developers to identify and address database-related failure modes during development can lead to more robust and reliable microservice applications in production.

Conclusion

The researchers have developed a valuable extension to the Filibuster fault injection tool that specifically addresses the challenge of testing database failures in microservice applications. By enabling systematic simulation of a wide range of database disruptions, the tool helps developers proactively evaluate the resilience of their applications during the development phase.

The integration with the IntelliJ IDE provides a user-friendly way for developers to visualize and understand the impacts of injected faults, facilitating more effective identification and resolution of resilience issues. The tool's support for both SQL and NoSQL databases further enhances its versatility and applicability across a broad range of microservice architectures.

Overall, this work represents an important advancement in the field of microservice resilience testing, helping to bridge a critical gap and empower developers to build more robust and reliable distributed applications that can withstand database-related failures.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

🧪

Can My Microservice Tolerate an Unreliable Database? Resilience Testing with Fault Injection and Visualization

Michael Assad, Christopher Meiklejohn, Heather Miller, Stephan Krusche

In microservice applications, ensuring resilience during database or service disruptions constitutes a significant challenge. While several tools address resilience testing for service failures, there is a notable gap in tools specifically designed for resilience testing of database failures. To bridge this gap, we have developed an extension for fault injection in database clients, which we integrated into Filibuster, an existing tool for fault injection in services within microservice applications. Our tool systematically simulates database disruptions, thereby enabling comprehensive testing and evaluation of application resilience. It is versatile, supporting a range of both SQL and NoSQL database systems, such as Redis, Apache Cassandra, CockroachDB, PostgreSQL, and DynamoDB. A defining feature is its integration during the development phase, complemented by an IntelliJ IDE plugin, which offers developers visual feedback on the types, locations, and impacts of injected faults. A video demonstration of the tool's capabilities is accessible at https://youtu.be/bvaUVCy1m1s.

4/3/2024

A Comprehensive Benchmarking Analysis of Fault Recovery in Stream Processing Frameworks

Adriano Vogel, Soren Henning, Esteban Perez-Wohlfeil, Otmar Ertl, Rick Rabiser

Nowadays, several software systems rely on stream processing architectures to deliver scalable performance and handle large volumes of data in near real-time. Stream processing frameworks facilitate scalable computing by distributing the application's execution across multiple machines. Despite performance being extensively studied, the measurement of fault tolerance-a key feature offered by stream processing frameworks-has still not been measured properly with updated and comprehensive testbeds. Moreover, the impact that fault recovery can have on performance is mostly ignored. This paper provides a comprehensive analysis of fault recovery performance, stability, and recovery time in a cloud-native environment with modern open-source frameworks, namely Flink, Kafka Streams, and Spark Structured Streaming. Our benchmarking analysis is inspired by chaos engineering to inject failures. Generally, our results indicate that much has changed compared to previous studies on fault recovery in distributed stream processing. In particular, the results indicate that Flink is the most stable and has one of the best fault recovery. Moreover, Kafka Streams shows performance instabilities after failures, which is due to its current rebalancing strategy that can be suboptimal in terms of load balancing. Spark Structured Streaming shows suitable fault recovery performance and stability, but with higher event latency. Our study intends to (i) help industry practitioners in choosing the most suitable stream processing framework for efficient and reliable executions of data-intensive applications; (ii) support researchers in applying and extending our research method as well as our benchmark; (iii) identify, prevent, and assist in solving potential issues in production deployments.

5/30/2024

FuzzTheREST: An Intelligent Automated Black-box RESTful API Fuzzer

Tiago Dias, Eva Maia, Isabel Prac{c}a

Software's pervasive impact and increasing reliance in the era of digital transformation raise concerns about vulnerabilities, emphasizing the need for software security. Fuzzy testing is a dynamic analysis software testing technique that consists of feeding faulty input data to a System Under Test (SUT) and observing its behavior. Specifically regarding black-box RESTful API testing, recent literature has attempted to automate this technique using heuristics to perform the input search and using the HTTP response status codes for classification. However, most approaches do not keep track of code coverage, which is important to validate the solution. This work introduces a black-box RESTful API fuzzy testing tool that employs Reinforcement Learning (RL) for vulnerability detection. The fuzzer operates via the OpenAPI Specification (OAS) file and a scenarios file, which includes information to communicate with the SUT and the sequences of functionalities to test, respectively. To evaluate its effectiveness, the tool was tested on the Petstore API. The tool found a total of six unique vulnerabilities and achieved 55% code coverage.

7/22/2024

📊

A Micro Architectural Events Aware Real-Time Embedded System Fault Injector

Enrico Magliano, Alessio Carpegna, Alessadro Savino, Stefano Di Carlo

In contemporary times, the increasing complexity of the system poses significant challenges to the reliability, trustworthiness, and security of the SACRES. Key issues include the susceptibility to phenomena such as instantaneous voltage spikes, electromagnetic interference, neutron strikes, and out-of-range temperatures. These factors can induce switch state changes in transistors, resulting in bit-flipping, soft errors, and transient corruption of stored data in memory. The occurrence of soft errors, in turn, may lead to system faults that can propel the system into a hazardous state. Particularly in critical sectors like automotive, avionics, or aerospace, such malfunctions can have real-world implications, potentially causing harm to individuals. This paper introduces a novel fault injector designed to facilitate the monitoring, aggregation, and examination of micro-architectural events. This is achieved by harnessing the microprocessor's PMU and the debugging interface, specifically focusing on ensuring the repeatability of fault injections. The fault injection methodology targets bit-flipping within the memory system, affecting CPU registers and RAM. The outcomes of these fault injections enable a thorough analysis of the impact of soft errors and establish a robust correlation between the identified faults and the essential timing predictability demanded by SACRES.

6/12/2024