Microservices architecture has a lot of moving components, and it is distributed in nature. Distributed applications communicate over networks, which are inherently unreliable. Failures are inevitable when your services communicate with external components like databases, APIs, and remote services. No matter how well you design your cloud native microservices architecture, it is bound to fail at some point—making resiliency a critical feature to keep in mind.
This article looks at architectural patterns for designing resilient data-intensive applications. We will discuss circuit breakers, rate limiting, throttling, retry, and timeout patterns, all of which can work together to ensure high availability and the stability of your microservices architecture. We will also examine the circuit breaker design pattern in order to understand its benefits, use cases, and implementation.
Failures in Distributed Systems
One of the
Eight Fallacies of Distributed Computing states that the network is homogenous. While a network of systems using similar configurations and the same communication protocol can be defined as homogenous, when you are integrating your application with internal and external services via the network, it is very likely that the network will be heterogeneous.
Your systems can experience hardware failures, network failures, and application issues. When it comes to architecting distributed systems, a key design principle is ensuring that your system can recover from these failures.
Transient vs. Non-Transient Errors
Transient errors are temporary in nature. Since they are typically fixed automatically after the connection is retried, they don’t tend to require any application-level changes. These self-correcting faults include database connectivity, network issues, and temporary unavailability of services.
Non-Transient errors continue to occur until their root causes are addressed. If they do not get fixed after multiple attempts, application code changes are necessary.
Design Patterns for High Availability and Resiliency
Discussed below are five design patterns beneficial for building a reliable and resilient microservices architecture: the retry pattern, the timeout pattern, throttling, message queues, and the use of a circuit breaker.
Retry Patterns
The retry pattern is the preferred solution for handling transient faults. If there is a temporary glitch, you can leverage this pattern to automatically retry the connection in an attempt to keep your service responsive. Remember that you should not retry indefinitely. You should have a retry strategy defined that limits you to a fixed number of attempts or an attempt after a fixed duration of time.
Timeout Patterns
The timeout pattern provides an upper limit to the latency introduced by external components. By setting a default timeout period in your application, you eliminate the scenario where there is no response received from an underlying service for an indefinite period of time. Instead, you throw an exception message back to the application to ensure that system resources are used properly.
Throttling
If your cloud-based service receives a sudden increase in user traffic, you can leverage auto-scaling policies to automatically spin up multiple instances of your service and cater to the request spike. However, you don’t want to completely exhaust your system resources to accommodate an unpredictable increase in traffic. Throttling ensures that API consumers are being good neighbors by not exceeding their allowed usage and adversely impacting other consumers.
Message Queues
Message queues foster better performance by helping to reduce tight coupling and dependencies between components. Instead of having services directly call each other, you can leverage queues to asynchronously communicate in a “fire-and-forget” manner. This increases system stability and resiliency when there are issues with dependent services. When the number of messages in the queue is high, you can scale the dependent services to meet the request volume. And, in case the dependent service is not available, you can hold off on processing until the service is up. This ensures that none of the messages are lost and that they are processed at a later point in time. This functionality is critical to having a successful microservices architecture.
Circuit breakers
The circuit breaker design pattern can improve the stability and resiliency of your applications. For non-transient errors, you can implement a circuit breaker to safeguard underlying services which might be experiencing performance issues or failures. The idea is for them to fail fast and not consume system resources waiting for the degraded service to respond. You also need to provide time for your dependent service to recover and behave normally before you route requests to it again. The remainder of this post will be dedicated to exploring the intricacies of the circuit breaker pattern.
The Advantages of Using Circuit Breakers
Circuit breakers offer a number of advantages, including the following:
- They help to safeguard the system from cascading failures by ensuring that one failing component does not bring down the entire system.
- They ensure a high application uptime by limiting the impact of latency, failures, and other non-transient errors.
- They help to automatically degrade application functionality when the system is under load by implementing a fallback mechanism.
- They shield the downstream system by limiting traffic and only sending a request volume that can be successfully managed.
The Circuit Breaker Pattern in Microservices Architecture
The circuit breaker is a common design pattern that can ensure the resilience, responsiveness and fault tolerance of your microservices architecture. One of the ways it does this is by safeguarding your system from cascading failures. When you have a number of dependent services, failure in one component might have a wider impact on a number of components, as seen in the image below.
Figure 1: Cascading Failures in Microservice Architecture
Implementing the Circuit Breaker Pattern
Circuit breaker functionality uses the following three states:
- Closed - In this state, all connection requests are allowed, and service communication is intact. During normal processing, the circuit breaker is in a closed state. However, if set failure thresholds are exceeded, the state changes to “open.”
- Open - In this state, all connection requests are blocked to allow the recovering service to be flooded with requests.
- Half-open - In this state, a small number of connections are allowed to pass through at regular intervals to test the service’s availability. If the requests are successful, then the circuit breaker assumes that the service issue has been resolved and switches it to the closed state. If the requests are not successful, then the circuit breaker assumes that the service issue still exists and switches back to the open state.
[caption id="attachment_1693" align="aligncenter" width="500"]
Figure 2: Circuit Breaker States[/caption]
Implementation of the
circuit breaker design pattern is simple: you need to wrap the external service calls in a circuit breaker object which monitors the target service for failures. When there is an issue with dependent services, the circuit breaker trips. Requests stop being routed to the failing service until it has recovered. At a periodic interval, a limited number of requests are sent to the target service as a health check. If the request succeeds, the circuit breaker transitions from an open state to a closed state, and the traffic flow resumes.
Third Party Libraries for Circuit Breaker Implementation
There are a number of third party libraries designed to help you implement circuit breaker functionality in your applications. Polly, Netflix Hystrix, and Istio Circuit Breaker are three of the most popular ones available.
Polly
Polly is a .NET library that allows developers to implement design patterns like retry, timeout, circuit breaker, and fallback to ensure better resilience and fault tolerance.
The code snippet below will create a circuit breaker policy which will break when five consecutive exceptions of the
HttpRequestException type are thrown. It will remain in an open state for 30 seconds.
// Define Circuit Breaker policy to break after 5 consecutive failures for a duration of 30sec
Policy
.Handle<HttpRequestException>()
.CircuitBreaker(5, TimeSpan.FromSeconds(30));
Netflix Hystrix
Hystrix is a latency and fault tolerant library developed by Netflix for making systems resilient and avoiding cascading failures.
The Hystrix command can be used to wrap any call that has a remote dependency (databases, microservices, third party APIs) over the network and create a method for circuit breaker implementation. A fallback method, which is invoked if the actual call fails, can also be added. Such an implementation stops cascading failures in your system and gives you an ability to fail fast and have a graceful degradation strategy.
// Configure @HystrixProperty and @HystrixCommand
@HystrixCommand(fallbackMethod = "Epsagon_welcomeFallback", commandProperties = {
@HystrixProperty(name = "execution.isolation.thread.timeoutInMilliseconds", value = "2000")
})
public String welcome() throws InterruptedException {
Thread.sleep(500);
return "Welcome Epsagon";
}
// Configure a fallback implementation
private String Epsagon_welcomeFallback() {
return "Request has failed. Please retry after some time";
}
As shown above, you can leverage Hystrix to implement a circuit breaker pattern which safeguards against cascading failures and provides fallback behavior for possibly failing calls.
Istio Circuit Breaker
In a service mesh architecture,
Istio helps implement fault tolerance in your applications without the need for any code changes.
You can also use an open-source proxy called
Envoy, which provides a number of failure recovery features. Implement circuit breaker capability by placing limits on the number of concurrent connections and requests to upstream services so that systems are not overwhelmed with a large number of requests.
apiVersion: networking.istio.io/v1alpha3
kind: DestinationRule
metadata:
name: OrderService
spec:
host: OrderService
trafficPolicy:
connectionPool:
http:
http1MaxPendingRequests: 10
maxRequestsPerConnection: 1
tcp:
maxConnections: 1
Figure 3: Circuit Breaker in Service Mesh Architecture
Conclusion
The circuit breaker design pattern allows you to gracefully handle failures in your microservices architecture and ensures that no cascading failures occur. This article explored how to use a combination of design principles to handle transient and non-transient errors in a microservices architecture. These design principles enhance resiliency and availability while also ensuring a good user experience.
Check out Epsagon for
FREE!