Published on 00/00/0000
Last updated on 00/00/0000
Published on 00/00/0000
Last updated on 00/00/0000
Share
Share
PRODUCT
11 min read
Share
Want to know more? Get in touch with us, or delve into the details of the latest release. Or just take a look at some of the Istio features that Backyards automates and simplifies for you, and which we've already blogged about.
Service interruptions caused by outages can have severe business consequences, so it's important that we build, run and test resilient systems. Resiliency can be implemented and tested at multiple levels, from the bottom infrastructure layer all the way to the application. While building our container management platform, Pipeline, implementing that type of comprehensive resiliency was one our key considerations. In this post we'll take a deep-dive into the fault injection feature of Istio (and the Banzai Cloud Istio operator), and how users of our automated service mesh - Backyards (now Cisco Service Mesh Manager) - can use it simply and effectively. Note that Backyards (now Cisco Service Mesh Manager), while being integrated into Pipeline, is also available as a standalone product: and features a practical, easy-to-use management UI, CLI and GraphQL API built on top of our Istio operator.
Some of the related Backyards features we have already blogged about are:
In this post, we'll be focusing on Istio's fault injection feature.
The resiliency of a system is derived from the resiliency of its parts: that every part of a system is able to handle a certain number of errors or faults. Whether subsequent service unavailability, network latency or data availability issues, distributed systems are full of implicit non-functional requirements for the correspondent handling of errors. Fault injection is a system testing method which involves the deliberate introduction of faults and errors into a system. It can be used to identify design or configuration weaknesses and to ensure that the system is able the handle faults and recover from error conditions. Faults can be introduced with compile-time injection
(modifying the source code of the software) or with runtime injection
, in which software triggers cause faults during specific scenarios.
To protect a system from cascading failures caused by slow response or failing services, it's good practice to use circuit breakers.
With Istio, failures can be injected at the application layer to test the resiliency of the services. You can configure faults to be injected into requests that match specific conditions to simulate service failures and higher latency between services. Fault injection is part of Istio's routing configuration and can be set in the fault
field under an HTTP route of the VirtualService
Istio custom resource. Faults include aborting HTTP requests from a downstream service, and/or delaying the proxying of requests. A fault rule must have
either a delay or abort (or both). Delay can delay requests before forwarding, emulating various failures such as network issues, an overloaded upstream service, etc. Abort can abort HTTP request attempts and return error codes to a downstream service, giving the impression that the upstream service is faulty.
Delay and abort faults are independent of one another, even if both are set to occur simultaneously.
Let's take a look at an example VirtualService
:
apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
name: reviews-route
spec:
hosts:
- reviews.prod.svc.cluster.local
http:
- match:
- sourceLabels:
env: prod
route:
- destination:
host: reviews.prod.svc.cluster.local
subset: v1
fault:
abort:
percentage:
value: 10
httpStatus: 503
delay:
percentage:
value: 40
fixedDelay: 5s
When this service is called, 10% of the calls will return 503 responses and 40% will experience a five second delay before they send a response.
Under the hood this feature uses Envoy's fault injection feature.
Backyards provides a simple and intuitive way to configure routing within a service mesh, and part of that feature (among many others) is its ability to set fault injection settings. When using Backyards, you don't need to manually edit the VirtualService
resource to modify fault injection configurations. Instead, you can achieve the same result via a convenient UI, or, if you prefer, through the Backyards CLI command line tool.
The above is just one example of Backyards' HTTP routing features. There are lots more!
On top of this, you can see visualizations of, and live dashboards for, your services and requests, so it's easy for you to tell what's going on.
First, we'll need a Kubernetes cluster.
I created a Kubernetes cluster on GKE via the free developer version of the Pipeline platform. If you'd like to do likewise, go ahead and create your cluster on any of the five cloud providers we support, or on-premise, using Pipeline. Otherwise bring your own Kubernetes cluster.
By far the easiest way of installing Istio, Backyards, and a demo application on a brand new cluster is to use the Backyards CLI. You just need to issue one command (Note, KUBECONFIG
must be set for your cluster):
❯ backyards install -a --run-demo
This command first installs Istio with our open-source Istio operator, then installs Backyards itself, as well as a demo application for demonstration purposes. After the installation of each component has finished, the Backyards UI will automatically open and send some traffic to the demo application. By issuing this one simple command you can watch Backyards start a brand new Istio cluster in just a few minutes! Give it a try!
You can do all these steps in a sequential order, as well. Backyards requires an Istio cluster - if you don't have one, you can install Istio with
backyards istio install
. Once you have Istio installed, you can install Backyards withbackyards install
. Finally, you can deploy the demo application withbackyards demoapp install
. Tip: Backyards is a core component of the Pipeline platform. Try the hosted developer version, here: https://try.pipeline.banzai.cloud/ (Service Mesh tab).
The demo application contains several microservice deploments to be able to show and try the various features of the Backyards product. To test how the system behaves.
Introduce an HTTP abort fault to the payments service.
❯ backyards routing fault-injection set backyards-demo/payments -m any
? Percentage of requests on which the delay will be injected 0
? Add a fixed delay before forwarding the request. Format: 1h/1m/1s/1ms. MUST be >1ms. 5s
? Percentage of requests on which the abort will be injected 100
? HTTP status code to use to abort the HTTP request 503
INFO[0016] fault injection for backyards-demo/payments set successfully
Fault injection settings for backyards-demo/payments
Matches Delay percentage Fixed delay Abort percentage Abort http status code
any - - 100 503
Send a load to the demo application with the following command:
❯ backyards demoapp load
As shown below, payments will behave erroneously and start throwing 503 errors. Remove the 503 abort injection by running the following command, and the payments service starts behaving correctly.
❯ backyards routing fault-injection delete backyards-demo/payments -m any
Fault injection settings for backyards-demo/payments
Matches Delay percentage Fixed delay Abort percentage Abort http status code
any - - 100 503
? Do you want to DELETE the fault injection? Yes
INFO[0005] fault injection set to backyards-demo/payments successfully deleted
The most insidious of distributed computing faults is not a "down" service but a service that responds slowly, potentially causing a cascading failure across a network of services. The normal latency of the system is pretty low as it can be seen on the UI: Now inject a 5 seconds delay towards the payments service:
❯ backyards routing fault-injection set backyards-demo/payments -m any
? Percentage of requests on which the delay will be injected 100
? Add a fixed delay before forwarding the request. Format: 1h/1m/1s/1ms. MUST be >1ms. 5s
? Percentage of requests on which the abort will be injected 0
? HTTP status code to use to abort the HTTP request 503
INFO[0007] fault injection for backyards-demo/payments set successfully
Fault injection settings for backyards-demo/payments
Matches Delay percentage Fixed delay Abort percentage Abort http status code
any 100 4s 0 503
As you can see the injected delay propagates throughout the whole system.
To protect the system from cascading failures caused by slowly responding or failing services, it is also a good practice to use circuit breakers.
Besides fault injections, Istio also provides failure recovery features that you can also configure dynamically at runtime. Using these features helps your applications operate reliably, ensuring that the service mesh can tolerate failing services and preventing localized failures from propagating to other services. Similarly to fault injection settings, the retry policy
and timeout
in Istio also can be set in a VirtualService
resource.
This setting describes the retry policy that's used when an HTTP request fails. For example, the following rule sets the maximum number of retries to three when calling ratings:v1 service, with a 2s timeout per retry attempt.
apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
name: payments
spec:
hosts:
- payments
http:
- route:
- destination:
host: payments
retries:
attempts: 3
perTryTimeout: 2s
timeout: 10s
The configuration above specifies a 10 second timeout for calls to payments
service and also configures a maximum of 3 retries to connect to this service after an initial call failure, each with a 2 second timeout.
Under the hood this feature uses the automatic retries feature of Envoy.
A retry setting specifies the maximum number of times an Envoy proxy attempts to connect to a service if the initial call fails. Retries can enhance service availability and application performance by making sure that calls don't fail permanently because of transient problems such as a temporarily overloaded service or network. The interval between retries (25ms+) is variable and determined automatically by Istio, preventing the called service from being overwhelmed with requests. By default, the Envoy proxy doesn't attempt to reconnect to services after a first failure.
❯ backyards routing route set backyards-demo/bookings -m any --retry-on 5xx --retry-attempts 5
INFO[0001] routing for backyards-demo/bookings set successfully
Settings for backyards-demo/bookings
Matches Routes Redirect Timeout Retry
any 100% bookings - - 5x (2s ptt) on 5xx
A timeout is the amount of time that an Envoy proxy should wait for replies from a given service, ensuring that services don't hang around waiting for replies indefinitely and that calls succeed or fail within a predictable timeframe. The default timeout for HTTP requests is 15 seconds, which means that if the service doesn't respond within 15 seconds, the call fails. The following commands sets timeout towards the payments
service to 5 seconds:
❯ backyards routing route set backyards-demo/bookings -m any -t 5s
INFO[0002] routing for backyards-demo/bookings set successfully
Settings for backyards-demo/bookings
Matches Routes Redirect Timeout Retry
any 100% bookings - 5s -
To remove the demo application, Backyards, and Istio from your cluster, you only need to issue one command, which removes each component in the correct order:
❯ backyards uninstall -a
With Backyards, you don't necessarily need to be familiar with Istio's Custom Resources, and don't have to edit them manually to set fault injection rules, retry policies or timeouts. Instead, you can easily configure these rules from a convenient UI or with the Backyards CLI command line tool. You can then check the visualized traffic flow to make sure that the rules and your services are working as expected.
Banzai Cloud’s Backyards (now Cisco Service Mesh Manager) is a multi and hybrid-cloud enabled service mesh platform for constructing modern applications. Built on Kubernetes and our Istio operator, it gives you flexibility, portability, and consistency across on-premise datacenters and cloud environments. Use our simple, yet extremely powerful UI and CLI, and experience automated canary releases, traffic shifting, routing, secure service communication, in-depth observability and more, for yourself.
Banzai Cloud is changing how private clouds are built: simplifying the development, deployment, and scaling of complex applications, and putting the power of Kubernetes and Cloud Native technologies in the hands of developers and enterprises, everywhere. #multicloud #hybridcloud #BanzaiCloud
Get emerging insights on innovative technology straight to your inbox.
Discover how AI assistants can revolutionize your business, from automating routine tasks and improving employee productivity to delivering personalized customer experiences and bridging the AI skills gap.
The Shift is Outshift’s exclusive newsletter.
The latest news and updates on generative AI, quantum computing, and other groundbreaking innovations shaping the future of technology.