8 min read

Blog thumbnail
Published on 12/15/2019
Last updated on 06/18/2024

Envoy protocol filter for Kafka, meshed


A while ago we published some benchmarks and sizing about our experience of running Apache Kafka over a service mesh with Koperator and Istio operator, orchestrated by our automated and operationalized service mesh, Backyards (now Cisco Service Mesh Manager). The reasons for such a setup were many, and there are more details in the Running Apache Kafka over Istio - benchmark post, but let me recap some of our initial reasons, and how we evolved from there.
  • Running Kafka over Istio does not add performance overhead (quite the opposite in case of mTLS)
  • Out of the box support for multiple network topologies
  • Resilience to network failures
  • Observability and metrics based alerts and decisions
While these were already good enough reasons, things changed quite fast since we published the benchmarks. The Envoy community has merged the Kafka protocol 2.0 codec, so instead of treating Kafka traffic as TCP, Envoy can now understand Kafka semantics at the protocol level. While this PR was essential, some other important parts of the puzzle were still missing, like Envoy's Kafka protocol filter.


  • The Envoy community and adamkotwasinski has been working on the Kafka protocol filter for Envoy
  • The filter is almost ready (in Adam's fork) and now you can bring it on a test ride
  • We built a custom Envoy version with the filter included
  • We automated the Kafka setup on Istio, including the custom Envoy version
  • Would you like to run Apache Kafka over Istio the easy way - try Supertubes.
Check out Supertubes in action on your own clusters: Register for an evaluation version and run a simple install command! As you might know, Cisco has recently acquired Banzai Cloud. Currently we are in a transitional period and are moving our infrastructure. Contact us so we can discuss your needs and requirements, and organize a live demo. Evaluation downloads are temporarily suspended. Contact us to discuss your needs and requirements, and organize a live demo.
supertubes install -a --no-demo-cluster --kubeconfig <path-to-k8s-cluster-kubeconfig-file>
or read the documentation for details. Take a look at some of the Kafka features that we've automated and simplified through Supertubes and the Koperator, which we've already blogged about:

Kafka protocol support in Envoy

Envoy is a next generation network proxy, built for the cloud native era. It supports a wide variety of application protocols (Zookeeeper, MongoDB, etc) and recently added Kafka support. The benefits of a network proxy understanding higher level protocol implementations are huge. In case of Kafka, the list of benefits include:
  • Out of the box tracing and monitoring within a Kafka mesh
  • Consumer group metrics
  • Information about apps and their version of the client libraries
  • Request validation
  • Protocol version translations
  • Automatic topic name conversions without having to modify the clients
  • Mirroring topics to another clusters (we run many hybrid Kubernetes clusters)
  • Functional parity across runtimes
Now let's dig into some of the above.

Metrics and monitoring

Koperator has always provided server side metrics. But running in a Backyards (now Cisco Service Mesh Manager)-managed Istio service mesh also adds metrics from the Envoy sidecar. This opens up a totally new perspective. Without having to modify Kafka clients, we now have insights into clients and how they behave. For example, it's easy to query which client is writing to a topic and what is the byte rate/client.

Functional parity across runtimes

In Kafka, the client SDK is often responsible for too many things. The historical decision behind it, was to keep the brokers as lightweight and easy as possible. Initially Kafka was written in Scala, however with the later shift to Java, the full featured client SDKs are now the Java ones. The non JVM clients are missing quite a few features. With the help of Envoy, this will be different in the future, because some of the client responsibilities could be shifted into the sidecar proxy. This would bring the same functionalities to all clients no matter what language they're written in.

Request validation

As Kafka is content agnostic, misbehaving clients can write nearly anything to the brokers. The Envoy proxy can now validate the requests at the protocol level, and check if they contain all the required (or too many) information before forwarding it to the brokers.

Rewrapping old Kafka protocols

The Kafka client SDK is a sensitive component. We've seen clusters that could not be upgraded in time, because clients were using older protocol versions. The Envoy filter can unwrap messages of older versions, and translate them to the latest and greatest version at the protocol level.

Envoy protocol filter for Kafka in action

This is all nice and handy, but there's still a missing piece: the Envoy protocol filter for Kafka. As mentioned earlier, the Envoy community and Adam Kotwasinski is working hard to finish it. We took Adam's branch, built a custom Envoy version with the Kafka filter included, and automated a Kafka cluster setup on Istio, orchestrated by Backyards (now Cisco Service Mesh Manager). Under the hood the major components are:

Install a Kafka cluster on Istio

The first prerequisite is to have a Kubernetes cluster.
You can create a Kubernetes cluster on five different cloud providers, or on-premise via the free developer version of the Pipeline platform. Or you can also bring your own cluster.
If you have a cluster, you can grab this experimental build of the Backyards CLI.
This is an experimental feature, so make sure you download the appropriate release.
Set the KUBECONFIG environment variable to your Kubernetes cluster, and run the following two commands. It will install all the necessary components to try out the Envoy Kafka protocol filter.
backyards istio install --set spec.proxy.image=banzaicloud/proxyv2:devfilter
backyards install --with-kafka-cluster
Backyards (now Cisco Service Mesh Manager) will install and configure an Istio service mesh, and an Apache Kafka cluster using Banzai Clouds Operators (Koperator and Istio). It will also configure the Envoy Kafka protocol filter with a custom resource called EnvoyFilter. If you are more of a visual type, the following diagram represents the architecture: EnvoyFilter. To see some metrics, you will need some load in your Kafka cluster. You can use you own tooling to do that, or you can issue the following command which starts a small performance tool and sends some load to Kafka:
backyards kafka load
Then you can open the Grafana dashboard for the Kafka cluster:
backyards kafka dashboard
Backyards View

Kafka protocol filter metrics

The sample dashboards show information about various Kafka protocol messages. The early version of the filter already produces some of the most important metrics, like the average latency of responses, the number of failed responses, or the number of topics. Kafka protocol filter metrics These metrics can help you keep the cluster healthy. You can setup alerts based on these, that are triggered when something starts to behave incorrectly. For example, the Produce Buffer metric can tell you if the cluster is nearing its limits, so an intervention is needed. Produce Buffer On the other hand you can also use these metrics to build custom logic that helps you manage the cluster. For example you can leverage the Produce requests metric when setting up autoscaling of the Kafka cluster. Passing a certain threshold of the average response time could initiate an automatic Kafka cluster upscale.

About Banzai Cloud

Banzai Cloud is changing how private clouds are built: simplifying the development, deployment, and scaling of complex applications, and putting the power of Kubernetes and Cloud Native technologies in the hands of developers and enterprises, everywhere. #multicloud #hybridcloud #BanzaiCloud
Subscribe card background
Subscribe to
the Shift!

Get emerging insights on innovative technology straight to your inbox.

Unlocking multi-cloud security: Panoptica's graph-based approach

Discover why security teams rely on Panoptica's graph-based technology to navigate and prioritize risks across multi-cloud landscapes, enhancing accuracy and resilience in safeguarding diverse ecosystems.

Subscribe to
the Shift
emerging insights
on innovative technology straight to your inbox.

The Shift keeps you at the forefront of cloud native modern applications, application security, generative AI, quantum computing, and other groundbreaking innovations that are shaping the future of technology.

Outshift Background