STRATEGY & INSIGHTS

7 min read

Published on 03/10/2019
Last updated on 03/21/2024
Kafka on Kubernetes, the easy way
Share
One of the key features of the Pipeline platform is its ability to automatically provision, manage, and operate different application frameworks through what we call spotguides. Among the many spotguides we support on Kubernetes (Spark, Zeppelin, NodeJS, Golang,
A competent Java/Kafka developer might lack any of the skills listed above - however, they can all be automated using Banzai Cloud's Koperator for Kubernetes and our Kafka Spotguide. You can kickstart your Kafka experience in less than 5 minutes through the Pipeline UI.
An overview (including automation flow) follows:
The first screen of the wizard/questionnaire will be a request for general information and for a handful of broker properties, which we will apply ourselves. Broker properties can contain anything covered in the original Kafka broker documentation. You can modify this snippet however you want, but it's recommended that you take note of the following (we highly recommended you keep these
Once you pass these generic configs, you will reach the Kubernetes cluster create option. Once passed, the cluster is created, and Kafka and all its components are deployed and made ready to use (they're monitored, their logs collected, etc).
Now your Kafka cluster is ready!
Lets try it out by producing and consuming some messages. For practicality's sake, these tests will be conducted using the well known Kafka tool,
You might want to check the logs, as well, to see that the messages are arriving in the location you have specified.
After a while, we may begin to have some problems, in the form of under-replicated partitions. This happens when a broker runs out of disk space, and fails. Kubernetes tries to save the broker by continously restarting it. Luckily, we can consume all transferred messages (no offline partitions yet exist), thanks to our well-chosen replication factor. All this information exists and is made available in the metrics and default charts we provide.
even custom frameworks
- to name a few) Apache Kafka is among the most popular.
We are heavily invested in making it as easy and straightforward as possible to operate Apache Kafka automatically on Kubernetes, and we believe that our current Apache Kafka Spotguide does just that. We're not stopping there, and we highly encourage you to read the roadmap section at the end of this blog.
Some of our older posts about Apache Kafka on Kuberbetes: Kafka on Kubernetes - using etcd Monitoring Apache Kafka with Prometheus Kafka on Kubernetes with Local Persistent Volumes Kafka on Kubernetes the easy way
Kafka on Kubernetes - the way it should be
There are a few solutions out there for people that want to use Kafka on Kubernetes, but I'd argue that none of them provide an end-to-end method of creating, operating and deploying Kafka to Kubernetes without the use of specialized skillsets. Most of these solutions (if not all of them) require, as a prerequisite, a preexisting Kubernetes cluster, Helm or K8s deployment, knowledge ofyaml
, logging and monitoring systems that are pre-deployed and pre-configured, possibly a CI/CD system, marriage
of Kafka and K8s security measures, and, ultimately, Kafka experience. These prerequisites don't usually overlap, so our aim was to automate them and to fast track the Kafka on Kubernetes experience by:
- Automating the creation of Kubernetes clusters on six cloud providers as well as on-premise
- Deploying Kafka's components and creating brokers and a Zookeeper cluster
- Pre-configure Prometheus to monitor all Kafka components with useful default Grafana dashboards
- Centralizing log collection (in object storage, Elastic, etc) using the fluentd/fluent-bit ecosystem
- Externalizing access to Kafka using a dynamically (re)configured Envoy proxy
- Reproducing environments using the built-in Pipeline CI/CD subsystem and storing state in Git


Kafka in action
Once you're logged in to the Pipeline platform, you can proceed directly to the Spotguides section (also, please check our documentation for details on how to add your cloud credentials).
as-is
):
- zookeeper.connect: keep this or add your own pre-existing Zookeeper cluster endpoint
- broker.id: this is populated by the Spotguide based on the statefulset's generated pod number
- advertised.listeners: this is populated by the Spotguide and cannot be changed in this release (we'll add the option to later)
- listeners: this is also populated by the Spotguide and cannot be changed in this release (option to be added later)
- log.dirs: this is likewise populated by the Spotguide and cannot be changed in this release (option to later)



kafkacat
.
- To produce messages:
kafkacat -P -b <bootstrap-server:port> -t test_topic
- To consume those messages:
kafkacat -C -<bootstrap-server:port> -t test_topic





Roadmap
So what's next? What we've just demonstrated is already faster and more convenient than most other options, but is still far from perfect. It's not perfect, because, when shit hits the proverbial fan, it requires manual intervention, and, at the same time, lacks a few features that may be required by some Kafka developers. Also, it involves some constraints that, when running on Kubernetes, we believe should be handled differently. The following are all works in progress that will soon be opensourced as part of our Kubernetes operator for Apache Kafka:- Ability to enable or disable a Schema Registry when creating a Spotguide
- Support for multiple open-source Kafka connectors
- Fine Grained Broker Config support
- Fine Grained Broker Volume support
- Fine Grained upscale and downscale support (this will involve radically different Kubernetes technology/a different approach to all Kafka solutions (including the current state of our operator))
- Intelligent failure management including nodes, brokers and disks
- Istio backed external access using the Banzai Cloud Istio operator
- Open-source Kafka UI support, like this
About Banzai Cloud Pipeline
Banzai Cloud’s Pipeline provides a platform for enterprises to develop, deploy, and scale container-based applications. It leverages best-of-breed cloud components, such as Kubernetes, to create a highly productive, yet flexible environment for developers and operations teams alike. Strong security measures — multiple authentication backends, fine-grained authorization, dynamic secret management, automated secure communications between components using TLS, vulnerability scans, static code analysis, CI/CD, and so on — are default features of the Pipeline platform.
Subscribe to
The Shift!
Get emerging insights on innovative technology straight to your inbox.
Welcome to the future of agentic AI: The Internet of Agents
Outshift is leading the way in building an open, interoperable, agent-first, quantum-safe infrastructure for the future of artificial intelligence.

* No email required
Related articles
Subscribe
to
The Shift
!Get on innovative technology straight to your inbox.
emerging insights
The Shift is Outshift’s exclusive newsletter.
Get the latest news and updates on agentic AI, quantum, next-gen infra, and other groundbreaking innovations shaping the future of technology straight to your inbox.
