Published on 00/00/0000
Last updated on 00/00/0000
Published on 00/00/0000
Last updated on 00/00/0000
Share
Share
INSIGHTS
7 min read
Share
Kubernetes has become the go-to platform for hosting container-based applications. Although Kubernetes is widely adopted, there are still “secret” benefits of running your applications on it. This post shows you why Kubernetes events are so important and how they help tackle simple and complex problems as well. Before we dig deeper, let’s get an overview of what Kubernetes events are!
We have an earlier blog post about Kubernetes events to get your feet wet, but here's just a quick overview of what events are. The foundation of Kubernetes is that there are several different controllers that keep the state of the system in sync with the resource definitions. These controllers communicate with the users via events. The most basic example is when you create a deployment:
controller manager
creates a replicaset
for that deployment.scheduler
(another controller) assigns nodes to pods.kubelet
(a per node controller) executes the containers.As you can see, a simple deployment goes through several controllers and a lot can go wrong during this process. Both success or failure of an operation results in an event
in Kubernetes. You can check those events via kubectl
or your preferred GUI or CLI tool. If you don’t want to filter events you can simply use:
kubectl get events
and the result will be something similar to this:
LAST SEEN TYPE REASON OBJECT SUBOBJECT SOURCE MESSAGE FIRST SEEN COUNT NAME
3m7s Normal LeaderElection configmap/banzaicloud-thanos-operator one-eye-thanos-operator-6467d7bd65-8xb27_01984c3f-b24d-4ebe-8156-d9a321a3a5d5 one-eye-thanos-operator-6467d7bd65-8xb27_01984c3f-b24d-4ebe-8156-d9a321a3a5d5 became leader 3m7s 1 banzaicloud-thanos-operator.16cb552d7b33e74b
3m7s Normal LeaderElection lease/banzaicloud-thanos-operator one-eye-thanos-operator-6467d7bd65-8xb27_01984c3f-b24d-4ebe-8156-d9a321a3a5d5 one-eye-thanos-operator-6467d7bd65-8xb27_01984c3f-b24d-4ebe-8156-d9a321a3a5d5 became leader 3m7s 1 banzaicloud-thanos-operator.16cb552d7b341142
2m39s Normal LeaderElection configmap/banzaicloud-thanos-operator one-eye-thanos-operator-6467d7bd65-8xb27_6b649c1c-1cc3-47cf-ae12-894b19b4ee99 one-eye-thanos-operator-6467d7bd65-8xb27_6b649c1c-1cc3-47cf-ae12-894b19b4ee99 became leader 2m39s 1 banzaicloud-thanos-operator.16cb5533d626f885
2m39s Normal LeaderElection lease/banzaicloud-thanos-operator one-eye-thanos-operator-6467d7bd65-8xb27_6b649c1c-1cc3-47cf-ae12-894b19b4ee99 one-eye-thanos-operator-6467d7bd65-8xb27_6b649c1c-1cc3-47cf-ae12-894b19b4ee99 became leader 2m3
If you have a resource related to an event, you can query events for that particular resource.
kubectl get event one-eye-thanos-operator.16cb552b0653a67d
apiVersion: v1
count: 1
eventTime: null
firstTimestamp: "2022-01-18T10:02:12Z"
involvedObject:
apiVersion: apps/v1
kind: Deployment
name: one-eye-thanos-operator
namespace: default
resourceVersion: "4521231"
uid: ee22d555-1bdf-4424-a1ea-19a1382c958d
kind: Event
lastTimestamp: "2022-01-18T10:02:12Z"
message:
Scaled up replica set one-eye-thanos-operator-6467d7bd65
to 1
metadata:
creationTimestamp: "2022-01-18T10:02:12Z"
name: one-eye-thanos-operator.16cb552b0653a67d
namespace: default
resourceVersion: "4521234"
selfLink: /api/v1/namespaces/default/events/one-eye-thanos-operator.16cb552b0653a67d
uid: e5cf909c-53c2-4e5e-be4c-af92a956a12c
reason: ScalingReplicaSet
reportingComponent: ""
reportingInstance: ""
source:
component: deployment-controller
type: Normal
As you can see, Kubernetes events are essentially resources similar to deployments
or pods
. They have the same version and metadata fields. However, we have a couple of event-specific fields as well. Let's see the most important ones:
eventTime
The timestamp of an atomic eventfirstTimestamp
The first timestamp of a continuous eventlastTimestamp
The last timestamp of a continuous eventcount
The number of times this event was triggeredmessage
Human readable messagereason
Short description of the eventinvolvedObject
Reference to the Kubernetes resource the event is related tometadata
The event's own metadata including name, uid, etc.source
The source object of the eventtype
Event type like Normal, Warning, and so on.Events are garbage collected by the Kubernetes API Server after a short period of time. This TTL is configurable, a typical value is an hour, but there are exceptions like 5 minutes in case of EKS. However, events can be really useful when debugging what happened in your cluster. That is why storing events is a common practice. The problem with events is that they are not really metrics, a bit different from logs, and have some trace-like properties as well.
A trivial approach is to store events as logs. Although, there are some problems with this approach: events have fields that make connections between different components. If you treat events like standard log lines and ingest them into a log database like Loki, you miss a lot of information. Of course, it is possible to later retrieve that information at query time, but you need to be prepared to parse those fields from your raw data.
As events have a lot of simple attributes (like reason
), they can be translated into metrics. A good transformation would be to use the reason
field as metric name and the count
field as value. All the other relevant attributes can be label
s on the metric. From this information you can create a nice overview of what's happening in your cluster. This seems like a good idea and it provides you with an overall health indicator, yet you lose a lot of important information. Time series databases don't handle high cardinality information well. If you need more than aggregated values, like message
and/or the name
field, they become individual time series per event. That does not sound good, does it?
An interesting approach is to store events as traces. Traces have the ability to not just show individual events, but represent hierarchy and time ranges visually. Kspan is a proof of concept of how to represent events as traces. I don't want to go into details, you can follow up on the kspan project page. Screenshot from the kspan project
All the above solutions have their pros and cons, but we wanted something truly useful. Most of the time you need events tied to a resource. This can happen when you investigate an application behavior maze because you got a response time alert. Because of this, you want to filter alerts for related objects and need timelines when the alert was active. Eventually, events become another aspect of correlation. In a following post we will discuss the correlation feature of MCOM as well, so we don't let you hang dry.
Let's talk about how Cisco MCOM handles events. First of all, we need to collect them all. To extract events from Kubernetes we use a modified version of Heptio's eventrouter. This simple yet great tool is able to fetch events from the Kubernetes API server and print them to the container's standard output. From there we have just the right tool to parse and send them to OpenSearch. Cisco MCOM provides the Flow and Output resources out of the box for ingesting Kubernetes events. We decided to use OpenSearch as our event backend because of the extensive query language it provides.
As previously mentioned, we store historical event data in OpenSearch and leverage ElasticSearch's Query DSL to filter and aggregate results. Query DSL can express complex queries using a tree of clauses encoded as JSON. When fetching events for correlation, we use it to filter events by involved object and time range, but also sort and aggregate them before they leave OpenSearch — all in a single expression. Let's see an example:
{
"collapse": {
"field": "event.metadata.name.keyword"
},
"query": {
"constant_score": {
"filter": {
"bool": {
"must": [
{
"term": {
"event.involvedObject.apiVersion.keyword": "v1"
}
},
{
"term": {
"event.involvedObject.kind.keyword": "Pod"
}
},
{
"term": {
"event.involvedObject.namespace.keyword": "default"
}
},
{
"term": {
"event.involvedObject.name.keyword": "nginx-558bd4d5db-6v9sc"
}
},
{
"bool": {
"should": [
{
"range": {
"event.eventTime": {
"from": "2022-02-14T01:00:00Z",
"include_lower": true,
"include_upper": true,
"to": null
}
}
},
{
"bool": {
"must": {
"range": {
"event.lastTimestamp": {
"from": "2022-02-14T01:00:00Z",
"include_lower": true,
"include_upper": true,
"to": null
}
}
}
}
}
]
}
}
]
}
}
}
},
"size": 10000,
"sort": [
{
"event.lastTimestamp": {
"missing": "_first",
"order": "desc",
"unmapped_type": "date"
}
},
{
"event.eventTime": {
"missing": "_first",
"order": "desc",
"unmapped_type": "date"
}
}
]
}
As you can see, there's quite a hierarchy of objects to express all these conditions and transformations, but we'll take it clause-by-clause. The collapse
clause is responsible for aggregating events by event name. The query
clause describes the logical combination of different filters. In our case, the first four term
clauses filter the events by involved object, and the last clause defines the time range predicate for both event kinds — events with eventTime
and events with firstTimestamp
and lastTimestamp
. Lastly, the size
clause limits the result set size and the sort
clause specifies an ordering by event timestamp. And that's it! Not so complicated after all. The results are then represented on a timeline on our correlation view:
All steps are manually reproducible but there is quite a bit of configuration required. To simplify the deployment, Cisco MCOM provides command-line options to deploy the event backend as described above.
one-eye logging install -us
one-eye opensearch install
one-eye event-backend install
And we are ready to browse our Events! In a future post we will show a practical example about logs, metrics and events in the correlation view!
Get emerging insights on innovative technology straight to your inbox.
Discover why security teams rely on Panoptica's graph-based technology to navigate and prioritize risks across multi-cloud landscapes, enhancing accuracy and resilience in safeguarding diverse ecosystems.
The Shift is Outshift’s exclusive newsletter.
The latest news and updates on cloud native modern applications, application security, generative AI, quantum computing, and other groundbreaking innovations shaping the future of technology.