8 min read
by Chinmay Gaikwad
Published on 12/21/2021
Last updated on 02/05/2024
Published on 12/21/2021
Last updated on 02/05/2024
Prometheus Metrics for KubernetesAs one of CNCF’s managed projects, Prometheus has been widely adopted for monitoring Kubernetes applications due to its efficiency in collecting metrics for varied services. The platform leverages an instrumentation framework capable of churning large amounts of data, making it ideal for complex, distributed workloads. Prometheus collects performance data using a pull-based system, where it sends an HTTP request based on a component’s configuration. The platform then scrapes metric data from the response to this request while using exporters to make sure that the scraped data is correctly exposed and formatted. The scope of Prometheus’ service discoveries ranges across multiple components of a Kubernetes cluster, including:
Benefits of PrometheusThough use cases may vary for different organizations, the Prometheus platform offers a range of observability benefits, including: A multidimensional data model: Prometheus collects data in key-value pairs, similar to how Kubernetes component metadata is configured in YAML files. The platform relies on the Prometheus Query Language (PromQL) to enable the collection of flexible and accurate time-series data. Simple data formats and protocols: The platform collects data in self-explanatory, human-readable formats that can be published in standard HTTP. This makes exposing and checking metrics a pretty straightforward task. A built-in alert manager: Developers can specify rules for Prometheus notifications and alerts. This reduces disruptions and your developers’ workload since there is no need to source an external system or API for notification. Whitebox and blackbox monitoring: The platform includes exporters and client libraries to enable the monitoring of both performance and user experience. Prometheus can consume metrics from labels and annotations in configuration files for efficient monitoring and tracking of component status. Additionally, the platform includes metrics exposed by each component’s internals, such as logs, interfaces, and internal HTTP handlers. Pull-based metrics: With the pull-based metrics collection system, teams can simply expose metrics as HTTP endpoints and use Prometheus without exposing the monitor’s location to the services.
Metric TypesPrometheus’s out-of-the-box client libraries primarily support the collection of four different types of metrics: Count: A cumulative metric arising from a single counter that either rises monotonically or is reset to zero when metric collection restarts, this is used to represent indicators such as tasks completed, errors, or number of requests received. Gauge: A single, numeric value that can rise or fall arbitrarily, this is used to expose metrics for measured values such as memory usage, temperature, or count metrics that can go up and down. Histogram: Using buckets to represent the frequency distribution of sample metrics, this measure is cumulative and can be used to observe trends in Summary and Count metrics. Summary: Similar to Histogram, this samples metrics and provides the total count of observations. However, in contrast to Histogram, the Summary metric type uses a sliding time window to calculate configurable quantiles.
Prometheus Metrics & KPIs for KubernetesPrometheus exposes metrics that help you observe various components of a Kubernetes ecosystem. These include the four groups of metrics reviewed below.
Cluster & Node MetricsThese indicators focus on an entire cluster or a specific node’s health status. They include:
- Node resource metrics: disk & memory utilization, network bandwidth, and CPU usage.
- Number of nodes
- Number of running pods per node
- Memory/CPU requests and limits
Deployment and Pod MetricsThese include:
- Current deployment and daemonset
- Missing and failed pods
- Pod restarts
- Pods in CrashLoopBackOff
- Running vs. desired pods
- Pod resource usage vs. requests and limits
- Available and unavailable pods
Container MetricsThese help teams establish how close container resource consumption is to the configured limits. Such metrics include:
- Container CPU usage
- Container memory utilization
- Network usage
Application MetricsThese measure whether the applications running in pods are healthy and available. They include:
- Application availability
- Application health and performance
Troubleshooting Using Prometheus Metrics in EpsagonEpsagon offers observability support for clusters running on all open-source Kubernetes distributions. The platform allows a simple and seamless integration with Prometheus to automatically discover and generate metrics for an entire application workload. Epsagon also provides access to cluster logs and traces for monitoring dynamic, containerized environments. With complete visibility into cluster health and application performance issues, organizations can detect bottlenecks, troubleshoot issues, and optimize resource configuration to help developers enhance productivity. Epsagon’s integration with Prometheus lets you collect and analyze various metrics for actionable intelligence and troubleshooting. Such metrics include:
- Container logs
- Cluster performance metrics, insights, and alerts
- Detailed mappings of cluster components for health verification
Installing the Epsagon AgentYou can install the Epsagon agent in your Kubernetes cluster using the Helm package manager. If it doesn’t exist already, install Helm in your cluster using the following command:
$ curl -fsSL -o get_helm.sh https://raw.githubusercontent.com/helm/helm/master/scripts/get-helm-3
$ chmod 700 get_helm.sh
$ ./get_helm.sh<div class="open_grepper_editor" title="Edit & Save To Grepper"></div>
- Generate an Epsagon token to connect your application with an associated account.
- Create a simple cluster name that will be shown on the Epsagon dashboard.
- Complete the installation using the command:
$ helm repo add epsagon https://helm.epsagon.com
$ helm install <RELEASE_NAME>
--set epsagonToken=<EPSAGON_TOKEN> --set clusterName=<CLUSTER_NAME> epsagon/cluster-agent<div class="open_grepper_editor" title="Edit & Save To Grepper"></div>
Setting up Prometheus to Send Metrics to EpsagonBefore collecting Prometheus metrics, it’s important to have the Prometheus operator installed in the cluster using the command:
$ helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
$ helm install [RELEASE_NAME] prometheus-community/prometheus --set serviceAccounts.alertmanager.create=false --set serviceAccounts.nodeExporter.create=false --set serviceAccounts.pushgateway.create=false --set alertmanager.enabled=false --set nodeExporter.enabled=false --set pushgateway.enabled=false --set server.persistentVolume.size=10Gi<div class="open_grepper_editor" title="Edit & Save To Grepper"></div>
- url: https://collector.epsagon.com/ingestion?<EPSAGON_TOKEN>
- target_label: cluster_name
replacement: <CLUSTER_NAME><div class="open_grepper_editor" title="Edit & Save To Grepper"></div>
$ helm repo add bitnami https://charts.bitnami.com/bitnami
$ helm install [RELEASE_NAME] bitnami/kube-state-metrics<div class="open_grepper_editor" title="Edit & Save To Grepper"></div>
Enabling Trace-to-Log CorrelationThe Epsagon platform autonomously correlates logs and traces, allowing developers to view logs for a specific time span. This eliminates the need for a manual log search or injecting logs with Span IDs. To enable trace-to-log correlation, developers have to:
- Trace their containers. Note: Only applications in Java, Python, and Node.js support log correlation.
- Set up FluentD as a DaemonSet to send logs to AWS CloudWatch.
- View a trace’s logs by opening the trace, selecting a node, and accessing the logs in one click.
How to Send MetricsTeams can use the Prometheus StatsD exporter to translate StatsD metrics into Prometheus metrics using pre-configured mapping rules. This is achieved by downloading and installing the exporter in the cluster. You can implement the native Prometheus instrumentation client for sending custom metrics into Prometheus. To achieve this, use the Prometheus Pushgateway or scrape the metrics directly from the client.
SummaryAs a workload’s ecosystem grows, a single Prometheus instance is often not enough to account for the increasing number of time series data. While deploying multiple instances of Prometheus is always one option, federating data of those instances through a common, centralized channel such as Espagon is considered the most optimal solution. Having successfully deployed the Prometheus operator with the Epsagon agent, organizations can track the overall health, performance, and behavior of their Kubernetes clusters efficiently. Prometheus is a metrics-based application monitoring system that enables DevOps teams to observe, repair, and maintain distributed, microservices-based Kubernetes workloads. And with Epsagon, teams can access a comprehensive dashboard of logs, traces, and metrics that enhances observability and simplifies troubleshooting. It also allows you to easily integrate with a wide range of data sources and create custom dashboards. Check out our demo environment or try Epsagon for FREE for up to 10 Million traces per month!
Get emerging insights on emerging technology straight to your inbox.
Unlocking Multi-Cloud Security: Panoptica's Graph-Based Approach
Discover why security teams rely on Panoptica's graph-based technology to navigate and prioritize risks across multi-cloud landscapes, enhancing accuracy and resilience in safeguarding diverse ecosystems.
The Shift keeps you at the forefront of cloud native modern applications, application security, generative AI, quantum computing, and other groundbreaking innovations that are shaping the future of technology.