Published on 00/00/0000
Last updated on 00/00/0000
Published on 00/00/0000
Last updated on 00/00/0000
Share
Share
PRODUCT
8 min read
Share
Running multiple Kubernetes clusters in a company or even within a team is a common practice. You need test environments or just want to separate workloads between customers. Prometheus is an awesome tool to monitor a single cluster. But if you want to query multiple clusters, you need the help of other tools. If you are a regular reader, you know that our choice for this task is Thanos. If you are not familiar with Thanos, read our Multi cluster monitoring with Thanos blog post first.
Using operators in a Kubernetes environment can be a huge operational benefit. There are always debates about using simple Helm charts, (or other deployment tools for templating Kubernetes yaml files) versus installing an operator and managing custom resources. Each of these have their pros and cons. In my opinion, if you are managing stateful applications, interacting with different components, and/or want changing configuration frequently, it's nice to have an operator abstracting away configuration complexities. The following use-case uses our Thanos Operator and has already demonstrated the benefits of having a deterministic way for managing a multi-component software.
Note: To simplify the process we will use the
one-eye
command line tool. This tool is available throught Cisco’s Emerging Technologies Design Partner Program.
In this example we will setup Prometheus and Thanos to have a single-dashboard multi-query architecture. What does this mean? In short: you can grab metrics from an application no matter which cluster it is running on.
Note: Although this use-case does not enable long-term storage, it should be trivial to configure that as well.
So let's outline the steps we have to do to achieve this. To simplify the explanation, I'll call the management cluster as Observer
and all the other clusters as Peer
clusters.
Note: This is one way to achieve this functionality. Other solutions - like using reverse-proxy for TLS configuration on the observer cluster - are also perfectly fine. However, we used the tooling we already had at hand via the Thanos Operator.
So let's start!
As we will use more than one cluster, we will use the --context
switch to choose between them. Please take note of which commands we apply on which cluster.
This walkthrough assumes that you have a kubeconfig file (and the path to the file is defined in the KUBECONFIG
environment variable) containing all contexts required to connect to the specific clusters. We will refer to the context names using the ${OBSERVER_CONTEXT}
and ${PEER_CONTEXT}
shell variables, and assume they have been exported like in the following example below.
export OBSERVER_CONTEXT="mcom-observer"
export PEER_CONTEXT="mcom-peer-1"
First, grab the name of the peer cluster. The snippet below sets the context name to the endpoint name.
kubectx "${PEER_CONTEXT}"
export PEER_ENDPOINT=$(kubectl config current-context | cut -d '@' -f 2)
Note: Depending on your context name, the delimiter might be different, so check the
$PEER_ENDPOINT
value.
After these preparations, deploy the components on the Observer cluster.
one-eye --context "${OBSERVER_CONTEXT}" cert-manager install -us
one-eye --context "${OBSERVER_CONTEXT}" prometheus install -us
one-eye --context "${OBSERVER_CONTEXT}" grafana install -us
one-eye --context "${OBSERVER_CONTEXT}" thanos install --operator-only -us
one-eye --context "${OBSERVER_CONTEXT}" observer reconcile
Note: To reduce the number of times we reconcile, we use the
-s/--skip-reconcile
and-u/--update
flags to initialize the observer configuration. We will do an explicitreconcile
at the end.
If you have a way to create client and server certificates, you can skip this part. In this section we setup a self-signed certificate using cert-manager.
Note: Self-signed certificates are for demonstration purposes. Use a proper CA for production setup.
The first step is to set up a self-signed issuer. You can do that by applying the following yaml:
apiVersion: cert-manager.io/v1
kind: Issuer
metadata:
name: selfsigned
namespace: default
spec:
selfSigned: {}
The next step is to generate certificates. For simplicity, we use the same certificate for client and server authentication. The following yaml creates the appropriate certificate at mcom-peer-1
secret.
apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
name: mcom-peer-1-tls
namespace: default
spec:
commonName: peer-endpoint.cluster.notld
dnsNames:
- "${PEER_ENDPOINT}"
issuerRef:
name: selfsigned
secretName: ${PEER_ENDPOINT}-tls
usages:
- server auth
- client auth
After generating the proper certificates, we can jump over to the Peer cluster to prepare the environment there.
cluster
label on PrometheusIf Prometheus is already installed on the peer cluster, make sure that it properly sets the cluster label for the collected metrics: the label must be present and unique, otherwise the metrics of the different clusters become mixed up. More over we need to enable the Thanos Sidecar for Prometheus. If Multi Cloud Observability Manager (formerly called One Eye) has already been installed on the peer cluster, make sure that the spec.clusterName
field of the observer custom resource is different on the Observer and the Peer clusters. Multi Cloud Observability Manager version 0.5.0 and later tries to detect it automatically from the context. Copy the certificates to the Peer cluster:
kubectl --context "${OBSERVER_CONTEXT}" get secret "${PEER_ENDPOINT}-tls" -o yaml | kubectl --context "${PEER_CONTEXT}" create -f-
Prepare the Peer cluster for monitoring.
one-eye --context "${PEER_CONTEXT}" prometheus install -us
one-eye --context "${PEER_CONTEXT}" thanos install --operator-only -us
one-eye --context "${PEER_CONTEXT}" ingress install -us
one-eye --context "${PEER_CONTEXT}" observer reconcile
For ingress Cisco MCOM installs the official Kubernetes Nginx Ingress Controller. Since Thanos uses GRPC to communicate between components we need an ingress that can provide HTTP/2 support. Because most of HTTP/2 configuration is based on annotations Thanos Operator currently supports only Nginx Ingress but this can be exteneded in the future. Create the ThanosEndpoint
on the Peer cluster. This command will perform different tasks. First, it will deploy a Thanos Query to provide an interface for the other components. After that, it deploys an Nginx ingress to create a GRPC endpoint with TLS configured. The following command first generates the yaml for the endpoint, then applies it.
one-eye thanos endpoint generate $PEER_ENDPOINT --cert-secret-name ${PEER_ENDPOINT}-tls --ca-bundle-secret-name ${PEER_ENDPOINT}-tls | kubectl apply -f-
Note: You can use the generate command to create yaml files that you can later use in your CI/CD environment as well.
Example ThanosEndpoint configuration
apiVersion: monitoring.banzaicloud.io/v1alpha1
kind: ThanosEndpoint
metadata:
name: mcom-peer-1
namespace: default
spec:
caBundle: mcom-peer-1-tls
certificate: mcom-peer-1-tls
ingressClassName: one-eye-nginx-external
metaOverrides: {}
After a successful reconcile, the ThanosEndpoint
resource's status holds the value of the ingress public endpoint. This endpoint is required to setup the Observer's peer resource. Let's export that into a variable.
$ kubectl get thanosendpoint
NAME ENDPOINT ADDRESS
mcom-peer-1 xxxxxxxxxxxxxxxxxxxxx-zzzzzzzzzzzz.eu-west-1.elb.amazonaws.com:443
We can use the following one-liner to save the address in a variable.
export ENDPOINT_ADDRESS=$(one-eye --context "${PEER_CONTEXT}" thanos endpoint address "${PEER_ENDPOINT}")
Now it's time to create the peer resource on the Observer cluster. We need to specify the endpoint address and the secret for the certificates. The following command will creates and configures a Thanos Query with TLS authentication that connects to the peer cluster. Moreover, it creates a datasource resource for the Grafana operator automatically.
one-eye --context "${OBSERVER_CONTEXT}" thanos peer generate "${PEER_ENDPOINT}" --endpoint-address "${ENDPOINT_ADDRESS}" --cert-secret-name "${PEER_ENDPOINT}-tls" --ca-bundle-secret-name "${PEER_ENDPOINT}-tls" | kubectl --context "${OBSERVER_CONTEXT}" apply -f-
Example ThanosPeer configuration
apiVersion: monitoring.banzaicloud.io/v1alpha1
kind: ThanosPeer
metadata:
name: mcom-peer-1
namespace: default
spec:
endpointAddress: xxxxxxxxxxxxxxxxxxxxx-zzzzzzzzzzzz.eu-west-1.elb.amazonaws.com:443
peerEndpointAlias: mcom-peer-1
status:
queryHTTPServiceURL: http://mcom-peer-1-peer-query.default.svc:10902
The final step is to configure our Central Query instance that aggregates all of the configured peer queries. First, create an aggregator Query called central-query
.
apiVersion: monitoring.banzaicloud.io/v1alpha1
kind: Thanos
metadata:
labels:
app.kubernetes.io/instance: central-query
app.kubernetes.io/managed-by: thanos-operator
app.kubernetes.io/name: query
name: central-query
spec:
queryDiscovery: true
query:
grafanaDatasource: true
metrics:
serviceMonitor: true
Then create a StoreEndpoint
definition with an empty selector. This will aggregate all endpoints using the Thanos Store protocol. The thanos
attribute references our previously created query instance.
apiVersion: monitoring.banzaicloud.io/v1alpha1
kind: StoreEndpoint
metadata:
name: all-endpoint
spec:
thanos: central-query
selector: {}
Now we are all set. Just check the Grafana dashboard and query whatever you need! At first these steps may look like a lot, but installing, configuring, and then reconfiguring all components to harness the synergy between them can take a lot of time. Moreover, these steps are easy to automate! You can build a CD pipeline easily with these tools. Remember, Cisco Multi Cloud Observability Manager is both an Operator and a CLI tool, and they can work simultaneously. You can install the operator on every cluster and configure it via the CLI tool as a step of a delivery pipeline. Another benefit of this approach is that we only uses Kubernetes provided resources. There is no custom logic behind the service discovery nor hand configured proxies. These are standard Kubernetes resources that you would use for other applications as well.
Get emerging insights on innovative technology straight to your inbox.
Discover why security teams rely on Panoptica's graph-based technology to navigate and prioritize risks across multi-cloud landscapes, enhancing accuracy and resilience in safeguarding diverse ecosystems.
The Shift is Outshift’s exclusive newsletter.
The latest news and updates on cloud native modern applications, application security, generative AI, quantum computing, and other groundbreaking innovations shaping the future of technology.