Published on 00/00/0000
Last updated on 00/00/0000
Published on 00/00/0000
Last updated on 00/00/0000
Share
Share
PRODUCT
11 min read
Share
Multi-cluster service meshes have emerged as an architecture pattern to enable high availability, fault isolation, and failover for distributed applications. In my experience, this setup can also empower teams to run services across various cloud providers and on-premise resources. Why would this be advantageous? Well aside from mitigating potential vendor-lock, it can enable teams to optimize discrepancies in infrastructure resource availability, scalability, and cost. Within a hybrid-cloud mesh, for instance, we can manage the tradeoff between scalability and cost by splitting or replicating workloads across on-prem and cloud-hosted clusters. In this blog, we will explore a dynamic, or event-driven, method for replicating workloads across a multi-cluster service mesh. We will create an event-driven autoscaler, utilizing service-mesh properties and APIs from Calisti in addition to Kubernetes’ client-go. This model may be of particular interest for hybrid cloud meshes to implement cloud-bursting, where demand spikes trigger a burst of on-prem services into the cloud. With this in mind, we will design the autoscaler for a primary/peer service mesh setup. Building upon the concept of Kubernetes' horizontal-pod-autoscaler, we can ingest host-level as well as application-level metrics to inform scaling events.
Let us begin by installing Calisti on our primary and peer clusters, creating a multi-cluster, single mesh control plane. On our peer cluster:
smm install -a
On our primary cluster:
smm install -a
smm istio cluster attach <PEER_CLUSTER_KUBECONFIG_FILE>
smm istio cluster status
Check out the Calisti docs for more installation and usage details. Once both clusters are attached, forming a single mesh, we can deploy an application spanning the mesh. As we will see, cross-cluster service discovery enables microservices in both clusters to communicate with each other. A feature of the namespace sameness concept, dictating that all services across the clusters are shared by default.
smm demoapp install -s frontpage,catalog,bookings,postgresql
smm -c <PEER_CLUSTER_KUBECONFIG_FILE> demoapp install -s movies,payments,notifications,analytics,database,mysql —peer
Now that our mesh is set up, we can turn to implement the autoscaler. As mentioned, we will utilize two primary building blocks; Calisti, as well as Kubernetes’ client-go. We will define the autoscaler as a control loop with a listener, informer, and work queue. To utilize service-level metrics we will instruct the listener to periodically queries Calisti’s graphql API. The informer will then compare these metrics to scaling and reconciliation policies to determine if our microservices need to be replicated across the mesh.
Before we implement these components we will first define an event policy configuration. Since we aim to prioritize resource availability and cost, our policies could take into account three levels of metrics; provider metrics (cost), host metrics (resource availability), and service metrics (health). For the following examples, we will create a policy for service metrics, namely service requests-per-second.
kind: multi-cluster-autoscaler
metadata:
namespace: smm-demo
spec:
groupVersionKind:
kind: Deployment
selector:
app: bookings
policy:
type: “rps-burst”
burst-value: 100
reconcile-value: 60
throttle-delay: 120
This event policy indicates that our control loop should watch request-per-second metrics for the bookings
microservice, of type deployment. The event is triggered if over 100 requests-per-second (rps) are measured for a period of 120 seconds. At which point the controller should replicate or burst bookings
into the peer cluster. Conversely, if the bookings
deployments receive less than 60 rps for 120 seconds, the traffic and microservice should be scaled back to the primary cluster. Note that these policies could be implemented as Kubernetes custom-resource-definitions, but for simplicity, we will stick with yaml configs.
For all host and service level metrics we can utilize Calisti's graphql API. For instance, we can retrieve the requests-per-second received by the bookings
microservice by sending a query to http://127.0.0.1:50500/api/graphql
. Note that the Calisti dashboard must be running to access the API via localhost.
{
service(name: "bookings", namespace: "smm-demo") {
metrics(evaluationDurationSeconds: 5) {
latencyP50
rps
}
}
}
...
{
"data": {
"service": {
"metrics": {
"latencyP50": 0.08447085452073383,
"rps": 30.281078348468693
}
}
}
}
If our informer determines the policy is met, enqueuing an event, we will employ our own multi-cluster replication controller to replicate or reconcile runtime objects across clusters. Additionally, we will create an Calisti virtual service and route rule to split application traffic accordingly. The core of the autoscaler implementation lies in the Kubernetes multi-cluster replication controller. We will be discussing two implementations, one which solely utilizes Kubernetes’ client-go scaffolding while the other builds upon Calisti's internal cluster-registry-controller
.
Let’s first look at how we can create a multi-cluster replication controller using Kubernetes’ client-go library. Our control flow will be as follows; Upon replication or scale-out, retrieve the desired runtime spec from the primary cluster then apply the in-memory spec to the peer cluster. Upon scale back, simply remove the resources from the peer cluster. For the following example, we will be showing sample code from the Deployment
multi-cluster replication handler. Note that the implementation is practically identical for all k8s core types given their distinct clientset interfaces. Given an app label or deployment name from the informer, we can retrieve the desired runtime obeject spec.
func (d *DeploymentHandler) GetDeploymentsByAppLabel(
cl *kubernetes.Clientset,
ns string,
app string) (*app.DeploymentList, error) {
client := cl.AppsV1().Deployments(ns)
deployments, err := client.List(context.TODO(), metav1.ListOptions{
LabelSelector: fmt.Sprintf("app=%s", app),
})
if err != nil {...}
return deployments, nil
}
We then must be able to create a deployment given the in-memory specification that was retrieved.
func (d *DeploymentHandler) createDeployment(
cl *kubernetes.Clientset,
ns string,
deployment *app.Deployment,
) error {
client := cl.AppsV1().Deployments(ns)
_, err := client.Create(context.TODO(),
deployment, metav1.CreateOptions{})
if err != nil {...}
ctx, cancel := context.WithTimeout(context.Background(), time.Second*60)
defer cancel()
// signal when pods are available
err = WaitForRCPods(cl, ctx, deployment.Spec.Template.Labels["app"], ns, int(*deployment.Spec.Replicas))
log.Println("all replicas up")
return err
}
Providing a clientset and deployment spec, we create the deployment and wait for all pod replicas to be available, or interrupt after 60 seconds. We can ensure pods are up by watching the status of pods that belong to the k8s replication controller, in this case, the deployment.
watch, err := cl.CoreV1().Pods(ns).Watch(context.TODO(), metav1.ListOptions{
LabelSelector: fmt.Sprintf("app=%s",
rcLabel),
})
...
for event := range watch.ResultChan() {
p, ok := event.Object.(*cv1.Pod)
if !ok {...}
// check status of pods
switch p.Status.Phase {
case "Pending":
...
case "Running":
...
}
}
We will now tie these two primary functionalities together to complete the cross-cluster replication. First, retrieve the desired spec using the primary cluster’s clientset, then apply the spec to the peer cluster using the peer cluster’s clientset.
func (d *DeploymentHandler) Replicate(
clSource,
clTarget *kubernetes.Clientset,
ns string,
application string) []error {
deployments, err := d.GetDeploymentsByAppLabel(clSource, ns, application)
if err != nil {...}
for _, deployment := range deployments.Items {
deepCpy := deployment.DeepCopy()
deepCpy.ResourceVersion = "" // could add uuid tag or peer-cluster id
err = d.createDeployment(clTarget, ns, deepCpy)
if err != nil {...}
}
return nil
}
Each time a runtime object is replicated to a peer cluster, we must also replicate any corresponding services. This will enable our virtual service to seamlessly split traffic between the primary and replicated services. Service type replication can be done in the same manner as our Deployment
handler examples. With these client methods, we can dynamically move Kubernetes resources between participating clusters. As mentioned, we can also achieve the cross-cluster replication functionality by building a control layer on top of Calisti’s internal cluster-registry-controller
. The registry controller is responsible for synchronizing Kubernetes resources across clusters according to certain rules, defined by a custom-resource-definition (CRD). For instance, the following ResourceSyncRule
CRD may be used to synchronize or copy the matched Secret
to all participating clusters.
apiVersion: clusterregistry.k8s.cisco.com/v1alpha1
kind: ResourceSyncRule
metadata:
name: test-secret-sink
spec:
groupVersionKind:
kind: Secret
version: v1
rules:
- match:
- objectKey:
name: test-secret
namespace: cluster-registry
Using these rules, we can redefine our multi-cluster replication control flow. Upon replication or scale-out, create a ResourceSyncRule
for the desired runtime objects and associated services on the primary cluster. This will synchronize these objects to the peer cluster(s). Upon scale back, remove the ResourceSyncRules
on the primary cluster and remove the associated resources on the peer cluster. To utilize the cluster-registry-controller
for replication we will first generate a clientset for the cluster-registry public CRDs using Kubernetes’ code-generator. This will give us a type-safe method for listing, creating, and deleting the defined custom-resource-definitions. With the generated ResourceSyncRule
clientset, creating and deleting the CRD is no different from core Kubernetes objects.
ruleSpec := clusterregistryv1alpha1.ResourceSyncRule{...}
rule, err := ruleCRDClient.Create(context.TODO(), ruleSpec, metav1.CreateOptions{})
The cluster-registry-controller
takes the burden of replicating resources into the peer cluster but we will still use the client-go to remove objects upon reconcile. We can reuse a helper function used by both implementations to retrive the correct deletion function for all resource types.
func GetDeleter(cl *cls.Clientsets, kind, ns string) (func(context.Context, string, metav1.DeleteOptions) error, error) {
var deleter func(context.Context, string, metav1.DeleteOptions) error
switch kind {
case "ResourceSyncRule":
deleter = cl.ResourceSyncRuleV1(ns).Delete
case "Deployment":
deleter = cl.AppsV1().Deployments(ns).Delete
case "Statefulset":
deleter = cl.AppsV1().StatefulSets(ns).Delete
case "Daemonset":
deleter = cl.AppsV1().DaemonSets(ns).Delete
case "Pod":
deleter = cl.CoreV1().Pods(ns).Delete
case "Service":
deleter = cl.CoreV1().Services(ns).Delete
default:
return nil, fmt.Errorf("unsupported kind: %v", kind)
}
return deleter, nil
}
Upon a reconcile policy trigger, we call the deleter for the ResourceSyncRule
in the primary cluster and the replicated core type resources in the peer cluster.
func (r *ResourceSyncHandler) Reconcile(clPrimary, clPeer *cls.Clientsets, resourceName, kind, ns string) error {
deleter, err := GetDeleter(clPrimary, "ResourceSyncRule", ns)
if err != nil {...}
ruleName := rulePrefix + resourceName
err = deleter(context.TODO(), ruleName, metav1.DeleteOptions{})
if err != nil {...}
deleter, err = GetDeleter(clPeer, kind, ns)
if err != nil {...}
err = deleter(context.TODO(), resourceName, metav1.DeleteOptions{})
return err
}
The final piece of this autoscaler implementation is traffic shifting. When a policy is met and resources replicated, we will create a Calisti virtual service. The virtual service will split traffic between the microservice in the primary cluster and the replicated version in the peer cluster. We can define destination weights to tell the virtual service how much traffic to send to the two microservices. This can be accomplished by creating a gaphql mutation query. Here we have a sample mutation query that creates a virtual service with two service destinations and their weights. Note that these services are in separate clusters.
applyHTTPRoute(
input: {
selector: {
namespace: "smm-demo"
hosts: ["bookings"]
}
rule: {
route: [
{
destination: { host: "bookings", port: { number: 8080 } }
weight: 75
}
{
destination: { host: "bookings-repl", port: { number: 8080 } }
weight: 25
}
]
}
}
)
In our autoscaler controller, we can either use a golang graphql client or marshal this query into JSON and send an HTTP Post request to the Calisti graphql API. Note that when sending a request to the API outside of the graphql console, we will need to provide the authentication cookie generated by the smm dashboard
command. To acquire this cookie we can inspect any request from the Calisti UI to Calisti graphql API. Once the virtual service takes effect, we should see the replicated microservice appear in the service mesh as it handles application traffic.
Now that we have defined the core components, let’s run the completed autoscaler and apply a sample request-per-second event policy for the bookings
microservice. For demonstation purposes, we will choose rps values relative to Calisti's demo-app traffic generator, a burst-value
of 40 rps and a reconcile-value
of 20 rps. Prior to execution, we can confirm that the bookings
microservice is within our primary cluster.
replctl apply bookings-controller.yaml
To quickly test the controller we can force a cross-cluster replication event by generating additional load to the bookings
service via Calisti’s per-service HTTP load generator. Specifying the service, port, and method, we will generate 100 requests-per-second for a period of 30 seconds. Checking the controller logs, we should eventually see an event triggered as the service’s 5-second average for requests-per-second surpasses the bust-value
.
…
burst triggered for app=bookings
2022/03/27 14:21:55 deployments created and being evaluated
2022/03/27 14:21:55 Waiting for 2 pods to be running.
2022/03/27 14:21:56 pod status: Pending
2022/03/27 14:21:56 pod status: Pending
2022/03/27 14:21:56 pod status: Pending
2022/03/27 14:21:56 pod status: Pending
2022/03/27 14:21:58 pod status: Pending
2022/03/27 14:21:59 pod status: Pending
2022/03/27 14:22:01 all replicas up
2022/03/27 14:22:01 setting v-service route...
2022/03/27 14:22:01 route set.
We can verify that the virtual service and destination rules were added to the bookings services. We can see that there is a route policy splitting traffic between the bookings
service and the bookings-repl
service in the peer cluster. If we again check the topology of the mesh, we should see the new bookings
deployment in the peer cluster. The topology confirms that the new deployment is up and is routing traffic to downstream microservices in the peer cluster. Since we added a short burst of artificial HTTP load, the received requests-per-second will eventually fall back below our event policy’s reconcile-value
. This will trigger a reconcile or scale-back event, removing the traffic rule and deployment from the peer cluster.
This blog highlighted how a service mesh framework, namely Calisti, can be leveraged to dynamically scale or replicate services across a multi-cluster mesh. Using Calisti's graphql API we were able to seamlessly extract service level metrics to inform scaling events. Furthermore, utilizing Kubernetes' client-go and Calisti's cluster-registry-controller
, we were able to replicate and reconcile Kubernetes objects across clusters. This is intended to be a starting point for anyone interested in service meshes, cloud-native automation, and of course, Kubernetes. References Calisti docs k8s-client-go cluster-registry-api
Get emerging insights on innovative technology straight to your inbox.
Discover how AI assistants can revolutionize your business, from automating routine tasks and improving employee productivity to delivering personalized customer experiences and bridging the AI skills gap.
The Shift is Outshift’s exclusive newsletter.
The latest news and updates on generative AI, quantum computing, and other groundbreaking innovations shaping the future of technology.