#Kubecon 2019 Takeaway Part 2: It's a Multi-Cluster, Multi-Distro World

This blog was co-authored by Rupinder (Robbie) Gill and Haseeb Budhani.

In a previous blog, we discussed reasons why companies are choosing to operate many small clusters.

As we’ve been following up with engineers we met at Kubecon 2019, we are finding that a high percentage of teams are running two or more Kubernetes distributions across their public cloud and on-premise footprints. The common belief seems to be that cloud providers will do their best to optimize their Kubernetes offerings for their infrastructure, so its best to use EKS in AWS, GKE in GCP, AKS in Azure, and OpenShift or PKS on premises.

At Rafay, we follow the same methodology: We use the resident managed Kubernetes service in our cloud provider of choice instead of spinning up, for example, v1.61.1 ourselves on virtual machines. And - of course - we run many, small clusters.

There are many hurdles to cross in keeping modern applications operational, and if the public cloud (or VMware & RedHat) is able to take away the pain of keeping the Kubernetes control plane up and running, why would anyone not leverage that? What’s more, the cost of running Kubernetes in public clouds is fast approaching zero. You pay for the worker node VMs and the master node costs are a rounding error or downright free.

Operating services across multiple clusters and multiple distros simplifies the development process. Ongoing complexity is reduced because developers no longer have to add complex logic in each service to address environmental characteristics such as service meshes, storage classes and admission controllers. With a growing cluster fleet that may span multiple clouds & data centers, and leveraging multiple Kubernetes distributions, SRE/Ops need tooling to manage their cluster fleet. SRE/Ops teams must now solve for:

  1. Complete visibility and governance across the company’s fleet of Kubernetes clusters, regardless of distribution. They must be able to quickly figure out where a given service is running at present, which app is experiencing restarts in a given cluster, which apps have been upgraded across the fleet in the last month, and much more. 
  2. On-demand Cluster bringup and customization in any cloud environment or on premises. In addition to simply bringing up EKS, GKE, etc., SRE/Ops are responsible for ensuring that a given cluster is customized appropriately and conforms to the business’ security/compliance requirements. 
  3. Continuous deployment capabilities across the entire cluster fleet without requiring multiple deployment tools. SRE/Ops need access to deployment tools that work across any cluster type, in any environment, and do not require scripting/coding investments that traditionally fall on DevOps teams.

We live in a multi-distro, multi-cluster world. But this is an issue that cloud providers don’t have an incentive to solve for you. When engaging with a vendor focused on helping your SRE/Ops teams address the complexity of managing multiple clusters running a variety of distros, be sure to ask for their plan for the above 3 requirements. Rafay delivers these capabilities today and can simplify ongoing operations for Kubernetes environments as a service. Feel free to get in touch to learn more.

Posted by Haseeb Budhani