#Kubecon 2019 Takeaway Part 1: We Live in a Multi-Cluster World

This blog was co-authored by Rupinder (Robbie) Gill and Haseeb Budhani.

Kubecon 2019 was an incredibly engaging event where we had the pleasure of engaging with Kubernetes enthusiasts from a variety of different verticals. Here’s a trend we noticed across the board: 

Developments teams that are early in their Kubernetes journey build out larger clusters and use Kubernetes namespaces to implement multi-tenancy. 

This seems like a logical choice, given that the namespace concept is designed to do exactly this. But teams that have been at it for some time and have experienced multiple Kubernetes version upgrades tend to spin up many, smaller clusters and choose to group fewer services into the same cluster.

Why the difference in opinion? Experience.

Engineers we spoke to shared the following technical reasons (in no particular order) that for choosing to go with the many, smaller clusters approach: 

  • Blast radius: Every time Kubernetes or a supporting component (e.g. service mesh or metrics collection packages) is upgraded, each service may need to be updated to work with the new version. Someone needs to make sure that all services in a given cluster are ready to work with new APIs if older APIs may have been deprecated in the new version. This type of upgrade can impact schedules across multiple teams. Best to let teams run smaller clusters where the impact can be broken up into smaller morsels.
  • Security requirements: A set of services may have unique hardening and data retention requirements, and it may make sense to deploy these services in a hardened cluster with stringent auditing, auth and logging policies. But doing this across the board may lead to unnecessary slowdowns and overhead.
  • Scaling requirements: If a few services have massive scaling requirements, it may be best to deploy them into dedicated clusters to protect against other services experiencing “pod pending” events due to busier services taking up an inordinate percentage of available resources.
  • Integration requirements: Some services may need a special admission controller, high-speed storage, and so on, while others may not. Such special integration requirements may also apply to service meshes and key-management services. Services that need such integrations may be best grouped together in clusters that are pre-configured with required packages or the right storage class, while other services can be deployed on clusters running vanilla Kubernetes.
  • Custom enhancements: Some services may lead the DevOps team to develop enhancements to Kubernetes. To protect against unforeseen side effects (bugs) from such enhancements, services that need these enhancements can be deployed on customized clusters, while other services can be deployed on clusters running vanilla Kubernetes.
  • Network load requirements: Services that are expected to drive high network load (by way of Kubernetes API calls) are best deployed on separate clusters to protect other services that may get starved otherwise.

Engineers also listed business reasons for running many, smaller clusters:

  • Compliance: If end users are distributed globally, its better to deploy clusters in target geographies to comply with data sovereignty or other regional regulations instead of implementing complex data management strategies centrally.
  • Hybrid or multi-cloud strategies: Many companies find themselves in need to manage a mix of environments for a variety of reasons, ranging from pre-existing assets (colo contracts and servers), M&A activity to demanding customers (“I don’t do business with vendors that run their apps on AWS.”).
  • Performance: If the end user population is spread across geographies and, if the application is designed appropriately, it may make sense to deploy the web and application tiers in multiple regions, i.e. across multiple clusters. And if you’re thinking of running a cross-region cluster, be ready to address etcd sync issues and cross-pod traffic management across the WAN.

The fact that companies such as VMware (see Tanzu Mission Control) and Microsoft (see Azure Arc) recently announced tech previews of products to help companies manage clusters across hybrid environments implies they also realize this trend.

At Rafay, we run our SaaS controller as a cloud-native service that leverages a variety of open source and home-grown components - we are living the gospel we preach. We operate many, small clusters. And so should you, if you aren’t doing it already.

In a follow-on blog, we’ll discuss how your peers are not only running multiple clusters, but are also leveraging more than one Kubernetes distributions to boot. In the meantime, if you’d like to see how Rafay can help you operate a fleet of clusters across any environment, please feel free to get in touch.

Posted by Haseeb Budhani