In 9.0, VKS was an excellent vSphere-native Kubernetes runtime, but operating the platform was still platform-team work. Provisioning a clean namespace, attaching quotas, registry, ingress and identity to it — that was a sequence of manual steps application teams could not trigger on their own. The gap between “operated cluster” and “self-service platform” was bridged with homegrown scripts.
VCF 9.1 attacks that gap head-on. Four changes — linked clones for VKS, scale to 500 clusters per Supervisor, simplified Container-as-a-Service, native object storage in Tech Preview — shift the boundary: what the platform team did by hand becomes a consumable construct. This article decodes each one, with the 9.0 → 9.1 delta and the concrete architecture impact.
Series 'What's new in VCF 9.1' — 3/4
A mini-series on what’s new in VMware Cloud Foundation 9.1:
- Infrastructure efficiency & TCO
- Networking & scale
- Kubernetes & self-service (this article)
- Security & resilience
Visual credits
Visuals © Broadcom, sourced from the official VCF blog (links at the end of the article). Synthesis and analysis are my own.
VKS & VM Fast-Deploy: linked clones change the scale
Up to 9.0, every VKS node — control plane and worker alike — was a full clone of the VKr OVA: a complete copy of the source disk to the datastore before first boot. On a 6-node cluster, that’s six multi-GB copies to provision, sequentially constrained by datastore throughput. Cluster deployment time — and even more so rolling upgrade time (every node recreated) — was dominated by that copy.
9.1 introduces Fast-Deploy via linked clones for VKS and VM clusters. The mechanics: a read-only parent disk (from the VKr) is instantiated once, then each node creates a delta disk that stores only the blocks changed relative to the parent. The node boots in seconds instead of waiting for a full copy. Conceptually, it’s copy-on-write applied to Kubernetes node provisioning.
| Aspect | VKS 9.0 (full clone) | VKS 9.1 (linked clone) |
|---|---|---|
| Node creation | Full source-disk copy | Delta disk over a shared parent |
| 6-node cluster deploy | Dominated by N sequential copies | Near-parallel, drastically reduced time |
| Rolling upgrade | Each node = full clone recreated | Each node = delta over existing parent |
| Initial storage footprint | N × image size | 1 × parent + N deltas (growing) |
| I/O coupling | Long copy, datastore saturated | Shared parent read, delta write |
Architect impact. Linked clones aren’t just a speed optimization: they make upgrades operationally trivial. When recreating 200 nodes costs minutes instead of hours, the change window shrinks and the Kubernetes patch SLA becomes tenable without negotiating long maintenance windows. The trade-off to account for: the shared parent disk becomes a read hotspot, and delta disks grow over time. Storage sizing now reasons in parent + delta growth, not in N × a fixed image.
Scale: 500 Kubernetes clusters per Supervisor
In 9.0, the Supervisor topped out well below a hundred comfortably usable workload clusters — enough for one team, tight for an internal cloud provider serving dozens of tenants. 9.1 raises the bar to 500 Kubernetes clusters per Supervisor.
| Dimension | VKS 9.0 | VKS 9.1 |
|---|---|---|
| Clusters / Supervisor | Low limit (tens) | Up to 500 |
| Target model | Team / small multi-tenant | Large-scale internal cloud provider |
| Density per vSphere cluster | 1 platform = 1 narrow Supervisor | 1 Supervisor = full Kubernetes fleet |
What it unlocks: a single Supervisor can now carry an entire organization’s Kubernetes fleet, instead of fragmenting into several Supervisors (and therefore several governance perimeters, several control planes to operate). For a multi-tenant model where each tenant gets one or more dedicated clusters, 500 clusters/Supervisor changes the map: fewer Supervisors, simpler governance, but a wider blast radius.
Architect impact. 500 clusters is not a free quota. The Supervisor remains a shared control plane: its etcd, its CAPI controllers, its namespace quotas now absorb a far higher reconciliation load. Five hundred clusters means five hundred CAPI reconciliation loops, hundreds of thousands of objects in the Supervisor API server, and an etcd whose latency becomes critical. The rule: treat the figure as a validated architectural ceiling, not a target. Size the Supervisor (HA control plane, etcd IOPS, control-plane observability) before approaching density, and keep headroom.
Simplified Container-as-a-Service
In 9.0, making a namespace available to an application team looked like this: create the vSphere Namespace, manually attach storage policies, configure CPU/RAM/storage quotas, wire in a registry, deploy or attach an ingress controller, then plumb identity (SSO mapping, RBAC). Six steps, six chances of divergence between tenants, and a platform team in the critical path of every onboarding.
9.1 turns this into a true self-service Container-as-a-Service. Provisioning a namespace becomes a consumable action that automatically inherits VCF constructs: registry, ingress, quotas and identity are derived from the tenant’s VCF perimeter rather than rewired by hand. The application team requests a namespace; it receives an already-governed namespace, with its registry, its ingress entry point and its identity aligned to enterprise SSO.
Source : Broadcom — VCF Blog
| Namespace onboarding step | 9.0 flow (manual) | 9.1 flow (CaaS) |
|---|---|---|
| Namespace creation | Platform-team action | Consumable self-service |
| Storage policy | Manual attachment | Inherited from VCF construct |
| Registry | Separate wiring | Provisioned with the namespace |
| Ingress | Manual deploy / attach | Included in the construct |
| Quotas | Defined by hand per profile | Derived from tenant perimeter |
| Identity / RBAC | Manual SSO mapping | Inherited from VCF identity |
Architect impact. The platform team steps out of the onboarding critical path without losing governance: guardrails (quotas, identity, policies) are encoded in the construct, not applied after the fact. The platform team’s role shifts from repetitive execution to defining tenant profiles. This is exactly the move we were trying to script in 9.0 — except here it’s native and consistent by default.
Native Object Storage (Tech Preview)
Block storage (PVC → VMDK via CNS) and file storage were already self-service in 9.0. The big absentee: S3-compatible object storage, which developers want for artifacts, application backups, datasets and cloud-native application state. In 9.0 you had to leave the platform (external bucket, self-managed MinIO) — and therefore break the governance model.
9.1 introduces native S3-compatible object storage in self-service, in Tech Preview. Developers provision buckets through the same deploy / scale / manage workflow as block and file; IT keeps the guardrails (quotas, policies, identity) without becoming a bottleneck. The promise: close the last missing storage category so the platform covers all three axes (block, file, object) under unified governance.
Tech Preview ≠ production
Native object storage ships as Tech Preview in VCF 9.1. That means: no production support, no stable API guarantee, functionality liable to change or be removed before GA. Use it to evaluate and prepare the target architecture — never to carry a critical application workload or data without a fallback plan to a supported object solution.
| Storage category | VKS 9.0 | VKS 9.1 |
|---|---|---|
| Block (PVC) | Self-service via CNS | Self-service via CNS |
| File | Self-service | Self-service |
| Object (S3) | Off-platform | Native self-service (Tech Preview) |
Architect impact. The point isn’t to use this in production now — it’s to scope the target today. If your teams consume external S3 or self-managed MinIO, the Tech Preview lets you prototype the migration and measure the governance delta, so you’re ready on GA day without rewriting access patterns.
From operated cluster to self-service platform
Taken in isolation, each of these four changes is an improvement. Taken together, they close the self-service gap end to end.
Linked clones make provisioning and upgrades fast enough to be self-service — without them, exposing cluster creation to teams would saturate the datastore. Scale to 500 clusters/Supervisor makes multi-tenant scale reachable without fragmenting governance — it’s the Kubernetes counterpart to the networking & scale work covered in the previous article. Simplified CaaS encodes governance into the construct rather than into runbooks. Object storage completes the coverage so developers no longer have to leave the platform.
The result: what we built by hand on top of VKS in 9.0 — a journey described step by step in Deploying your first VKS cluster on VCF 9 — becomes native platform behavior in 9.1. The platform team doesn’t disappear; it moves from repetitive execution to defining profiles, guardrails and quotas. This is the shift from the operated cluster to the self-service platform.
Pitfalls & points of attention
Linked clones: storage / I/O coupling to watch
500 clusters is not free etcd
CaaS: quota governance is defined upfront
Object Storage: Tech Preview, not GA
Identity inheritance: SSO consistency to validate
Blast radius: fewer Supervisors = wider perimeter
Conclusion
Self-service-able provisioning
Linked clones make deploy and upgrade fast enough to expose to teams without saturating storage. Upgrades become operationally trivial.
Multi-tenant scale
500 clusters per Supervisor reach internal-cloud-provider scale without fragmenting governance — provided etcd and the control plane are sized for it.
Encoded governance
CaaS and object storage (Tech Preview) move governance into the construct. The platform team defines profiles, no longer executes them.
Next step. The fourth and final article in the series covers VCF 9.1 security & resilience — the defensive counterpart to this self-service opening: the more you expose the platform, the more security guardrails and resilience mechanisms become structural. Read it alongside the networking & scale article, which lays the network foundations of this same fleet.
For further reading.
Resources:
- VCF 9.1 announcement — official Broadcom post
- VCF 9.1 Release Notes — detailed changes and limits
- VKS documentation — official vSphere Kubernetes Service reference
- William Lam — community walkthroughs and deep-dives