VCF 9.1: Kubernetes & self-service, the platform takes over

In 9.0, VKS was an excellent vSphere-native Kubernetes runtime, but operating the platform was still platform-team work. Provisioning a clean namespace, attaching quotas, registry, ingress and identity to it — that was a sequence of manual steps application teams could not trigger on their own. The gap between “operated cluster” and “self-service platform” was bridged with homegrown scripts.

VCF 9.1 attacks that gap head-on. Four changes — linked clones for VKS, scale to 500 clusters per Supervisor, simplified Container-as-a-Service, native object storage in Tech Preview — shift the boundary: what the platform team did by hand becomes a consumable construct. This article decodes each one, with the 9.0 → 9.1 delta and the concrete architecture impact.

VKS 9.0Linked clones + scaleSelf-service 9.1

Series 'What's new in VCF 9.1' — 3/4

A mini-series on what’s new in VMware Cloud Foundation 9.1:

Infrastructure efficiency & TCO
Networking & scale
Kubernetes & self-service (this article)
Security & resilience

Visual credits

Visuals © Broadcom, sourced from the official VCF blog (links at the end of the article). Synthesis and analysis are my own.

VKS & VM Fast-Deploy: linked clones change the scale

Up to 9.0, every VKS node — control plane and worker alike — was a full clone of the VKr OVA: a complete copy of the source disk to the datastore before first boot. On a 6-node cluster, that’s six multi-GB copies to provision, sequentially constrained by datastore throughput. Cluster deployment time — and even more so rolling upgrade time (every node recreated) — was dominated by that copy.

9.1 introduces Fast-Deploy via linked clones for VKS and VM clusters. The mechanics: a read-only parent disk (from the VKr) is instantiated once, then each node creates a delta disk that stores only the blocks changed relative to the parent. The node boots in seconds instead of waiting for a full copy. Conceptually, it’s copy-on-write applied to Kubernetes node provisioning.

Aspect	VKS 9.0 (full clone)	VKS 9.1 (linked clone)
Node creation	Full source-disk copy	Delta disk over a shared parent
6-node cluster deploy	Dominated by N sequential copies	Near-parallel, drastically reduced time
Rolling upgrade	Each node = full clone recreated	Each node = delta over existing parent
Initial storage footprint	N × image size	1 × parent + N deltas (growing)
I/O coupling	Long copy, datastore saturated	Shared parent read, delta write

Architect impact. Linked clones aren’t just a speed optimization: they make upgrades operationally trivial. When recreating 200 nodes costs minutes instead of hours, the change window shrinks and the Kubernetes patch SLA becomes tenable without negotiating long maintenance windows. The trade-off to account for: the shared parent disk becomes a read hotspot, and delta disks grow over time. Storage sizing now reasons in parent + delta growth, not in N × a fixed image.

Scale: 500 Kubernetes clusters per Supervisor

In 9.0, the Supervisor topped out well below a hundred comfortably usable workload clusters — enough for one team, tight for an internal cloud provider serving dozens of tenants. 9.1 raises the bar to 500 Kubernetes clusters per Supervisor.

Dimension	VKS 9.0	VKS 9.1
Clusters / Supervisor	Low limit (tens)	Up to 500
Target model	Team / small multi-tenant	Large-scale internal cloud provider
Density per vSphere cluster	1 platform = 1 narrow Supervisor	1 Supervisor = full Kubernetes fleet

What it unlocks: a single Supervisor can now carry an entire organization’s Kubernetes fleet, instead of fragmenting into several Supervisors (and therefore several governance perimeters, several control planes to operate). For a multi-tenant model where each tenant gets one or more dedicated clusters, 500 clusters/Supervisor changes the map: fewer Supervisors, simpler governance, but a wider blast radius.

Architect impact. 500 clusters is not a free quota. The Supervisor remains a shared control plane: its etcd, its CAPI controllers, its namespace quotas now absorb a far higher reconciliation load. Five hundred clusters means five hundred CAPI reconciliation loops, hundreds of thousands of objects in the Supervisor API server, and an etcd whose latency becomes critical. The rule: treat the figure as a validated architectural ceiling, not a target. Size the Supervisor (HA control plane, etcd IOPS, control-plane observability) before approaching density, and keep headroom.

Simplified Container-as-a-Service

In 9.0, making a namespace available to an application team looked like this: create the vSphere Namespace, manually attach storage policies, configure CPU/RAM/storage quotas, wire in a registry, deploy or attach an ingress controller, then plumb identity (SSO mapping, RBAC). Six steps, six chances of divergence between tenants, and a platform team in the critical path of every onboarding.

9.1 turns this into a true self-service Container-as-a-Service. Provisioning a namespace becomes a consumable action that automatically inherits VCF constructs: registry, ingress, quotas and identity are derived from the tenant’s VCF perimeter rather than rewired by hand. The application team requests a namespace; it receives an already-governed namespace, with its registry, its ingress entry point and its identity aligned to enterprise SSO.

Self-service Container-as-a-Service in VCF 9.1

Source : Broadcom — VCF Blog

Namespace onboarding step	9.0 flow (manual)	9.1 flow (CaaS)
Namespace creation	Platform-team action	Consumable self-service
Storage policy	Manual attachment	Inherited from VCF construct
Registry	Separate wiring	Provisioned with the namespace
Ingress	Manual deploy / attach	Included in the construct
Quotas	Defined by hand per profile	Derived from tenant perimeter
Identity / RBAC	Manual SSO mapping	Inherited from VCF identity

Architect impact. The platform team steps out of the onboarding critical path without losing governance: guardrails (quotas, identity, policies) are encoded in the construct, not applied after the fact. The platform team’s role shifts from repetitive execution to defining tenant profiles. This is exactly the move we were trying to script in 9.0 — except here it’s native and consistent by default.

Native Object Storage (Tech Preview)

Block storage (PVC → VMDK via CNS) and file storage were already self-service in 9.0. The big absentee: S3-compatible object storage, which developers want for artifacts, application backups, datasets and cloud-native application state. In 9.0 you had to leave the platform (external bucket, self-managed MinIO) — and therefore break the governance model.

9.1 introduces native S3-compatible object storage in self-service, in Tech Preview. Developers provision buckets through the same deploy / scale / manage workflow as block and file; IT keeps the guardrails (quotas, policies, identity) without becoming a bottleneck. The promise: close the last missing storage category so the platform covers all three axes (block, file, object) under unified governance.

Tech Preview ≠ production

Native object storage ships as Tech Preview in VCF 9.1. That means: no production support, no stable API guarantee, functionality liable to change or be removed before GA. Use it to evaluate and prepare the target architecture — never to carry a critical application workload or data without a fallback plan to a supported object solution.

Storage category	VKS 9.0	VKS 9.1
Block (PVC)	Self-service via CNS	Self-service via CNS
File	Self-service	Self-service
Object (S3)	Off-platform	Native self-service (Tech Preview)

Architect impact. The point isn’t to use this in production now — it’s to scope the target today. If your teams consume external S3 or self-managed MinIO, the Tech Preview lets you prototype the migration and measure the governance delta, so you’re ready on GA day without rewriting access patterns.

From operated cluster to self-service platform

Taken in isolation, each of these four changes is an improvement. Taken together, they close the self-service gap end to end.

Linked clones make provisioning and upgrades fast enough to be self-service — without them, exposing cluster creation to teams would saturate the datastore. Scale to 500 clusters/Supervisor makes multi-tenant scale reachable without fragmenting governance — it’s the Kubernetes counterpart to the networking & scale work covered in the previous article. Simplified CaaS encodes governance into the construct rather than into runbooks. Object storage completes the coverage so developers no longer have to leave the platform.

The result: what we built by hand on top of VKS in 9.0 — a journey described step by step in Deploying your first VKS cluster on VCF 9 — becomes native platform behavior in 9.1. The platform team doesn’t disappear; it moves from repetitive execution to defining profiles, guardrails and quotas. This is the shift from the operated cluster to the self-service platform.

Pitfalls & points of attention

Linked clones: storage / I/O coupling to watch

The shared parent disk becomes a read hotspot when many nodes boot simultaneously, and delta disks grow with usage. Size the datastore on parent + delta growth, not on N × a fixed image. Watch parent read latency during mass rolling upgrades: the speed gain can shift into I/O pressure if the datastore is undersized.

500 clusters is not free etcd

The figure is a validated architectural ceiling, not a target to aim for. Five hundred clusters means as many CAPI reconciliation loops and a massive etcd load on the Supervisor. Size the Supervisor control plane (HA, etcd IOPS, API server memory) and instrument the control plane before approaching density. Keep headroom — a Supervisor at 95% of its reconciliation capacity is fragile.

CaaS: quota governance is defined upfront

Self-service inherits quotas from the tenant construct. If tenant profiles are poorly calibrated, self-service propagates quotas that are too wide (loss of governance) or too tight (blocked provisioning) at scale. Scoping the profiles becomes critical work: that's now where governance is decided, not in manual execution.

Object Storage: Tech Preview, not GA

No production support, unstabilized API, functionality liable to change or be removed before GA. Use only to evaluate and prepare the target. Don't carry critical workloads or data on it without a fallback to a supported object solution. Track the release notes for the GA date and any API changes.

Identity inheritance: SSO consistency to validate

The CaaS namespace inherits the identity of the tenant's VCF perimeter. If the enterprise SSO mapping (groups, roles) isn't clean upstream, inheritance propagates inconsistent permissions to every new namespace. Validate identity federation and group mapping before opening self-service — fixing it afterward across dozens of namespaces is expensive.

Blast radius: fewer Supervisors = wider perimeter

Consolidating the fleet onto a single Supervisor at 500 clusters simplifies governance but widens the blast radius: a control-plane incident potentially impacts the organization's entire Kubernetes fleet. Arbitrate consolidation vs isolation explicitly, and treat the Supervisor as a maximum-criticality target in the resilience plan.

Conclusion

Self-service-able provisioning

Linked clones make deploy and upgrade fast enough to expose to teams without saturating storage. Upgrades become operationally trivial.

Multi-tenant scale

500 clusters per Supervisor reach internal-cloud-provider scale without fragmenting governance — provided etcd and the control plane are sized for it.

Encoded governance

CaaS and object storage (Tech Preview) move governance into the construct. The platform team defines profiles, no longer executes them.

Next step. The fourth and final article in the series covers VCF 9.1 security & resilience — the defensive counterpart to this self-service opening: the more you expose the platform, the more security guardrails and resilience mechanisms become structural. Read it alongside the networking & scale article, which lays the network foundations of this same fleet.

For further reading.

Resources:

VCF 9.1 announcement — official Broadcom post
VCF 9.1 Release Notes — detailed changes and limits
VKS documentation — official vSphere Kubernetes Service reference
William Lam — community walkthroughs and deep-dives

VCF 9.1: Kubernetes & self-service, the platform takes over

Series 'What's new in VCF 9.1' — 3/4

Visual credits

VKS & VM Fast-Deploy: linked clones change the scale

Scale: 500 Kubernetes clusters per Supervisor

Simplified Container-as-a-Service

Native Object Storage (Tech Preview)

Tech Preview ≠ production

From operated cluster to self-service platform

Pitfalls & points of attention

Conclusion

Articles similaires

VCF 9.1: Kubernetes & self-service, the platform takes over

VCF 9.1 : Kubernetes & self-service, la plateforme prend le dessus

Deploying your first VKS cluster on VCF 9: An architect's guide

Follow along