Kubernetes News
-
Kubernetes v1.36: Moving Volume Group Snapshots to GA
Volume group snapshots were introduced as an Alpha feature with the Kubernetes v1.27 release, moved to Beta in v1.32, and to a second Beta in v1.34. We are excited to announce that in the Kubernetes v1.36 release, support for volume group snapshots has reached General Availability (GA).
The support for volume group snapshots relies on a set of extension APIs for group snapshots. These APIs allow users to take crash-consistent snapshots for a set of volumes. Behind the scenes, Kubernetes uses a label selector to group multiple
PersistentVolumeClaimobjects for snapshotting. A key aim is to allow you to restore that set of snapshots to new volumes and recover your workload based on a crash-consistent recovery point.This feature is only supported for CSI volume drivers.
An overview of volume group snapshots
Some storage systems provide the ability to create a crash-consistent snapshot of multiple volumes. A group snapshot represents copies made from multiple volumes that are taken at the same point-in-time. A group snapshot can be used either to rehydrate new volumes (pre-populated with the snapshot data) or to restore existing volumes to a previous state (represented by the snapshots).
Why add volume group snapshots to Kubernetes?
The Kubernetes volume plugin system already provides a powerful abstraction that automates the provisioning, attaching, mounting, resizing, and snapshotting of block and file storage. Underpinning all these features is the Kubernetes goal of workload portability.
There was already a VolumeSnapshot API that provides the ability to take a snapshot of a persistent volume to protect against data loss or data corruption. However, some storage systems support consistent group snapshots that allow a snapshot to be taken from multiple volumes at the same point-in-time to achieve write order consistency. This is extremely useful for applications that contain multiple volumes. For example, an application may have data stored in one volume and logs stored in another. If snapshots for these volumes are taken at different times, the application will not be consistent and will not function properly if restored from those snapshots.
While you can quiesce the application first and take individual snapshots sequentially, this process can be time-consuming or sometimes impossible. Consistent group support provides crash consistency across all volumes in the group without the need for application quiescence.
Kubernetes APIs for volume group snapshots
Kubernetes' support for volume group snapshots relies on three API kinds that are used for managing snapshots:
- VolumeGroupSnapshot
- Created by a Kubernetes user (or automation) to request creation of a volume group snapshot for multiple persistent volume claims.
- VolumeGroupSnapshotContent
- Created by the snapshot controller for a dynamically created VolumeGroupSnapshot. It contains information about the provisioned cluster resource (a group snapshot). The object binds to the VolumeGroupSnapshot for which it was created with a one-to-one mapping.
- VolumeGroupSnapshotClass
- Created by cluster administrators to describe how volume group snapshots should be created, including the driver information, the deletion policy, etc.
These three API kinds are defined as CustomResourceDefinitions (CRDs). For the GA release, the API version has been promoted to
v1.What's new in GA?
- The API version for
VolumeGroupSnapshot,VolumeGroupSnapshotContent, andVolumeGroupSnapshotClassis promoted togroupsnapshot.storage.k8s.io/v1. - Enhanced stability and bug fixes based on feedback from the beta releases, including the improvements introduced in v1beta2 for accurate
restoreSizereporting.
How do I use Kubernetes volume group snapshots
Creating a new group snapshot with Kubernetes
Once a
VolumeGroupSnapshotClassobject is defined and you have volumes you want to snapshot together, you may request a new group snapshot by creating aVolumeGroupSnapshotobject.Label the PVCs you wish to group:
% kubectl label pvc pvc-0 group=myGroup persistentvolumeclaim/pvc-0 labeled % kubectl label pvc pvc-1 group=myGroup persistentvolumeclaim/pvc-1 labeledFor dynamic provisioning, a selector must be set so that the snapshot controller can find PVCs with the matching labels to be snapshotted together.
apiVersion:groupsnapshot.storage.k8s.io/v1 kind:VolumeGroupSnapshot metadata: name:snapshot-daily-20260422 namespace:demo-namespace spec: volumeGroupSnapshotClassName:csi-groupSnapclass source: selector: matchLabels: group:myGroupThe
VolumeGroupSnapshotClassis required for dynamic provisioning:apiVersion:groupsnapshot.storage.k8s.io/v1 kind:VolumeGroupSnapshotClass metadata: name:csi-groupSnapclass driver:example.csi.k8s.io deletionPolicy:DeleteHow to use group snapshot for restore
At restore time, request a new
PersistentVolumeClaimto be created from aVolumeSnapshotobject that is part of aVolumeGroupSnapshot. Repeat this for all volumes that are part of the group snapshot.apiVersion:v1 kind:PersistentVolumeClaim metadata: name:examplepvc-restored-2026-04-22 namespace:demo-namespace spec: storageClassName:example-sc dataSource: name:snapshot-0962a745b2bf930bb385b7b50c9b08af471f1a16780726de19429dd9c94eaca0 kind:VolumeSnapshot apiGroup:snapshot.storage.k8s.io accessModes: - ReadWriteOncePod resources: requests: storage:100MiAs a storage vendor, how do I add support for group snapshots?
To implement the volume group snapshot feature, a CSI driver must:
- Implement a new group controller service.
- Implement group controller RPCs:
CreateVolumeGroupSnapshot,DeleteVolumeGroupSnapshot, andGetVolumeGroupSnapshot. - Add group controller capability
CREATE_DELETE_GET_VOLUME_GROUP_SNAPSHOT.
See the CSI spec and the Kubernetes-CSI Driver Developer Guide for more details.
How can I learn more?
- The design spec for the volume group snapshot feature.
- The code repository for volume group snapshot APIs and controller.
- CSI documentation on the group snapshot feature.
How do I get involved?
This project, like all of Kubernetes, is the result of hard work by many contributors from diverse backgrounds working together. On behalf of SIG Storage, I would like to offer a huge thank you to all the contributors who stepped up over the years to help the project reach GA:
- Ben Swartzlander (bswartz)
- Cici Huang (cici37)
- Darshan Murthy (darshansreenivas)
- Hemant Kumar (gnufied)
- James Defelice (jdef)
- Jan Šafránek (jsafrane)
- Madhu Rajanna (Madhu-1)
- Manish M Yathnalli (manishym)
- Michelle Au (msau42)
- Niels de Vos (nixpanic)
- Leonardo Cecchi (leonardoce)
- Rakshith R (Rakshith-R)
- Raunak Shah (RaunakShah)
- Saad Ali (saad-ali)
- Wei Duan (duanwei33)
- Xing Yang (xing-yang)
- Yati Padia (yati1998)
For those interested in getting involved with the design and development of CSI or any part of the Kubernetes Storage system, join the Kubernetes Storage Special Interest Group (SIG). We always welcome new contributors.
We also hold regular Data Protection Working Group meetings. New attendees are welcome to join our discussions.
-
Kubernetes v1.36: More Drivers, New Features, and the Next Era of DRA
Dynamic Resource Allocation (DRA) has fundamentally changed how platform administrators handle hardware accelerators and specialized resources in Kubernetes. In the v1.36 release, DRA continues to mature, bringing a wave of feature graduations, critical usability improvements, and new capabilities that extend the flexibility of DRA to native resources like memory and CPU, and support for ResourceClaims in PodGroups.
Driver availability continues to expand. Beyond specialized compute accelerators, the ecosystem includes support for networking and other hardware types, reflecting a move toward a more robust, hardware-agnostic infrastructure.
Whether you are managing massive fleets of GPUs, need better handling of failures, or simply looking for better ways to define resource fallback options, the upgrades to DRA in 1.36 have something for you. Let's dive into the new features and graduations!
Feature graduations
The community has been hard at work stabilizing core DRA concepts. In Kubernetes 1.36, several highly anticipated features have graduated to Beta and Stable.
Prioritized list (stable)
Hardware heterogeneity is a reality in most clusters. With the Prioritized list feature, you can confidently define fallback preferences when requesting devices. Instead of hardcoding a request for a specific device model, you can specify an ordered list of preferences (e.g., "Give me an H100, but if none are available, fall back to an A100"). The scheduler will evaluate these requests in order, drastically improving scheduling flexibility and cluster utilization.
Extended resource support (beta)
As DRA becomes the standard for resource allocation, bridging the gap with legacy systems is crucial. The DRA Extended resource feature allows users to request resources via traditional extended resources on a Pod. This allows for a gradual transition to DRA, meaning cluster operators can migrate clusters to DRA but let application developers adopt the ResourceClaim API on their own schedule.
Partitionable devices (beta)
Hardware accelerators are powerful, and sometimes a single workload doesn't need an entire device. The Partitionable devices feature, provides native DRA support for dynamically carving physical hardware into smaller, logical instances (such as Multi-Instance GPUs) based on workload demands. This allows administrators to safely and efficiently share expensive accelerators across multiple Pods.
Device taints (beta)
Just as you can taint a Kubernetes Node, you can apply taints directly to specific DRA devices. Device taints and tolerations empower cluster administrators to manage hardware more effectively. You can taint faulty devices to prevent them from being allocated to standard claims, or reserve specific hardware for dedicated teams, specialized workloads, and experiments. Ultimately, only Pods with matching tolerations are permitted to claim these tainted devices.
Device binding conditions (beta)
To improve scheduling reliability, the Kubernetes scheduler can use the Binding conditions feature to delay committing a Pod to a Node until its required external resources—such as attachable devices or FPGAs—are fully prepared. By explicitly modeling resource readiness, this prevents premature assignments that can lead to Pod failures, ensuring a much more robust and predictable deployment process.
Resource health status (beta)
Knowing when a device has failed or become unhealthy is critical for workloads running on specialized hardware. With Resource health status, Kubernetes expose device health information directly in the Pod status, giving users and controllers crucial visibility to quickly identify and react to hardware failures. The feature includes support for human-readable health status messages, making it significantly easier to diagnose issues without the need to dive into complex driver logs.
New Features
Beyond stabilizing existing capabilities, v1.36 introduces foundational new features that expand what DRA can do. These are alpha features, so they are behind feature gates that are disabled by default.
ResourceClaim support for workloads
To optimize large-scale AI/ML workloads that rely on strict topological scheduling, the ResourceClaim support for workloads feature enables Kubernetes to seamlessly manage shared resources across massive sets of Pods. By associating ResourceClaims or ResourceClaimTemplates with PodGroups, this feature eliminates previous scaling bottlenecks, such as the limit on the number of pods that can share a claim, and removes the burden of manual claim management from specialized orchestrators.
Node allocatable resources
Why should DRA only be for external accelerators? In v1.36, we are introducing the first iteration of using the DRA APIs to manage node allocatable infrastructure resources (like CPU and memory). By bringing CPU and memory allocation under the DRA umbrella with the DRA Node allocatable resources feature, users can leverage DRA's advanced placement, NUMA-awareness, and prioritization semantics for standard compute resources, paving the way for incredibly fine-grained performance tuning.
DRA resource availability visibility
One of the most requested features from cluster administrators has been better visibility into hardware capacity. The new Resource pool status feature allows you to query the availability of devices in DRA resource pools. By creating a
ResourcePoolStatusRequestobject, you get a point-in-time snapshot of device counts — total, allocated, available, and unavailable — for each pool managed by a given driver. This enables better integration with dashboards and capacity planning tools.List types for attributes
ResourceClaim constraint evaluation has changed to work better with scalar and list values:
matchAttributenow checks for a non-empty intersection, anddistinctAttributechecks for pairwise disjoint values.An
includes()function in CEL has also been introduced, that lets device selectors keep working more easily when an attribute changes between scalar and list representations. (Theincludes()function is only available in DRA contexts for expression evaluation).Deterministic device selection
The Kubernetes scheduler has been updated to evaluate devices using lexicographical ordering based on resource pool and ResourceSlice names. This change empowers drivers to proactively influence the scheduling process, leading to improved throughput and more optimal scheduling decisions. The ResourceSlice controller toolkit automatically generates names that reflect the exact device ordering specified by the driver author.
Discoverable device metadata in containers
Workloads running on nodes with DRA devices often need to discover details about their allocated devices, such as PCI bus addresses or network interface configuration, without querying the Kubernetes API. With Device metadata, Kubernetes defines a standard protocol for how DRA drivers expose device attributes to containers as versioned JSON files at well-known paths. Drivers built with the DRA kubelet plugin library get this behavior transparently; they just provide the metadata and the library handles file layout, CDI bind-mounts, versioning, and lifecycle. This gives applications a consistent, driver-independent way to discover and consume device metadata, eliminating the need for custom controllers or looking up ResourceSlice objects to get metadata via attributes.
What’s next?
This release introduced a wealth of new Dynamic Resource Allocation (DRA) features, and the momentum is only building. As we look ahead, our roadmap focuses on maturing existing features toward beta and stable releases while hardening DRA’s performance, scalability, and reliability. A key priority over the coming cycles will be deep integration with workload aware and topology aware scheduling.
A big goal for us is to migrate users from Device Plugin to DRA, and we want you involved. Whether you are currently maintaining a driver or are just beginning to explore the possibilities, your input is vital. Partner with us to shape the next generation of resource management. Reach out today to collaborate on development, share feedback, or start building your first DRA driver.
Getting involved
A good starting point is joining the WG Device Management Slack channel and meetings, which happen at Americas/EMEA and EMEA/APAC friendly time slots.
Not all enhancement ideas are tracked as issues yet, so come talk to us if you want to help or have some ideas yourself! We have work to do at all levels, from difficult core changes to usability enhancements in kubectl, which could be picked up by newcomers.
-
Kubernetes v1.36: Server-Side Sharded List and Watch
As Kubernetes clusters grow to tens of thousands of nodes, controllers that watch high-cardinality resources like Pods face a scaling wall. Every replica of a horizontally scaled controller receives the full stream of events from the API server, paying the CPU, memory, and network cost to deserialize everything, only to discard the objects it is not responsible for. Scaling out the controller does not reduce per-replica cost; it multiplies it.
Kubernetes v1.36 introduces server-side sharded list and watch as an alpha feature (KEP-5866). With this feature enabled, the API server filters events at the source so that each controller replica receives only the slice of the resource collection it owns.
The problem with client-side sharding
Some controllers, such as kube-state-metrics, already support horizontal sharding. Each replica is assigned a portion of the keyspace and discards objects that do not belong to it. While this works functionally, it does not reduce the volume of data flowing from the API server:
- N replicas x full event stream: every replica deserializes and processes every event, then throws away what it does not need.
- Network bandwidth scales with replicas, not with shard size.
- CPU spent on deserialization is wasted for the discarded fraction.
Server-side sharded list and watch solves this by moving the filtering upstream into the API server. Each replica tells the API server which hash range it owns, and the API server only sends matching events.
How it works
The feature adds a
shardSelectorfield toListOptions. Clients specify a hash range using theshardRange()function:shardRange(object.metadata.uid, '0x0000000000000000', '0x8000000000000000')The API server computes a deterministic 64-bit FNV-1a hash of the specified field and returns only objects whose hash falls within the range
[start, end). This applies to both list responses and watch event streams. The hash function produces the same result across all API server instances, so the feature is safe to use with multiple API server replicas.Currently supported field paths are
object.metadata.uidandobject.metadata.namespace.Using sharded watches in controllers
Controllers typically use informers to list and watch resources. To shard the workload, each replica injects the
shardSelectorinto theListOptionsused by its informers viaWithTweakListOptions:import ( metav1 "k8s.io/apimachinery/pkg/apis/meta/v1" "k8s.io/client-go/informers" ) shardSelector := "shardRange(object.metadata.uid, '0x0000000000000000', '0x8000000000000000')" factory := informers.NewSharedInformerFactoryWithOptions(client, resyncPeriod, informers.WithTweakListOptions(func(opts *metav1.ListOptions) { opts.ShardSelector = shardSelector }), )For a 2-replica deployment, the selectors split the hash space in half:
// Replica 0: lower half of the hash space "shardRange(object.metadata.uid, '0x0000000000000000', '0x8000000000000000')" // Replica 1: upper half of the hash space "shardRange(object.metadata.uid, '0x8000000000000000', '0x10000000000000000')"A single replica can also cover non-contiguous ranges using
||:"shardRange(object.metadata.uid, '0x0000000000000000', '0x4000000000000000') || " + "shardRange(object.metadata.uid, '0x8000000000000000', '0xc000000000000000')"Verifying server support
When the API server honors a shard selector, the list response includes a
shardInfofield in the response metadata that echoes back the applied selector:{ "kind": "PodList", "apiVersion": "v1", "metadata": { "resourceVersion": "10245", "shardInfo": { "selector": "shardRange(object.metadata.uid, '0x0000000000000000', '0x8000000000000000')" } }, "items": [...] }If
shardInfois absent, the server did not honor the shard selector and the client received the complete, unfiltered collection. In this case, the client should be prepared to handle the full result set, for example by applying client-side filtering to discard objects outside its assigned shard range.Getting involved
This feature is in alpha and requires enabling the
ShardedListAndWatchfeature gate on the API server. We are looking for feedback from controller authors and operators running large clusters.If you have questions or feedback, join the
#sig-api-machinerychannel on Kubernetes Slack. -
Kubernetes v1.36: Declarative Validation Graduates to GA
In Kubernetes v1.36, Declarative Validation for Kubernetes native types has reached General Availability (GA).
For users, this means more reliable, predictable, and better-documented APIs. By moving to a declarative model, the project also unlocks the future ability to publish validation rules via OpenAPI and integrate with ecosystem tools like Kubebuilder. For contributors and ecosystem developers, this replaces thousands of lines of handwritten validation code with a unified, maintainable framework.
This post covers why this migration was necessary, how the declarative validation framework works, and what new capabilities come with this GA release.
The Motivation: Escaping the "Handwritten" Technical Debt
For years, the validation of Kubernetes native APIs relied almost entirely on handwritten Go code. If a field needed to be bounded by a minimum value, or if two fields needed to be mutually exclusive, developers had to write explicit Go functions to enforce those constraints.
As the Kubernetes API surface expanded, this approach led to several systemic issues:
- Technical Debt: The project accumulated roughly 18,000 lines of boilerplate validation code. This code was difficult to maintain, error-prone, and required intense scrutiny during code reviews.
- Inconsistency: Without a centralized framework, validation rules were sometimes applied inconsistently across different resources.
- Opaque APIs: Handwritten validation logic was difficult to discover or analyze programmatically. This meant clients and tooling couldn't predictably know validation rules without consulting the source code or encountering errors at runtime.
The solution proposed by SIG API Machinery was Declarative Validation: using Interface Definition Language (IDL) tags (specifically
+k8s:marker tags) directly withintypes.gofiles to define validation rules.Enter
validation-genAt the core of the declarative validation feature is a new code generator called
validation-gen. Just as Kubernetes uses generators for deep copies, conversions, and defaulting,validation-genparses+k8s:tags and automatically generates the corresponding Go validation functions.These generated functions are then registered seamlessly with the API scheme. The generator is designed as an extensible framework, allowing developers to plug in new "Validators" by describing the tags they parse and the Go logic they should produce.
A Comprehensive Suite of +k8s: Tags
The declarative validation framework introduces a comprehensive suite of marker tags that provide rich validation capabilities highly optimized for Go types. For a full list of supported tags, check out the official documentation. Here is a catalog of some of the most common tags you will now see in the Kubernetes codebase:
- Presence:
+k8s:optional,+k8s:required - Basic Constraints:
+k8s:minimum=0,+k8s:maximum=100,+k8s:maxLength=16,+k8s:format=k8s-short-name - Collections:
+k8s:listType=map,+k8s:listMapKey=type - Unions:
+k8s:unionMember,+k8s:unionDiscriminator - Immutability:
+k8s:immutable,+k8s:update=[NoSet, NoModify, NoClear]
Example Usage:
type ReplicationControllerSpec struct { // +k8s:optional // +k8s:minimum=0 Replicas *int32 `json:"replicas,omitempty"` }By placing these tags directly above the field definitions, the constraints are self-documenting and immediately visible to anyone reading the type definitions.
Advanced Capabilities: "Ambient Ratcheting"
One of the most substantial outcomes of this work is that validation ratcheting is now a standard, ambient part of the API. In the past, if we needed to tighten validation, we had to first add handwritten ratcheting code, wait a release, and then tighten the validation to avoid breaking existing objects.
With declarative validation, this safety mechanism is built-in. If a user updates an existing object, the validation framework compares the incoming object with the
oldObject. If a specific field's value is semantically equivalent to its prior state (i.e., the user didn't change it), the new validation rule is bypassed. This "ambient ratcheting" means we can loosen or tighten validation immediately and in the least disruptive way possible.Scaling API Reviews with
kube-api-linterReaching GA required absolute confidence in the generated code, but our vision extends beyond just validation. Declarative validation is a key part of a comprehensive approach to making API review easier, more consistent, and highly scalable.
By moving validation rules out of opaque Go functions and into structured markers, we are empowering tools like
kube-api-linter. This linter can now statically analyze API types and enforce API conventions automatically, significantly reducing the manual burden on SIG API Machinery reviewers and providing immediate feedback to contributors.What's next?
With the release of Kubernetes v1.36, Declarative Validation graduates to General Availability (GA). As a stable feature, the associated
DeclarativeValidationfeature gate is now enabled by default. It has become the primary mechanism for adding new validation rules to Kubernetes native types.Looking forward, the project is committed to adopting declarative validation even more extensively. This includes migrating the remaining legacy handwritten validation code for established APIs and requiring its use for all new APIs and new fields. This ongoing transition will continue to shrink the codebase's complexity while enhancing the consistency and reliability of the entire Kubernetes API surface.
Beyond the core migration, declarative validation also unlocks an exciting future for the broader ecosystem. Because validation rules are now defined as structured markers rather than opaque Go code, they can be parsed and reflected in the OpenAPI schemas published by the Kubernetes API server. This paves the way for tools like
kubectl, client libraries, and IDEs to perform rich client-side validation before a request is ever sent to the cluster. The same declarative framework can also be consumed by ecosystem tools like Kubebuilder, enabling a more consistent developer experience for authors of Custom Resource Definitions (CRDs).Getting involved
The migration to declarative validation is an ongoing effort. While the framework itself is GA, there is still work to be done migrating older APIs to the new declarative format.
If you are interested in contributing to the core of Kubernetes API Machinery, this is a fantastic place to start. Check out the
validation-gendocumentation, look for issues tagged withsig/api-machinery, and join the conversation in the #sig-api-machinery and #sig-api-machinery-dev-tools channels on Kubernetes Slack (for an invitation, visit https://slack.k8s.io/). You can also attend the SIG API Machinery meetings to get involved directly.Acknowledgments
A huge thank you to everyone who helped bring this feature to GA:
- Tim Hockin
- Joe Betz
- Aaron Prindle
- Lalit Chauhan
- David Eads
- Darshan Murthy
- Jordan Liggitt
- Patrick Ohly
- Maciej Szulik
- Wojciech Tyczynski
- Joel Speed
- Bryce Palmer
And the many others across the Kubernetes community who contributed along the way.
Welcome to the declarative future of Kubernetes validation!
-
Kubernetes v1.36: Admission Policies That Can't Be Deleted
If you've ever tried to enforce a security policy across a fleet of Kubernetes clusters, you've probably run into a frustrating chicken-and-egg problem. Your admission policies are API objects, which means they don't exist until someone creates them, and they can be deleted by anyone with the right permissions. There's always a window during cluster bootstrap where your policies aren't active yet, and there's no way to prevent a privileged user from removing them.
Kubernetes v1.36 introduces an alpha feature that addresses this: manifest-based admission control. It lets you define admission webhooks and CEL-based policies as files on disk, loaded by the API server at startup, before it serves any requests.
The gap we're closing
Most Kubernetes policy enforcement today works through the API. You create a ValidatingAdmissionPolicy or a webhook configuration as an API object, and the admission controller picks it up. This works well in steady state, but it has some fundamental limitations.
During cluster bootstrap, there's a gap between when the API server starts serving requests and when your policies are created and active. If you're restoring from a backup or recovering from an etcd failure, that gap can be significant.
There's also a self-protection problem. Admission webhooks and policies can't intercept operations on their own configuration resources. Kubernetes skips invoking webhooks on types like ValidatingWebhookConfiguration to avoid circular dependencies. That means a sufficiently privileged user can delete your critical admission policies, and there's nothing in the admission chain to stop them.
We - Kubernetes SIG API Machinery - wanted a way to say "these policies are always on, full stop."
How it works
You add a
staticManifestsDirfield to theAdmissionConfigurationfile that you already pass to the API server via--admission-control-config-file. Point it at a directory, drop your policy YAML files in there, and the API server loads them before it starts serving.apiVersion:apiserver.config.k8s.io/v1 kind:AdmissionConfiguration plugins: - name:ValidatingAdmissionPolicy configuration: apiVersion:apiserver.config.k8s.io/v1 kind:ValidatingAdmissionPolicyConfiguration staticManifestsDir:"/etc/kubernetes/admission/validating-policies/"The manifest files are standard Kubernetes resource definitions. The only requirement is that all the objects that these manifests define must have names ending in
.static.k8s.io. This reserved suffix prevents collisions with API-based configurations and makes it easy to tell where an admission decision came from when you're looking at metrics or audit logs.Here's a complete example that denies privileged containers outside kube-system:
apiVersion:admissionregistration.k8s.io/v1 kind:ValidatingAdmissionPolicy metadata: name:"deny-privileged.static.k8s.io" annotations: kubernetes.io/description:"Deny launching privileged pods, anywhere this policy is applied" spec: failurePolicy:Fail matchConstraints: resourceRules: - apiGroups:[""] apiVersions:["v1"] operations:["CREATE","UPDATE"] resources:["pods"] variables: - name:allContainers expression:>- object.spec.containers + (has(object.spec.initContainers) ? object.spec.initContainers : []) + (has(object.spec.ephemeralContainers) ? object.spec.ephemeralContainers : []) validations: - expression:>- !variables.allContainers.exists(c, has(c.securityContext) && has(c.securityContext.privileged) && c.securityContext.privileged == true) message:"Privileged containers are not allowed" --- apiVersion:admissionregistration.k8s.io/v1 kind:ValidatingAdmissionPolicyBinding metadata: name:"deny-privileged-binding.static.k8s.io" annotations: kubernetes.io/description:"Bind deny-privileged policy to all namespaces except kube-system" spec: policyName:"deny-privileged.static.k8s.io" validationActions: - Deny matchResources: namespaceSelector: matchExpressions: - key:"kubernetes.io/metadata.name" operator:NotIn values:["kube-system"]Protecting what couldn't be protected before
The part we're most excited about is the ability to intercept operations on admission configuration resources themselves.
With API-based admission, webhooks and policies are never invoked on types like ValidatingAdmissionPolicy or ValidatingWebhookConfiguration. That restriction exists for good reason: if a webhook could reject changes to its own configuration, you could end up locked out with no way to fix it through the API.
Manifest-based policies don't have that problem. If a bad policy is blocking something it shouldn't, you fix the file on disk and the API server picks up the change. There's no circular dependency because the recovery path doesn't go through the API.
This means you can write a manifest-based policy that prevents deletion of your critical API-based admission policies. For platform teams managing shared clusters, this is a significant improvement. You can now guarantee that your baseline security policies can't be removed by a cluster admin, accidentally or otherwise.
Here's what that looks like in practice. This policy prevents any modification or deletion of admission resources that carry the
platform.example.com/protected: "true"label:apiVersion:admissionregistration.k8s.io/v1 kind:ValidatingAdmissionPolicy metadata: name:"protect-policies.static.k8s.io" annotations: kubernetes.io/description:"Prevent modification or deletion of protected admission resources" spec: failurePolicy:Fail matchConstraints: resourceRules: - apiGroups:["admissionregistration.k8s.io"] apiVersions:["*"] operations:["DELETE","UPDATE"] resources: - "validatingadmissionpolicies" - "validatingadmissionpolicybindings" - "validatingwebhookconfigurations" - "mutatingwebhookconfigurations" validations: - expression:>- !has(oldObject.metadata.labels) || !('platform.example.com/protected' in oldObject.metadata.labels) || oldObject.metadata.labels['platform.example.com/protected'] != 'true' message:"Protected admission resources cannot be modified or deleted" --- apiVersion:admissionregistration.k8s.io/v1 kind:ValidatingAdmissionPolicyBinding metadata: name:"protect-policies-binding.static.k8s.io" annotations: kubernetes.io/description:"Bind protect-policies policy to all admission resources" spec: policyName:"protect-policies.static.k8s.io" validationActions: - DenyWith this in place, any API-based admission policy or webhook configuration labeled
platform.example.com/protected: "true"is shielded from tampering. The protection itself lives on disk and can't be removed through the API.A few things to know
Manifest-based configurations are intentionally self-contained. They can't reference API resources, which means no
paramKindfor policies, no Service references for admission webhooks (instead they are URL-only), and bindings may only reference policies in the same manifest set. These restrictions exist because the configurations need to work without any cluster state, including at startup before etcd is available.If you run multiple API server instances, each one loads its own manifest files independently. There's no cross-server synchronization built in. This is the same model as other file-based API server configurations like encryption at rest. When this feature is enabled, Kubernetes exposes a configuration hash as a label on relevant metrics, so you can detect drift.
Files are watched for changes at runtime, so you don't need to restart the API server to update policies. If you update a manifest file, the API server validates the new configuration and swaps it in atomically. If validation fails, it keeps the previous good configuration and logs the error. This means you can roll out policy changes across your fleet using standard configuration management tools (Ansible, Puppet, or even mounted ConfigMaps) without any API server downtime.
The initial load at startup is stricter: if any manifest is invalid, the API server won't start. This is intentional. At startup, failing fast is safer than running without your expected policies.
Try it out
To try this in Kubernetes v1.36:
- Enable the
ManifestBasedAdmissionControlConfigfeature gate for each kube-apiserver. - Create a directory with your static manifest files. If you need to mount that in to the Pod where the API server runs, do that too. Read-only is fine.
- Configure
staticManifestsDirin yourAdmissionConfigurationwith the directory path. - Start the API server with
--admission-control-config-filepointing to yourAdmissionConfigurationfile.
The full documentation is at Manifest-Based Admission Control, and you can follow KEP-5793 for ongoing progress.
We'd love to hear your feedback. Reach out on the #sig-api-machinery channel on Kubernetes Slack (for an invitation, visit https://slack.k8s.io/).
How to get involved
If you're interested in contributing to this feature or other SIG API Machinery projects, join us on #sig-api-machinery on Kubernetes Slack. You're also welcome to attend the SIG API Machinery meetings, held every other Wednesday.
- Enable the