Kubernetes 1.35: In-place Pod Resizing, Native Workload Identity, and Runtime/Networking Roadmap

Kubernetes 1.35 consolidates operational changes platform operators need to plan for: release notes promote in-place Pod resource updates (so vertical resizing without eviction becomes an explicit operational primitive), a first-party workload identity path plus automated certificate rotation are emphasized, and the release discussion signals migration away from older networking and runtime codepaths (notably IPVS kube-proxy usage and older containerd 1.x compatibility). These items affect upgrade testing, autoscaler behavior, and node image requirements. This article explains practical semantics, realistic tests to add to CI, and rollout guidance.

What in-place Pod resource updates mean in practice

The core change promoted in 1.35 is that controllers and operators can PATCH a Pod's container resource requests/limits and the kubelet coordinates an in-place cgroup update rather than requiring Pod replacement or eviction, provided the kubelet and node runtime support the adjustment.

Operational semantics and constraints:

API usage is unchanged: you continue to PATCH the Pod. The kube-apiserver + kubelet coordinate an in-place transition instead of rejecting or forcing eviction when the change is supported.
Immutable fields still behave the same: changes to scheduling-critical fields (nodeName, hostNetwork, etc.) will continue to require replacement.
The node runtime (CRI), container runtime, and kubelet must support applying resource changes without restarting the container. Behavior can vary between cgroup v1 and v2; validate against your node image and runtime.
Controllers and operators that assumed Pod replacement as a lifecycle event (for hooks, sidecars, or restart-based initialization) must be adapted to react to in-place updates when needed.

Example: patch an existing Pod to increase CPU and memory resources for the container named "app":

kubectl patch pod web-0 --type=merge -p '{"spec":{"containers":[{"name":"app","resources":{"requests":{"cpu":"500m","memory":"512Mi"},"limits":{"cpu":"1","memory":"1Gi"}}}]}}'
 
# Verify the Pod spec changed and inspect node/kubelet logs and runtime state:
kubectl get pod web-0 -o yaml
kubectl describe pod web-0
# Check kubelet logs on the node and the container runtime for cgroup update entries
journalctl -u kubelet -n 200 --no-pager

CI/test additions to validate in-place resizing:

Workload-integrity tests asserting no eviction on resource PATCH and that application health checks continue to pass after cgroup changes.
VPA interactions: if you use Vertical Pod Autoscaler, validate the mode you run (eviction vs. in-place recommendation application) and test recommendations being applied in-place where supported.
Runtime/cgroup compatibility: run tests on your supported node images under both cgroup v1 and v2 (if applicable) and observe differences in memory/cpu behavior and OOM handling.

Native workload identity and automated certificate rotation

1.35 emphasizes a first-party workload identity path that builds on the TokenRequest API and projected ServiceAccount tokens, and it pushes for more automated rotation of short-lived credentials and component certificates. The intent is to make short-lived, bound tokens the default integration surface instead of relying on long-lived service-account secrets or cloud-provider metadata hacks.

What to adopt and test:

Use the ServiceAccount TokenRequest / projected token pattern to issue audience-bound, short-lived tokens for Pods. Configure your external token-exchange or IAM system to accept those tokens by audience and issuer.
Monitor certificate rotation workflows (kubelet certs, controller-manager certs, CSR controller). Automated rotation changes failure modes: rotation failures typically surface as trust or authentication errors rather than an obvious "expiry" alert.
Ensure your CI/tests exercise token lifetime, token refresh, and token-exchange flows so you catch token audience or issuer mismatches early.

Example ServiceAccount + Pod using a projected token:

apiVersion: v1
kind: ServiceAccount
metadata:
  name: ai-worker
  namespace: ml
---
apiVersion: v1
kind: Pod
metadata:
  name: ai-infer
  namespace: ml
spec:
  serviceAccountName: ai-worker
  containers:
  - name: infer
    image: example/ai-infer:stable
    volumeMounts:
    - name: sa-token
      mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      readOnly: true
  volumes:
  - name: sa-token
    projected:
      sources:
      - serviceAccountToken:
          path: token
          expirationSeconds: 600
          audience: api://external-oidc

Your external token-exchange or cloud IAM must accept the projected token and exchange it for an external identity. The practical benefit is short-lived, Pod-bound credentials that reduce blast radius and eliminate long-lived static tokens.

Scheduler, scaling behavior, and the networking/runtime roadmap

1.35 includes scheduler improvements aimed at better placement for bursty and large AI-style workloads and signals continued evolution of scaling resilience. The release also marks an explicit direction away from some older networking/runtime approaches; treat those as migration signals rather than immediate breaking changes and schedule tests accordingly.

Operational takeaways:

Scheduler: validate your PriorityClasses, PodTopologySpread, NodeSelectors, and preemption tuning against representative AI/GPU workload shapes. Improvements help but your policy choices still determine placement outcomes.
Scaling: run combined HPA + VPA + in-place update scenarios. Test for metric drift, oscillation, and eviction thresholds under noisy-neighbor conditions.
kube-proxy IPVS: the release discussion calls out IPVS mode as a path being phased out in favor of iptables or alternative CNIs (for example eBPF-based CNIs such as Cilium). If you run IPVS mode, plan and test migration to iptables or a kernel-bypass CNI.
containerd 1.x: the discussion signals a deprecation trajectory for older containerd 1.x compatibility. Treat this as a near-term compatibility risk: ensure node images and OS vendor packages provide a supported container runtime and plan upgrades.

Example: prefer iptables mode via kube-proxy DaemonSet flags if you are phasing out IPVS (edit the kube-proxy DaemonSet to set the flag):

# In the kube-proxy DaemonSet container args, set:
# --proxy-mode=iptables
# Or update your kube-proxy configmap to the supported config API for your cluster version.

If migrating to an eBPF CNI, run a parallel upgrade path in staging to validate Service discovery, NetworkPolicy semantics, hostPort/NodePort behavior, and performance under churn.

Testing and rollout checklist for platform teams

Schedule these checklist items into your upgrade and validation pipeline before rolling 1.35 into production:

Node images & runtime: confirm host images include a CRI and container runtime that support in-place resource updates. If you remain on older containerd 1.x packages, plan migration and validate vendor upgrade paths.
Load tests: at scale, patch Pod resources and watch for application-visible regressions, kubelet errors, OOMs, and CPU throttling anomalies.
Autoscaler interplay: exercise HPA + VPA + in-place updates. Confirm VPA recommendation application and ensure metrics-based HPA signals remain stable.
Workload identity: update CI to request projected tokens and run token-exchange integration tests. Monitor CSR and token-request controllers and add alerts for rotation failures.
Networking: if you use kube-proxy IPVS, run migration tests to iptables or an eBPF CNI in staging. Validate service health under high churn and check kernel nftables backends if you rely on them.
Monitoring & runbooks: add alerts for certificate or token rotation failures and treat rotation failures as actionable incidents. Add runbook steps for in-place update failures (how to surface kubelet/runtime logs, rollback, or restart strategies).

Practical timeline guidance

Short term (0–3 months): enable and test in-place resource updates in staging. Update CI to exercise PATCH-based resizing. Audit node runtime versions and begin remediation if you are on older containerd 1.x builds.
Medium term (3–9 months): migrate networking away from IPVS-based kube-proxy where feasible. For clusters needing high performance and lower kube-proxy reliance, evaluate eBPF CNIs like Cilium.
Security & identity: move away from long-lived service-account tokens or cloud-metadata workarounds toward projected, audience-bound tokens and token-exchange flows. Update secret management and sidecar patterns accordingly.
Operational maturity: add targeted tests for certificate and token rotation; ensure runbooks treat rotation failures as priority incidents rather than simple expiry warnings.

Kubernetes 1.35 is less about a single flashy feature and more about consolidating operational primitives: a supported path for in-place vertical changes, a first-party workload identity surface with automated rotation, and an explicitly signaled roadmap away from older networking/runtime codepaths. Platform teams should bake these behaviors into CI, upgrade plans, and runbooks rather than treating them as optional toggles.