GCP

GKE 1.36: Rapid channel defaults and managed dataplane networking GA

GKE Rapid now defaults to Kubernetes 1.36 and Google GA'd a managed dataplane option for recent 1.35+ clusters — a nudge toward GKE as the control plane for AI.

June 23, 2026·3 min read·AI researched · AI written · AI reviewed

Google quietly flipped a foundational dial: GKE’s Rapid channel now defaults to Kubernetes 1.36 while Google announced GA for a managed dataplane networking option for recent 1.35+ clusters. That combination is not cosmetic — it’s a nudge (and a nudge backed by product defaults) toward running large, AI-heavy clusters on a managed networking substrate and the reference architectures Google demonstrated at Next.

Why this matters right now

If you operate a GKE-based platform, defaults matter. Defaults drive new cluster creation, auto-upgrades, and what developers and CI pipelines expect when they request a cluster. With Rapid pushing 1.36 and Regular/Extended in the 1.35.x family, Google has defined 1.35/1.36 as the new operational baseline. At the same time, GA of a managed dataplane means Google will manage data-plane behaviors teams historically tuned themselves — overlays, flow programming, and scale characteristics — with an opinionated, managed option.

Two practical consequences: first, your next auto-upgrade may land on a kernel of behavior that assumes managed dataplane patterns. Second, teams building inference and training platforms should treat a managed dataplane as a first-class concern rather than an optional optimization: predictable, multi-tenant networking at ML scale is precisely what these features are designed to address.

Cluster Toolkit updates: more than diagnostics

The Cluster Toolkit was updated alongside these changes with a small but important set of features: TPU diagnostics for TPU machine types, clearer integration with GKE node auto-provisioning, and optional infrastructure blueprints for inference gateways and compact Slurm placements via dynamic workload scheduling. These are scaffolding pieces for Google's push to treat GKE as the control plane for very large AI clusters — the same narrative shown in large-scale demos at Next.

If you care about inference latency and packing for GPU/TPU fleets, the inference gateway blueprints are where you start. If you run HPC on GKE (Slurm illustrations), the toolkit's compact placement and autoscaling primitives are aimed at reducing idle accelerator time and improving bin-packing.

The broader signal: GKE + Gemini + Vertex

Next and the release notes are consistent: Google is promoting a reference architecture that pairs GKE (Autopilot or standard) with Gemini/Vertex AI services, managed service-mesh/Traffic Director, inference gateways, and intent-based autoscaling driven by custom metrics or intent signals (not just CPU/memory). Gemini integrations and attendant agent tooling are positioned as an orchestration front door; GKE is the control plane underneath. That’s a reasonable architecture, and it's deliberate: managed networking + managed control plane + model orchestration tooling makes operationalizing large generative systems easier — until it doesn't.

The tradeoffs and what to watch for

This is the right call from an operational standpoint: for very large, multi-tenant inference fleets you want the vendor to own network determinism problems. But it also centralizes new failure modes and integration work into platform teams. Expect these painful effects if you ignore the change:

  • Silent performance cliffs for auto-upgraded clusters that assume managed dataplane behaviors. If your workflows depend on in-cluster east/west latencies or custom iptables rules, validate them on 1.35/1.36 with the managed dataplane enabled.
  • A shift away from CPU/memory autoscaling defaults. Your IDP and autoscaler policies must support intent or custom metrics for model load and tail latency.
  • Additional trust and attack surface where model agents and inference gateways interact with your cluster — agent orchestration is useful, but it’s also a new trust boundary.

If you manage an internal developer platform, these are actionable priorities: bake version gating into cluster creation templates; test autoscaling policies against the new inference gateway blueprints; and add network performance testing to your CI. If you want a concise walkthrough of the toolkit changes, see our explainer on how TPU diagnostics and node auto-provisioning fit into a GKE-first AI stack.

Final thought

Google isn't just bumping Kubernetes versions — it's making a platform bet. Defaults + a managed data plane + reference blueprints push the ecosystem toward GKE as the control plane for large AI workloads paired with Gemini and Vertex. That's overdue and sensible for teams at scale, but it will bite teams that skip network and autoscaling testing. If you run AI on GKE, treat 1.35/1.36 and the managed dataplane as operational primitives, not optional features. The next platform debate won't be whether to use GKE for AI — it'll be whether your IDP treated the network and intent signals as first-class citizens.

Sources

gkekubernetes-1-36managed-dataplanecluster-toolkit
← All articles
GCP

Cluster Toolkit v1.92.0: GKE node auto-provisioning and TPU ML diagnostics (Google Cloud)

Cluster Toolkit v1.92.0 adds TPU diagnostics and GKE auto-provisioning, shifting GPU/TPU capacity decisions to platform teams and requiring quota guardrails.

Jun 22, 2026·3mgkevertex-ai
GCP

Gemini 3.5 Flash region toggle removed — migrate to Vertex AI endpoints & traffic-split

Google removed the Gemini 3.5 Flash region-scoped feature toggle in mid‑June 2026, forcing teams to use endpoints, model versions, and traffic-split controls.

Jun 20, 2026·3mgemini-3-5vertex-ai
GCP

Vertex AI Agent Engine: Sessions, Memory Bank & Code Execution billing begins 2026-01-28

Vertex AI Agent Engine will charge for Sessions, Memory Bank, and Code Execution starting 2026-01-28. Teams must rethink agent state and cost telemetry.

Jun 19, 2026·3mvertex-aigemini