Cluster Toolkit 1.92.0: TPU VM Diagnostics and GKE Node Auto-Provisioning

Google flipped a toggle you probably relied on: as of June 16, 2026 the per-project/region feature toggle for Gemini 3.5 Flash is gone in Global, US and EU multi-regions. That means Flash is now an always-on capability for those regions — no more quick opt-outs from the console. If your data plane, runtime contracts, or cost models assumed Flash could be disabled at the project level, you need to treat Gemini 3.5 Flash like an operational baseline, not an opt-in experiment.

Two operational updates — Cluster Toolkit 1.92.0 and expanded Gemini Cloud Assist features inside BigQuery — actually change how you operate ML pipelines and schedule data jobs. There's also a small but meaningful delay to the Managed Service for Apache Spark rollout that affects upgrade windows.

Why the toggle removal matters

Removing the per-project toggle is the right call from a product simplicity and safety perspective: feature flags that live in customer consoles create brittle operational models and endless support sleuthing. The downside is practical — teams that relied on flipping Flash to reproduce old behavior or to gate outputs for compliance now must pin models or endpoints explicitly. If you need deterministic model semantics for tests or audits, pin your Vertex endpoints or lock a specific model revision; don't depend on a UI toggle that no longer exists. For a deeper take, see our earlier note on the toggle removal Vertex AI: Gemini 3.5 Flash per-project toggle removed.

Cluster Toolkit 1.92.0: operational knobs you actually use

Cluster Toolkit 1.92.0 landed with two operational features that will change day-to-day work for platform teams running ML on GKE:

ML diagnostics support for Cloud TPU VMs and TPU-related failure modes. Diagnostics now surface TPU-specific traces and performance counters and integrate with Cloud Monitoring and Cloud Logging, so when a distributed training job stalls you get TPU-aware traces and recommendations. That’s not just a telemetry add; it can materially lower mean time to resolution for TPU-related job failures.
Node auto-provisioning for GKE clusters. GKE's Node Auto-Provisioning (NAP) integration can now create node pools tailored to accelerator-backed ML workloads — for example GPU-backed node pools and TPU VM node types where those are supported — driven by the cluster autoscaler. Practically, that reduces manual capacity planning for bursty training workloads and shortens the time you wait for infra tickets to be addressed.

Both features are overdue. We've been relying on kludged autoscaler configs and bespoke runbooks for TPU jobs for too long; this gives teams a standard place to surface TPU issues and to let the control plane manage ephemeral capacity. Caveat: don’t mistake auto-provisioning for capacity governance. You still need quota guardrails, PodPriority/Preemption and resource quotas to avoid noisy neighbors.

BigQuery + Gemini-powered Cloud Assist: lineage and scheduled runs

BigQuery’s Gemini-powered Cloud Assist expanded in preview with two practical features: model-assisted data lineage inference and AI-assisted scheduled queries. The lineage suggestions help you build a view of how data flows through SQL queries and transformations — useful for impact analysis and auditing — while the scheduling assist can propose and create recurring scheduled queries in the BigQuery UI or via the API instead of you wiring an external scheduler.

These are welcome: lineage is the part of data governance that actually changes developer behavior (you only refactor a query if you can see downstream consumers), and scheduling inside BigQuery reduces glue code. But expect false positives in lineage suggestions during preview — treat the results as a draft that accelerates investigation, not an authoritative source of truth.

Managed Spark rollout timeline tweak

Finally: Managed Service for Apache Spark revised a sub-minor rollout plan and moved the start of automated rollouts by one week (from June 15 to June 22, 2026). It's a small shift, but it affects teams that had validation and CI windows aligned to the previous date. If you keep strict runtime baselines for reproducible ETL, move your validation gates and bake that week into your release calendar.

What to do now

If you depend on turning Gemini 3.5 Flash off for tests or compliance: pin models/endpoints and update runbooks. Treat Flash as a baseline capability.
Upgrade or evaluate Cluster Toolkit 1.92.0 in a staging cluster to exercise TPU VM diagnostics and GKE node auto-provisioning with your workloads before enabling in production.
Try Gemini Cloud Assist lineage on non-critical datasets to calibrate noise/accuracy, and use the scheduling preview to replace ad-hoc cron glue where appropriate.

Final thought: these are incremental releases, but they point to a pattern. Google is consolidating AI capabilities into platform primitives (diagnostics, auto-provisioning, model baselines) rather than leaving them as experimental toggles. That’s good for predictable operations — but it forces platform teams to be explicit about pinning and governance. If you treat cloud AI features as feature flags instead of platform contracts, you'll be surprised. Treat them like infrastructure now.

Cluster Toolkit 1.92.0: TPU VM Diagnostics and GKE Node Auto-Provisioning

Sources

GKE 1.36 now default for Rapid-channel new clusters

GCP Cloud Billing: pre‑June 16, 2026 accounts moved to billing-account-level CUD sharing

Vertex AI: Gemini 3.5 Flash per-project toggle removed in Global, US, EU