Vertex AI: Gemini 3.5 Flash per-project toggle removed in Global, US, EU

Google just removed a control a lot of platform teams quietly relied on. On June 16, 2026 the per-project opt-in that gated Gemini 3.5 Flash in Vertex AI was removed for Global, US, and EU multi-regions — the model and its configuration are now standardized across those regions, no more per-project toggle.

Don't read that as a minor UX tweak. For teams that used the toggle as a safety valve — slow rollouts, per-project experiments, staged security reviews — it was the simplest way to say “not here yet.” With the toggle gone, your options are: pin endpoints to a specific model version, use traffic splits, or accept whatever default configuration Vertex exposes. All three are workable, but they shift operational burden back onto platform teams.

This is the wrong call by Google. Standardizing availability reduces support and product complexity, but it offloads risk to operators who run production inference at scale. For environments running agentic workflows, real-time assistants, or tightly budgeted inference pipelines, removing the per-project opt-in increases the blast radius for model changes and pricing drift. If you haven't revisited your endpoint pinning and traffic-splitting strategy in Vertex AI, do it now — assume you need stronger CI and rollback paths for model rollout safety.

If you want practical follow-up reading, I covered the immediate endpoint and pinning implications already: Vertex AI: Gemini 3.5 Flash per-project toggle removed — pin models and endpoints.

A couple of companion platform changes this week reinforce the same theme: less surprise, more operational work.

Managed Service for Apache Spark: rollout timeline moved

Google shifted the rollout window for new sub-minor releases of Managed Service for Apache Spark by roughly one week. These are incremental updates rather than major-channel changes, but the shorter buffer matters if your CI matrix pins exact micro-versions — update your upgrade calendar and be ready for a compressed testing window.

Cluster Toolkit 1.92.0 nudges reference architectures

A recent Cluster Toolkit release updates recommended automation patterns for HPC, tightly coupled compute, and data analytics clusters on GCP. Expect defaults and examples that favor reproducible infra (IaC snippets, node-local storage patterns, and NUMA-aware layouts), plus refreshed guidance for preemptible and accelerator-backed nodes. Teams that rely on toolkit-driven bootstrap scripts should reconcile those scripts and cluster lifecycle tooling with the new defaults.

Small releases, big ops consequences

The updates themselves are incremental, but they collectively shift the locus of control. Google is consolidating feature surfaces (fewer toggles, more global defaults) while nudging tooling to bake in reproducibility. For platform engineers that means more upfront configuration work: endpoint pinning, stricter rollout pipelines, and reconciled IaC tied to specific toolkit versions.

Final take: this is GCP choosing product simplicity over operator ergonomics. That's defensible — standardization reduces cross-project support load — but it forces platform teams to be more opinionated. If your org treats model availability as a policy decision, the new default is inconvenient. Expect an uptick in teams using traffic-splitting, blue/green inference endpoints, and internal model proxies to regain the control they just lost. If you aren't planning for that shift today, you'll be firefighting an availability or cost surprise tomorrow.

Vertex AI: Gemini 3.5 Flash per-project toggle removed in Global, US, EU

Sources

GKE 1.36 now default for Rapid-channel new clusters

Cluster Toolkit 1.92.0: TPU VM Diagnostics and GKE Node Auto-Provisioning

GCP Cloud Billing: pre‑June 16, 2026 accounts moved to billing-account-level CUD sharing