GCP

Vertex AI: Gemini 3.5 Flash per‑project toggle removed — pin models and endpoints

Google removed the Gemini 3.5 Flash per-project toggle in June 2026. Teams must control variant exposure via Vertex AI model pinning, endpoints, IAM, quotas.

June 18, 2026·3 min read·AI researched · AI written · AI reviewed

The ability to gate the Flash variant of Gemini 3.5 at the project level is gone — and that will break a surprising number of deployment and cost controls.

Google removed the Gemini 3.5 Flash feature toggle in June 2026. Concretely: the per-project flag that let teams opt specific projects into the Flash variant is no longer available. The variant is now controlled through Vertex AI model/version and endpoint configuration, so exposure control shifts from ad-hoc flags to model lifecycle, IAM, quotas, and regional availability.

Why that matters now

Many platform teams used the per-project toggle as a cheap safety valve: enable Flash in a sandbox project, smoke-test it, then flip other projects when ready. Others used the flag as a throttle — limit Flash access to non-prod projects to contain cost or risk. Removing the toggle doesn't remove the capability to run Flash; it removes the low-friction switch many teams relied on to segment access quickly.

This will bite teams who built operational controls around feature flags rather than explicit model/version pinning and endpoint-level gating. If you treated the toggle as a safety net instead of baking rollout controls into CI/CD and IAM, expect surprises in both spend and surface area.

Operational fallout beyond Flash

The consolidated release notes for the period also show a handful of quiet but consequential changes across managed services: sub-minor runtime rollouts for Managed Service for Apache Spark, Contact Center AI, and rolling infrastructure updates reaching regions. These release-note-driven tweaks — defaults changing, minor runtime bumps, regional availability shifts — are the kind of friction that accumulates in production.

Two linked realities:

  • Model access is now a configuration and ops problem, not a feature-flag problem. You must pin models, control endpoints, and use IAM plus quota guards. The toggle's removal accelerates that shift.
  • Small runtime and regional rollouts keep happening. If you pin a runtime without tracking sub-minor changes, you can see behavior drift when Google completes rollouts.

Immediate checklist (what to do in the next 48–72 hours)

  1. Audit: Find any projects that relied on the Gemini 3.5 Flash toggle. Search IaC, CI pipelines, and deployment scripts for the toggle name or project-level flag patterns. If you used the flag for cost control, watch billing and model-invocation metrics for spikes.

  2. Pin and gate: Move consumers to explicit Vertex AI model versions and endpoints. Use private endpoints, VPC Service Controls (VPC‑SC), and endpoint-level IAM to restrict access. Create separate endpoints for canary, staging, and prod rather than relying on project scoping.

  3. Enforce quotas and metering: Apply GCP quotas and budget alerts to endpoints and projects. For rate-limiting, put invocation proxies (Cloud Run / Cloud Functions) in front of endpoints so you can control concurrency and apply consistent metering.

  4. Update CI/CD: Bake model promotion steps into your pipeline (model registry -> staging endpoint -> canary -> prod endpoint) and require approvals for endpoint updates.

  5. Watch regional availability: If you depended on a regional fallback via the toggle, verify the Flash variant and related runtimes are available where you run production workloads.

This is the right operational nudge — overdue, honestly. Per-project feature flags are a weak control for something as high-impact as a model variant that changes latency, cost, and capabilities. Centralizing model lifecycle management under Vertex AI forces platform teams to do the engineering work they should have done months ago: explicit versioning, endpoint segregation, IAM boundaries, and billing controls.

But Google removed the flag with minimal fanfare, and that gap will create headaches. Teams that haven't migrated risk unexpected access, cost overruns, and compliance gaps.

If you want context on how Google is changing model economics and runtime governance, review the earlier pricing and runtime shifts in Gemini 3.x — they're part of the same trend (see: /article/vertex-ai-gemini-3-1-agent-engine-pricing-token-costs/).

Final thought: the era of feature-flagging model variants at the project level is ending. Platform teams should treat model variants like any other runtime — explicitly versioned, permissioned, and promoted through CI. Do that now, or you'll be reacting to the cost and security aftermath.

Sources

vertex-aigemini-3-5gcp-release-notesgke
← All articles
GCP

GKE 1.36 now default for Rapid-channel new clusters

GKE 1.36 is now the default for new Rapid-channel clusters. Platform teams must pin versions, validate webhooks and policies, and re-run CI for compatibility.

Jun 23, 2026·3mgkegcp
GCP

Cluster Toolkit 1.92.0: TPU VM Diagnostics and GKE Node Auto-Provisioning

Cluster Toolkit 1.92.0 adds TPU VM diagnostics and GKE node auto-provisioning. BigQuery gets Gemini-powered lineage and scheduling previews; Spark delay.

Jun 22, 2026·3mcluster-toolkitgke
GCP

GCP Cloud Billing: pre‑June 16, 2026 accounts moved to billing-account-level CUD sharing

GCP moves accounts created before June 16, 2026 without active commitments to billing-account CUD sharing, altering discounts for GKE, Cloud Run and Vertex AI.

Jun 20, 2026·3mgcp-billingcommitted-use-discounts