GCP

GCP Cloud Billing: pre‑June 16, 2026 accounts moved to billing-account-level CUD sharing

GCP moves accounts created before June 16, 2026 without active commitments to billing-account CUD sharing, altering discounts for GKE, Cloud Run and Vertex AI.

June 20, 2026·3 min read·AI researched · AI written · AI reviewed

Google Cloud just changed a billing control that will immediately reprice workloads for many teams: accounts created before June 16, 2026 that dont have active resourceased commitments are being switched to billing-accountlevel committed use discount (CUD) sharing. If your org has historically relied on project- or resource-scoped CUDs to keep GKE node pools, Cloud Run, or Vertex AI GPU spend predictable, expect surprises on the next invoice.

This isnt a minor flag flip. Moving CUDs to the billing-account scope effectively pools committed capacity and redistributes discounts across all projects that share the account. That improves utilization for orgs that underutilized commitments, but it also destroys isolation: a single noisy GKE cluster or a spike in Vertex AI GPU training can soak the pool and shift discounts away from other projects. For multi-team billing and chargeback models this is going to be messy  and fast.

At the same time Google rolled out cross-cutting platform changes that matter when you architect around those costs. API Gateway and Cloud Endpoints enforce quotas and can return immediate throttling (HTTP 429/503) when limits are hit; they do not absorb request storms. In practice, that means throttled requests to backends running on GKE, Cloud Run, or Vertex AI are likely to be rejected at the edge rather than quietly queued. If you front generative AI endpoints with API Gateway, rework retries, circuit breakers, and fallback routes now  a retry storm against Vertex AI or a Cloud Run service will not be absorbed by the gateway.

BigQuery also gained governance and automation features that help teams understand and adapt to these billing shifts: IAM Deny policies are generally available across GCP, BigQuery Data Lineage and Data Catalog integrations (Preview) can help trace jobs and datasets, and enabling the Cloud Billing export to BigQuery (line items export) gives you richer, exportable cost telemetry. The exported billing line items are the single most useful lever for teams that need to reconcile how billing-account CUD sharing affects GPU-heavy Vertex AI spend.

On the AI side, Google has been standardizing model availability and defaults in Vertex AI across regions; check the Vertex AI release notes and your Cloud Console for any per-project toggles or model-selection defaults that might change how inference is routed. Also confirm Managed Service for Apache Spark availability in your region, since service rollout windows can affect pipeline upgrade planning for training jobs.

Why this will sting platform teams

This change favors consolidated utility over perproject predictability. If your finance model charges back per project, billing-accountlevel CUD sharing collapses the invariants youve relied on: discounts are no longer strictly attributable. The right call here from Google if your goal is utilization efficiency  but its an operational sledgehammer for organizations that used commitments as isolated price guarantees.

What to do in the next 72 hours

  • Inventory: List billing accounts and identify those created before June 16, 2026. Flag projects that share those accounts and map major cost drivers (GKE node pools, Cloud Run concurrency, Vertex AI GPU jobs).
  • Export and analyze: Enable Cloud Billing export to BigQuery (line items). Run a short-window query to see where discounted credits are likely to be consumed and correlate spend with workloads using BigQuery lineage and Data Catalog where available.
  • Protect critical workloads: If a project needs guaranteed discounted pricing, consider purchasing resource-scoped commitments where available or moving the workload to a separate billing account so discounts are attributable.
  • Harden frontends: Revisit API Gateway and Cloud Endpoints throttling and retry semantics: tighten client-side jittered backoff, add circuit-breakers, and provide graceful degradation for Vertex AI inference so quota rejections dont cascade.

This is not a documentation nitpick  its a change in the fundamental accounting model for committed discounts. Teams that treat CUDs as silent insurance will get billed; teams that treat them as a shared pool will probably see improved utilization. Either way, platform engineering must own this one: run the cost simulations, update chargeback rules, and harden your gateways. Expect more platform-level moves like this  cloud vendors are optimizing for utilization, not for your internal org charts.

Sources

gcp-billingcommitted-use-discountsvertex-aiapi-gateway
← All articles
GCP

GKE 1.36 now default for Rapid-channel new clusters

GKE 1.36 is now the default for new Rapid-channel clusters. Platform teams must pin versions, validate webhooks and policies, and re-run CI for compatibility.

Jun 23, 2026·3mgkegcp
GCP

Cluster Toolkit 1.92.0: TPU VM Diagnostics and GKE Node Auto-Provisioning

Cluster Toolkit 1.92.0 adds TPU VM diagnostics and GKE node auto-provisioning. BigQuery gets Gemini-powered lineage and scheduling previews; Spark delay.

Jun 22, 2026·3mcluster-toolkitgke
GCP

Vertex AI: Gemini 3.5 Flash per-project toggle removed in Global, US, EU

Google removed the per-project opt-in for Gemini 3.5 Flash in Vertex AI across Global, US, and EU regions, shifting rollout controls onto platform teams.

Jun 19, 2026·3mvertex-aigemini-3-5