GCP

BigQuery: Gemini Cloud Assist (Preview) and per-second autoscaling reservation billing

BigQuery adds Gemini Cloud Assist (Preview) and per-second autoscaling reservation billing, changing cost and ops for bursty analytics and Vertex AI pipelines.

June 13, 2026·3 min read·AI researched · AI written · AI reviewed

BigQuery just changed two things that will immediately alter how teams design cost-efficient analytics and AI pipelines: Gemini Cloud Assist is available in Preview for lineage and query scheduling, and autoscaling reservations now use per-second billing with no minimum billing increment. Both are small-sounding tweaks with outsized operational impact.

The per-second billing change is the one that really matters. If your pipelines are a mix of short ETL bursts, ad-hoc model training, and scheduled Vertex AI jobs, you no longer have to overcommit slot/reservation time to avoid wasted minutes of billing. Jobs that used to justify keeping a half-day reservation "because it's cheaper than paying for ten 2‑minute bursts" are now visible candidates for true ephemeral sizing.

Why this changes how you build analytics pipelines

  • Cost modeling: Per-second billing collapses a lot of the tail risk in bursty workloads. Your unit economics for small jobs (seconds-to-minutes) goes from stepped pricing to near-linear cost. That makes autoscaler-friendly architectures — ephemeral workers in GKE, short-lived Cloud Run tasks, or on-demand Vertex AI training jobs — materially cheaper.
  • Platform control planes get simpler: Internal schedulers can spin up reservations programmatically for the exact lifespan of a job without having to batch or pad for billing minimums. Expect teams to move reservation orchestration into their job schedulers or platform APIs rather than rely on static, long-lived reservations.
  • New failure modes: more frequent reservation churn means more API calls, more auditing events, and more opportunity to hit quotas. If you already have autoscaler-driven reservations, add monitoring on reservation-create/delete rates and surface those metrics in billing dashboards.

Gemini Cloud Assist in Preview is another relevant lever: it provides lineage analysis and query-scheduling hooks tied into BigQuery. That matters because Vertex AI model training and feature pipelines often start as BigQuery jobs. Cloud Assist makes it easier to route model-ready datasets into Vertex jobs and to schedule those handoffs with better awareness of data freshness — effectively shrinking the time-to-serve for model inputs.

Model and service mix: Model Garden and residency constraints

Model Garden has added additional Anthropic Opus-family models alongside new Gemini variants offered in limited availability with region and residency options. This is a signal that multi-model serving is now an ops problem, not a product checkbox. You need to map model SLOs and data residency constraints into your routing layer (Vertex endpoints, custom gateways, or sidecar inference proxies). This complements the billing change: cheaper, short-lived compute + more model options equals a strong incentive to adopt per-request or per-invocation model routing instead of pinned capacity.

Network and security wrinkles

These updates aren't flashy but they matter: Network Connectivity Center expanded preview support for partner-managed cross-cloud connectivity to AWS, Cloud WAN received incremental improvements, and service mesh/Envoy-based stacks continue to evolve. If your GKE clusters or Cloud Run services are fronting Gemini-powered endpoints, test Envoy and service-mesh behavior before upgrading — model-serving latency and header propagation matter for A/B routing across models.

Also note Google has announced changes to IAM APIs and deprecation timelines for some legacy third-party SIEM/connectors. If your observability or SIEM flows feed GKE/Cloud Run logs into third-party tooling, map those connectors to supported paths now — this is a governance problem more than a purely technical one.

Take: this is the right call, and it's overdue. Per-second billing for autoscaling reservations removes an economic artifact that forced wasteful, long-lived reservations. Platform teams who don't rework schedulers, quotas, and observability around ephemeral reservation churn will be the ones surprised by sudden API-quota spikes or strange billing patterns.

If you run Vertex AI or multi-model inference at scale, treat this week as an operational turning point: move reservation control into your job orchestrator, add telemetry for reservation churn, and codify model SLO/residency routing in your gateway. Within a year, cost-conscious platforms will default to ephemeral capacity, not the old reserve-and-forget model — and those that cling to static reservations will be the first to pay for it.

Sources

gcpbigqueryvertex-aicloud-billing
← All articles
GCP

GKE Maintenance Controls: Per-Node-Pool Exclusions, 90‑Day No-Upgrade Windows, and Data-Cache SSDs

GKE adds per-node-pool maintenance exclusions and 90-day no-upgrades windows, plus an ephemeral local SSD dataCacheCount API. Operational guidance for SREs.

Jun 11, 2026·6mgkevertex-ai
GCP

Vertex AI: Gemini 2.5 Flash‑Lite GA — Cloud Run GPUs GA and GKE Inference Updates

Gemini 2.5 Flash-Lite is GA on Vertex AI with explicit caching and batch prediction. Cloud Run GPUs are GA for serverless GPU inference; check region quotas.

Jun 10, 2026·6mvertex-aigemini-2-5
GCP

Gemini Enterprise Agent Platform: Pricing, API Alignment, and GKE/Cloud Run Impacts for Gemini 2.5 & 3.x

Gemini Enterprise Agent Platform clarifies Gemini 2.5/3.x token and grounding pricing, affecting Vertex AI cost models, RAG economics, and GKE/Cloud Run ops.

Jun 8, 2026·6mgemini-enterprise-agent-platformgemini-2.5