BigQuery fluid scaling GA: per-second billing for autoscaling reservations

BigQuery’s fluid scaling just rewires cost math for analytics: it’s GA with per-second billing and no minimum duration for autoscaling reservations, which means you can now scale up for ten seconds of heavy work and not pay for idle slot-hours afterward. If your architecture still assumes you must buy a slot cluster to avoid cold starts, this single change makes that assumption expensive and obsolete.

Why this actually matters

Autoscaling reservations were already useful for shielding workloads from noisy neighbors and guaranteeing concurrency. The GA move to true per-second billing removes the economic trade-off that forced teams to round up to the nearest hour (or keep baseline reservations to amortize costs). Practically, this enables three patterns that were previously unattractive:

Short-lived, compute-heavy ELT bursts triggered by events (pub/sub spikes, nearline ingestion) without the need for 24/7 slots.
Event-driven analytics pipelines that can scale from near-zero to thousands of slots for backfills and then drain to zero instantaneously.
Cost-siloing by tenant or product where chargebacks reflect actual burst consumption rather than conservative reservation sizing.

GKE and the mesh: small version bump, meaningful implications

Anthos Service Mesh / ASM has a recent patch release that updates sidecar behavior, closes Envoy-related security issues, and tightens multi-cluster compatibility guarantees. The takeaway isn’t the exact version number so much as the operational risk: sidecar shim behavior and Envoy API changes can subtly alter routing and mTLS negotiation. If you run GKE with in-cluster or multi-cluster service meshes, plan a canary upgrade to validate Envoy filter compatibility and multi-cluster gateway tests.

Vertex AI, Gemini and the multi-model world

Vertex AI’s Model Garden has continued to add third-party frontier models (including Anthropic’s Claude family). I’m seeing enterprises front Gemini models with a blend of third-party models behind a single Vertex endpoint to centralize policy, observability, and cost controls. That mix-and-match approach is the sane architecture: use cheaper Gemini variants for routine reasoning and fall back to more capable (and expensive) models for critical paths.

Relatedly, Gemini-assisted features in BigQuery are showing up in preview for things like data-lineage analysis and query scheduling. Teams are increasingly expressing CI/CD-style governance as AI-generated SQL and metadata transformations. That’s powerful, but it turns LLM output into an infrastructure control plane—treat it like code and own the audit trail.

Networking gets practical: Partner Cross-Cloud Interconnect for AWS

Network Connectivity Center added public preview support for partner-backed Cross-Cloud Interconnect to AWS. This is a pragmatic path to multi-cloud observability and analytics where GCP acts as the analytics/control plane and AWS systems remain in their home region—no more brittle VPNs or ad-hoc peering for predictable, partner-backed connectivity.

What you should change this week

Re-evaluate slot purchases and migration plans: migrate predictable workloads to autoscaling reservations and rerun cost models assuming per-second billing.
Add telemetry around short-lived bursts: without it, the new billing model just moves costs elsewhere unnoticed.
Treat Gemini-generated SQL as deployable artifacts: implement review gates, provenance logging, and test suites.
Canary the ASM upgrade for Envoy filter compatibility; multi-cluster gateways deserve an integration test.

Final take: this isn’t one big launch, it’s a set of nudges that together change architecture defaults. BigQuery’s billing change is the headline because it alters economics immediately; the rest (mesh patches, model additions, cross-cloud interconnect) are operational continuity and multi-model maturation. Expect teams that don’t re-architect for per-second analytics to keep overpaying, and expect a new class of operational tooling to emerge that treats model outputs as first-class infra artifacts.

If you want a short refresher on the Vertex AI momentum and where Gemini fits, see Vertex AI: Gemini 2.5 FlashLite GA Cloud Run GPUs GA and GKE Inference Updates.

One prediction: within 12 months, “we pay only for bursts” will be a line item in cloud cost reviews, and teams still buying 24/7 slots will need to justify themselves to finance and SREs.

BigQuery fluid scaling GA: per-second billing for autoscaling reservations

Sources

Cloud Run worker pools GA: run pull-based, non-HTTP workers as managed serverless

GKE 1.33/1.32: Channel defaults, deprecations, and the push to Gateway API 1.5 for inference

GKE 1.36.0-gke.4681000: COS security patches and channel defaults shifting to 1.35/1.36 (mid‑July 2026)