GCP

BigQuery fluid scaling GA: per-second billing for autoscaling reservations

BigQuery fluid scaling goes GA with per-second billing and no minimum for autoscaling reservations, enabling bursty, near-zero cost analytics and slot costs.

June 13, 2026·3 min read·AI researched · AI written · AI reviewed

BigQuery’s fluid scaling just rewires cost math for analytics: it’s GA with per-second billing and no minimum duration for autoscaling reservations, which means you can now scale up for ten seconds of heavy work and not pay for idle slot-hours afterward. If your architecture still assumes you must buy a slot cluster to avoid cold starts, this single change makes that assumption expensive and obsolete.

Why this actually matters

Autoscaling reservations were already useful for shielding workloads from noisy neighbors and guaranteeing concurrency. The GA move to true per-second billing removes the economic trade-off that forced teams to round up to the nearest hour (or keep baseline reservations to amortize costs). Practically, this enables three patterns that were previously unattractive:

  • Short-lived, compute-heavy ELT bursts triggered by events (pub/sub spikes, nearline ingestion) without the need for 24/7 slots.
  • Event-driven analytics pipelines that can scale from near-zero to thousands of slots for backfills and then drain to zero instantaneously.
  • Cost-siloing by tenant or product where chargebacks reflect actual burst consumption rather than conservative reservation sizing.

GKE and the mesh: small version bump, meaningful implications

Anthos Service Mesh / ASM has a recent patch release that updates sidecar behavior, closes Envoy-related security issues, and tightens multi-cluster compatibility guarantees. The takeaway isn’t the exact version number so much as the operational risk: sidecar shim behavior and Envoy API changes can subtly alter routing and mTLS negotiation. If you run GKE with in-cluster or multi-cluster service meshes, plan a canary upgrade to validate Envoy filter compatibility and multi-cluster gateway tests.

Vertex AI, Gemini and the multi-model world

Vertex AI’s Model Garden has continued to add third-party frontier models (including Anthropic’s Claude family). I’m seeing enterprises front Gemini models with a blend of third-party models behind a single Vertex endpoint to centralize policy, observability, and cost controls. That mix-and-match approach is the sane architecture: use cheaper Gemini variants for routine reasoning and fall back to more capable (and expensive) models for critical paths.

Relatedly, Gemini-assisted features in BigQuery are showing up in preview for things like data-lineage analysis and query scheduling. Teams are increasingly expressing CI/CD-style governance as AI-generated SQL and metadata transformations. That’s powerful, but it turns LLM output into an infrastructure control plane—treat it like code and own the audit trail.

Networking gets practical: Partner Cross-Cloud Interconnect for AWS

Network Connectivity Center added public preview support for partner-backed Cross-Cloud Interconnect to AWS. This is a pragmatic path to multi-cloud observability and analytics where GCP acts as the analytics/control plane and AWS systems remain in their home region—no more brittle VPNs or ad-hoc peering for predictable, partner-backed connectivity.

What you should change this week

  • Re-evaluate slot purchases and migration plans: migrate predictable workloads to autoscaling reservations and rerun cost models assuming per-second billing.
  • Add telemetry around short-lived bursts: without it, the new billing model just moves costs elsewhere unnoticed.
  • Treat Gemini-generated SQL as deployable artifacts: implement review gates, provenance logging, and test suites.
  • Canary the ASM upgrade for Envoy filter compatibility; multi-cluster gateways deserve an integration test.

Final take: this isn’t one big launch, it’s a set of nudges that together change architecture defaults. BigQuery’s billing change is the headline because it alters economics immediately; the rest (mesh patches, model additions, cross-cloud interconnect) are operational continuity and multi-model maturation. Expect teams that don’t re-architect for per-second analytics to keep overpaying, and expect a new class of operational tooling to emerge that treats model outputs as first-class infra artifacts.

If you want a short refresher on the Vertex AI momentum and where Gemini fits, see Vertex AI: Gemini 2.5 FlashLite GA  Cloud Run GPUs GA and GKE Inference Updates.

One prediction: within 12 months, “we pay only for bursts” will be a line item in cloud cost reviews, and teams still buying 24/7 slots will need to justify themselves to finance and SREs.

Sources

bigqueryvertex-aigkecloud-networking
← All articles
GCP

GKE per-node-pool maintenance exclusions, 90-day no-upgrade window, and concurrent node-pool upgrades (Preview)

GKE adds per-node-pool maintenance exclusions, an extendable 90-day 'No upgrades' exclusion, and Preview concurrent node-pool upgrades—tradeoffs for operators.

Jun 15, 2026·3mgkekubernetes
GCP

GKE per-node-pool maintenance exclusions and 90-day no-upgrade window (release channels)

GKE adds per-node-pool maintenance exclusions in release channels and extends the default no-upgrade exclusion window to 90 days letting teams freeze upgrades.

Jun 14, 2026·3mgkebigquery
GCP

GKE Maintenance Controls: Per-Node-Pool Exclusions, 90‑Day No-Upgrade Windows, and Data-Cache SSDs

GKE adds per-node-pool maintenance exclusions and 90-day no-upgrades windows, plus an ephemeral local SSD dataCacheCount API. Operational guidance for SREs.

Jun 11, 2026·6mgkevertex-ai