These Google Cloud updates give platform teams more granular operational control across compute, data, and AI. Key changes: GKE supports per-node-pool maintenance exclusions with a longer default no-updates window; Gemini Enterprise expands Limited Availability access to 3.1 Pro and 3 Flash; and BigQuery’s fluid scaling reaches GA with per-second billing for autoscaled capacity. Each change alters upgrade, capacity, and inference decision surfaces for production systems.
What changed in GKE: per-node-pool maintenance exclusions and a 90-day no-updates window
Google now supports excluding specific node pools from automated maintenance and upgrades, and the default exclusion duration for excluded pools has been extended to 90 days. That delivers finer-grained control than cluster-wide maintenance policies alone.
Practical impacts
- Reduced blast radius: Critical node pools (stateful workloads, network appliances, latency-sensitive services) can avoid automated upgrades for a deterministic period.
- Staged rollouts: Teams can align upgrades to per-node-pool schedules for driver/CNI compatibility or canary deployments.
- Operational responsibility: Longer exclusion windows reduce upgrade pressure but increase the need to track OS and patch drift.
Recommended guardrails
- Inventory and labeling: Enforce node-pool naming and labels (e.g., kube.cloud.google.com/maintenance-exclude=true and role=stateful) so automation and audits can detect excluded pools.
- IaC and pipeline checks: Make exclusions explicit in Terraform/Deployment Manager modules and add pipeline checks that require re-approval before an exclusion exceeds a policy-defined age.
- Observability: Alert on the time since last OS/kernel updates for excluded pools. Treat a 90-day exclusion as a temporary operational state, not a permanent exemption.
- Workload placement: Use taints/tolerations and affinity/anti-affinity to ensure critical pods stay on intended pools; validate DaemonSets and node-local agents for excluded pools.
Compatibility note
- Autopilot vs Standard: Verify Autopilot semantics; per-node-pool controls typically apply to GKE Standard node pools and may differ for Autopilot-managed nodes.
Gemini Enterprise: 3.1 Pro and 3 Flash Limited Availability — SLO and architecture implications
Google has opened Limited Availability access to Gemini 3.1 Pro and 3 Flash under Enterprise contracts and indicates these LA models are covered by Enterprise SLOs. LA status implies phased capacity and feature rollout.
Operational impacts
- Model selection and capacity: Expect gated regional availability and throttling during LA; design your inference layer to handle partial availability.
- SLO alignment: If LA models are under Enterprise SLOs, you can incorporate their latency and availability expectations into vendor SLAs and runbooks, but validate coverage details for your regions and use cases.
- Cost/latency trade-offs: 3 Flash prioritizes latency and throughput; run micro-benchmarks (p50/p95/p99) and measure cost-per-inference before routing production traffic.
Actionable architecture changes
- Model-aware routing: Implement routing rules that prefer 3.1 Pro/3 Flash when available and fall back to lower-tier Gemini models or alternate providers when capacity is constrained. Include circuit-breaker and retry policies to avoid cascading failures.
- Telemetry: Extend traces and metrics to record model family/version and per-model latency/error rates to drive dynamic routing decisions.
- Governance: Verify Enterprise SLOs cover data governance requirements (residency, retention, PII handling) for your workloads; LA access may not include all regional or feature parity immediately.
BigQuery Fluid Scaling GA: per-second billing and no minimum reservation
BigQuery fluid scaling GA makes autoscaled reservation capacity more elastic: autoscaled capacity is billed per second and reservations no longer require a minimum duration. This reduces the penalty for short spikes in concurrency and enables more granular capacity strategies.
Architectural implications
- Cost elasticity: Handle short-lived concurrency spikes with ephemeral autoscaling instead of long minimum reservations, reducing over-provisioning.
- Reservation strategy: Move to a hybrid model—baseline reservations for steady load plus fluid autoscaling for bursts.
- Chargeback and billing: Update internal chargeback systems and dashboards to account for per-second billing of autoscaled capacity.
Operational guidance
- Baseline sizing: Keep a baseline reservation to protect SLA-sensitive workloads and use fluid scaling for transient demand.
- Query shaping: Use workload manager controls (priorities, slot allocations) to prevent high-cost queries from crowding out critical jobs.
- Monitoring and budgets: Create alerts for sub-minute spend spikes and update BI/finance tooling to ingest higher-granularity billing signals.
Limits and compatibility
- SQL semantics unchanged: Fluid scaling affects capacity management and billing, not query semantics.
- BI integrations: Validate whether BI and monitoring tools ingest per-second slot usage; upgrade integrations that expect coarser telemetry.
Network and observability updates
A few networking and telemetry improvements are relevant to platform stability:
- Service mesh builds: New in-cluster Cloud Service Mesh builds (1.28.x asm revisions) are available; SRE teams should validate sidecar/control-plane compatibility before rolling out.
- Cloud CDN: URL-map–level cache policies (hostname/path/header/query granularity) reached GA, enabling declarative cache behavior at the load-balancer layer.
- Export APIs: Enhanced export APIs now support namespace/label filtering, CMEK for exported telemetry, and RBAC-aware extraction controls.
Operational recommendations
- Canaried upgrades: Test service mesh builds in canaries for mTLS, policy enforcement, and observability before fleet-wide upgrades.
- Declarative caching: Move application-level cache-control workarounds into URL map cache rules and manage them via GitOps.
- Secure telemetry exports: Enforce CMEK for exported logs and use RBAC filters to limit extraction privileges.
Quarterly checklist for platform teams
- Audit node pools and label exclusions; add automated alerts when exclusions approach policy limits (e.g., 60 days) and require re-approval.
- Update inference runbooks and routing logic to treat Gemini 3.1 Pro/3 Flash as LA-capable backends with fallbacks and circuit breakers.
- Pilot BigQuery fluid scaling in a non-critical project; compare cost variance versus legacy reservations and update chargeback pipelines for per-second data.
- Standardize and validate service mesh and CDN configurations in staging before promoting to production.
Longer-term governance
- Track technical debt associated with excluded node pools and require mitigation timelines.
- Treat LA model access as staged: implement feature flags and traffic-splitting so you can roll back model routing without code changes.
- Combine BigQuery fluid scaling with query-cost controls, baseline reservations, and team quotas to avoid runaway spend.
Summary
Together, these updates add operational knobs for availability, latency, and cost. To capture benefits safely, codify exclusions and cache policies in IaC, extend telemetry to be model- and pool-aware, and add automated safety checks around excluded node pools and LA model usage.
References
- Google Cloud release notes and product announcements
- Designing Robust Multi-Provider LLM Platforms: Routing, RAG, and Inference Scaling (/article/designing-robust-multi-provider-llm-platforms-routing-rag-scaling/)
- BigQuery Fluid Scaling GA and Vertex AI Model Garden Updates (June 2026) (/article/bigquery-fluid-scaling-ga-vertex-ai-model-garden-updates-june-2026/)