GCP

GCP Weekly: GKE ASM 1.28.7-asm.3, Vertex AI DSA GA, Gemini 3.1 Pro/3 Flash LA, BigQuery Fluid Scaling, Cloud CDN Cache Policies

Weekly GCP updates: GKE ASM 1.28.7-asm.3, Vertex AI DSA GA, Gemini 3.1 Pro/3 Flash LA, BigQuery per-second fluid scaling, and finer Cloud CDN cache policies.

June 5, 2026·6 min read·AI researched · AI written · AI reviewed

A set of incremental, infrastructure-facing updates across GKE, Vertex AI, BigQuery, and Cloud CDN landed this week. These changes are operationally significant for platform teams: they alter runtime behaviors, billing math, and cache-key controls. Below are the technical highlights and concrete actions to adopt during upgrades and planning.

GKE and Service Mesh: behavioral patches that matter

The notable GKE add-on update is Cloud Service Mesh (ASM) 1.28.7-asm.3. Although released as a maintenance patch, it includes fixes and adjustments to traffic management, sidecar lifecycle, and connection handling that can change runtime behavior during upgrades.

Operational impacts:

  • Timing changes in sidecar injection and Envoy configuration propagation can surface as transient 5xx errors or higher tail latency during control-plane updates and rolling node drains.
  • Adjusted HTTP/2 and connection-pooling behavior may affect workloads that depend on long-lived connections.

Action items:

  • Pin add-on and ASM versions in your cluster manifests and CI/CD to avoid accidental rollouts.
  • Use staged rollouts (per node pool or per namespace) and run canary clusters that mirror production traffic when applying patches.
  • Add health checks and synthetic traffic tests that validate request paths and connection behavior after sidecar updates.
  • Reconcile third-party CNI and CRD compatibility (Calico, Cilium, etc.) against target GKE versions rather than assuming automatic compatibility.

Vertex AI, Data Science Agent GA, and Gemini availability

Data Science Agent (DSA) for Colab Enterprise and BigQuery reached GA. This provides a supported integration for notebooks that run queries and experiments directly against BigQuery, improving provenance and reducing ad-hoc data extraction.

Operational implications:

  • Centralize access: enforce service-account-based access for DSA, apply least-privilege IAM, and manage BigQuery job quotas and audit logs as primary control points.
  • Capture reproducibility artifacts (SQL job specs, query params, sample datasets) when promoting notebook work to production training jobs.

Gemini Enterprise customers gained Limited Availability access to Gemini 3.1 Pro and Gemini 3 Flash. For teams using Vertex AI or enterprise Gemini APIs:

  • Incorporate model SLOs and latency expectations into API contract tests and downstream SLAs.
  • Load-test Pro and Flash variants: Flash models may prioritize throughput and latency at different cost/quality trade-offs.
  • Track tokenization and generation behavior across model versions to detect changes in hallucination rate or output distribution that affect labeling and downstream processing.

BigQuery fluid scaling: per-second billing and ephemeral reservations

BigQuery's fluid-scaling updates (per-second billing and removal of a minimum reservation duration) change how you provision analytic capacity and optimize cost.

Why it matters:

  • Short-lived, spike-oriented reservations become cost-effective; you can provision capacity for narrow job windows without a multi-minute minimum charge.
  • Existing cost models that assumed sustained reservations should be revisited; burst profiles and concurrency patterns should guide reservation strategy.

Operational guidance:

  • Automate reservation orchestration: create and tear down reservations around heavy jobs via your scheduler (Airflow, Cloud Composer, Vertex Pipelines) to avoid paying for idle capacity.
  • Instrument slot utilization by job type and duration; use metrics to decide when ephemeral reservations are cheaper than continuous capacity.
  • Update FinOps dashboards and alerts to reflect per-second granularity so forecasting and chargebacks match actual spend.

Architectural option:

  • Combine ephemeral BigQuery reservations with preemptible upstream compute for ETL and coordinate orchestration so compute and BigQuery capacity spin up concurrently.

Cloud CDN + Application Load Balancer: URL-map-level cache policies

Global external Application Load Balancer now supports Cloud CDN cache policies configured at URL-map levels (hostname, path, headers, query params). This enables fine-grained cache-key design without pushing logic into application code.

Practical effects:

  • Apply different TTLs and cache-key normalization strategies for APIs versus static assets on the same host.
  • Exclude or normalize high-cardinality query parameters at the edge to reduce cache fragmentation and egress costs.

Operational advice:

  • Run a cache analysis to identify high-cost egress paths and high-cardinality parameters; use URL-map rules to normalize or exclude irrelevant params.
  • Use path-level TTLs for immutable assets and short TTLs for dynamic API responses.
  • Test for unintended cache collisions and validate header/query matching with synthetic traffic before rollout.
  • Ensure URL-map and Cloud Armor rule ordering is explicit in IaC so security rewrites or middleware don't inadvertently change cache keys.

Practical checklist

  • Treat minor ASM patches as behavioral changes: staged rollouts, canaries, and targeted health checks are essential.
  • Harden notebook governance now that DSA is GA: use service accounts, least-privilege IAM, and capture provenance for promoted queries.
  • Bake model-version testing into CI: load-test Gemini variants for latency, throughput, and output quality.
  • Rework BigQuery capacity strategy: automate ephemeral reservations where it reduces cost and align FinOps tooling with per-second billing.
  • Adopt URL-map-level CDN cache policies to reduce tail latency and egress costs, and validate cache-key logic with tests.

Subscribe to GCP release notes and feed them into your change governance pipeline. These updates are incremental but operationally meaningful: they change upgrade behavior, billing math, and edge caching controls. Platform teams that convert these changes into automated practices, clear SLAs, and updated cost models will reduce risk and lower operating costs.

Sources

gcpgkevertex-aibigquerycloud-cdn
← All articles
GCP

GKE Secret Sync GA, AI Cost Summary Agent Preview, and Gemini Enterprise telemetry updates

Google Cloud released GKE Secret Sync as GA, launched an AI Cost Summary Agent in Preview, and expanded Gemini Enterprise telemetry and capacity options.

Jun 3, 2026·6mgcpgke
GCP

Google Cloud Weekly: Cloud Run Worker Pools GA, Gemini 3.1 Flash‑Lite & Pro Previews, AI Infra Updates

Weekly Google Cloud roundup: Cloud Run worker pools GA for pull-based non-HTTP workloads; Gemini 3.1 Flash-Lite and Pro in preview on Vertex AI and Gemini API.

Jun 1, 2026·6mgoogle-cloudcloud-run
GCP

Google Cloud: Gemini 3.1 Flash‑Lite & Pro previews, Cloud Run worker pools GA, Fractional G4s, and gcloud/url-map updates

Gemini 3.1 Flash‑Lite/Pro previews, Cloud Run worker pools GA, Fractional G4 GPUs, and gcloud/url-map updates — operational guidance for platform and SRE teams.

May 29, 2026·6mgoogle-cloudvertex-ai