GCP

Vertex AI Agent Engine: Sessions, Memory Bank & Code Execution billing begins 2026-01-28

Vertex AI Agent Engine will charge for Sessions, Memory Bank, and Code Execution starting 2026-01-28. Teams must rethink agent state and cost telemetry.

June 19, 2026·3 min read·AI researched · AI written · AI reviewed

Google just put a hard date on something platform teams have been pricing into spreadsheets for months: Vertex AI Agent Engine will begin charging for Sessions, Memory Bank, and Code Execution on January 28, 2026. That single line item changes the economic calculus for any production agent architecture that uses persistent sessions or server-style code execution inside Agent Engine.

The immediate surprise isn’t that Google will meter agent features — that was inevitable — it’s the combination of moves in the same window. Google lowered the base price for Vertex AI Agent Engine runtime but is now charging for the things that actually enable production-grade agents: session state, Memory Bank long-term context, and inline code execution. Lower runtime costs plus discrete feature metering is a textbook nudge: make the runtime cheap, make state and execution explicit and observable.

If you run agent orchestration at scale, this is a big deal. Teams that hoisted conversational context into Memory Bank because “it’s convenient and cheap” are about to see a new recurring cost. Sessions billing creates a direct incentive against long-lived session habits. Code Execution charges mean that using Agent Engine as a generalized serverless runner — spinning up ephemeral environments to execute customer code or business logic — now has a direct per-feature tax.

Two quick, practical implications:

  • You will move more state out of Memory Bank and into cheaper, auditable stores (BigQuery, Cloud Storage, Redis) and rehydrate only the slices the model needs. Persisting conversation logs remains fine; keeping heavy context live in Memory Bank does not.
  • Architect your agents to prefer stateless short-lived sessions or to checkpoint state asynchronously. If you depend on in-engine Code Execution for business logic, treat that as a billable compute path and gate it with runtime quotas and tracing.

Google didn’t release these changes in isolation. Vertex AI also added lower-cost preview models aimed at cheaper and faster inference for certain workloads, which give teams more explicit tradeoffs between latency, quality, and cost. For teams optimizing TCO, cheaper model options plus metered state/execution lets you trade latency/quality for cost more predictably.

On the platform side, BigQuery’s recent autoscaling and billing improvements reached GA with finer-grained autoscaling and billing behavior that better aligns to short-lived AI bursts. Google also put a Gemini-assisted SQL analysis feature for BigQuery into preview — it will surface performance and cost recommendations, lineage, and scheduling suggestions. Combine that with the existing ability to export billing data to BigQuery (and any newer billing-export previews) and you have the observability to tie agent features to dollar figures.

Frank opinion: Google made the right move making these agent primitives billable and simultaneously improving cost tooling. The alternative was a long tail of teams building ad-hoc, unmetered stateful agents and getting surprised by runaway invoices months after launch. Charging for Memory Bank and Code Execution forces engineering teams to build observable, auditable patterns now rather than cobble them later.

That said, this will bite teams that treated Agent Engine as a free sandbox for production workloads. If your SREs aren’t already tracking feature-level agent usage and tying it into chargeback, you will have unpleasant conversations in Q1 2026.

What to do this week: inventory where you use Sessions, Memory Bank, and Code Execution; estimate monthly costs against the new pricing profile; and explore the lower-cost Gemini and Veo preview models as lower-cost inference options. Use BigQuery autoscaling improvements and billing exports to create the telemetry you’ll need to enforce quotas and runbacks.

Prediction: within six months we’ll see a small ecosystem of agent-cost managers that stitch BigQuery billing exports, Gemini-assisted cost analysis, and runtime quotas into a single dashboard. Platforms that treat agent state as a first-class cost object will be the ones that scale agents in production without surprise invoices.

Sources

vertex-aigeminibigquerygcp-billing
← All articles
GCP

Gemini 3.5 Flash region toggle removed — migrate to Vertex AI endpoints & traffic-split

Google removed the Gemini 3.5 Flash region-scoped feature toggle in mid‑June 2026, forcing teams to use endpoints, model versions, and traffic-split controls.

Jun 20, 2026·3mgemini-3-5vertex-ai
GCP

Google Gemini Enterprise Agent Platform pricing: AI Cost Summary Agent (Preview) and token-rate details

Google Cloud added an AI Cost Summary Agent (Preview) and published Gemini Enterprise pricing with explicit storage, per-session, and token rates and discounts.

Jun 18, 2026·3mgemini-enterprise-agent-platformgcp
GCP

Vertex AI Gemini 3.x: agent billing, token costs, and Cloud Run GPU patterns

Gemini 3.x on Vertex AI is billed by input and output tokens; agent orchestrations can generate multiple billable events. Track tokens, retrieval, and compute.

Jun 16, 2026·3mvertex-aigemini