Overview
Amazon's recent platform updates cluster around three operational shifts platform teams should plan for: Bedrock's general availability of GPT-5.5, GPT-5.4, and Codex under pay-per-token pricing; a re:Invent emphasis on "Amazon EKS Capabilities" for orchestrating containerized workloads; and an ergonomics improvement — a macOS universal installer for AWS CLI v2 starting at 2.30.0 that covers both Apple silicon and Intel.
Amazon Bedrock: GA models and what changes for platforms
Amazon lists GPT-5.5, GPT-5.4, and Codex as generally available on Bedrock. Treat these models as production-grade managed endpoints within AWS's inference ecosystem, and assume Bedrock will provide authentication, routing, and integration with AWS observability and governance surfaces — but verify specifics for your compliance needs.
Immediate platform impacts
- Model registry and telemetry: add entries for GPT-5.5/GPT-5.4/Codex to your model catalog and capture token usage as a first-class billing metric.
- Governance boundary: hosting on Bedrock delegates portions of the audit and key-management surface to AWS. Confirm which controls (CloudTrail events, KMS integration, access logging, data residency) are covered by AWS-managed governance and which you must implement.
If you operate a multi-provider LLM strategy, decide routing policies up-front: route high-fidelity or code-generation requests to Bedrock models and fall back to alternative providers where needed. See companion guidance: AWS: Bedrock enhancements, Multicloud Interconnect, and Amazon S3 Files — architecture implications.
Pay-per-token pricing — operational implications
Pay-per-token aligns cost to usage but raises the operational need for tighter controls and visibility.
Key priorities
- Token accounting: emit token consumption per request to your billing pipeline and associate it with owner and deployment metadata.
- Quotas and rate limiting: implement quotas per team/application and throttle at the inference ingress (API gateway, intermediary proxy) to prevent runaway spend.
- Observability: extend SLOs and dashboards to include cost per successful inference, tokens per 1k requests, reply-size distributions, and tail latency.
- Governance verification: confirm which Bedrock events are visible in CloudTrail, how KMS keys are used for any persisted artifacts, and whether inference inputs/outputs are covered by AWS retention and residency policies.
Operational playbook (concise)
- Tag inference requests with owner IDs and deployment versions.
- Stream token counts in real time to a metrics backend and run daily burn reports correlating tokens → cost → model version.
- Use a cost-aware router to prefer lower-cost models on high-volume, lower-fidelity paths.
Amazon EKS Capabilities at re:Invent 2025 — what to map to platform primitives
The "EKS Capabilities" theme signals continued investment in EKS as the canonical managed control plane for containerized workloads, including inference and batch jobs.
Top concerns for inference on EKS
- Workload consolidation: plan for inference-sidecars, GPU scheduling, and autoscaling patterns.
- Platform APIs and add-ons: expect higher-level primitives for observability, policy, and lifecycle; validate first-party integrations for CI/CD and resource policy enforcement.
- Resource management: reinforce node-level autoscaling, mixed-instance strategies, spot fallback handling, and checkpointing for GPU workloads.
Action checklist
- Validate GPU quotas and node-group configurations across regions; run capacity tests that reflect model memory and burst patterns.
- Integrate model-serving telemetry (tokens/sec, requests/sec) into cluster-level monitoring (Prometheus/Grafana) and correlate instance counts to tokens/sec.
- Enforce cost and security policies via admission controllers or policy engines (OPA/Gatekeeper or equivalent).
AWS CLI v2 2.30.0 macOS universal installer — developer tooling and CI/CD ergonomics
AWS CLI v2.30.0+ ships a universal macOS installer that supports both Apple silicon and Intel in one package. This reduces branching in bootstrap scripts and image builders for macOS developer images and self-hosted runners.
Practical changes
- Use a single installer URL in macOS image builders and validate signatures as part of immutable image creation.
- Continue to pin the CLI version in onboarding and CI images and run acceptance tests to detect behavioral changes across versions.
Platform takeaways and immediate priorities
Treat these announcements as a package of operational changes focused on billing, governance, and orchestration.
Immediate actions
- Add tokens to observability and cost stacks; implement per-owner token counters and daily burn reports.
- Centralize inference ingress behind a proxy or API gateway to enforce quotas, authentication, request normalization, and cost-aware routing.
- Decide hosting mix: which workloads use Bedrock (managed, fast go-to-prod) versus self-hosting on EKS (infra control, potential lower unit cost at scale). Document decision criteria: latency SLA, cost per 1k tokens, data residency, and customization needs.
- Validate EKS capacity and GPU reservation strategies with production load tests; add spot-instance fallbacks and robust termination handling.
- Update developer onboarding and CI images to use AWS CLI v2.30.0+ universal installers and pin versions.
Longer-term items
- Build a cost-aware router that uses tokens and model cost-per-token as inputs to routing decisions.
- Extend policy-as-code to include token budgets and inference quotas tied to SSO/identity provisioning.
- Request explicit documentation from AWS on Bedrock's audit, CloudTrail visibility, and KMS protections and map those to compliance controls.
Bottom line
Bedrock's GA of GPT-5.5/GPT-5.4/Codex, the EKS capabilities emphasis, and the macOS AWS CLI universal installer are operational levers. They lower some barriers to production but require platform teams to implement token-aware telemetry, centralize inference ingress, and ensure EKS clusters are provisioned and policy-controlled for large-scale LLM traffic.