Amazon Bedrock managed retrieval and agent web search: RAG without self-hosted vector stores

AWS just made one of the more consequential infrastructure bets for enterprise generative AI: a managed retrieval stack that means you no longer have to run your own vector store and ETL pipeline to build grounded RAG applications. That sounds like a plumbing improvement, but it changes where complexity, cost, and security live.

The retrieval problem AWS just swallowed

Amazon Bedrock's managed retrieval feature bundles native connectors to enterprise data sources, multi-format parsing for normalization, and a retriever component that can handle multi-step, context-rich queries. Critically, the whole thing is integrated with Bedrock's agent tooling and gateway so teams can build production RAG pipelines without owning vector stores, embedding pipelines, or bespoke ETL jobs.

This matters for three reasons.

First, it collapses a maintenance burden. Running Redis/FAISS/HNSW + periodic re-embedding jobs is operational toil: data drift, freshness windows, hot-spotting, and backup/restore are all non-trivial. AWS taking the retrieval piece off your plate is the right call for most teams — they were already inventing brittle solutions that lacked observability.

Second, it creates a new trust boundary. Your enterprise connectors and parsed documents now cross an AWS-managed plane. That centralizes auditing and access control (good), but it also concentrates an attack surface engineers must treat like any other privileged service. Treat the Knowledge Base connectors as infrastructure: IAM roles, logging, data classification, and least-privilege network controls are not optional.

Third, it changes cost and performance trade-offs. Managed retrieval can hide embedding costs and provide much better freshness SLAs, but it also introduces latency and pricing characteristics you need to measure. If your SLA is p50 50ms for inference + retrieval over internal data, you need to benchmark Bedrock's agent gateway from your VPC.

Bedrock's agent-facing web search

Bedrock also includes a managed web-search integration for agents that returns cited, current results inside the secured AWS environment. That makes it trivial to build agents that need up-to-date facts without wiring custom scraping or external search APIs — again, shifting control to AWS.

What this means operationally

You should treat Bedrock Knowledge Base connectors as first-class resources in your infra repo: include their provisioning, IAM, and audit log routing in your CI/CD.
Expect to rework your incident playbooks: retrieval failures (staleness, quota limits, connector auths) will be operational incidents distinct from model outages.
Add SLOs for retrieval latency and freshness and wire them into release gates.

G7 (Blackwell-class) on EC2: inference and graphics on familiar primitives

On the infrastructure side AWS also announced a new EC2 G7 instance family featuring Blackwell-class NVIDIA GPUs targeted at high-performance inference, 3D graphics, and analytics workloads. These instances are exposed through the usual EC2 constructs — instance families, EBS-backed storage, VPC networking — meaning teams can slot Blackwell-class accelerators into existing deployment and provisioning patterns.

Two practical implications: if you have on-prem Blackwell-optimized inference code, migration to EC2 G7 should be straightforward; and for graphics-heavy workloads the EC2 surface means fewer surprises than bespoke managed services. Pricing and comparative throughput against H100-class instances still needs testing; expect AWS to position G7 for cost-effective production inference rather than raw training throughput.

ECS high-resolution metrics: finally usable autoscaling

Amazon ECS now publishes finer-grained service metrics to CloudWatch (sub-minute granularity where supported) that can feed directly into ECS Service Auto Scaling policies. This fixes a long-standing operational gap: 1-minute granularity and aggregated signals were a major reason autoscaling either lagged or oscillated. Higher-resolution metrics reduce over/under-provisioning on micro-spikes and let control-plane policies react more like edge load balancers.

This should be a no-brainer for teams that run latency-sensitive services on ECS. If you haven't revisited your scaling policies since you set them up, do it now — these metrics will change the tuning surface.

A broader pattern: AWS moving up the stack (and the trade-offs)

Alongside these launches, AWS previewed additional higher-level developer and security features: more release-readiness and autonomous testing for DevOps workflows, and expanded security integrations for threat modeling and code scanning. The trend is clear: AWS is absorbing more of the developer experience and security tooling into managed services.

That's valuable: it reduces bespoke glue and accelerates delivery. It's also a vendor-consolidation move that increases your blast radius. Platform teams must respond by shifting from ad-hoc scripts to formalized provisioning, policy as code, and tighter observability for these managed primitives.

Final thought

Managed retrieval in Bedrock is not incremental; it's a platform decision. If you let AWS own retrieval, you gain speed and lose a bit of direct control — but that loss is manageable if you treat these new primitives like critical system services: codify, monitor, and defend them. Over the next 12 months the real work won't be swapping vector stores for agent gateways; it'll be rearchitecting SLOs, IAM policies, and deploy pipelines around a world where AWS runs your retrieval and search.

Amazon Bedrock managed retrieval and agent web search: RAG without self-hosted vector stores

Sources

Amazon Bedrock Managed Knowledge Base, agent web search GA, SageMaker InvokeEndpointAsync inline payloads, EC2 G7 with Blackwell GPUs

EC2 M9g/M9gd (Graviton5) GA — Cognito multi-Region & CMKs, MCP Server GA, WAF Bot Control edge metering

AWS WAF Bot Control: edge AI traffic monetization and bot billing