Amazon Bedrock Managed Knowledge Base, agent web search GA, SageMaker InvokeEndpointAsync inline payloads, EC2 G7 with Blackwell GPUs

AWS just handed platform teams a new attack surface and called it a feature. The Bedrock Managed Knowledge Base plus GA agent web search mean you no longer have to stitch together vector stores, ingestion pipelines, and web crawlers to ground an agent — AWS will host the retrieval layer, parse documents in multiple formats, and provide an agent-integrated retriever that plugs into Bedrock's agent runtime.

That is enormous for engineering velocity. It also fundamentally changes where you solve security and audit problems. The managed RAG data plane removes a lot of brittle custom engineering (no more home-grown S3 + Lambda + FAISS glue), but it centralizes access: connectors and the Bedrock agent gateway become the choke points for every enterprise knowledge surface. If you treat those as just another API key to hand out, you will regret it.

What changed, precisely

Amazon Bedrock Managed Knowledge Base: fully managed retrieval with native connectors and automated parsing for PDFs, docs, and HTML. The retrieval service integrates with Bedrock's agent runtime so agents can fetch and cite enterprise data without teams running a separate vector-store fleet.
Agent web search on Bedrock: GA. A managed web-search capability that lets agents ground answers in current web content with citations. AWS describes controls to keep connector data in customer-controlled AWS resources, reducing data movement and simplifying architectures that need fresh signals.
SageMaker async inference: InvokeEndpointAsync now supports inline payloads as an alternative to mandatory S3 staging and includes improved container image caching to reduce cold-start and scale-out latency for inference workloads.
EC2 G7 instances: GA — a new GPU instance family using NVIDIA Blackwell-generation GPUs for inference, graphics, and analytics workloads.
Amazon ECS: higher-resolution service metrics for more granular autoscaling.

Why this matters for platform teams

Managed retrieval and agent web search remove a lot of operational toil. You get fewer moving parts and faster time to production for agentic applications — that’s the point. If you want a concise treatment of how this shifts the RAG model away from self-hosted vector stores, see our earlier writeup: Amazon Bedrock managed retrieval and agent web search: RAG without self-hosted vector stores.

But the real operational cost moves from infra to identity and audit: who can create knowledge bases, attach connectors (SharePoint, Confluence, S3, databases), or route agent traffic through the Bedrock agent gateway? Those connectors are now high-value attack surfaces because they give an agent potentially broad access to internal knowledge.

The SageMaker changes are the tidy flip side: inline payloads for InvokeEndpointAsync and improved container-image caching are practical wins. Making S3 optional for payload staging eliminates extra permissions and latency, and caching reduces cold-starts when you scale out generative models. If you’re serving LLM prompts at scale, this is materially simpler and cheaper engineering.

Opinion: the move is the right call — but it's incomplete

Providing managed retrieval and web search was overdue; the alternative was teams improvising with brittle credential injection and poorly auditable connectors. AWS correctly centralized this capability. But they also consolidated responsibility. Platform teams must treat Bedrock Managed Knowledge Base and agent features as privileged runtime: enforce least privilege on connector creation, log agent gateway calls, and integrate those logs into your SIEM and policy engines. If you don’t, you’ll trade operational complexity for a compliance and breach surface that’s harder to investigate.

What to change tomorrow

Treat knowledge-base creation and connector onboarding as an RBAC workflow with approvals and audit hooks.
Route Bedrock agent gateway traffic through VPC endpoints, strong network policies, and transparent request logging.
Use the SageMaker inline payloads to simplify IAM (fewer S3 roles) and enable container-cache-aware scaling policies.

Parting thought

Managed RAG and agent web search are going to become the default developer experience for enterprise agents. That’s great for velocity — but platform teams who treat this as just another API will wake up to unanticipated audit trails and data exfil risks. The engineering bet here is clear: AWS wants to move retrieval and grounding into the cloud platform; your job is to make it safe, observable, and governed before someone asks why an agent quoted a confidential doc in a customer-facing reply.

Amazon Bedrock Managed Knowledge Base, agent web search GA, SageMaker InvokeEndpointAsync inline payloads, EC2 G7 with Blackwell GPUs

Sources

Amazon Bedrock managed retrieval and agent web search: RAG without self-hosted vector stores

EC2 M9g/M9gd (Graviton5) GA — Cognito multi-Region & CMKs, MCP Server GA, WAF Bot Control edge metering

AWS WAF Bot Control: edge AI traffic monetization and bot billing