State of Platform Engineering Vol.4: Building AI-Native IDPs, Measuring DORA, and Platform Product Management

Introduction

Recent research — the platformengineering.org preview of State of Platform Engineering Vol.4 and Google Cloud's platform engineering research — converge on one operational thesis: internal developer platforms (IDPs) are evolving from tool-aggregation toward product-first platforms, and that evolution now includes AI primitives as first-class capabilities. For platform teams this implies treating model endpoints, agents, and ML workflows as platform products with SLAs, telemetry, and governance baked into the catalog and developer experience.

What an AI-native IDP looks like

An AI-native IDP does not bolt AI onto existing tooling; it exposes AI capabilities as self-service platform products. Practically, implement three core architectural moves:

Catalog-first model artifacts: record model and agent ownership, validation state, approved providers, and contract metadata in the service catalog so discovery, access control, and billing are consistent with other platform products.
Opinionated scaffolds and golden paths: provide templates and Scaffolder workflows for common AI tasks (vector store provisioning, RAG pipelines, batch inference, agent scaffolds) that include security and cost guardrails.
LLMOps primitives in telemetry: collect input/output telemetry (sampled and redacted), drift signals, performance and cost metrics, and expose these for SLOs and automated workflows.

These are product requirements as much as engineering tasks: platform product management (PMs, UX researchers) is necessary to prioritize trade-offs across infra, data, security, and developer experience.

DORA metrics as platform KPIs

Google Cloud's research reinforces that platform maturity correlates with DORA performance. The canonical four DORA metrics remain essential:

Deployment Frequency
Lead Time for Changes
Change Failure Rate
Mean Time to Recovery (MTTR)

Treat DORA as a platform concern by adding two concrete telemetry axes:

Platform-level DORA slices: measure DORA for platform consumers — teams using templates, Scaffolder workflows, and platform CI/CD — not only for product teams. Examples: time from template creation to first successful production deploy, median lead time for services created via the platform vs outside it, and template escape rate (teams bypassing templates).
Feature-level SLIs for AI products: define SLIs per model/agent such as inference latency P95, inference error rate, requests-per-minute, and cost per 1k requests. Join these SLIs with change-failure and recovery metrics for model updates.

Immediate instrumentation recommendations

Emit a standardized event when a scaffolder/template is used (template_id, user_id, repo, target_cluster, outcome) and ingest to your analytics store (BigQuery, ClickHouse, Snowplow).
Tag CI/CD runs with a platform_scaffolder=true label so deployment frequency and lead-time can be partitioned.
Add model metadata to the catalog (model_id, provider, model_version, schema_contract, owner) and join telemetry for SLOs and cost allocation.

Designing golden paths and platform product management for AI

Golden paths must balance opinionation and escape hatches; with AI the stakes for safety and cost increase. Implement these product/design elements:

Templates for common AI flows: RAG pipelines using approved vector stores, batch inference templates with autoscaling/spot fallbacks, and agent scaffolds with bounded tool access. Each template should include security and cost controls.
Approval workflows for new providers: wire approval state into catalog metadata and enforce automated gating (data access review, lineage checks) for provider additions.
Consumption SLAs and quotas: expose per-team quotas and autoscaling/cost-smoothing knobs to limit noisy-neighbour incidents.
User research and telemetry: treat templates and AI primitives as features — run short usability tests, measure template-to-production conversion and abandonment, and iterate based on results.

Backstage and Backstage-style IDPs are common vehicles for these elements: the catalog, Scaffolder, and plugin model let you bake in templates, approval flows, and metadata-driven governance. Platform PMs should own template conversion metrics and platform-level DORA slices.

Operational and security implications

Security and governance

Data handling and PII: enforce data classification and automated redaction where required; ensure templates that provision training or RAG workflows include these checks.
Policy-as-code: use a policy engine to enforce runtime restrictions (tool access for agents, network egress rules), promotion gates for model versions, and data access constraints.
Auditable model lineage: require traceability from promoted model versions to training data, evaluation artifacts, and owners; register artifacts in a verified registry linked from the catalog.

Observability and LLMOps

Sampled, redacted I/O telemetry: full logging of every prompt is often impractical and risky; use sampling with contextual identifiers (user id, request id) and enforce redaction rules.
Drift detection and retraining triggers: monitor distributional changes (input embeddings, output characteristics) and create platform workflows for retraining or human review when thresholds are crossed.
Cost and performance dashboards: correlate model-level metrics with deployment frequency and MTTR to measure platform ROI and surface cost outliers to owners.

Runtime architecture guidance

Abstract provider APIs behind a platform model service to centralize provider selection, caching, input/output filtering, quotas, and policy enforcement.
Offer both low-latency inference patterns (sidecars or managed low-latency tiers) and batch/autoscaling job clusters as templates.
Map model and agent calls to IAM principals for tracing, billing, and end-to-end policy enforcement.

A 6–12 month roadmap for Backstage-style IDPs

For teams running Backstage-style IDPs, the combined guidance implies this phased roadmap:

Treat AI capabilities as platform products: add model/agent entries to the catalog, assign owners, define SLIs, and create promotion workflows.
Make DORA a core success metric: instrument template usage and platform CI so you can show impact on deployment frequency and lead time.
Ship a small set of opinionated AI golden paths: start with an RAG template, a batch inference template, and an agent scaffold with bounded tools; iterate from telemetry.
Centralize model routing and provider abstraction: implement a platform model service/facade to manage providers, caching, and quotas.
Build LLMOps and policy primitives: add sampled I/O telemetry, drift detectors, cost dashboards, and policy-as-code gates into the platform telemetry and approval flows.
Assign platform product ownership: make PMs and UX researchers accountable for DORA slices, template conversion, and operational risk OKRs.

Start small and measurable: add model metadata to the catalog, emit template usage events from the Scaffolder, and release a single guarded AI template. These steps make AI capabilities discoverable, measurable, and governable — the core outcomes Vol.4 and the Google Cloud research identify for a mature platform.

Conclusion

AI features expand both the scope and the risk profile of IDPs. Treating models and agents as platform products, instrumenting platform-level DORA slices, and investing in LLMOps and policy primitives are practical, measurable steps to adopt AI safely and effectively within Backstage-style IDPs.

State of Platform Engineering Vol.4: Building AI-Native IDPs, Measuring DORA, and Platform Product Management

Sources

Backstage v1.48.0: Catalog Extension Points Graduate, API Override Rules Tighten

Backstage v1.47.0: Security fixes for Scaffolder templates and external content reading

Backstage v1.47.0: Scaffolder and catalog-ingestion security fixes — upgrade guidance for platform teams