Skip to content

Platform Engineering: The Next Evolution of DevOps

A
abemon
| | 18 min read | Written by practitioners
Share

DevOps delivered on its promise. And created a new problem.

DevOps worked. It broke down silos between development and operations, automated deployments, and accelerated delivery cycles. But along the way, it transferred an enormous amount of operational responsibility to development teams. Most of those teams were not equipped to absorb it.

A backend developer in 2015 needed to know how to write code, tests, and maybe some SQL. A backend developer in 2025 needs to understand Kubernetes, Terraform, CI/CD pipelines, network policies, secrets management, observability, service meshes, and a dozen more tools. Cognitive load has multiplied. The result is predictable: teams spend more time wrestling infrastructure than building product.

Gartner estimates that by 2026, 80% of software engineering organizations will have platform teams. Not because it is trendy, but because the current model does not scale. With 5 developers, each one can configure their own infrastructure. With 50, you need standardization. With 200, you need a platform.

What platform engineering is (and is not)

Platform engineering is the discipline of designing and building self-service toolchains and workflows that enable development teams to deliver software without filing tickets to operations or becoming infrastructure experts.

The product of a platform team is an Internal Developer Platform (IDP): a set of tools, APIs, templates, and documentation that abstracts infrastructural complexity and exposes capabilities through simple interfaces. The developer says “I need a service with a PostgreSQL database and a public endpoint,” and the platform provisions it with all security, networking, observability, and compliance configurations included.

What platform engineering is not:

  • Renaming the operations team as “platform team.”
  • Building a monolithic internal tool that nobody wants to use.
  • Centralizing control to slow teams down.
  • Putting a pretty portal on top of chaotic infrastructure.

The key difference is the product mindset. A platform team treats developers as internal customers. It does discovery, measures adoption, iterates on feedback, and deprecates what does not work. If nobody uses your platform voluntarily, you do not have a platform. You have a mandate that teams circumvent with workarounds.

The five components of an IDP

After designing and operating internal platforms for companies with 40 to 300 engineers, we have identified five components that appear in every functional IDP.

1. Developer portal

The entry point. A web portal where developers discover available services, create new projects, browse documentation, and view deployment status. Backstage (originally from Spotify, now CNCF) is the de facto standard. Not because it is perfect (it has a significant learning curve), but because its plugin architecture allows integration with virtually any stack.

In practice, the portal is the component that generates the most visible value and carries the highest risk of becoming shelfware. The key is solving real problems from day one: “Where is the documentation for this service?”, “Who owns this API?”, “How do I create a new microservice?” If the portal does not answer those questions better than asking a colleague on Slack, it will fail.

2. Golden paths

A golden path is a preconfigured route for completing a common task. It is not the only possible path, but it is the recommended, tested, and supported one. The “golden path” versus “dirt road” metaphor is deliberate: you can go off-path, but you are on your own.

Concrete examples:

  • Create a microservice: a template that generates a repo with project structure, Dockerfile, CI/CD pipeline, observability configuration, and Kubernetes manifest. The developer runs a command, answers four questions (name, language, needs database, needs queue), and has a deployable service in 15 minutes.
  • Add a database: a workflow that provisions a PostgreSQL or MySQL instance with automatic backups, monitoring, and credentials rotated in the secrets vault. No tickets, no waiting.
  • Deploy to production: a pipeline that runs tests, security scanning, configuration validation, and progressive rollout (canary or blue-green). The developer merges to main and the golden path handles the rest.

The most common mistake with golden paths is trying to cover 100% of use cases. Cover 80%. The remaining 20% are exceptions that the platform team handles manually or that require specialized paths. Trying to automate everything from the start is the surest way to deliver nothing useful.

3. Infrastructure as Code with abstractions

The third component is the layer that actually provisions resources. Terraform, Pulumi, or Crossplane underneath, but exposed through high-level abstractions that hide the complexity.

A developer should not write 200 lines of Terraform to create an S3 bucket with encryption, versioning, access policies, and lifecycle rules. They should write something like:

kind: StorageBucket
metadata:
  name: user-uploads
spec:
  access: private
  retention: 90d
  encryption: true

And the platform translates that into the 200 lines of Terraform with all best practices included. Crossplane does this natively with its Compositions. Terraform can achieve it with well-designed modules. The point is that the developer declares the intent, not the implementation.

This abstraction comes at a cost: flexibility. And that is exactly what you want. Limiting options reduces the error surface, simplifies maintenance, and allows the platform team to guarantee compliance and security. If a team needs something outside the abstraction, they request it and the platform team decides whether to add it to the catalog or treat it as a justified exception.

4. Configuration orchestration

Configuration is the most underestimated aspect of platform engineering. A typical service has application configuration, secrets, per-environment variables (dev, staging, production), feature flags, and infrastructure configuration. Managing all of this coherently, auditably, and securely is a problem that grows exponentially with the number of services.

Solutions we see working:

  • Secrets: HashiCorp Vault or AWS Secrets Manager, with automatic rotation and runtime injection. Never in repos, never in static environment variables.
  • Application configuration: versioned files in git (GitOps) processed by ArgoCD or Flux. The developer opens a PR with the configuration change, it gets reviewed, approved, and applied automatically.
  • Feature flags: a centralized system (LaunchDarkly, Unleash, Flagsmith) that allows toggling features without redeploying. This decouples deployment from release, which is one of the most transformative changes in delivery velocity.

5. Integrated observability

The final component is “free” observability. Every service created through the platform comes with metrics, logs, and traces configured automatically. The platform team maintains an observability stack (Prometheus + Grafana + Loki + Tempo is the most common combination in mid-market environments) and the golden paths include instrumentation by default.

This eliminates one of the most persistent problems we see: teams that skip observability because they “don’t have time.” If observability is the default, there is no decision to make. The service is born observable.

Dashboards are also a platform product. A generic per-service dashboard showing the four golden signals (latency, traffic, errors, saturation) covers 80% of operational needs. Teams can customize them, but they start from a functional baseline.

The operating model: platform team as product team

The difference between a platform team that works and one that becomes a bottleneck is the operating model. And that model is, unambiguously, the product model.

Users: the developers in the organization. Not all of them. Start with a pilot team, collect feedback, iterate, and scale.

Product metrics: voluntary adoption (percentage of teams using the platform without being told to), onboarding time (how long it takes a new developer to deploy their first service), lead time (time from commit to production), and developer satisfaction (quarterly surveys, internal NPS).

Roadmap: prioritized by impact on teams, not by what is most technically interesting. If teams are asking for better secrets management and you want to build a service mesh, build the secrets management. Trust is built by solving the problems that hurt most.

Documentation: treated as part of the product. If it is not documented, it does not exist. The platform’s documentation is the most important interface — more than the portal, more than the CLIs. Because when a developer gets stuck at 2am, they will not look for a colleague. They will look for the docs.

A platform team for an organization of 50-100 developers typically has 3-5 people. For 200+, it can grow to 8-12. Beyond that, splitting into squads with specific responsibilities (CI/CD, infrastructure, observability, developer experience) makes sense.

Sizing the investment: when it makes sense

Not every organization needs platform engineering. And starting too early is as harmful as starting too late.

Under 20 developers: you probably do not need a formal platform. A solid set of scripts, templates, and documentation maintained by an SRE or senior devops engineer is sufficient. The overhead of building and maintaining an IDP is not justified.

20-50 developers: the inflection point. You start noticing that teams reinvent the wheel, that infrastructure configuration is inconsistent across services, that onboarding takes weeks. This is the moment to start with golden paths and a basic portal.

50-200 developers: platform engineering is nearly mandatory. Without a platform, delivery velocity degrades, incidents increase due to inconsistency, and top engineers get frustrated with operational complexity. The investment pays back in months, not years.

200+ developers: the platform is critical infrastructure, as important as the product the company sells.

The typical ROI we observe in mid-market organizations (50-200 developers) is a 40-60% reduction in new service onboarding time, a 30% reduction in configuration-related incidents, and a measurable increase in developer satisfaction. In absolute numbers, for a 100-developer organization with an average cost of EUR 70,000/year per engineer, recovering 10% of productive time equals EUR 700,000 annually. A 4-person platform team costs considerably less.

Mistakes we have seen (and made)

Building before understanding. The first impulse is to start building tools. The correct first step is to sit with teams and understand what hurts. We have seen platform teams spend 6 months building a sophisticated deployment system when the main problem was that nobody knew where the production secrets were stored.

The portal as a prestige project. Backstage is attractive. It has plugins, an active community, and looks great in a demo. But if your portal has 15 plugins and developers are still asking on Slack how to deploy, the portal is not solving anything. A minimal portal that works is infinitely better than a complete one that nobody uses.

Not measuring adoption. If you do not measure how many teams use your platform voluntarily, you do not know whether you are delivering value. We have seen platforms that were “successful” according to the team that built them but had real adoption of 30%. If 70% of your potential users are not using you, you do not have a successful product.

Overly opaque abstractions. Hiding all complexity is tempting, but when something breaks (and it will), developers need to be able to look under the hood. Good abstractions have escape hatches that allow access to the underlying layer when needed. Heroku’s platform was fantastic until you needed something not on the menu.

Ignoring security from the start. The platform is the perfect place to implement security by default: image scanning, network policies, secrets management, configuration compliance. If security is added after the fact, it is a friction layer. If it comes included in the golden path, it is invisible. We have helped clients reduce audit findings by 70% simply by moving security controls to the platform level.

The stack we recommend for mid-market

For organizations of 50-200 engineers that do not want to build everything from scratch, this is the technology stack we have validated:

ComponentToolAlternative
PortalBackstagePort
IaCTerraform + internal modulesCrossplane
GitOpsArgoCDFlux
CI/CDGitHub ActionsGitLab CI
SecretsHashiCorp VaultAWS Secrets Manager
ObservabilityPrometheus + Grafana + LokiDatadog
Feature flagsUnleashLaunchDarkly
TemplatesBackstage scaffolderCookiecutter + CLI
PolicyOPA/GatekeeperKyverno
Service catalogBackstage catalogPort catalog

This stack has a manageable operational cost (most are open source), an active community, and is mature enough for production. It is not the only valid option. If your organization already uses Datadog, do not switch to Prometheus for ideological purity. If you are pure AWS, Secrets Manager makes more sense than Vault. Consistency within your stack matters more than picking the “perfect” tool in each category.

How to start tomorrow

If you are considering platform engineering for your organization, this is the 90-day plan we recommend:

Weeks 1-2: discovery. Interview 5-8 developers from different teams. Ask: what frustrates you most about the development and deployment process? How much time do you spend on tasks that are not writing product code? If you could change one thing, what would it be? The answers will surprise you. They always do.

Weeks 3-4: first golden path. Pick the most-mentioned problem from the interviews and build a minimal golden path that solves it. Typically this is “create a new service” or “deploy to production.” Do not use Backstage yet. A well-documented script is sufficient.

Weeks 5-8: pilot with one team. Put the golden path in front of a real team. Observe how they use it. Collect feedback weekly. Iterate. This phase will tell you whether your solution solves the real problem or the problem you thought existed.

Weeks 9-12: incremental scale. If the pilot works, extend to 2-3 more teams. Start building the portal (now, yes, Backstage or Port). Add the second golden path. Measure adoption. Present results to leadership with concrete numbers: time saved, incidents avoided, pilot team satisfaction.

This incremental approach is radically different from the “big 12-month platform project” that we have seen fail multiple times. Deliver value early, iterate fast, and let adoption guide priorities.

The immediate future: AI on the platform

The next leap in platform engineering is integrating AI agents into internal platforms. We are not talking about chatbots answering documentation questions (though that has value too). We are talking about agents that can execute operational tasks.

A developer says “I need to scale the payments service to 10 replicas for Black Friday” and a platform agent validates the request against policies, verifies capacity, generates the configuration change, opens a PR, and assigns the appropriate reviewer. The developer approves the PR and the change applies.

This is not science fiction. The technical components exist. What is missing is the integration: connecting language models to platform APIs, defining security guardrails, and building team trust that the agent will not scale production to zero replicas due to a misinterpretation.

We are working with clients on prototypes of this integration, and early results are promising. Routine operations time drops by 50-70%, and configuration errors (the leading cause of incidents in most organizations) decrease dramatically because the agent applies policies automatically.

Platform engineering is, at its core, a problem of developer experience and operational efficiency. Solving it does not just accelerate software delivery. It frees engineers to do what they actually know how to do: build product.

About the author

A

abemon engineering

Engineering team

Multidisciplinary engineering, data and AI team headquartered in the Canary Islands. We build, deploy and operate custom software solutions for companies at any scale.