← index

Entry 006 · 06/10/2024 · 3 min read

Platform Engineering: Building the Right Abstractions

What separates a good internal developer platform from an expensive Kubernetes wrapper—and how to build the former.

Platform engineering has become one of those terms that means different things to different organizations. At its best, it describes a deliberate practice of building internal products that make developers more effective. At its worst, it's a rebranding exercise for what used to be called "the ops team."

The difference is usually visible in the abstractions the platform exposes.

The Abstraction Trap

The most common failure mode in platform engineering is building abstractions that are too thin. You create a Helm chart wrapper, add a CI template, and call it a platform. Engineers still need to understand Kubernetes concepts, still need to write Kubernetes YAML, still need to think about pod disruption budgets and resource limits.

This isn't a platform—it's a conventions guide with some automation on top. It reduces toil at the margins but doesn't fundamentally change the cognitive load on application developers.

The goal of a good platform is to make the right thing the easy thing, and to hide complexity that application developers shouldn't need to care about.

What Good Abstractions Look Like

Good platform abstractions are opinionated. They encode decisions, not options.

An application developer should be able to describe their service in terms of their service—what port it runs on, what its resource requirements are, what other services it depends on—without needing to know how that maps to Kubernetes primitives.

We built this using a CRD called Application that captures just enough information to deploy, scale, and observe a service:

apiVersion: platform.example.com/v1
kind: Application
metadata:
  name: payments-api
spec:
  image: payments-api:v2.1.4
  port: 8080
  replicas:
    min: 2
    max: 10
  resources:
    tier: standard  # translates to specific CPU/memory limits
  dependencies:
    - payments-db
    - notification-service

The platform operator translates this into a Deployment, Service, HorizontalPodAutoscaler, PodDisruptionBudget, NetworkPolicy, and ServiceMonitor. Application developers never write those resources directly.

The Internal Developer Portal

The CRD-based approach works for teams comfortable with YAML. But many of your application developers aren't. They're building services, not operating Kubernetes clusters, and they shouldn't need to learn YAML configuration to deploy software.

We built an internal developer portal using Backstage that surfaces the platform abstractions through a web UI. Engineers can onboard a new service, configure deployments, and view operational metrics without touching Kubernetes directly.

Adoption was initially slower than expected. The breakthrough came when we integrated the portal with our on-call rotation: when an alert fires, the runbook link in PagerDuty goes directly to the service page in the portal, which has the current deployment status, recent changes, and links to logs and traces. That made the portal essential for on-call engineers, which drove regular usage across the org.

Measuring Platform Value

Platform teams often struggle to measure their impact. We track three things: deployment frequency (how often teams ship), change failure rate (what percentage of deployments cause incidents), and cognitive load (via quarterly developer surveys).

After 18 months, deployment frequency is up 3x, change failure rate is down 40%, and developer survey scores for "I can deploy without help from platform team" went from 31% to 78%.

The survey score matters most to us. A platform that engineers don't understand and can't operate independently isn't a platform—it's a dependency.

YOU suck