When someone first asks to run an LLM on Kubernetes, the instinct is reasonable: it's a containerised workload, it exposes an HTTP API, it needs to scale. Deploy it like anything else — a `Deployment`, a `Service`, maybe an `HPA` on CPU. That instinct gets you surprisingly far. Until it doesn't
No.DateEntryTags
00706/10/2024
Running LLMs on GKE: what breaks before you find the Inference Gateway00606/10/2024
Platform Engineering: Building the Right AbstractionsWhat separates a good internal developer platform from an expensive Kubernetes wrapper—and how to build the former.
00505/01/2024
Data Pipelines on Kubernetes: Lessons from Airflow to ArgoA migration story from Airflow on VMs to cloud-native data pipelines using Argo Workflows and what we'd do differently.
00403/20/2024
Progressive Delivery with Argo RolloutsImplementing canary deployments and automated rollbacks using Argo Rollouts with Prometheus analysis templates.
00303/05/2024
LLMOps in Production: What Nobody Tells YouRunning large language model inference workloads on Kubernetes at scale—the infrastructure problems that emerge past the prototype stage.
00202/08/2024
Writing Production-Grade Kubernetes OperatorsLessons from building and running three custom operators in production, including the parts the tutorials skip.
00101/15/2024
GitOps at Scale: Managing 50 Clusters with FluxHow we standardized cluster configuration across a multi-cloud estate using Flux v2 and a monorepo approach.