00706/10/2024
Running LLMs on GKE: what breaks before you find the Inference Gateway

When someone first asks to run an LLM on Kubernetes, the instinct is reasonable: it's a containerised workload, it exposes an HTTP API, it needs to scale. Deploy it like anything else — a `Deployment`, a `Service`, maybe an `HPA` on CPU. That instinct gets you surprisingly far. Until it doesn't

GKELLM InferenceGCP
00606/10/2024
Platform Engineering: Building the Right Abstractions

What separates a good internal developer platform from an expensive Kubernetes wrapper—and how to build the former.

platform-engineeringdevexkubernetes
00505/01/2024
Data Pipelines on Kubernetes: Lessons from Airflow to Argo

A migration story from Airflow on VMs to cloud-native data pipelines using Argo Workflows and what we'd do differently.

airflowk8sdata-engineering
00403/20/2024
Progressive Delivery with Argo Rollouts

Implementing canary deployments and automated rollbacks using Argo Rollouts with Prometheus analysis templates.

argocdcanaryprogressive-delivery
00303/05/2024
LLMOps in Production: What Nobody Tells You

Running large language model inference workloads on Kubernetes at scale—the infrastructure problems that emerge past the prototype stage.

llmopskubernetesai
00202/08/2024
Writing Production-Grade Kubernetes Operators

Lessons from building and running three custom operators in production, including the parts the tutorials skip.

kubernetesoperatorsgo
00101/15/2024
GitOps at Scale: Managing 50 Clusters with Flux

How we standardized cluster configuration across a multi-cloud estate using Flux v2 and a monorepo approach.

gitopskubernetesflux