Containers · Cloud-Native · Intermediate

Health Probes & Graceful Lifecycle

Telling Kubernetes the truth about your service — liveness vs readiness vs startup probes, and graceful shutdown that drains in-flight requests so deploys never drop traffic.

Containers Intermediate ⏱ 4 min read Complete

🚦 Analogy

A health probe is the difference between a shop’s “Open” sign and a heartbeat monitor. Liveness is the heartbeat: if it flatlines, call the paramedics (restart). Readiness is the Open sign: the shop can flip it to “Closed — back in 5” while restocking, without anyone calling an ambulance. Mixing them up — calling paramedics every time the shop briefly closes — causes chaos. And graceful shutdown is locking the door but serving the customers already inside before you leave.

Three probes, three jobs

graph TD
START["Pod starts"] --> SU["startupProbe<br/>(generous: still booting?)"]
SU -->|passes| LR["liveness + readiness begin"]
LR --> LV["livenessProbe<br/>process wedged? → RESTART"]
LR --> RD["readinessProbe<br/>can serve now? → add/remove from LB"]

Liveness — “is the process wedged?” Fails → Kubernetes restarts the container. Keep it cheap and self-contained; never check external dependencies here.
Readiness — “can I serve traffic right now?” Fails → Kubernetes removes the Pod from the Service but leaves it running. Put dependency checks (DB reachable, caches warm) here.
Startup — for slow boots: gives the app time to initialize before liveness/readiness apply, so a long startup isn’t mistaken for a hang.

See it: liveness and readiness handlers

/healthz is cheap and always-200 while the process runs; /readyz reflects whether dependencies are ready. This runs here via httptest — readiness starts failing (503) and flips to 200 once initialized:

▶ probes.go — editable & runnable

package main

import (
"fmt"
"net/http"
"net/http/httptest"
"sync/atomic"
)

func main() {
var ready atomic.Bool // flipped true once dependencies are up

mux := http.NewServeMux()
// Liveness: only reports the process is running. No dependency checks!
mux.HandleFunc("/healthz", func(w http.ResponseWriter, r *http.Request) {
	w.WriteHeader(http.StatusOK)
	fmt.Fprintln(w, "ok")
})
// Readiness: 503 until ready, 200 after — controls load-balancer traffic.
mux.HandleFunc("/readyz", func(w http.ResponseWriter, r *http.Request) {
	if ready.Load() {
		w.WriteHeader(http.StatusOK)
		fmt.Fprintln(w, "ready")
	} else {
		w.WriteHeader(http.StatusServiceUnavailable)
		fmt.Fprintln(w, "not ready")
	}
})

probe := func(path string) int {
	rec := httptest.NewRecorder()
	mux.ServeHTTP(rec, httptest.NewRequest("GET", path, nil))
	return rec.Code
}

fmt.Println("before init:  /healthz", probe("/healthz"), " /readyz", probe("/readyz"))
ready.Store(true) // dependencies connected, caches warmed
fmt.Println("after init:   /healthz", probe("/healthz"), " /readyz", probe("/readyz"))
}

Liveness is 200 throughout (the process is fine); readiness is 503 until initialization completes, so Kubernetes only routes traffic once the app can actually serve it.

Graceful shutdown

On a deploy or scale-down, Kubernetes sends SIGTERM and removes the Pod from the Service — but there’s a propagation delay. The robust sequence avoids dropping in-flight requests (fenced — needs a real server and signals):

ctx, stop := signal.NotifyContext(context.Background(), syscall.SIGTERM, os.Interrupt)
defer stop()

go srv.ListenAndServe()
<-ctx.Done() // SIGTERM received

ready.Store(false)          // 1. fail readiness so the LB stops sending new traffic
time.Sleep(3 * time.Second) // 2. wait for the LB to notice (propagation delay)

shutCtx, cancel := context.WithTimeout(context.Background(), 25*time.Second)
defer cancel()
srv.Shutdown(shutCtx)       // 3. stop accepting, drain in-flight requests, exit

The order matters: fail readiness first, pause, then Shutdown. That drains traffic to other replicas before this Pod stops accepting — eliminating the intermittent 502s that plague naive shutdowns. See hardening services for the server-side Shutdown details.

🐹 Probes should be cheap and honest

A probe is hit every few seconds on every replica, so keep it fast and side-effect-free — no heavy queries, no writes. Make readiness honest: report not-ready while caches warm, during shutdown, or when a critical dependency is unreachable, so the load balancer routes around you. But don’t over-couple — if readiness flips off every time a non-critical downstream hiccups, you’ll yo-yo Pods out of rotation. Check only what you truly need to serve a request, and set probe periodSeconds/failureThreshold so a single blip doesn’t trigger action.

⚠️ Don't put dependency checks in the liveness probe

The classic outage amplifier: a liveness probe that checks the database. The DB has a 30-second blip → every replica fails liveness simultaneously → Kubernetes restarts them all at once → the restart storm and cold caches turn a minor blip into a major outage, and the restarts may even prevent recovery. Liveness answers only “is this process stuck?” (it should almost never fail). Dependency health belongs in readiness, which removes the Pod from the load balancer without the nuclear option of a restart.

Check your understanding

Score: 0 / 5

1. What's the difference between a liveness and a readiness probe?

Liveness failing → Kubernetes restarts the container (it's wedged). Readiness failing → Kubernetes stops routing traffic to it but leaves it running (it's alive but temporarily can't serve — warming up, or a dependency is down). Confusing them is dangerous: putting a dependency check in liveness can cause restart storms when that dependency blips.

2. What should a liveness probe check?

Liveness must be cheap and self-contained: if it checks the database and the DB has a brief outage, every replica fails liveness and Kubernetes restarts them all at once — turning a small blip into a full outage (a restart storm). Keep dependency checks in readiness (remove from LB, don't restart). Liveness = 'is this process wedged?'

3. What is a startup probe for?

An app with a long initialization (warming caches, loading models) could fail an aggressive liveness probe during boot and be killed in a loop. A startup probe runs first with a generous timeout; only once it passes do liveness/readiness begin. This lets you keep liveness/readiness tight for the running state without punishing a slow start.

4. What does graceful shutdown accomplish on a deploy or scale-down?

On a deploy/scale-down, Kubernetes sends SIGTERM and removes the Pod from the Service. The app should catch SIGTERM, stop accepting new requests, and let active ones complete (http.Server.Shutdown), then exit before the grace period ends. Without it, in-flight requests are killed mid-flight on every deploy — intermittent 502s users can't explain.

5. Why fail readiness during shutdown BEFORE you stop the server?

There's a propagation delay between SIGTERM and the load balancer noticing the Pod is gone. If you stop accepting immediately, requests routed in that window get refused. The robust sequence: on SIGTERM, flip readiness to failing (LB stops sending new traffic), wait a moment, then Shutdown() to drain the rest. This eliminates the shutdown-race 502s.

Sync across devices

Health Probes & Graceful Lifecycle

Three probes, three jobs

See it: liveness and readiness handlers

Graceful shutdown

See also

Check your understanding

Comments

Three probes, three jobs

See it: liveness and readiness handlers

Graceful shutdown

See also

Related topics

Check your understanding

Comments