Containers · Cloud-Native · Intermediate

Deployment Strategies

Shipping a new version without dropping traffic — rolling, blue-green, and canary releases, the health-gated promotion that makes them safe, and how readiness probes tie in.

Containers Intermediate ⏱ 4 min read Complete

🚦 Analogy

Replacing a bridge while traffic flows. Rolling: close and rebuild one lane at a time — traffic always has a lane. Blue-green: build a whole second bridge alongside, then divert everyone over in one go (and keep the old one ready in case the new one wobbles). Canary: open the new bridge to a trickle of cars first, watch it hold, then let more across. All three avoid the one thing you must never do — shut the river crossing entirely while you work.

Three ways to ship without an outage

graph TD
subgraph Rolling["Rolling (default)"]
  R["replace pods in batches; both versions briefly coexist"]
end
subgraph BG["Blue-Green"]
  BGd["green deployed beside blue → switch all traffic at once → instant flip-back"]
end
subgraph Canary["Canary"]
  C["1% → 5% → 25% → 100%, gated on metrics each step"]
end

Strategy	Extra infra	Rollback	Blast radius of a bad release	Best for
Rolling	none	slow (roll back)	medium (some pods)	the sane default
Blue-green	2× during cutover	instant (flip)	all-or-nothing	fast rollback, big releases
Canary	a bit (routing)	fast (shift back)	tiny (1% of users)	risky changes, real-traffic validation

The safety mechanism: health-gated promotion

What makes any of these safe isn’t the traffic shape — it’s gating each step on health. A new instance receives traffic only after its readiness probe passes; a canary advances to the next percentage only if its error rate and latency stay within budget. See health probes & graceful lifecycle for the probe mechanics.

See it: a canary promotion gate

The runnable below is an illustration of the decision logic, not a deployment tool you install — it shows how a canary controller observes the new version’s metrics and decides to promote, hold, or abort. Real progressive-delivery controllers (Argo Rollouts, Flagger) run this same kind of logic against live Prometheus metrics. This runs here:

▶ canary.go — editable & runnable

package main

import "fmt"

type Metrics struct {
ErrorRate float64 // 0..1
P99ms     int
}

// Budgets the canary must stay within to be promoted.
const (
maxErrRate = 0.02 // 2%
maxP99ms   = 300
)

var steps = []int{1, 5, 25, 50, 100} // traffic % ladder

// decide promotes to the next step, holds, or aborts.
func decide(step int, m Metrics) (next int, verdict string) {
if m.ErrorRate > maxErrRate || m.P99ms > maxP99ms {
	return 0, "ABORT → shift traffic back to stable"
}
for i, s := range steps {
	if s == step && i+1 < len(steps) {
		return steps[i+1], "PROMOTE"
	}
}
return step, "DONE → canary is now stable"
}

func main() {
// Healthy canary climbs the ladder.
at := 1
for _, m := range []Metrics{{0.005, 120}, {0.01, 180}, {0.008, 200}, {0.012, 260}} {
	next, v := decide(at, m)
	fmt.Printf("at %3d%%  err=%.1f%% p99=%dms  -> %s (next %d%%)\n",
		at, m.ErrorRate*100, m.P99ms, v, next)
	at = next
}
// A bad release trips the gate and rolls back.
_, v := decide(25, Metrics{ErrorRate: 0.09, P99ms: 800})
fmt.Println("bad release at 25%:", v)
}

The gate makes the rollback decision automatic and metric-driven — exactly what tools like Argo Rollouts or Flagger do against real Prometheus metrics.

🐹 Go services are easy to deploy this way — if they shut down gracefully

A small static Go binary in a tiny container starts in milliseconds, so rolling and canary steps are quick. The one thing your code must do is cooperate with the rollout: implement a readiness probe that only reports ready once dependencies are wired, and handle SIGTERM to drain in-flight requests before exiting (server.Shutdown(ctx)). Without graceful shutdown, every pod the rollout retires drops its current requests — turning a “zero-downtime” strategy into a steady drip of 502s on each deploy. The traffic strategy is the orchestrator’s job; being safe to stop is yours.

⚠️ Both versions run at once — so changes must be compatible

Every strategy here has the old and new version live simultaneously, sharing one database and existing clients. A deploy that breaks the still-running version causes an outage during the rollout. Use the expand/contract (parallel-change) pattern for breaking changes: first deploy a version that adds the new (nullable column, new field, new endpoint) while still supporting the old; migrate data/clients; only then deploy a version that removes the old. Never drop a column or API field that the currently-running version still depends on. Database migrations especially must be backward-compatible across a single deploy step.

Check your understanding

Score: 0 / 5

1. What is a rolling deployment?

A rolling deployment (Kubernetes' default) gradually replaces old pods with new ones in batches — spin up some new, wait for them to pass readiness, retire some old, repeat — so there's always enough healthy capacity serving traffic. It needs no extra infrastructure, but during the roll both versions run simultaneously (so changes must be backward-compatible), and rollback means rolling back, which takes time.

2. What characterizes a blue-green deployment?

Blue-green keeps two complete environments. The current (blue) serves production while you deploy and smoke-test the new version (green) in parallel. Then you cut traffic over to green in one switch (a load-balancer/router change). The big win is instant rollback — flip back to blue. The cost is running double the infrastructure during the cutover, and handling stateful concerns (DB migrations, in-flight sessions) across the switch.

3. What is a canary release?

A canary (named after canaries in coal mines) sends a small slice of production traffic — say 1%, then 5%, 25%, 100% — to the new version while monitoring its error rate, latency, and key metrics. If the canary stays healthy you promote it further; if it degrades you abort and shift traffic back. It limits the blast radius of a bad release to a small fraction of users and gives real-traffic signal that staging can't.

4. Why are readiness probes essential to safe rolling/canary deploys?

During a rollout, Kubernetes adds a new pod to the load-balancer rotation only after its readiness probe succeeds — so a pod that's still warming up (connecting to the DB, loading config) won't get traffic and cause errors. Combined with graceful shutdown on the old pods (drain in-flight requests), readiness probes are what make a rolling or canary deploy actually zero-downtime. Get them wrong and you drop requests on every deploy.

5. What requirement do ALL these strategies impose on your application changes?

Because rolling, blue-green, and canary all have both versions live at once (and talking to one database and existing clients), a deploy must not break the version still running. That means compatible changes: add a nullable column before writing to it, support both old and new message formats, never remove an API field clients still use until they've migrated. The 'expand/contract' (parallel-change) pattern — add the new, migrate, then remove the old across separate deploys — is how you ship breaking changes safely.

Sync across devices

Deployment Strategies

Three ways to ship without an outage

The safety mechanism: health-gated promotion

See it: a canary promotion gate

See also

Check your understanding

Comments

Three ways to ship without an outage

The safety mechanism: health-gated promotion

See it: a canary promotion gate

See also

Related topics

Check your understanding

Comments