{} The Go Reference

Foundations · Concurrency · Intermediate

Race Conditions & Atomicity

What a data race is, why i++ isn't atomic, why time.Sleep never fixes it, the race detector, and the three real fixes: mutex, channel, atomic.

Foundations Intermediate ⏱ 15 min read Complete

🏦 Analogy

Two tellers update the same account balance from a sticky note. Both read “100”, both add 50, both write “150”. Two deposits happened, but the balance only went up once — $50 vanished. Nobody enforced “read, then write, before anyone else reads.” That gap is a data race: the result depends on who happened to write last, and no rule decides who that is.

What a data race is

A data race has exactly three ingredients, and you need all three:

  1. Two or more goroutines access the same memory location — the same variable, slice element, map, or struct field.
  2. At least one access is a write. Two concurrent reads can never race.
  3. Nothing enforces an order between them — no mutex, channel, atomic, or other happens-before edge separates the accesses.

Miss any one and there is no race. Two goroutines reading the same config? Fine — no writer. One goroutine writing a value another reads after a channel send delivers it? Fine — the channel imposes an order. It is the combination that is undefined behavior.

And “undefined” is literal. Go’s memory model says a program with a data race has no defined behavior at all for the racing accesses: you might read a stale value, a half-written value (on a wide type), or — on some architectures — a value that was never written by anyone. A race is not “usually fine, occasionally wrong.” It is a hole in the meaning of your program.

🧠 Mental model: a race is a missing edge, not a slow goroutine

Don’t picture a race as “two goroutines running at the same time.” Picture a directed graph of memory accesses, where an edge means “this provably happens before that.” A data race is two accesses to the same location (one a write) with no path between them in that graph. Fixing a race always means the same thing: add an edge. A mutex adds one (unlock happens-before the next lock). A channel adds one (send happens-before the matching receive). An atomic adds one. A time.Sleep adds none — which is exactly why it never works.

Atomicity: i++ is a lie

Statements that look indivisible usually aren’t. counter++ is really three machine operations:

1. load  counter  → register   (read)
2. add   1                      (modify)
3. store register → counter     (write)

Between step 1 and step 3, another goroutine can run the whole sequence. Both read the same starting value, both add one, both write the same result — and one increment is silently lost. Disassemble it and you can see there is no single atomic instruction:

$ go tool compile -S race.go   # abridged
    MOVQ    counter(SB), AX    ; load
    INCQ    AX                 ; add 1
    MOVQ    AX, counter(SB)    ; store   ← three separate steps

The same trap applies to counter += n, slice[i]++, m[k]++, x = x + 1, and reads/writes of any value wider than a machine word (a 16-byte interface or slice header can tear — you read half of an old value and half of a new one).

sequenceDiagram
participant A as Goroutine A
participant M as counter (=100)
participant B as Goroutine B
A->>M: load 100
B->>M: load 100
A->>M: store 101
B->>M: store 101
Note over M: two increments ran, but counter = 101 (one lost)

Watch it lose updates

A thousand goroutines each increment an unguarded counter. It almost never reaches 1000 — and the exact number changes between runs because the interleaving is nondeterministic. Edit and Run it a few times; you’ll see different values, all below 1000:

race-counter.go — editable & runnable
package main

import (
"fmt"
"sync"
)

func main() {
const N = 1000
var counter int // shared, UNSYNCHRONIZED
var wg sync.WaitGroup

for i := 0; i < N; i++ {
	wg.Add(1)
	go func() {
		defer wg.Done()
		counter++ // DATA RACE: unsynchronized read-modify-write
	}()
}

wg.Wait()
// Output VARIES per run and is almost always < 1000 — that is the bug.
// (It is not a panic and not a deadlock; the race just loses updates.)
fmt.Printf("counter = %d (wanted %d — the race lost some)\n", counter, N)
}

The sync.WaitGroup here is correct — it gives you a happens-before edge between each wg.Done() and the wg.Wait(), so reading counter after Wait is safe. The race is purely the unguarded counter++ inside the goroutines.

Why time.Sleep never fixes it

A tempting “fix” is to space the goroutines out so they “don’t collide”:

go func() {
	defer wg.Done()
	counter++
	time.Sleep(time.Millisecond) // does NOTHING for correctness
}()

This is a mirage. A sleep changes timing, and a race is not a timing bug — it is a missing-ordering bug. The instant the machine is busier, the scheduler shifts, the CPU is faster, or you run under -race (which changes timing too), the collision returns. Worse, sleeps that “work” on your laptop fail in production at 3am under load, which is the most expensive possible place to discover a race. Synchronization is a mutex, a channel, or sync/atomic — never a sleep.

Fix 1 — a Mutex (deterministic, == N)

Wrap the read-modify-write in a critical section. Now exactly one goroutine touches counter at a time, the unlock→lock edge orders every increment, and the result is always N:

fix-mutex.go — editable & runnable
package main

import (
"fmt"
"sync"
)

func main() {
const N = 1000
var counter int
var mu sync.Mutex
var wg sync.WaitGroup

for i := 0; i < N; i++ {
	wg.Add(1)
	go func() {
		defer wg.Done()
		mu.Lock()
		counter++ // now the only goroutine inside the critical section
		mu.Unlock()
	}()
}

wg.Wait()
fmt.Printf("counter = %d (always %d — the mutex enforces order)\n", counter, N)
}

Keep the critical section tiny — only the shared access belongs inside Lock/Unlock. See the sync package for Mutex, RWMutex, and the rest.

Fix 2 — sync/atomic (lock-free for one value)

For a single counter, a full mutex is overkill. atomic.Int64 performs the read-modify-write as one indivisible hardware instruction (a LOCK XADD on x86), so no lock and no critical section is needed — and the result is again always N:

fix-atomic.go — editable & runnable
package main

import (
"fmt"
"sync"
"sync/atomic"
)

func main() {
const N = 1000
var counter atomic.Int64 // the type carries its own atomicity
var wg sync.WaitGroup

for i := 0; i < N; i++ {
	wg.Add(1)
	go func() {
		defer wg.Done()
		counter.Add(1) // one atomic load-add-store, no lock
	}()
}

wg.Wait()
fmt.Printf("counter = %d (always %d — atomic Add is indivisible)\n", counter.Load(), N)
}

Atomics are the right tool for one word-sized value (a counter, a flag, a generation number). The moment you need to keep two fields consistent with each other, you need a mutex — an atomic only protects a single location. See atomic operations.

Fix 3 — a channel (one owner)

The most idiomatic Go answer is often to not share at all: give the counter to a single owner goroutine and have everyone else send it messages. “Don’t communicate by sharing memory; share memory by communicating.”

counts := make(chan int)
go func() {              // the ONLY goroutine that touches total
	total := 0
	for c := range counts {
		total += c       // no lock needed: one owner
	}
}()
counts <- 1              // each worker just sends

The channel send/receive provides the ordering for free. This shines when the counter is part of a larger pipeline; for a hot, standalone counter the atomic is cheaper. See channels.

The check-then-act race

Not every race is a raw i++. A subtler family is check-then-act: you test a condition, then act on it, and another goroutine changes the world in the gap. A classic is “create the key if absent”:

// BROKEN even though each line is "synchronized":
if _, ok := cache[key]; !ok { // check (lock here)
	cache[key] = expensive()  // act  (lock here)
}

If you take the lock around the check and separately around the write, two goroutines can both observe !ok, both call expensive(), and both write — duplicating work or clobbering each other. Concurrent map writes are also a flat-out crash in Go (fatal error: concurrent map writes), even without -race. The fix is to make the whole check-and-act one critical section:

mu.Lock()
if _, ok := cache[key]; !ok {
	cache[key] = expensive() // check and act are now indivisible
}
mu.Unlock()

Or use sync.Map.LoadOrStore, or sync.Once for one-time init. The lesson: locks fix concurrent access, but you must still wrap the right span — an invariant that spans two operations needs both inside one lock.

The race detector

Races hide. They depend on timing, so they pass code review, pass most test runs, and surface in production. The only reliable way to catch them is the race detector, which Go ships in the toolchain:

go run -race .          # instrument and run a program
go test -race ./...     # instrument and run the whole test suite
go build -race -o app   # ship-nothing; just for testing/staging

It works by instrumenting every memory access at compile time and maintaining, at runtime, a happens-before graph (a vector-clock-style “shadow memory”). The instant it observes two accesses to the same address with no ordering edge and at least one write, it prints a report naming both goroutines, the conflicting addresses, and full stack traces for each.

==================
WARNING: DATA RACE
Write at 0x00c0000b4010 by goroutine 8:
  main.main.func1()
      race-counter.go:18 +0x...
Previous read at 0x00c0000b4010 by goroutine 7:
  ...
==================

Two things to internalize about it:

  • It is dynamic, not static. It finds only races on code paths that actually run during that execution. A race in an untested branch is invisible. So run your concurrent tests under -race in CI, and exercise the real concurrent paths.
  • It costs real performance. Instrumented binaries run roughly 2–20x slower and use 5–10x more memory, so you use -race in tests/CI and staging, not in your production build.

It has no false positives: if -race reports a race, it is a real race in your code. Treat every report as a bug. See the race detector for reading reports in depth.

Reference: the fixes at a glance

ToolBest forCostOrdering it gives you
sync.Mutexguarding a struct’s internal state; multi-field invariantsa lock/unlock per accessunlock happens-before next lock
sync.RWMutexmany readers, rare writersreader-cheaper, writer-exclusivesame, with shared read locks
sync/atomic (atomic.Int64, …)one word-sized value: counter, flag, pointercheapest; lock-freethe atomic op itself is the edge
channeltransferring ownership; one-owner state; pipelinesa send/receive per messagesend happens-before receive
time.Sleepnothing — never a fixhides the bugnone

When to use which

  • Guarding a struct’s own fields, or any invariant spanning two values → sync.Mutex. It is the default, it is obvious, and it scales to “keep these three fields consistent.”
  • A single counter, flag, or pointer → sync/atomic. Lock-free and fast, but it only protects one location — don’t reach for it when two values must agree.
  • Moving data between goroutines, or letting one goroutine own state → a channel. The most idiomatic answer when the data has a clear owner or flows through stages.
  • Read-heavy, write-rare shared state → sync.RWMutex. Measure first; for short critical sections a plain Mutex is often faster than the bookkeeping RWMutex adds.

⚠️ time.Sleep is NOT synchronization

Sprinkling time.Sleep to “let the other goroutine finish” only reshuffles timing — the race is still there and will surface under real load, on a faster machine, or the moment you run under -race (which itself changes timing). A time.Sleep(time.Second) that “fixes” your test is a race waiting to ship. Synchronization means a mutex, a channel, or sync/atomic — something that adds a happens-before edge. A sleep adds none.

✅ A guarded lock still isn't 'correct'

Locks fix concurrent access, not logic. You can have perfectly synchronized code that is still wrong: check-then-act gaps, broken invariants, deadlocks from bad lock order. Synchronization is necessary, not sufficient — design the logic too, and read deadlock, livelock & starvation before you start nesting locks.

See also

Next: when locks themselves stop progress — deadlock, livelock & starvation.

Check your understanding

Score: 0 / 5

1. Which combination defines a data race?

All three parts are required: concurrent access, at least one writer, and no happens-before ordering. Two concurrent reads are fine (no writer); two writes to different memory are fine (not the same location); a synchronized write is fine (an order is enforced). Remove any one of the three and it is not a data race.

2. Why is `counter++` not safe from multiple goroutines?

`counter++` is a read-modify-write: it loads the value, adds one, and stores it back. Two goroutines can both load the same value and both store the same result, so one increment vanishes. Go does not make `++` atomic — use a mutex or sync/atomic for that.

3. Does adding `time.Sleep` fix a data race?

Sleep changes the odds, not the correctness. A race is the absence of an enforced happens-before relationship; a sleep enforces nothing. The bug just becomes rarer and harder to reproduce. Real fixes are a mutex, a channel, or sync/atomic.

4. What does the race detector actually do?

`-race` builds an instrumented binary that watches every memory access and reports a race the moment it sees two conflicting unsynchronized accesses, printing both goroutines' stacks. It is a dynamic tool: it only finds races on code paths that actually execute, so you must exercise concurrent paths under `-race` (ideally in CI). It does not prove absence of races.

5. Code does `if _, ok := m[k]; !ok { m[k] = v }` on a shared map under a mutex held only around each line separately. What is the bug?

Synchronizing each access individually still leaves a window between the check and the act. Another goroutine can write the key in that gap, so two goroutines both decide the key is absent. The whole check-then-act must be inside one critical section (hold the lock across both lines). Locks fix concurrent access, not logic.

Comments

Sign in with GitHub to join the discussion.