{} The Go Reference

Testing · Stdlib · Intermediate

Benchmarks

Measuring speed with BenchmarkXxx and b.N — reading ns/op, B/op and allocs/op, and the classic += vs strings.Builder comparison.

Testing Intermediate ⏱ 6 min read Complete

⏱️ Analogy

A benchmark is a stopwatch with a calibration trick. One lap is too quick to time honestly, so it runs the lap over and over — b.N times — until the total is long enough to trust, then divides back out to a per-lap figure. Add -benchmem and it also weighs the litter each lap drops on the floor: bytes and allocations the garbage collector must sweep up.

Writing a benchmark

A benchmark is func BenchmarkXxx(b *testing.B) that runs the code under test b.N times. The framework picks b.N:

// concat_test.go
package strutil

import (
	"strings"
	"testing"
)

func BenchmarkConcatPlus(b *testing.B) {
	for i := 0; i < b.N; i++ {
		s := ""
		for j := 0; j < 1000; j++ {
			s += "x" // reallocates every time — O(n²)
		}
		_ = s
	}
}

func BenchmarkConcatBuilder(b *testing.B) {
	for i := 0; i < b.N; i++ {
		var sb strings.Builder
		for j := 0; j < 1000; j++ {
			sb.WriteString("x") // one growing buffer — O(n)
		}
		_ = sb.String()
	}
}

Run with -bench (a regexp; . means all) and -benchmem to include allocation stats:

$ go test -bench=. -benchmem
goos: darwin
goarch: amd64
pkg: example/strutil
cpu: Intel(R) Core(TM) i7
BenchmarkConcatPlus-8        2113    563120 ns/op   530200 B/op   999 allocs/op
BenchmarkConcatBuilder-8   271106      4302 ns/op     2040 B/op    11 allocs/op
PASS
ok      example/strutil    3.214s

Read each line as: name (with -8 = GOMAXPROCS), iterations run, then the metrics:

  • ns/op — nanoseconds per operation. Builder is ~130× faster here.
  • B/op — bytes allocated per operation. += allocates a new string every step.
  • allocs/op — number of heap allocations per operation. Fewer is steadier under GC.
graph LR
A["go test -bench=. -benchmem"] --> B["run body b.N times"]
B --> C["b.N too low?"]
C -->|"yes, scale up"| B
C -->|"long enough"| D["total time / b.N"]
D --> E["report ns/op · B/op · allocs/op"]

Setup, allocs, and accuracy

If you do expensive setup before the loop, call b.ResetTimer() so it isn’t counted. b.ReportAllocs() forces allocation stats for one benchmark even without the -benchmem flag:

func BenchmarkLookup(b *testing.B) {
	data := buildBigIndex() // expensive, shouldn't be timed
	b.ReportAllocs()
	b.ResetTimer()          // start the clock here
	for i := 0; i < b.N; i++ {
		_ = data.Find("key")
	}
}

You can run the same two approaches here in main to confirm they produce the identical string — the benchmark above just measures how much each one costs to get there:

concat.go — editable & runnable
package main

import (
"fmt"
"strings"
)

// concatPlus builds the string with += (allocates each step).
func concatPlus(n int) string {
s := ""
for j := 0; j < n; j++ {
	s += "x"
}
return s
}

// concatBuilder builds the same string with strings.Builder.
func concatBuilder(n int) string {
var sb strings.Builder
for j := 0; j < n; j++ {
	sb.WriteString("x")
}
return sb.String()
}

func main() {
const n = 1000
a := concatPlus(n)
b := concatBuilder(n)
fmt.Printf("len(plus)=%d len(builder)=%d equal=%t\n", len(a), len(b), a == b)
}

⚠️ Stop the compiler from optimizing your work away

If a benchmark’s result is never used, the compiler may delete the code entirely, and you’ll measure nothing. Assign the result to a package-level var sink (or otherwise consume it) so the work can’t be eliminated. Also: benchmark on a quiet machine, ignore the first noisy run, and compare runs with benchstat rather than eyeballing single numbers — ns/op jitters between runs.

The modern loop: b.Loop (Go 1.24+)

Go 1.24 added for b.Loop() as the preferred benchmark loop. It times only the looped region, runs any setup before it exactly once, and — crucially — stops the compiler from optimizing away the body or its inputs, which kills the classic “my benchmark measured nothing” bug:

func BenchmarkConcatBuilder(b *testing.B) {
	for b.Loop() { // Go 1.24+: tuned count, no dead-code elimination
		var sb strings.Builder
		for j := 0; j < 1000; j++ {
			sb.WriteString("x")
		}
		_ = sb.String()
	}
}

The for i := 0; i < b.N; i++ form still works everywhere and is what you’ll see in older code; prefer b.Loop() on Go 1.24+.

Comparing runs

A single ns/op is noisy. Take several samples and compare with benchstat:

# collect 10 samples of the baseline and the change
$ go test -bench=. -benchmem -count=10 > old.txt
#   ...make your change...
$ go test -bench=. -benchmem -count=10 > new.txt
$ benchstat old.txt new.txt   # mean ± variance, and is the delta significant?

See also

Next: the commands that tie it all together — the Go toolchain.

Check your understanding

Score: 0 / 5

1. Why does a benchmark loop for i := 0; i < b.N; i++ instead of a fixed count?

go test runs the benchmark repeatedly, increasing b.N (1, then more) until the total time is large enough to measure accurately. It then divides total time by b.N to report ns/op, so the result is stable regardless of how fast one iteration is.

2. What do B/op and allocs/op tell you?

With -benchmem (or b.ReportAllocs), each line shows B/op (bytes allocated per op) and allocs/op (heap allocations per op). Fewer allocations usually means less garbage-collector work and faster, steadier code — often the bigger win over raw ns/op.

3. When do you need b.ResetTimer()?

If you build a big fixture before the loop, call b.ResetTimer() right before for i := 0; i < b.N; i++ so the timer starts fresh and only the work inside the loop is measured. There's also b.StopTimer/b.StartTimer to exclude per-iteration setup.

4. What does the Go 1.24 b.Loop() form (for b.Loop() { ... }) improve over for i := 0; i < b.N; i++?

for b.Loop() { ... } (Go 1.24+) replaces the manual b.N loop: it times only the looped region, runs setup before it once, and prevents the compiler from eliminating the loop body or its inputs — fixing the classic 'dead-code elimination ate my benchmark' trap.

5. Why compare benchmark runs with benchstat and -count instead of reading one number?

Microbenchmarks are noisy. Run go test -bench=. -count=10 to collect samples, save old and new, then benchstat old.txt new.txt prints means, variance, and a significance verdict — so you don't chase noise as if it were a real regression.

Comments

Sign in with GitHub to join the discussion.