⏱️ Analogy
A benchmark is a stopwatch with a calibration trick. One lap is too quick to time honestly, so it runs the lap over and over — b.N times — until the total is long enough to trust, then divides back out to a per-lap figure. Add -benchmem and it also weighs the litter each lap drops on the floor: bytes and allocations the garbage collector must sweep up.
Writing a benchmark
A benchmark is func BenchmarkXxx(b *testing.B) that runs the code under test b.N times. The framework picks b.N:
// concat_test.go
package strutil
import (
"strings"
"testing"
)
func BenchmarkConcatPlus(b *testing.B) {
for i := 0; i < b.N; i++ {
s := ""
for j := 0; j < 1000; j++ {
s += "x" // reallocates every time — O(n²)
}
_ = s
}
}
func BenchmarkConcatBuilder(b *testing.B) {
for i := 0; i < b.N; i++ {
var sb strings.Builder
for j := 0; j < 1000; j++ {
sb.WriteString("x") // one growing buffer — O(n)
}
_ = sb.String()
}
}
Run with -bench (a regexp; . means all) and -benchmem to include allocation stats:
$ go test -bench=. -benchmem
goos: darwin
goarch: amd64
pkg: example/strutil
cpu: Intel(R) Core(TM) i7
BenchmarkConcatPlus-8 2113 563120 ns/op 530200 B/op 999 allocs/op
BenchmarkConcatBuilder-8 271106 4302 ns/op 2040 B/op 11 allocs/op
PASS
ok example/strutil 3.214s
Read each line as: name (with -8 = GOMAXPROCS), iterations run, then the metrics:
- ns/op — nanoseconds per operation. Builder is ~130× faster here.
- B/op — bytes allocated per operation.
+=allocates a new string every step. - allocs/op — number of heap allocations per operation. Fewer is steadier under GC.
graph LR A["go test -bench=. -benchmem"] --> B["run body b.N times"] B --> C["b.N too low?"] C -->|"yes, scale up"| B C -->|"long enough"| D["total time / b.N"] D --> E["report ns/op · B/op · allocs/op"]
Setup, allocs, and accuracy
If you do expensive setup before the loop, call b.ResetTimer() so it isn’t counted. b.ReportAllocs() forces allocation stats for one benchmark even without the -benchmem flag:
func BenchmarkLookup(b *testing.B) {
data := buildBigIndex() // expensive, shouldn't be timed
b.ReportAllocs()
b.ResetTimer() // start the clock here
for i := 0; i < b.N; i++ {
_ = data.Find("key")
}
}
You can run the same two approaches here in main to confirm they produce the identical string — the benchmark above just measures how much each one costs to get there:
package main
import (
"fmt"
"strings"
)
// concatPlus builds the string with += (allocates each step).
func concatPlus(n int) string {
s := ""
for j := 0; j < n; j++ {
s += "x"
}
return s
}
// concatBuilder builds the same string with strings.Builder.
func concatBuilder(n int) string {
var sb strings.Builder
for j := 0; j < n; j++ {
sb.WriteString("x")
}
return sb.String()
}
func main() {
const n = 1000
a := concatPlus(n)
b := concatBuilder(n)
fmt.Printf("len(plus)=%d len(builder)=%d equal=%t\n", len(a), len(b), a == b)
}
⚠️ Stop the compiler from optimizing your work away
If a benchmark’s result is never used, the compiler may delete the code entirely, and you’ll measure nothing. Assign the result to a package-level var sink (or otherwise consume it) so the work can’t be eliminated. Also: benchmark on a quiet machine, ignore the first noisy run, and compare runs with benchstat rather than eyeballing single numbers — ns/op jitters between runs.
The modern loop: b.Loop (Go 1.24+)
Go 1.24 added for b.Loop() as the preferred benchmark loop. It times only the looped region, runs any setup before it exactly once, and — crucially — stops the compiler from optimizing away the body or its inputs, which kills the classic “my benchmark measured nothing” bug:
func BenchmarkConcatBuilder(b *testing.B) {
for b.Loop() { // Go 1.24+: tuned count, no dead-code elimination
var sb strings.Builder
for j := 0; j < 1000; j++ {
sb.WriteString("x")
}
_ = sb.String()
}
}
The for i := 0; i < b.N; i++ form still works everywhere and is what you’ll see in older code; prefer b.Loop() on Go 1.24+.
Comparing runs
A single ns/op is noisy. Take several samples and compare with benchstat:
# collect 10 samples of the baseline and the change
$ go test -bench=. -benchmem -count=10 > old.txt
# ...make your change...
$ go test -bench=. -benchmem -count=10 > new.txt
$ benchstat old.txt new.txt # mean ± variance, and is the delta significant?
See also
- profiling with pprof — when a benchmark is slow, pprof shows where the time goes.
- testing basics — benchmarks share the
_test.gofile andgo testrunner. - strings & bytes — the
+=vsBuilderresult this page measures.
Next: the commands that tie it all together — the Go toolchain.
Related topics
The testing package and go test — writing TestXxx functions, Errorf vs Fatalf, t.Helper, and the got/want convention that runs with go test ./...
toolingProfiling with pprofMeasure before you optimize — capture CPU, heap, and goroutine profiles with runtime/pprof or net/http/pprof, then read hot paths in go tool pprof.
testingTable-Driven Tests & SubtestsThe idiomatic Go pattern — a slice of named cases looped through t.Run subtests, with t.Parallel, so one test scales to dozens of inputs.
Check your understanding
Score: 0 / 51. Why does a benchmark loop for i := 0; i < b.N; i++ instead of a fixed count?
go test runs the benchmark repeatedly, increasing b.N (1, then more) until the total time is large enough to measure accurately. It then divides total time by b.N to report ns/op, so the result is stable regardless of how fast one iteration is.
2. What do B/op and allocs/op tell you?
With -benchmem (or b.ReportAllocs), each line shows B/op (bytes allocated per op) and allocs/op (heap allocations per op). Fewer allocations usually means less garbage-collector work and faster, steadier code — often the bigger win over raw ns/op.
3. When do you need b.ResetTimer()?
If you build a big fixture before the loop, call b.ResetTimer() right before for i := 0; i < b.N; i++ so the timer starts fresh and only the work inside the loop is measured. There's also b.StopTimer/b.StartTimer to exclude per-iteration setup.
4. What does the Go 1.24 b.Loop() form (for b.Loop() { ... }) improve over for i := 0; i < b.N; i++?
for b.Loop() { ... } (Go 1.24+) replaces the manual b.N loop: it times only the looped region, runs setup before it once, and prevents the compiler from eliminating the loop body or its inputs — fixing the classic 'dead-code elimination ate my benchmark' trap.
5. Why compare benchmark runs with benchstat and -count instead of reading one number?
Microbenchmarks are noisy. Run go test -bench=. -count=10 to collect samples, save old and new, then benchstat old.txt new.txt prints means, variance, and a significance verdict — so you don't chase noise as if it were a real regression.
Comments
Sign in with GitHub to join the discussion.