📖 Analogy
Reading the compiler’s assembly is like a chef tasting the dish the kitchen actually plated, not the recipe they were handed. The recipe (your Go) says “add a helper function here”; the plated dish (the assembly) might show that helper folded into the main course (inlining), a redundant safety step skipped (bounds-check elimination), or a fancy technique done in one practiced motion (an intrinsic). You don’t taste-test every plate — but when one dish is mysteriously slow, tasting it tells you what the kitchen really did, which the recipe alone can’t.
Seeing the output: -S
The compiler will show you the code it generates. Two equivalent ways:
# Disassemble one file (prints to stderr).
go tool compile -S main.go
# As part of a build, for whole packages.
go build -gcflags=-S ./...
# Combine with -m to also see inline/escape decisions.
go build -gcflags='-S -m' ./...
A snippet for a trivial add function looks roughly like this (amd64):
"".add STEXT nosplit size=4 args=0x10 locals=0x0
MOVQ "".a+8(SP), AX ; load arg a from the stack frame
ADDQ "".b+16(SP), AX ; add arg b
MOVQ AX, "".~r0+24(SP) ; store the return value
RET
It’s Go’s abstract assembly
What you’re reading is not literal x86/ARM — it’s Go’s own portable assembly notation (inherited from Plan 9), shared across architectures and then lowered to the real instruction set. The giveaways are the pseudo-registers:
| Pseudo-register | Meaning |
|---|---|
FP | Frame pointer — function arguments and locals by name (a+8(FP)) |
SP | Stack pointer — the local stack frame |
PC | Program counter |
SB | ”Static base” — global symbols and functions |
nosplit means the function skips the stack-growth check; args/locals give frame sizes. The mnemonics (MOVQ, ADDQ, CALL) resemble the platform’s but go through Go’s assembler, so don’t expect a 1:1 match with objdump.
Anchoring with runtime function info
You can’t run -S inside a playground, but you can introspect compiled functions at runtime — names, entry addresses, and source positions — which is the same metadata the disassembler and stack traces use:
package main
import (
"fmt"
"runtime"
)
//go:noinline
func add(a, b int) int {
whoAmI() // print this function's compiled metadata
return a + b // ...the work the assembly implements
}
// whoAmI reports the name, file and line of its *caller* using the
// same PC→function metadata that stack traces, panics and pprof use.
func whoAmI() {
pc := make([]uintptr, 1)
n := runtime.Callers(2, pc) // skip Callers + whoAmI, capture the caller
if n == 0 {
return
}
fn := runtime.FuncForPC(pc[0])
file, line := fn.FileLine(pc[0])
fmt.Printf("running %-14s at %s:%d (entry 0x%x)\n",
fn.Name(), shortFile(file), line, fn.Entry())
}
func shortFile(p string) string {
for i := len(p) - 1; i >= 0; i-- {
if p[i] == '/' {
return p[i+1:]
}
}
return p
}
func main() {
fmt.Println("add(2,3) =", add(2, 3))
fmt.Println("the entry address above is where add's assembly begins.")
}
The clean, supported path is runtime.FuncForPC on a PC you already hold (here, from runtime.Callers). The takeaway: every compiled function carries name/file/line metadata and an entry address, which is exactly what go tool compile -S, panics, and pprof use to map machine code back to your source.
What to look for
When you do read assembly, you’re usually checking one of a few things:
- Did it inline? A small hot function should not appear as a
CALLin its caller.-gcflags='-m'says “can inline X”;-Sconfirms it’s gone. - Bounds-check elimination (BCE). Loops over slices should drop the
CMP/panic-branch when the compiler can prove the index is in range.-gcflags='-d=ssa/check_bce/debug=1'reports remaining checks. - Did an intrinsic fire?
math/bits.TrailingZeros,bits.OnesCount, atomics — these should be single instructions, not calls. - Did it allocate? A
CALL runtime.newobjectin a “fast” path means something escaped.
🐹 Read assembly to confirm, not to optimize
Assembly is a verification tool, not a starting point. The workflow is always: benchmark and pprof find a genuine hot spot → you make a change (hoist an allocation, simplify a loop) → you read -S (or -m) to confirm the compiler did what you hoped (inlined, eliminated the bounds check, fired the intrinsic). Writing your own assembly (.s files) is a rare, expert move reserved for crypto and the runtime — for application code, guiding the compiler with cleaner Go almost always beats hand-rolling.
⚠️ Assembly output is version- and arch-specific, and easy to misread
Don’t over-invest. The generated code changes between Go versions (the compiler keeps improving), so conclusions you draw today may not hold next release — never hard-code assumptions about codegen. It’s also architecture-specific: amd64 and arm64 output differ, and the abstract pseudo-registers don’t map 1:1 to what a hardware disassembler (objdump) shows. And it is the wrong tool for correctness — a logic bug won’t reveal itself in -S; use tests, the race detector, and go vet. Read assembly to answer a specific, measured performance question, then close the file.
See also
- compile & link — the pipeline that produces this assembly.
- escape analysis — spotting
runtime.newobjectcalls that mean an allocation. - memory layout & alignment — why field offsets show up as
(FP)displacements. - benchmarks (stdlib) — the measurement that should precede any assembly reading.
Next: the whole track on one page — the internals cheat-sheet.
Related topics
From source to a binary — the compiler stages (parse, type-check, SSA, codegen), the linker, build modes, and what actually ends up inside a Go executable.
memoryEscape AnalysisHow the compiler decides whether a value lives on the stack or escapes to the heap — reading go build -gcflags=-m, the patterns that cause escapes, and why it matters for performance.
representationMemory Layout & AlignmentHow Go lays structs out in memory — alignment and padding, field ordering, unsafe.Sizeof/Alignof/Offsetof, and cache lines.
Check your understanding
Score: 0 / 51. How do you see the assembly the Go compiler generates for a package?
go tool compile -S prints the generated assembly to stderr; go build -gcflags=-S does the same as part of a build. The output is Go's own portable assembly notation, not raw platform assembly, though it maps closely to machine instructions.
2. Go's assembler uses a portable, abstract syntax with pseudo-registers like SP, FP, and PC. What does this mean?
Go's assembler (descended from Plan 9) uses an abstract syntax with pseudo-registers — FP (args/frame), SP (stack), PC, SB (globals) — that the toolchain lowers to each architecture. The mnemonics resemble but aren't exactly the platform's, so reading it takes a little acclimatization.
3. What does //go:noinline do and why use it when reading assembly?
By default small functions are inlined, so they vanish into their callers in the assembly. //go:noinline keeps the function separate and emits a CALL, which is handy for benchmarking it in isolation or studying its standalone codegen. It's a compiler directive comment, not a general optimization switch.
4. What are compiler intrinsics?
Certain standard-library functions are recognized by the compiler and emitted as single optimal instructions — e.g. math/bits.TrailingZeros becomes a TZCNT/CTZ, sync/atomic ops become lock-prefixed instructions. In the assembly they appear as inline instructions, not CALLs, which is why they're so fast.
5. When is reading assembly actually worth it?
Assembly is a last-mile tool for verified hot paths: did this inline? did the bounds check get eliminated? did the intrinsic kick in? did this allocate? You go there after pprof/benchmarks identify a hot spot — not as a routine practice, and never to debug correctness.
Comments
Sign in with GitHub to join the discussion.