{} The Go Reference

Files · Systems · Beginner

Files & Directories

Working the filesystem in Go — permissions, reading and walking directories with WalkDir, computing directory sizes, finding duplicates by hash, and symlinks.

Files Beginner ⏱ 5 min read Complete

🗂️ Analogy

The filesystem is a giant filing cabinet, and Go hands you a clean set of tools for it: list a drawer (ReadDir), walk every drawer and folder top to bottom (WalkDir), check a label’s access rules (permissions), weigh a whole drawer (directory size), and spot identical documents filed in different places (hashing). The OS does the heavy lifting through system calls; the os and io/fs packages give you a portable handle on the drawers.

The portable filesystem API

Go’s filesystem story spans a few packages: os (open, create, read, write, stat, remove), io/fs (the abstract fs.FS and fs.DirEntry types), and path/filepath (OS-aware path joining and tree walking). For everyday reading and writing of whole files see files & the os package; this page is about working with directories and metadata.

Two calls do most of the work:

  • os.ReadDir(dir) — the immediate entries of one directory, as []fs.DirEntry.
  • filepath.WalkDir(root, fn) — a depth-first walk of the whole tree, calling fn(path, d, err) for every entry.

Walk a tree: size and duplicate detection

This runs on the playground (it builds a small tree in a temp dir first), and shows the two everyday jobs — total size and finding duplicates by content hash:

walk.go — editable & runnable
package main

import (
"crypto/sha256"
"encoding/hex"
"fmt"
"io"
"io/fs"
"os"
"path/filepath"
"sort"
)

func main() {
// Build a sample tree in a temp directory (works on the playground FS).
root, _ := os.MkdirTemp("", "tree")
defer os.RemoveAll(root)
os.MkdirAll(filepath.Join(root, "sub"), 0o755)
os.WriteFile(filepath.Join(root, "a.txt"), []byte("hello"), 0o644)
os.WriteFile(filepath.Join(root, "b.txt"), []byte("world!!"), 0o644)
os.WriteFile(filepath.Join(root, "sub", "c.txt"), []byte("hello"), 0o644) // dup of a.txt

// Walk the tree: sum sizes and group by content hash.
var total int64
byHash := map[string][]string{}
filepath.WalkDir(root, func(path string, d fs.DirEntry, err error) error {
	if err != nil || d.IsDir() {
		return err
	}
	info, _ := d.Info() // lazy: only stat the files we keep
	total += info.Size()
	byHash[hashFile(path)] = append(byHash[hashFile(path)], filepath.Base(path))
	return nil
})

fmt.Printf("total size: %d bytes\n", total)
for h, files := range byHash {
	if len(files) > 1 {
		sort.Strings(files)
		fmt.Printf("duplicates (%s…): %v\n", h[:8], files)
	}
}
}

func hashFile(path string) string {
f, _ := os.Open(path)
defer f.Close()
h := sha256.New()
io.Copy(h, f) // stream, so huge files don't blow up memory
return hex.EncodeToString(h.Sum(nil))
}

a.txt and sub/c.txt hash identically, so they’re reported as duplicates even though the names differ. Note d.Info() is called lazily — that’s the WalkDir win over the older Walk.

Permissions and metadata

Every entry carries a mode: type bits plus Unix permission bits.

graph LR
M["0o644"] --> O["owner: rw- (6)"]
M --> G["group: r-- (4)"]
M --> W["other: r-- (4)"]

This runs here — create a file, read its mode, flip the permission bits, and read them back:

perms.go — editable & runnable
package main

import (
"fmt"
"os"
"path/filepath"
)

func main() {
dir, _ := os.MkdirTemp("", "perms")
defer os.RemoveAll(dir)
path := filepath.Join(dir, "script.sh")
os.WriteFile(path, []byte("#!/bin/sh\necho hi\n"), 0o644)

info, _ := os.Stat(path)
fmt.Printf("before: mode=%v size=%d isDir=%v\n",
	info.Mode(), info.Size(), info.Mode().IsDir())

os.Chmod(path, 0o755) // make it executable

info, _ = os.Stat(path)
fmt.Printf("after:  mode=%v  (owner can execute: %v)\n",
	info.Mode(), info.Mode().Perm()&0o100 != 0)
}
// The mode also encodes the file TYPE, queryable with the helper methods:
info.Mode().IsDir()              // directory?
info.Mode()&os.ModeSymlink != 0  // symlink?
info.Mode().Perm()               // just the 0o777 permission bits

A symlink is a file whose contents are a path to another file. The key distinction is Stat follows it, Lstat doesn’t:

os.Symlink("real.txt", "link.txt")      // create link.txt -> real.txt

target, _ := os.Readlink("link.txt")    // "real.txt"
st, _ := os.Stat("link.txt")            // info about real.txt (followed)
ls, _ := os.Lstat("link.txt")           // info about the link itself
fmt.Println(ls.Mode()&os.ModeSymlink != 0) // true

When walking trees, use Lstat semantics (which WalkDir gives you via DirEntry) so you don’t follow a symlink into a cycle or outside your root.

🐹 fs.FS makes filesystem code testable

Code written against the io/fs interfaces (fs.FS, fs.WalkDir, fs.ReadDir) works over any filesystem — the real OS (os.DirFS("/some/root")), an embedded one (//go:embed), a zip, or an in-memory fstest.MapFS in your tests. So instead of hard-coding os.* calls, accept an fs.FS and your directory-walking logic becomes trivially unit-testable with a fake tree — no temp directories required.

⚠️ Paths, errors, and the WalkDir callback

Three traps. Use path/filepath, not string concatenationfilepath.Join handles separators (/ vs \) and cleans ..; building paths with + "/" + breaks on Windows. Handle the err argument in the WalkDir callback — a permission error on one subdirectory shouldn’t abort the whole walk unless you want it to (return nil to skip, filepath.SkipDir to prune). And don’t ignore close errors when writing — but for reads, always defer f.Close() so a long walk doesn’t leak descriptors.

See also

Next: writing files without corrupting them — temp files & atomic writes.

Check your understanding

Score: 0 / 5

1. What's the difference between os.ReadDir and filepath.WalkDir?

os.ReadDir(dir) returns the immediate entries (as []os.DirEntry). filepath.WalkDir(root, fn) walks the entire tree depth-first, invoking fn for each path — it's the modern, DirEntry-based replacement for the slower filepath.Walk.

2. Why is filepath.WalkDir preferred over the older filepath.Walk?

Walk gives your callback a fully-populated os.FileInfo, forcing a stat on every entry. WalkDir (Go 1.16+) passes a lazy fs.DirEntry; you call Info() only when you actually need size/mtime, saving a syscall per file on large trees.

3. What does a file mode like 0o644 mean?

Unix permissions are three octal digits (owner, group, other), each a sum of read(4)+write(2)+execute(1). 0o644 = rw-r--r--; 0o755 = rwxr-xr-x (typical for executables/dirs). The 0o prefix is Go's octal literal.

4. How do you reliably detect duplicate files?

Different files can share a name, size, or mtime. Hashing the bytes (streaming them through sha256 so you don't load huge files into memory) gives a content fingerprint; equal hashes mean equal content. A common optimization is to group by size first, then hash only within size-groups.

5. What does os.Lstat give you that os.Stat doesn't?

os.Stat follows symlinks and reports the target. os.Lstat reports the link itself (so info.Mode()&os.ModeSymlink != 0 tells you it's a link). That distinction matters when walking trees so you don't follow links into cycles or outside the root.

Comments

Sign in with GitHub to join the discussion.