🔍 Analogy
A fixed-string search (strings.Contains) asks “is this exact word here?” A regular expression asks “is there anything shaped like this here?” — a date, an email, a price, a log line. You describe the shape once; the engine scans the text and finds (or extracts, or rewrites) every place that fits. The cost is that you must first compile the shape into a matcher — so you do it once and reuse it.
Match, compile, reuse
The quick one-shot is regexp.MatchString(pattern, s), but it compiles the pattern every call. In real code you compile once with regexp.Compile (returns an error) or regexp.MustCompile (panics on a bad pattern — perfect for a package-level constant), then reuse the returned *regexp.Regexp. It’s immutable and safe to share across goroutines.
🧪 In real code, write patterns with backticks
Go’s idiomatic regex literal is a backtick raw string — regexp.MustCompile(`\d+`) — so backslashes are literal and you don’t double them. The playgrounds below use double-quoted strings with \\ purely because the page embeds them; the compiled pattern is identical. Prefer backticks in your own code.
package main
import (
"fmt"
"regexp"
)
// Compiled once, reusable, goroutine-safe.
var emailRe = regexp.MustCompile("\\b\\w+@\\w+\\.\\w+\\b")
func main() {
text := "contact ada@example.com or bob@test.org for help"
fmt.Println(emailRe.MatchString(text)) // true
fmt.Println(emailRe.FindString(text)) // ada@example.com (first match)
fmt.Println(emailRe.FindAllString(text, -1)) // [ada@example.com bob@test.org]
fmt.Println(emailRe.FindAllString(text, 1)) // [ada@example.com] (limit = 1)
// Byte offsets of the first match.
fmt.Println(emailRe.FindStringIndex(text)) // [8 23]
}
Capturing groups
Parentheses create capture groups — sub-parts of the match you want to pull out. FindStringSubmatch returns a slice where index 0 is the whole match and 1, 2, … are the groups in order. Groups can be named with (?P<name>…), and SubexpNames maps names to indexes:
package main
import (
"fmt"
"regexp"
)
var dateRe = regexp.MustCompile("(?P<year>\\d{4})-(?P<month>\\d{2})-(?P<day>\\d{2})")
func main() {
m := dateRe.FindStringSubmatch("today is 2026-06-06 ok")
fmt.Println(m) // [2026-06-06 2026 06 06]
fmt.Println(m[1]) // 2026 — first group
// Resolve groups by name.
names := dateRe.SubexpNames()
for i, val := range m {
if names[i] != "" {
fmt.Printf("%s = %s\n", names[i], val)
}
}
// FindAllStringSubmatch returns one []string per match.
all := dateRe.FindAllStringSubmatch("2026-01-02 and 2025-12-25", -1)
fmt.Println(len(all), "matches") // 2 matches
}
Replacing
Two replacement styles. ReplaceAllString uses a template where $1 / ${name} expand to captured groups. ReplaceAllStringFunc runs your function on each match, so the replacement can be computed:
package main
import (
"fmt"
"regexp"
"strings"
)
func main() {
// Template replacement: reorder captured groups.
swap := regexp.MustCompile("(\\w+)\\s+(\\w+)")
fmt.Println(swap.ReplaceAllString("Ada Lovelace", "$2, $1")) // Lovelace, Ada
// Functional replacement: mask every digit.
digits := regexp.MustCompile("\\d")
masked := digits.ReplaceAllStringFunc("card 4012 8888", func(string) string {
return "*"
})
fmt.Println(masked) // card **** ****
// Uppercase the first letter of each word.
firstLetter := regexp.MustCompile("\\b\\w")
title := firstLetter.ReplaceAllStringFunc("go is fun", strings.ToUpper)
fmt.Println(title) // Go Is Fun
}
Under the hood: the RE2 engine
Go’s regexp is built on RE2, and that choice shapes everything. RE2 guarantees matching in time linear in the length of the input, no matter how the pattern is written. Backtracking engines (PCRE, Perl, Java, JavaScript) can take exponential time on adversarial patterns like (a+)+$ against a long string of as — the basis of ReDoS denial-of-service attacks. RE2 simply cannot do that.
graph TD P["pattern source"] --> C["compile → NFA/DFA"] C --> R["*regexp.Regexp (immutable, concurrent-safe)"] R --> M["scan input ONCE, left to right"] M --> G["linear time — no catastrophic backtracking (no ReDoS)"]
The price of that guarantee: RE2 omits features that require backtracking — there are no backreferences (\1 referring to an earlier group) and no lookahead/lookbehind ((?=…), (?<=…)). A pattern from another language that uses those won’t compile in Go, and you’ll need to restructure it (often by capturing and checking in code instead). Everything else — classes, anchors, alternation, repetition, named groups, Unicode classes like \p{L} — is supported. Flags go in the pattern: (?i) case-insensitive, (?s) dot matches newline, (?m) multi-line ^/$.
Syntax cheat sheet
| Pattern | Matches |
|---|---|
. | any char (except newline, unless (?s)) |
\d \w \s | digit / word char / whitespace |
\D \W \S | the negations |
[a-z] [^0-9] | char class / negated class |
^ $ | start / end (of text, or line with (?m)) |
\b | word boundary |
* + ? | 0+, 1+, 0-or-1 (greedy) |
*? +? | lazy (fewest) versions |
{n} {n,} {n,m} | exact / at-least / range repetition |
(…) | capture group |
(?P<name>…) | named capture group |
(?:…) | non-capturing group |
a|b | alternation |
(?i) (?m) (?s) | case-insensitive / multiline / dotall flags |
When to reach for regexp — and when not
- Use a regex for genuine patterns: validating/extracting emails, dates, log fields, tokenizing, find-and-replace with structure.
- Don’t use a regex for fixed substrings —
strings.Contains,HasPrefix,Index,Cutare simpler and far faster. - Don’t parse structured formats with regex — use
encoding/json,encoding/xml, or a real parser for HTML/JSON/CSV. Regex on nested formats is a famous source of bugs. - Compile once. A
MustCompileat package scope beats compiling inside a function called repeatedly.
⚠️ Compile once; raw strings; no backreferences
Three traps. (1) Don’t call regexp.MatchString / regexp.Compile inside a loop — hoist a MustCompile to package level (the *Regexp is concurrent-safe). (2) Write patterns with backtick raw strings so backslashes are literal — `\d+` not "\\d+". (3) RE2 has no backreferences or lookaround; a pattern that relies on them won’t compile, so don’t copy PCRE/JS regexes blindly. If Compile returns an error, your pattern used an unsupported feature or has a syntax mistake.
See also
- strings & bytes — fixed-string search and the faster alternative for non-pattern work.
- fmt & io — formatting the values you extract.
- encoding/json — parse structured data with a real decoder, not a regex.
Next: turning structs into JSON and back — encoding/json.
Related topics
The text toolkit — searching and transforming with the strings package, O(n) assembly via strings.Builder, the parallel bytes package and bytes.Buffer, strconv for number⇄string conversion, and the unicode/utf8 view of multibyte text.
essentialsfmt & ioFormatting and streaming — the fmt verbs you'll actually use, width/precision flags, the Stringer/Formatter hooks, and the tiny io.Reader/io.Writer interfaces (plus io.Copy, MultiWriter, TeeReader) that everything plugs into.
essentialsencoding/jsonTurning Go values into JSON and back — Marshal/Unmarshal, struct tags and omitempty, decoding into structs vs maps, streaming Encoder/Decoder, custom Marshaler/Unmarshaler, and json.RawMessage for deferred decoding.
Check your understanding
Score: 0 / 51. Why prefer regexp.MustCompile at package level over regexp.MatchString inside a hot loop?
MatchString re-compiles the pattern every call. Compile/MustCompile pays that cost once; the returned *Regexp is immutable and goroutine-safe, so you reuse it everywhere. MustCompile panics on a bad pattern — ideal for package-level constants known at build time.
2. What does FindStringSubmatch return for a pattern with capture groups?
FindStringSubmatch returns []string where [0] is the entire match and [1], [2], … are the parenthesized groups in order. It returns nil when the pattern doesn't match at all.
3. Go's regexp uses the RE2 engine. What's the key guarantee that gives you?
RE2 runs in time linear in the input length, which is why it categorically avoids the exponential blowup (ReDoS) that backtracking engines hit. The trade-off: no backreferences and no lookaround — features that require backtracking.
4. You want to check whether a string contains the literal substring "http". What should you use?
For fixed strings, strings.Contains / HasPrefix / Index are simpler and much faster — no compilation, no engine. Reach for regexp only when you need an actual pattern (classes, repetition, alternation, groups).
5. What's special about ReplaceAllStringFunc compared to ReplaceAllString?
ReplaceAllString uses a template where $1, ${name} expand to captured groups. ReplaceAllStringFunc instead runs your func(string) string on each matched substring, so the replacement is dynamic — masking, case-folding, lookups, etc.
Comments
Sign in with GitHub to join the discussion.