🔎 Analogy
Walk up to a shop and you learn a lot before entering: the brand of lock, the alarm-company sticker, the “back in 5 minutes” note, the delivery entrance round the side. HTTP recon is reading those signs on a web service — the Server header is the lock brand, a stray /.git/ is the propped-open back door, a verbose error page is the note that says too much. None of it is breaking in; it’s reading what the building tells you for free.
A recon client, built right
Everything starts with an HTTP client you control — custom headers, a real timeout, and your own redirect policy. The build is fenced (it needs the network) but it’s the foundation of every web tool:
client := &http.Client{
Timeout: 8 * time.Second, // NEVER use the zero-timeout default
CheckRedirect: func(r *http.Request, via []*http.Request) error {
return http.ErrUseLastResponse // see redirects, don't auto-follow
},
}
req, _ := http.NewRequestWithContext(ctx, "GET", url, nil)
req.Header.Set("User-Agent", "recon/1.0 (authorized testing)")
resp, err := client.Do(req)
The two non-negotiables: a timeout (the default client has none — one hung host stalls a goroutine forever) and an honest User-Agent for authorized work.
Fingerprinting from the response
The headers and status code reveal the stack:
graph TD R["HTTP response"] --> H["Server: nginx/1.18"] R --> P["X-Powered-By: PHP/7.4"] R --> C["Set-Cookie: PHPSESSID"] R --> CDN["CF-RAY / X-Amz-Cf-Id"] H --> MAP["map to known CVEs"] P --> MAP C --> MAP CDN --> WAF["CDN / WAF in front?"]
See it: parse and fingerprint a response
You can analyze a raw HTTP response with http.ReadResponse — no network needed — which is exactly how this runs in the sandbox. The same parsing logic works on a real resp from client.Do:
package main
import (
"bufio"
"fmt"
"net/http"
"sort"
"strings"
)
// A captured raw HTTP response (as if read from the wire).
const raw = "HTTP/1.1 200 OK\r\n" +
"Server: nginx/1.18.0\r\n" +
"X-Powered-By: PHP/7.4.3\r\n" +
"Set-Cookie: PHPSESSID=abc; HttpOnly\r\n" +
"Content-Type: text/html\r\n" +
"Content-Length: 0\r\n" +
"\r\n"
func main() {
resp, err := http.ReadResponse(bufio.NewReader(strings.NewReader(raw)), nil)
if err != nil {
fmt.Println("parse error:", err)
return
}
defer resp.Body.Close()
fmt.Println("status:", resp.StatusCode)
// Simple tech fingerprint from headers.
var tech []string
if s := resp.Header.Get("Server"); s != "" {
tech = append(tech, "web server: "+s)
}
if p := resp.Header.Get("X-Powered-By"); p != "" {
tech = append(tech, "language: "+p)
}
for _, c := range resp.Cookies() {
if c.Name == "PHPSESSID" {
tech = append(tech, "session: PHP")
}
}
sort.Strings(tech)
fmt.Println("fingerprint:")
for _, t := range tech {
fmt.Println(" -", t)
}
}
That nginx/1.18.0 + PHP/7.4.3 fingerprint is the first thing an attacker maps to a CVE list — and the first thing a defender should make less informative.
Content and path discovery
The interesting paths aren’t linked — /admin, /.git/config, /backup.zip, /api/v1/docs. Path discovery requests each candidate from a wordlist (the same worker pool) and flags non-boring responses. The one subtlety is soft-404 calibration:
// Calibrate: fetch a path that definitely doesn't exist, record its
// status + body length as the "not found" fingerprint.
notFound := fetch(base + "/zzq-" + random())
// A real hit differs from that baseline (different status OR length).
if resp.StatusCode != notFound.status || resp.length != notFound.length {
report(path, resp.StatusCode)
}
Without calibration, a site that serves a friendly 200 page for missing routes makes every guess look like a hit. Always baseline first, then throttle (don’t hammer the target), and honor robots.txt as a hint about interesting paths — it often lists exactly what the owner wanted hidden.
Defending your web service
🐹 Make your responses say less
You can’t stop recon, but you can starve it. Strip version detail from Server/X-Powered-By (or override with a generic value). Return uniform errors — a consistent generic 404 defeats soft-404 calibration and tech-stack guessing. Add security headers (see web security): HSTS, CSP, X-Content-Type-Options. And rate-limit aggressive clients so directory brute-forcing is slow and visible. Each one is small; together they turn a rich fingerprint into a shrug.
⚠️ Scraping and brute-forcing have legal and operational limits
Automated requests against a site you don’t own can breach its terms of service and, at volume, computer-misuse law — and an aggressive content-discovery run can look like an attack or knock a fragile app over. For authorized testing, throttle hard, set a clear User-Agent, and stay in scope. For scraping specifically, respect robots.txt and rate limits even when the law is ambiguous — being a good citizen keeps you out of both technical and legal trouble.
See also
- DNS enumeration — finding the hosts whose web services you then profile.
- Fuzzing for bugs — pushing past recon into finding actual flaws.
- Input validation & injection defense — the bugs recon is looking for.
- Web security — the security headers and hardening referenced above.
Next: from reading what a service says to finding where it breaks — fuzzing for bugs.
Related topics
Mapping a target's attack surface through DNS — record types and lookups, concurrent subdomain brute-forcing, zone transfers as a misconfiguration, and the defenses that limit what DNS reveals.
offensiveFuzzing for BugsFinding crashes and vulnerabilities by feeding malformed input — a runnable mutation fuzzer that discovers a parser bug, Go's built-in coverage-guided fuzzing, and why fuzzing your own code is the best defense.
defenseInput Validation & Injection DefenseThe bug class behind most breaches — why injection happens (mixing data with code), and the structural fixes: parameterized queries, html/template auto-escaping, allowlist validation, and safe path handling.
Check your understanding
Score: 0 / 51. What can response headers alone reveal about a web target?
Server, X-Powered-By, X-AspNet-Version, Set-Cookie names (PHPSESSID, JSESSIONID), and CDN headers (CF-RAY, X-Amz-Cf-Id) fingerprint the stack. An attacker maps 'nginx 1.18 + PHP 7.4' to known CVEs; a defender removes or genericizes these headers so they reveal less.
2. Why should a recon HTTP client set its own timeout instead of using http.DefaultClient as-is?
http.DefaultClient (and http.Get) have no timeout by default — a slow or malicious server can hold the connection open indefinitely. Always construct &http.Client{Timeout: ...} (and ideally a context per request) so every probe is bounded. This is true for production clients too, not just recon.
3. What is content/path discovery (directory brute-forcing)?
Many sensitive paths aren't linked from anywhere — admin panels, exposed .git directories, backup archives, API docs. A discovery tool requests each path from a wordlist and flags interesting status codes (200, 403, 401, 301). Tools like gobuster/ffuf do exactly this; the Go pattern is the same worker pool you've seen.
4. Why must a content-discovery tool calibrate against the target's 404 behavior?
Plenty of apps serve a friendly 200 page for missing routes (a 'soft 404'). A naive tool that treats 200 as 'found' reports every path as existing. The fix: request a guaranteed-nonexistent path first to learn the not-found fingerprint (status + body length), then flag responses that differ from it.
5. What's the most effective way to reduce what HTTP recon learns about your service?
Remove Server/X-Powered-By detail, return consistent generic error pages (so soft-404 calibration and tech fingerprinting are harder), set security headers, and rate-limit aggressive clients. Blocking one User-Agent is trivially bypassed. You can't stop recon, but you can make it learn far less — defense in depth, not a single trick.
Comments
Sign in with GitHub to join the discussion.