Offensive · Security · Intermediate

HTTP Reconnaissance

Profiling web targets in Go — a custom HTTP client, fingerprinting tech from headers, content and path discovery, and the response-hardening defenses (security headers, generic errors, rate limits) that blunt it.

Offensive Intermediate ⏱ 5 min read Complete

🔎 Analogy

Walk up to a shop and you learn a lot before entering: the brand of lock, the alarm-company sticker, the “back in 5 minutes” note, the delivery entrance round the side. HTTP recon is reading those signs on a web service — the Server header is the lock brand, a stray /.git/ is the propped-open back door, a verbose error page is the note that says too much. None of it is breaking in; it’s reading what the building tells you for free.

A recon client, built right

Everything starts with an HTTP client you control — custom headers, a real timeout, and your own redirect policy. The build is fenced (it needs the network) but it’s the foundation of every web tool:

client := &http.Client{
	Timeout: 8 * time.Second, // NEVER use the zero-timeout default
	CheckRedirect: func(r *http.Request, via []*http.Request) error {
		return http.ErrUseLastResponse // see redirects, don't auto-follow
	},
}
req, _ := http.NewRequestWithContext(ctx, "GET", url, nil)
req.Header.Set("User-Agent", "recon/1.0 (authorized testing)")
resp, err := client.Do(req)

The two non-negotiables: a timeout (the default client has none — one hung host stalls a goroutine forever) and an honest User-Agent for authorized work.

Fingerprinting from the response

The headers and status code reveal the stack:

graph TD
R["HTTP response"] --> H["Server: nginx/1.18"]
R --> P["X-Powered-By: PHP/7.4"]
R --> C["Set-Cookie: PHPSESSID"]
R --> CDN["CF-RAY / X-Amz-Cf-Id"]
H --> MAP["map to known CVEs"]
P --> MAP
C --> MAP
CDN --> WAF["CDN / WAF in front?"]

See it: parse and fingerprint a response

You can analyze a raw HTTP response with http.ReadResponse — no network needed — which is exactly how this runs in the sandbox. The same parsing logic works on a real resp from client.Do:

▶ fingerprint.go — editable & runnable

package main

import (
"bufio"
"fmt"
"net/http"
"sort"
"strings"
)

// A captured raw HTTP response (as if read from the wire).
const raw = "HTTP/1.1 200 OK\r\n" +
"Server: nginx/1.18.0\r\n" +
"X-Powered-By: PHP/7.4.3\r\n" +
"Set-Cookie: PHPSESSID=abc; HttpOnly\r\n" +
"Content-Type: text/html\r\n" +
"Content-Length: 0\r\n" +
"\r\n"

func main() {
resp, err := http.ReadResponse(bufio.NewReader(strings.NewReader(raw)), nil)
if err != nil {
	fmt.Println("parse error:", err)
	return
}
defer resp.Body.Close()

fmt.Println("status:", resp.StatusCode)

// Simple tech fingerprint from headers.
var tech []string
if s := resp.Header.Get("Server"); s != "" {
	tech = append(tech, "web server: "+s)
}
if p := resp.Header.Get("X-Powered-By"); p != "" {
	tech = append(tech, "language: "+p)
}
for _, c := range resp.Cookies() {
	if c.Name == "PHPSESSID" {
		tech = append(tech, "session: PHP")
	}
}
sort.Strings(tech)
fmt.Println("fingerprint:")
for _, t := range tech {
	fmt.Println("  -", t)
}
}

That nginx/1.18.0 + PHP/7.4.3 fingerprint is the first thing an attacker maps to a CVE list — and the first thing a defender should make less informative.

Content and path discovery

The interesting paths aren’t linked — /admin, /.git/config, /backup.zip, /api/v1/docs. Path discovery requests each candidate from a wordlist (the same worker pool) and flags non-boring responses. The one subtlety is soft-404 calibration:

// Calibrate: fetch a path that definitely doesn't exist, record its
// status + body length as the "not found" fingerprint.
notFound := fetch(base + "/zzq-" + random())
// A real hit differs from that baseline (different status OR length).
if resp.StatusCode != notFound.status || resp.length != notFound.length {
	report(path, resp.StatusCode)
}

Without calibration, a site that serves a friendly 200 page for missing routes makes every guess look like a hit. Always baseline first, then throttle (don’t hammer the target), and honor robots.txt as a hint about interesting paths — it often lists exactly what the owner wanted hidden.

Defending your web service

🐹 Make your responses say less

You can’t stop recon, but you can starve it. Strip version detail from Server/X-Powered-By (or override with a generic value). Return uniform errors — a consistent generic 404 defeats soft-404 calibration and tech-stack guessing. Add security headers (see web security): HSTS, CSP, X-Content-Type-Options. And rate-limit aggressive clients so directory brute-forcing is slow and visible. Each one is small; together they turn a rich fingerprint into a shrug.

⚠️ Scraping and brute-forcing have legal and operational limits

Automated requests against a site you don’t own can breach its terms of service and, at volume, computer-misuse law — and an aggressive content-discovery run can look like an attack or knock a fragile app over. For authorized testing, throttle hard, set a clear User-Agent, and stay in scope. For scraping specifically, respect robots.txt and rate limits even when the law is ambiguous — being a good citizen keeps you out of both technical and legal trouble.

Check your understanding

Score: 0 / 5

1. What can response headers alone reveal about a web target?

Server, X-Powered-By, X-AspNet-Version, Set-Cookie names (PHPSESSID, JSESSIONID), and CDN headers (CF-RAY, X-Amz-Cf-Id) fingerprint the stack. An attacker maps 'nginx 1.18 + PHP 7.4' to known CVEs; a defender removes or genericizes these headers so they reveal less.

2. Why should a recon HTTP client set its own timeout instead of using http.DefaultClient as-is?

http.DefaultClient (and http.Get) have no timeout by default — a slow or malicious server can hold the connection open indefinitely. Always construct &http.Client{Timeout: ...} (and ideally a context per request) so every probe is bounded. This is true for production clients too, not just recon.

3. What is content/path discovery (directory brute-forcing)?

Many sensitive paths aren't linked from anywhere — admin panels, exposed .git directories, backup archives, API docs. A discovery tool requests each path from a wordlist and flags interesting status codes (200, 403, 401, 301). Tools like gobuster/ffuf do exactly this; the Go pattern is the same worker pool you've seen.

4. Why must a content-discovery tool calibrate against the target's 404 behavior?

Plenty of apps serve a friendly 200 page for missing routes (a 'soft 404'). A naive tool that treats 200 as 'found' reports every path as existing. The fix: request a guaranteed-nonexistent path first to learn the not-found fingerprint (status + body length), then flag responses that differ from it.

5. What's the most effective way to reduce what HTTP recon learns about your service?

Remove Server/X-Powered-By detail, return consistent generic error pages (so soft-404 calibration and tech fingerprinting are harder), set security headers, and rate-limit aggressive clients. Blocking one User-Agent is trivially bypassed. You can't stop recon, but you can make it learn far less — defense in depth, not a single trick.

Sync across devices

HTTP Reconnaissance

A recon client, built right

Fingerprinting from the response

See it: parse and fingerprint a response

Content and path discovery

Defending your web service

See also

Check your understanding

Comments

A recon client, built right

Fingerprinting from the response

See it: parse and fingerprint a response

Content and path discovery

Defending your web service

See also

Related topics

Check your understanding

Comments