Observability · Cloud-Native · Intermediate

Metrics with Prometheus

Measuring a service in aggregate — counters, gauges and histograms, the /metrics endpoint Prometheus scrapes, the RED method, and the cardinality trap that blows up your metrics.

Observability Intermediate ⏱ 4 min read Complete

📈 Analogy

Logs are the diary entries of individual requests; metrics are the dashboard gauges of the whole fleet. You don’t read a million diary entries to know the car is overheating — you glance at the temperature gauge. Metrics are those gauges: cheap aggregate numbers (requests/sec, error rate, p99 latency) that you watch on a dashboard and alert on, so you know something is wrong before you dive into the logs to find out what.

Pull-based metrics

Prometheus is an open-source metrics database and monitoring system (the de-facto standard in cloud-native, usually paired with Grafana for dashboards); see prometheus.io. It is pull-based: your app maintains counters in memory and exposes them on a /metrics HTTP endpoint in a simple text format; Prometheus scrapes that endpoint on an interval and stores the time series.

graph LR
APP["Go app<br/>counters in memory<br/>+ /metrics endpoint"] -->|scrape every 15s| PROM["Prometheus<br/>(stores time series)"]
PROM --> GRAF["Grafana dashboards"]
PROM --> ALERT["Alertmanager<br/>(rate/errors/latency)"]

Three core metric types:

Counter — monotonically increasing (requests served, errors). Query rate() for per-second rates.
Gauge — goes up and down (in-flight requests, queue depth, memory).
Histogram — buckets observations (latency, sizes) so you can compute p95/p99 quantiles — the numbers that describe real user experience.

See it: the /metrics exposition format

This runs here: a tiny in-process registry that counts requests by status and renders the Prometheus text format, served via httptest. The real prometheus/client_golang does this richly, but the shape is exactly this:

▶ metrics.go — editable & runnable

package main

import (
"fmt"
"net/http"
"net/http/httptest"
"sort"
"sync"
)

type counter struct {
mu sync.Mutex
v  map[string]int // label "status" -> count
}

func (c *counter) inc(status string) {
c.mu.Lock()
defer c.mu.Unlock()
c.v[status]++
}

func (c *counter) render() string {
c.mu.Lock()
defer c.mu.Unlock()
out := "# TYPE http_requests_total counter\n"
keys := make([]string, 0, len(c.v))
for k := range c.v {
	keys = append(keys, k)
}
sort.Strings(keys)
for _, k := range keys {
	out += fmt.Sprintf("http_requests_total{status=%q} %d\n", k, c.v[k])
}
return out
}

func main() {
reqs := &counter{v: map[string]int{}}
// Simulate traffic.
for i := 0; i < 100; i++ {
	if i%10 == 0 {
		reqs.inc("500")
	} else {
		reqs.inc("200")
	}
}

// Expose /metrics, then scrape it like Prometheus would.
h := http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
	fmt.Fprint(w, reqs.render())
})
rec := httptest.NewRecorder()
h.ServeHTTP(rec, httptest.NewRequest("GET", "/metrics", nil))
fmt.Print(rec.Body.String())
}

That http_requests_total{status="..."} output is exactly what Prometheus scrapes; rate(http_requests_total{status="500"}[5m]) then gives your error rate. In real code you’d use the client library:

// Fenced: github.com/prometheus/client_golang
var reqs = promauto.NewCounterVec(prometheus.CounterOpts{
	Name: "http_requests_total", Help: "Total HTTP requests.",
}, []string{"method", "route", "status"}) // LOW-cardinality labels only

http.Handle("/metrics", promhttp.Handler())
// in your middleware: reqs.WithLabelValues(r.Method, route, status).Inc()

Instrument with RED

For request-driven services, track RED: Rate (requests/sec), Errors (failures/sec), Duration (latency histogram) — per endpoint. Those three answer “is the service healthy?” and are what you alert on. (For resources — CPU, memory, disk — the complementary USE method tracks Utilization, Saturation, Errors.)

🐹 Metrics for aggregates, logs for detail, traces for causality

The three observability pillars are complementary, not interchangeable. Metrics answer “how many / how fast / how bad” cheaply and are what you alert on. Logs capture individual events you investigate. Traces show one request’s path across services. Don’t compute rates by parsing logs (expensive) or try to store per-request detail in metric labels (cardinality explosion). Use each for its job: alert on a metric, jump to the trace, read the logs.

⚠️ High-cardinality labels will take down your metrics

Every unique combination of label values is a separate time series. Put a user_id, request_id, email, or a raw URL with IDs in the path (/orders/12345) into a label and you create millions of series — a cardinality explosion that balloons Prometheus memory, slows queries, and can crash the server. Keep labels low-cardinality and bounded: method, the route template (/orders/:id, not the concrete id), status_code. The high-cardinality detail belongs in logs and traces, never in metric labels.

Check your understanding

Score: 0 / 5

1. How does Prometheus collect metrics from a Go service?

Prometheus is pull-based: the app exposes a /metrics endpoint and Prometheus scrapes it on an interval (e.g. every 15s). The app just maintains counters in memory and renders them in the text exposition format; Prometheus handles collection, storage, and querying. (Short-lived jobs that can't be scraped use a Pushgateway, the exception.)

2. Which metric type fits 'total HTTP requests served'?

A Counter only ever increases (or resets to 0 on restart); you query rate(http_requests_total[5m]) to get requests/sec. Gauges are for values that go up and down (in-flight requests, queue depth, memory). Histograms bucket observations (latency, sizes) so you can compute quantiles. Requests-served is the textbook Counter.

3. What does a Histogram let you compute that a Counter or Gauge cannot?

A Histogram counts observations into latency (or size) buckets, so Prometheus can estimate quantiles like p95/p99 — the numbers that actually describe user experience (an average hides the slow tail). Averages lie; p99 latency tells you what your worst-served users feel. Histograms are how you measure it.

4. What is the RED method for instrumenting a service?

RED — Rate, Errors, Duration — is the standard trio for request-driven services: how many requests, how many failed, and how long they took (as a histogram). Track those per endpoint and you can answer 'is the service healthy?' and alert on it. (USE — Utilization, Saturation, Errors — is the complementary method for resources like CPU/disk.)

5. What is the cardinality trap with metric labels?

Each unique label-value combination is its own time series. A label like user_id or request_id (millions of values) or a raw URL with IDs in the path creates millions of series — blowing up Prometheus memory and query cost (a 'cardinality explosion'). Keep labels low-cardinality and bounded (method, route TEMPLATE, status_code), and put the high-cardinality detail in logs/traces instead.

Sync across devices

Metrics with Prometheus

Pull-based metrics

See it: the /metrics exposition format

Instrument with RED

See also

Check your understanding

Comments

Pull-based metrics

See it: the /metrics exposition format

Instrument with RED

See also

Related topics

Check your understanding

Comments