How It Works#

This document provides information about the architecture, data flows, and internal mechanisms of Metrics Manager.

High-Level Architecture#

Metrics Manager is a unified metrics platform that collects, stores, and streams metrics from three main sources:

System Metrics — Telegraf agents collect CPU, memory, temperature, GPU, NPU data
Custom Metrics — REST API accepts JSON, InfluxDB Line Protocol, OpenTelemetry formats
Real-time Streaming — Server-Sent Events (SSE) broadcasts metrics to live dashboards

Metrics Manager High-Level Architecture

Component Breakdown#

Telegraf (System Metrics Collector)#

Telegraf is a lightweight metrics agent that runs inside the container and collects:

Inputs:

/proc/stat → CPU usage (user, system, idle) per core
/proc/meminfo → Memory (total, used, available)
/sys/class/thermal/ → CPU temperature
read_cpu_freq.sh (custom script) → CPU frequency
qmassa (GPU reader) → Intel Arc GPU metrics (via named pipe)
npu_reader.py (custom script) → Intel NPU metrics
Telegraf HTTP listener :8186 → Custom metrics from Metrics Manager

Processing:

All inputs go through a Starlark processor (temperature filtering)
Aggregated into Prometheus text format

Outputs:

:9273/metrics — Prometheus endpoint (scraped by SSE clients)
:8186/write — HTTP listener (accepts InfluxDB Line Protocol from FastAPI)

Metrics Manager (FastAPI Application)#

The FastAPI application provides:

REST API Endpoints:

POST /api/v1/metrics/simple — Single metric (easiest format)
POST /api/v1/metrics — JSON batch (multiple metrics with multiple fields)
POST /api/v1/metrics/influx — InfluxDB Line Protocol batch
POST /api/v1/metrics/otlp — OpenTelemetry (OTLP) format
GET /api/v1/metrics — List all stored metrics (JSON)
GET /api/v1/metrics/latest — Latest value per metric name
GET /api/v1/metrics/names — Metric name list
DELETE /api/v1/metrics — Clear all custom metrics
GET /health — Basic health check
GET /api/health — Detailed health with service statistics
GET /metrics — Prometheus format (custom metrics only)

SSE Stream Endpoint:

GET /metrics/stream — Server-Sent Events stream
- Polls Telegraf :9273/metrics every 500ms (configurable)
- Broadcasts all metrics as data: {...} events
- Each client polls independently (no shared queue)
- Browser support: serves HTML page with live table; SSE clients get raw stream

In-Memory Storage:

Stores custom metrics with configurable retention (default 300 seconds)
Automatic cleanup of expired metrics
Automatic eviction when memory limit reached (default 100k metrics)
Debounced persistence to Telegraf :8186/write (default 100ms debounce)

Supervisor (Process Manager)#

Manages three long-running processes inside the container:

Telegraf (priority 10) — System metrics collection
Metrics Manager / uvicorn (priority 20) — FastAPI application
qmassa (priority 30) — Intel GPU reader (writes to named pipe)

Priority determines startup order, with lower priority starting first. All are auto-restarted if they crash.

Container Startup Sequence#

When the container starts, the entrypoint.sh script initializes the environment and supervisor manages process startup:

entrypoint.sh
  │
  ├── Create directory: /app/custom-metrics
  ├── Create named pipe: /app/qmassa.fifo
  └── Start supervisord
        │
        ├── [priority 10] telegraf --config telegraf.conf
        │   └── Starts immediately
        │       ├── Reads /proc/stat, /proc/meminfo, /sys/class/thermal/
        │       ├── Listens on :9273 (Prometheus output)
        │       ├── Listens on :8186 (HTTP write endpoint)
        │       └── Waits for custom metric inputs
        │
        ├── [priority 20] uvicorn app.main:app --port 9090
        │   └── Starts after Telegraf is ready
        │       ├── Initializes MetricsStore
        │       ├── Registers routes (/api/v1/*, /metrics/stream, /health)
        │       ├── Sets up middleware (CORS, rate limiting, compression)
        │       ├── Calls lifespan startup hooks
        │       └── Waits for requests
        │
        └── [priority 30] qmassa --to-json /app/qmassa.fifo
            └── Starts last
                ├── Enumerates Intel Arc GPUs
                ├── Writes JSON metrics to named pipe
                └── Telegraf reads from pipe and publishes on :9273

Startup takes ~2–5 seconds. The service is ready when all three processes are RUNNING (check with supervisorctl status).

Custom Metrics Flow#

HTTP POST client
  │
  ├── /api/v1/metrics         (JSON batch)
  ├── /api/v1/metrics/simple  (single metric)
  ├── /api/v1/metrics/influx  (InfluxDB Line Protocol)
  └── /api/v1/metrics/otlp    (OpenTelemetry)
         │
         ▼
  Parsing (Pydantic validation)
         │
         ▼
  Conversion to Metric object
         │
         ▼
  MetricsStore.add_metric(s)
  ├── Check memory limits (max 100k metrics)
  │   └── If exceeded → eviction of oldest entries
  ├── Save in _metrics[name] with expires_at
  ├── Schedule debounced persistence (default 100ms)
  │   └── asyncio.Task → HTTP POST to Telegraf :8186/write
  │       (fire-and-forget with error callback)
  └── Returns: {"accepted": N, "message": "..."}
         │
         ▼
  Metric visible in next SSE event
  (after Telegraf persistence, debounced ~100ms)

Key Details:

All ingestion endpoints are fire-and-forget — the response is sent immediately while persistence happens in the background
No blocking: Custom metrics do not block system metrics collection
Debounced persistence: Multiple metrics pushed within 100ms are batched together in a single HTTP POST to Telegraf
Exponential backoff: Failed HTTP POSTs retry with exponential backoff (not implemented yet, logged only)

FastAPI Middleware Stack#

The Metrics Manager FastAPI application uses a layered middleware stack to handle cross-cutting concerns like request tracing, compression, rate limiting, and CORS. Middleware is applied in the order shown below (outer to inner):

FastAPI Middleware Stack

Important Details:

Correlation IDs: Every request and response includes an X-Correlation-ID header. Use this ID to track a request through logs across services
Rate Limiting: Applied to all endpoints except /health, /api/v1/stats, and SSE (/metrics/stream) — ensure critical endpoints remain accessible
GZIP Compression: Automatic for responses >1 KB; especially useful for SSE streams with many metrics
CORS: By default allows all origins (*). Restrict via CORS_ORIGINS environment variable in production

See Environment Variables for security-related settings.

System Metrics Collection#

Telegraf Metrics Collection Flow

Telegraf Inputs:

Input	Source	Interval	Fields
`cpu`	`/proc/stat`	1s	usage_user, usage_system, usage_idle (per core + total)
`mem`	`/proc/meminfo`	1s	total, used, available, used_percent
`temp`	`/sys/class/thermal/`	1s	temperature (per sensor, filtered to coretemp)
`exec`	`read_cpu_freq.sh`	10s	cpu_freq_mhz (per core)
`execd`	`qmassa_reader.py`	continuous	gpu_* (engine usage, frequency, power)
`execd`	`npu_reader.py`	1s	npu_power, npu_frequency, npu_temperature, npu_utilization, etc.

Telegraf Output:

Listens on :9273 and exposes all metrics in Prometheus text format
Metrics include a host= tag (hostname or METRICS_MANAGER_HOSTNAME override)

SSE Streaming#

SSE Client (browser or script)
  │
  └── GET http://localhost:9090/metrics/stream
        │
        ▼  each client runs its own independent event_stream()
        │  coroutine — there is no shared queue or subscriber state
        │
  sse.py: event_stream()
  │
  ├── Loop: every PROMETHEUS_POLLER_INTERVAL_MS (default 500ms)
  │   └── HTTP GET http://localhost:9273/metrics
  │       └── Parse Prometheus text format
  │           └── Convert to flat metric list
  │               └── yield: data: {"timestamp": ..., "metrics": [...]}
  │
  └── Each event contains all metrics available at that moment
     (system metrics from Telegraf + persisted custom metrics)

Event Format:

{
  "timestamp": 1777461975860,
  "metrics": [
    {
      "name": "cpu_usage_user",
      "labels": { "cpu": "cpu-total", "host": "myhost" },
      "value": 0.14,
      "timestamp": 1777463430000
    },
    {
      "name": "memory_used_mb",
      "labels": { "host": "myhost" },
      "value": 2048.5,
      "timestamp": 1777463430000
    }
  ]
}

Content Negotiation:

Browser request (Accept: text/html) → served HTML page with in-place updated metrics table
SSE request (Accept: text/event-stream) → raw event stream
No Accept header → defaults to event stream for backwards compatibility

Connection Model:

Each client connection is independent
No shared queue or subscriber state
If a client connects after a metric is published, that metric appears in the next polling cycle (~500ms)
If a client disconnects, no cleanup needed (connection closed immediately)

Metric Lifecycle#

1. Metric arrives (via API or custom script)
   │
   ▼
2. Parsed and validated (Pydantic)
   │
   ▼
3. Converted to Metric object with tags
   │
   ▼
4. Wrapped in StoredMetric(metric, expires_at=now + METRICS_RETENTION_SECONDS)
   │
   ▼
5. Added to MetricsStore._metrics[name] (list per metric name)
   │
   ▼
6. Memory check: if > MAX_METRICS_IN_MEMORY → _evict_oldest()
   │
   ▼
7. Debounced persistence (default 100ms):
   │   ├─ If >= 100ms since last push → persist immediately
   │   │   └── HTTP POST to Telegraf :8186/write (fire-and-forget)
   │   │       └── InfluxDB Line Protocol format
   │   │       └── Error callback logs failures (no retry yet)
   │   └─ Otherwise → asyncio.Task(_delayed_persist) after sleep
   │       └── Checks _pending_persist inside lock (no race)
   │       └── Makes one HTTP POST with accumulated metrics
   │
   ▼
8. Metric visible in Prometheus endpoint (:9273/metrics)
   │   └── Available to SSE clients in next polling cycle (~500ms)
   │
   ▼
9. After METRICS_RETENTION_SECONDS (default 300s):
   │   └── Metric expires and is removed on next store access
   │
   ▼
10. Cleanup: _cleanup_expired() runs on every store access
    └── Removes all metrics where expires_at < now

Key Classes and Architecture#

main.py (FastAPI application)
  │
  ├── lifespan.startup
  │   └── MetricsStore.get_instance() → singleton initialization
  │
  ├── middleware stack
  │   ├── CorrelationIdMiddleware (X-Correlation-ID headers)
  │   ├── GZipMiddleware (compress responses >1KB)
  │   ├── RateLimitMiddleware (token bucket per IP)
  │   └── CORSMiddleware (configurable origins)
  │
  ├── routes.py (REST endpoints)
  │   ├── POST /api/v1/metrics/*
  │   ├── GET /api/v1/metrics
  │   ├── GET /health
  │   └── etc.
  │
  └── sse.py (SSE endpoint)
      └── GET /metrics/stream
          └── event_stream() — polls :9273 per client

Data Formats#

InfluxDB Line Protocol (Internal Persistence Format)#

measurement[,tag1=val1,tag2=val2] field1=val1[,field2=val2] [timestamp]

Example:

cpu_usage,host=myhost,cpu=cpu0 usage=45.2 1704067200000000000
memory,host=myhost used_percent=67.5 1704067200000000000

Used for:

Telegraf input/output
Custom metrics persistence to Telegraf
/api/v1/metrics/influx endpoint input

Prometheus Text Format (Query Output)#

metric_name{label1="value1",label2="value2"} value
metric_name{label1="value1",label2="value2"} value