# How It Works

This document provides information about the architecture, data flows, and internal mechanisms of Metrics Manager.

## High-Level Architecture

Metrics Manager is a unified metrics platform that collects, stores, and streams metrics from three main sources:

1. **System Metrics** — Telegraf agents collect CPU, memory, temperature, GPU, NPU data
2. **Custom Metrics** — REST API accepts JSON, InfluxDB Line Protocol, OpenTelemetry formats
3. **Real-time Streaming** — Server-Sent Events (SSE) broadcasts metrics to live dashboards

![Metrics Manager High-Level Architecture](./_assets/metrics-mgr-high-lev-arch.drawio.svg "high-level architecture")

---

## Component Breakdown

### Telegraf (System Metrics Collector)

Telegraf is a lightweight metrics agent that runs inside the container and collects:

**Inputs:**

- `/proc/stat` → CPU usage (user, system, idle) per core
- `/proc/meminfo` → Memory (total, used, available)
- `/sys/class/thermal/` → CPU temperature
- `read_cpu_freq.sh` (custom script) → CPU frequency
- `qmassa` (GPU reader) → Intel Arc GPU metrics (via named pipe)
- `npu_reader.py` (custom script) → Intel NPU metrics
- `Telegraf HTTP listener :8186` → Custom metrics from Metrics Manager

**Processing:**

- All inputs go through a Starlark processor (temperature filtering)
- Aggregated into Prometheus text format

**Outputs:**

- `:9273/metrics` — Prometheus endpoint (scraped by SSE clients)
- `:8186/write` — HTTP listener (accepts InfluxDB Line Protocol from FastAPI)

### Metrics Manager (FastAPI Application)

The FastAPI application provides:

**REST API Endpoints:**

- `POST /api/v1/metrics/simple` — Single metric (easiest format)
- `POST /api/v1/metrics` — JSON batch (multiple metrics with multiple fields)
- `POST /api/v1/metrics/influx` — InfluxDB Line Protocol batch
- `POST /api/v1/metrics/otlp` — OpenTelemetry (OTLP) format
- `GET /api/v1/metrics` — List all stored metrics (JSON)
- `GET /api/v1/metrics/latest` — Latest value per metric name
- `GET /api/v1/metrics/names` — Metric name list
- `DELETE /api/v1/metrics` — Clear all custom metrics
- `GET /health` — Basic health check
- `GET /api/health` — Detailed health with service statistics
- `GET /metrics` — Prometheus format (custom metrics only)

**SSE Stream Endpoint:**

- `GET /metrics/stream` — Server-Sent Events stream
  - Polls Telegraf `:9273/metrics` every 500ms (configurable)
  - Broadcasts all metrics as `data: {...}` events
  - Each client polls independently (no shared queue)
  - Browser support: serves HTML page with live table; SSE clients get raw stream

**In-Memory Storage:**

- Stores custom metrics with configurable retention (default 300 seconds)
- Automatic cleanup of expired metrics
- Automatic eviction when memory limit reached (default 100k metrics)
- Debounced persistence to Telegraf `:8186/write` (default 100ms debounce)

### Supervisor (Process Manager)

Manages three long-running processes inside the container:

1. **Telegraf** (priority 10) — System metrics collection
2. **Metrics Manager / uvicorn** (priority 20) — FastAPI application
3. **qmassa** (priority 30) — Intel GPU reader (writes to named pipe)

Priority determines startup order, with lower priority starting first. All are auto-restarted if they crash.

### Container Startup Sequence

When the container starts, the `entrypoint.sh` script initializes the environment and supervisor manages process startup:

```
entrypoint.sh
  │
  ├── Create directory: /app/custom-metrics
  ├── Create named pipe: /app/qmassa.fifo
  └── Start supervisord
        │
        ├── [priority 10] telegraf --config telegraf.conf
        │   └── Starts immediately
        │       ├── Reads /proc/stat, /proc/meminfo, /sys/class/thermal/
        │       ├── Listens on :9273 (Prometheus output)
        │       ├── Listens on :8186 (HTTP write endpoint)
        │       └── Waits for custom metric inputs
        │
        ├── [priority 20] uvicorn app.main:app --port 9090
        │   └── Starts after Telegraf is ready
        │       ├── Initializes MetricsStore
        │       ├── Registers routes (/api/v1/*, /metrics/stream, /health)
        │       ├── Sets up middleware (CORS, rate limiting, compression)
        │       ├── Calls lifespan startup hooks
        │       └── Waits for requests
        │
        └── [priority 30] qmassa --to-json /app/qmassa.fifo
            └── Starts last
                ├── Enumerates Intel Arc GPUs
                ├── Writes JSON metrics to named pipe
                └── Telegraf reads from pipe and publishes on :9273
```

**Startup takes ~2–5 seconds.** The service is ready when all three processes are RUNNING (check with `supervisorctl status`).

---

## Custom Metrics Flow

```
HTTP POST client
  │
  ├── /api/v1/metrics         (JSON batch)
  ├── /api/v1/metrics/simple  (single metric)
  ├── /api/v1/metrics/influx  (InfluxDB Line Protocol)
  └── /api/v1/metrics/otlp    (OpenTelemetry)
         │
         ▼
  Parsing (Pydantic validation)
         │
         ▼
  Conversion to Metric object
         │
         ▼
  MetricsStore.add_metric(s)
  ├── Check memory limits (max 100k metrics)
  │   └── If exceeded → eviction of oldest entries
  ├── Save in _metrics[name] with expires_at
  ├── Schedule debounced persistence (default 100ms)
  │   └── asyncio.Task → HTTP POST to Telegraf :8186/write
  │       (fire-and-forget with error callback)
  └── Returns: {"accepted": N, "message": "..."}
         │
         ▼
  Metric visible in next SSE event
  (after Telegraf persistence, debounced ~100ms)
```

**Key Details:**

- All ingestion endpoints are **fire-and-forget** — the response is sent immediately while persistence happens in the background
- **No blocking**: Custom metrics do not block system metrics collection
- **Debounced persistence**: Multiple metrics pushed within 100ms are batched together in a single HTTP POST to Telegraf
- **Exponential backoff**: Failed HTTP POSTs retry with exponential backoff (not implemented yet, logged only)

---

## FastAPI Middleware Stack

The Metrics Manager FastAPI application uses a layered middleware stack to handle cross-cutting concerns like request tracing, compression, rate limiting, and CORS. Middleware is applied in the order shown below (outer to inner):

![FastAPI Middleware Stack](./_assets/metrics-mgr-fast-api-middleware.drawio.svg "fastapi middleware stack diagram")

**Important Details:**

- **Correlation IDs**: Every request and response includes an `X-Correlation-ID` header. Use this ID to track a request through logs across services
- **Rate Limiting**: Applied to all endpoints except `/health`, `/api/v1/stats`, and SSE (`/metrics/stream`) — ensure critical endpoints remain accessible
- **GZIP Compression**: Automatic for responses >1 KB; especially useful for SSE streams with many metrics
- **CORS**: By default allows all origins (`*`). Restrict via `CORS_ORIGINS` environment variable in production

See [Environment Variables](./get-started/environment-variables.md#security) for security-related settings.

---

## System Metrics Collection

![Telegraf Metrics Collection Flow](./_assets/metrics-mgr-sys-metrics-collect.drawio.svg "telegraf metrics collection flow")

**Telegraf Inputs:**

| Input   | Source                | Interval   | Fields                                                           |
| ------- | --------------------- | ---------- | ---------------------------------------------------------------- |
| `cpu`   | `/proc/stat`          | 1s         | usage_user, usage_system, usage_idle (per core + total)          |
| `mem`   | `/proc/meminfo`       | 1s         | total, used, available, used_percent                             |
| `temp`  | `/sys/class/thermal/` | 1s         | temperature (per sensor, filtered to coretemp)                   |
| `exec`  | `read_cpu_freq.sh`    | 10s        | cpu_freq_mhz (per core)                                          |
| `execd` | `qmassa_reader.py`    | continuous | gpu\_\* (engine usage, frequency, power)                         |
| `execd` | `npu_reader.py`       | 1s         | npu_power, npu_frequency, npu_temperature, npu_utilization, etc. |

**Telegraf Output:**

- Listens on `:9273` and exposes all metrics in Prometheus text format
- Metrics include a `host=` tag (hostname or `METRICS_MANAGER_HOSTNAME` override)

---

## SSE Streaming

```
SSE Client (browser or script)
  │
  └── GET http://localhost:9090/metrics/stream
        │
        ▼  each client runs its own independent event_stream()
        │  coroutine — there is no shared queue or subscriber state
        │
  sse.py: event_stream()
  │
  ├── Loop: every PROMETHEUS_POLLER_INTERVAL_MS (default 500ms)
  │   └── HTTP GET http://localhost:9273/metrics
  │       └── Parse Prometheus text format
  │           └── Convert to flat metric list
  │               └── yield: data: {"timestamp": ..., "metrics": [...]}
  │
  └── Each event contains all metrics available at that moment
     (system metrics from Telegraf + persisted custom metrics)
```

**Event Format:**

```json
{
  "timestamp": 1777461975860,
  "metrics": [
    {
      "name": "cpu_usage_user",
      "labels": { "cpu": "cpu-total", "host": "myhost" },
      "value": 0.14,
      "timestamp": 1777463430000
    },
    {
      "name": "memory_used_mb",
      "labels": { "host": "myhost" },
      "value": 2048.5,
      "timestamp": 1777463430000
    }
  ]
}
```

**Content Negotiation:**

- Browser request (`Accept: text/html`) → served HTML page with in-place updated metrics table
- SSE request (`Accept: text/event-stream`) → raw event stream
- No `Accept` header → defaults to event stream for backwards compatibility

**Connection Model:**

- Each client connection is independent
- No shared queue or subscriber state
- If a client connects after a metric is published, that metric appears in the next polling cycle (~500ms)
- If a client disconnects, no cleanup needed (connection closed immediately)

---

## Metric Lifecycle

```
1. Metric arrives (via API or custom script)
   │
   ▼
2. Parsed and validated (Pydantic)
   │
   ▼
3. Converted to Metric object with tags
   │
   ▼
4. Wrapped in StoredMetric(metric, expires_at=now + METRICS_RETENTION_SECONDS)
   │
   ▼
5. Added to MetricsStore._metrics[name] (list per metric name)
   │
   ▼
6. Memory check: if > MAX_METRICS_IN_MEMORY → _evict_oldest()
   │
   ▼
7. Debounced persistence (default 100ms):
   │   ├─ If >= 100ms since last push → persist immediately
   │   │   └── HTTP POST to Telegraf :8186/write (fire-and-forget)
   │   │       └── InfluxDB Line Protocol format
   │   │       └── Error callback logs failures (no retry yet)
   │   └─ Otherwise → asyncio.Task(_delayed_persist) after sleep
   │       └── Checks _pending_persist inside lock (no race)
   │       └── Makes one HTTP POST with accumulated metrics
   │
   ▼
8. Metric visible in Prometheus endpoint (:9273/metrics)
   │   └── Available to SSE clients in next polling cycle (~500ms)
   │
   ▼
9. After METRICS_RETENTION_SECONDS (default 300s):
   │   └── Metric expires and is removed on next store access
   │
   ▼
10. Cleanup: _cleanup_expired() runs on every store access
    └── Removes all metrics where expires_at < now
```

---

## Key Classes and Architecture

```
main.py (FastAPI application)
  │
  ├── lifespan.startup
  │   └── MetricsStore.get_instance() → singleton initialization
  │
  ├── middleware stack
  │   ├── CorrelationIdMiddleware (X-Correlation-ID headers)
  │   ├── GZipMiddleware (compress responses >1KB)
  │   ├── RateLimitMiddleware (token bucket per IP)
  │   └── CORSMiddleware (configurable origins)
  │
  ├── routes.py (REST endpoints)
  │   ├── POST /api/v1/metrics/*
  │   ├── GET /api/v1/metrics
  │   ├── GET /health
  │   └── etc.
  │
  └── sse.py (SSE endpoint)
      └── GET /metrics/stream
          └── event_stream() — polls :9273 per client
```

---

## Data Formats

### InfluxDB Line Protocol (Internal Persistence Format)

```
measurement[,tag1=val1,tag2=val2] field1=val1[,field2=val2] [timestamp]
```

**Example:**

```
cpu_usage,host=myhost,cpu=cpu0 usage=45.2 1704067200000000000
memory,host=myhost used_percent=67.5 1704067200000000000
```

Used for:

- Telegraf input/output
- Custom metrics persistence to Telegraf
- `/api/v1/metrics/influx` endpoint input

### Prometheus Text Format (Query Output)

```
metric_name{label1="value1",label2="value2"} value
metric_name{label1="value1",label2="value2"} value
```

**Example:**

```
cpu_usage_user{cpu="cpu-total",host="myhost"} 45.2
memory_used_mb{host="myhost"} 2048.5
```

Used for:

- `:9273/metrics` endpoint (scraped by SSE clients)
- `/metrics` endpoint (custom metrics only)

### JSON Batch Format (REST API Input)

```text
{
  "metrics": [
    {
      "name": "metric_name",
      "fields": {"field1": value1, "field2": value2},
      "tags": {"tag1": "val1", "tag2": "val2"},
      "timestamp": 1704067200000000000,
      "metric_type": "gauge"
    }
  ]
}
```

---

## Performance Characteristics

| Scenario                     | Performance                                                   |
| ---------------------------- | ------------------------------------------------------------- |
| Ingest 1000 metrics/sec      | ~5% CPU, <100ms p99 latency (debounced persistence)           |
| Store 100k metrics in memory | ~50 MB RAM                                                    |
| SSE broadcast to 100 clients | ~50 ms per polling cycle (one Telegraf fetch per 100 clients) |
| Custom metric appears in SSE | ~600ms worst-case (100ms debounce + 500ms polling interval)   |

---

## Error Handling

- **Invalid metric format**: Returns `422 Unprocessable Entity` with validation error
- **Memory limit exceeded**: Oldest metrics are evicted silently
- **Telegraf :8186 unreachable**: HTTP POST fails, error logged, metrics still stored locally
- **Expired metrics**: Cleaned up silently on next store access
- **Rate limit exceeded**: Returns `429 Too Many Requests`
- **Graceful shutdown**: Completes in-flight requests, closes aiohttp session, logs uptime

## Supporting Resources

- [API Reference](./api-reference.md)
- [Architecture Overview](./index.md)
- [Environment Variables](./get-started/environment-variables.md)
- [Troubleshooting](./troubleshooting.md)

## License

Copyright (C) 2025-2026 Intel Corporation

SPDX-License-Identifier: Apache-2.0