API Reference#
The backend is a FastAPI application that serves REST APIs, an SSE stream for captions/metadata (via MQTT), and WebSocket endpoints for metrics.
Interactive API docs#
When the stack is running, FastAPI provides OpenAPI/Swagger UI at:
http://localhost:4173/docs
(If you run the backend on a different host/port, adjust accordingly.)
REST Endpoints#
Models#
GET /api/vlm-models— List available VLM models discovered underov_models/GET /api/detection-models- List available object detection models discovered underov_detection_models/
Pipelines#
GET /api/pipelines— List available pipeline configurations
Runs#
POST /api/runs— Start a new captioning pipeline (publishes to MQTT)GET /api/runs— List active runsGET /api/runs/{run_id}— Get run details (includesmqttTopicfield)DELETE /api/runs/{run_id}— Stop a pipeline
Run Response Schema#
{
"runId": "string",
"pipelineId": "string",
"peerId": "string",
"mqttTopic": "live-video-captioning/{runId}",
"modelName": "string",
"pipelineName": "string",
"runName": "string",
"prompt": "string",
"maxTokens": 100,
"rtspUrl": "string"
}
Streaming Endpoints#
Server-Sent Events (SSE)#
GET /api/runs/metadata-stream— Multiplexed SSE stream for all active runs
The SSE stream provides real-time metadata received from the MQTT broker. Each message is an envelope containing:
{
"runId": "string",
"data": { /* pipeline inference result */ },
"received_at": 1705432800.123
}