API Reference#
Base URL: http://127.0.0.1:8010 (default).
All endpoints return JSON unless noted. The transcription endpoints also set
the X-Session-ID response header; clients that want multi-upload sessions
should read it and pass it back as the session_id form field.
GET /health#
Liveness probe.
Response:
{"status": "ok"}
GET /devices#
Returns detected ALSA capture devices in hw:<card>,<device> format.
POST /v1/audio/transcriptions#
OpenAI-compatible transcription endpoint that returns a single response.
Form fields:
Field |
Required |
Description |
|---|---|---|
|
Yes |
Audio upload. |
|
No |
Accepted value is |
|
No |
Reuse to continue an existing session. |
|
No |
Language hint passed to the ASR backend. |
|
No |
Accepted but currently ignored. |
|
No |
One of |
|
No |
Decoding temperature. |
Example:
curl --noproxy '*' \
-F file=@question_store_hours.wav \
-F response_format=verbose_json \
http://127.0.0.1:8010/v1/audio/transcriptions
If session_id is omitted, the service creates one and returns it in
X-Session-ID. Reusing that value with another upload continues the same
session and appends transcript state.
POST /v1/audio/transcriptions/stream#
Streaming transcription endpoint that emits NDJSON events.
Form fields:
Field |
Required |
Description |
|---|---|---|
|
Yes |
Audio upload. |
|
No |
Reuse to continue an existing session. |
|
No |
Language hint. |
|
No |
Decoding temperature. |
Event types:
transcription.chunk— Emitted as each audio chunk is transcribed.transcription.completed— Emitted once, when the upload is fully processed.
Example:
curl --noproxy '*' \
-F file=@question_store_hours.wav \
http://127.0.0.1:8010/v1/audio/transcriptions/stream
Sessions#
A session is identified by session_id and corresponds to the directory
storage/<session_id>/. The same id can be reused across multiple uploads to
append transcript state and (when sentiment is enabled) update the
session-level sentiment summary.
Supporting Resources#
Startup and deployment guides:
Configuration of ASR and sentiment backends: