API Reference#
Base URL (default): http://127.0.0.1:8012
Health Check#
GET /health
Response
{"status": "ok"}
List Input Devices#
GET /api/v1/devices
Returns audio input devices available on the server host. This is only relevant when calling the server-side microphone capture endpoint.
Response
{
"devices": [
{"index": 0, "name": "default", "channels": 2, "sample_rate": 44100},
{"index": 1, "name": "HDA Intel PCH: ALC256 Analog", "channels": 2, "sample_rate": 44100}
]
}
List Sessions#
GET /api/v1/sessions
Response
{
"sessions": [
{
"session_id": "3f1e4d2a-...",
"status": "completed",
"created_at": "2026-05-08T10:00:00.000Z",
...
}
]
}
Get Session#
GET /api/v1/sessions/{session_id}
Response — session snapshot object (see Session Snapshot).
Start Microphone Session#
POST /api/v1/sessions/start
Content-Type: application/json
Begins microphone capture on the server. Returns immediately; the session runs in the background.
Request Body
Field |
Type |
Default |
Description |
|---|---|---|---|
|
|
system default |
PortAudio input device index or name |
|
|
|
Capture sample rate in Hz ( |
|
|
|
Audio chunk length sent to audio-analyzer ( |
|
|
|
Silence after speech that ends the session ( |
|
|
|
Hard cap on session duration ( |
|
|
|
RMS threshold below which audio is silence ( |
|
|
|
Language code hint for ASR (e.g. |
|
|
|
ASR decoding temperature ( |
|
|
env default |
audio-analyzer transcription endpoint |
|
|
env default |
RAG query endpoint |
|
|
env default |
TTS speech endpoint |
|
|
|
Model name for TTS |
|
|
env default |
Voice name for TTS |
|
|
|
Language hint for TTS |
|
|
env default |
Style instructions for TTS |
Example
curl --noproxy '*' -X POST http://127.0.0.1:8012/api/v1/sessions/start \
-H 'Content-Type: application/json' \
-d '{
"language": "en",
"chunk_seconds": 4,
"silence_timeout_seconds": 1.5,
"max_session_seconds": 20,
"silence_threshold": 900
}'
Response — initial session snapshot with "status": "running".
Start File Session#
POST /api/v1/sessions/start-file
Content-Type: multipart/form-data
Feeds an uploaded audio file through the same chunking, ASR, RAG, and TTS pipeline as a session started through /api/v1/sessions/start. Useful for testing without capture hardware.
Form Fields
Accepts the same fields as Start Microphone Session plus:
Field |
Type |
Default |
Description |
|---|---|---|---|
|
binary |
required |
Audio file to process (WAV recommended) |
|
|
|
Playback speed multiplier for simulated real-time pacing ( |
Example
curl --noproxy '*' -X POST http://127.0.0.1:8012/api/v1/sessions/start-file \
-F "file=@/path/to/question.wav" \
-F "sample_rate=16000" \
-F "chunk_seconds=4" \
-F "silence_timeout_seconds=1.5" \
-F "max_session_seconds=20" \
-F "silence_threshold=900" \
-F "language=en" \
-F "temperature=0.0" \
-F "realtime_factor=10.0"
Response — initial session snapshot with "status": "running".
Stop Session#
POST /api/v1/sessions/{session_id}/stop
Requests an early stop of a running session.
Example
curl --noproxy '*' -X POST http://127.0.0.1:8012/api/v1/sessions/<session_id>/stop
Response
{
"session_id": "3f1e4d2a-...",
"status": "stopping",
"stop_requested_at": "2026-05-08T10:00:05.000Z"
}
Session Snapshot#
The session snapshot returned by Get Session, Start, and Start File has the following structure:
Field |
Type |
Description |
|---|---|---|
|
|
Unique session identifier |
|
|
|
|
|
|
|
|
ISO 8601 timestamp |
|
|
ISO 8601 timestamp |
|
|
ISO 8601 timestamp |
|
|
ISO 8601 timestamp |
|
|
Why the session ended ( |
|
|
Error message if status is |
|
|
Whether speech was detected |
|
|
Total seconds of audio captured |
|
|
Combined transcript of all processed chunks |
|
|
Same as |
|
|
Streamed RAG answer text |
|
|
TTS output clips; see below |
|
|
TTS error strings, if any |
tts_audio_segments item#
Field |
Type |
Description |
|---|---|---|
|
|
1-based sentence index within the session |
|
|
Sentence text that was synthesized |
|
|
Absolute path to the generated WAV file on the server. Accessible to the Gradio UI when both share the same |
Polling Pattern#
Start a session, then poll until status is "completed" or "failed":
# Start
SESSION=$(curl -s --noproxy '*' -X POST http://127.0.0.1:8012/api/v1/sessions/start-file \
-F "file=@question.wav" -F "realtime_factor=10.0" | python3 -c "import sys,json; print(json.load(sys.stdin)['session_id'])")
# Poll
while true; do
STATUS=$(curl -s --noproxy '*' http://127.0.0.1:8012/api/v1/sessions/$SESSION | python3 -c "import sys,json; d=json.load(sys.stdin); print(d['status'])")
echo "Status: $STATUS"
[[ "$STATUS" == "completed" || "$STATUS" == "failed" ]] && break
sleep 1
done
# Read result
curl -s --noproxy '*' http://127.0.0.1:8012/api/v1/sessions/$SESSION | python3 -m json.tool