API Reference#

Base URL: http://127.0.0.1:8011 (default).

All endpoints return JSON unless noted. The speech endpoint also sets the X-Session-ID response header on wav responses; clients that want to correlate uploads with persisted storage should read it.

`GET /health`#

Liveness probe.

Response:

{ "status": "ok" }

`GET /v1/audio/voices`#

Returns the configured model metadata, supported speakers, and the currently supported language list.

Example:

curl --noproxy '*' http://127.0.0.1:8011/v1/audio/voices

`POST /v1/audio/speech`#

Synthesize speech from text.

JSON body fields:

Field	Required	Description
`model`	Yes	Required for OpenAI API compatibility; the configured service model is always used.
`input`	Yes	Text to synthesize.
`voice`	No	Speaker name; defaults to the configured speaker.
`language`	No	Only `English` is currently accepted.
`instructions`	No	Optional speaking style guidance (where supported by the model).
`response_format`	No	`wav` (raw `audio/wav`) or `json` (metadata + base64-encoded WAV).

Example — SpeechT5 (set models.tts.name to microsoft/speecht5_tts in config.yaml):

status=$(
  curl --noproxy '*' -sS \
    -o speech.wav \
    -w '%{http_code}' \
    -X POST http://127.0.0.1:8011/v1/audio/speech \
    -H 'Content-Type: application/json' \
    -d '{
      "model": "default",
      "input": "The kiosk is ready for your next request.",
      "response_format": "wav"
    }'
)
if [ "$status" = "200" ]; then echo "Success: saved audio to speech.wav"; else echo "Failure: HTTP $status"; cat speech.wav; rm -f speech.wav; fi

Note: SpeechT5 accepts only the configured voice and language. Passing other values, or any instructions, returns an OpenAI-style error.

Example — Qwen TTS (set models.tts.name to a Qwen model in config.yaml):

status=$(
  curl --noproxy '*' -sS \
    -o speech.wav \
    -w '%{http_code}' \
    -X POST http://127.0.0.1:8011/v1/audio/speech \
    -H 'Content-Type: application/json' \
    -d '{
      "model": "default",
      "input": "The kiosk is ready for your next request.",
      "voice": "Ryan",
      "language": "English",
      "instructions": "Speak clearly and warmly.",
      "response_format": "wav"
    }'
)
if [ "$status" = "200" ]; then echo "Success: saved audio to speech.wav"; else echo "Failure: HTTP $status"; cat speech.wav; rm -f speech.wav; fi

Sessions#

When pipeline.persist_outputs is enabled, each wav response is associated with a session_id returned in the X-Session-ID header. The corresponding WAV and metadata are written under storage/<session_id>/.

Supporting Resources#

Startup and deployment guides:
Configuration of ASR and sentiment backends:
- Configuration Guide

API Reference#

GET /health#

GET /v1/audio/voices#

POST /v1/audio/speech#

Sessions#

Supporting Resources#

This Page

`GET /health`#

`GET /v1/audio/voices`#

`POST /v1/audio/speech`#