API Reference#

Base URL: http://127.0.0.1:8011 (default).

All endpoints return JSON unless noted. The speech endpoint also sets the X-Session-ID response header on wav responses; clients that want to correlate uploads with persisted storage should read it.

GET /health#

Liveness probe.

Response:

{ "status": "ok" }

GET /v1/audio/voices#

Returns the configured model metadata, supported speakers, and the currently supported language list.

Example:

curl --noproxy '*' http://127.0.0.1:8011/v1/audio/voices

POST /v1/audio/speech#

Synthesize speech from text.

JSON body fields:

Field

Required

Description

model

Yes

Required for OpenAI API compatibility; the configured service model is always used.

input

Yes

Text to synthesize.

voice

No

Speaker name; defaults to the configured speaker.

language

No

Only English is currently accepted.

instructions

No

Optional speaking style guidance (where supported by the model).

response_format

No

wav (raw audio/wav) or json (metadata + base64-encoded WAV).

Example — SpeechT5 (set models.tts.name to microsoft/speecht5_tts in config.yaml):

status=$(
  curl --noproxy '*' -sS \
    -o speech.wav \
    -w '%{http_code}' \
    -X POST http://127.0.0.1:8011/v1/audio/speech \
    -H 'Content-Type: application/json' \
    -d '{
      "model": "default",
      "input": "The kiosk is ready for your next request.",
      "response_format": "wav"
    }'
)
if [ "$status" = "200" ]; then echo "Success: saved audio to speech.wav"; else echo "Failure: HTTP $status"; cat speech.wav; rm -f speech.wav; fi

Note: SpeechT5 accepts only the configured voice and language. Passing other values, or any instructions, returns an OpenAI-style error.

Example — Qwen TTS (set models.tts.name to a Qwen model in config.yaml):

status=$(
  curl --noproxy '*' -sS \
    -o speech.wav \
    -w '%{http_code}' \
    -X POST http://127.0.0.1:8011/v1/audio/speech \
    -H 'Content-Type: application/json' \
    -d '{
      "model": "default",
      "input": "The kiosk is ready for your next request.",
      "voice": "Ryan",
      "language": "English",
      "instructions": "Speak clearly and warmly.",
      "response_format": "wav"
    }'
)
if [ "$status" = "200" ]; then echo "Success: saved audio to speech.wav"; else echo "Failure: HTTP $status"; cat speech.wav; rm -f speech.wav; fi

Sessions#

When pipeline.persist_outputs is enabled, each wav response is associated with a session_id returned in the X-Session-ID header. The corresponding WAV and metadata are written under storage/<session_id>/.

Supporting Resources#