API Reference#
Base URL: http://127.0.0.1:8011 (default).
All endpoints return JSON unless noted. The speech endpoint also sets the
X-Session-ID response header on wav responses; clients that want to
correlate uploads with persisted storage should read it.
GET /health#
Liveness probe.
Response:
{ "status": "ok" }
GET /v1/audio/voices#
Returns the configured model metadata, supported speakers, and the currently supported language list.
Example:
curl --noproxy '*' http://127.0.0.1:8011/v1/audio/voices
POST /v1/audio/speech#
Synthesize speech from text.
JSON body fields:
Field |
Required |
Description |
|---|---|---|
|
Yes |
Required for OpenAI API compatibility; the configured service model is always used. |
|
Yes |
Text to synthesize. |
|
No |
Speaker name; defaults to the configured speaker. |
|
No |
Only |
|
No |
Optional speaking style guidance (where supported by the model). |
|
No |
|
Example — SpeechT5 (set models.tts.name to microsoft/speecht5_tts in
config.yaml):
status=$(
curl --noproxy '*' -sS \
-o speech.wav \
-w '%{http_code}' \
-X POST http://127.0.0.1:8011/v1/audio/speech \
-H 'Content-Type: application/json' \
-d '{
"model": "default",
"input": "The kiosk is ready for your next request.",
"response_format": "wav"
}'
)
if [ "$status" = "200" ]; then echo "Success: saved audio to speech.wav"; else echo "Failure: HTTP $status"; cat speech.wav; rm -f speech.wav; fi
Note: SpeechT5 accepts only the configured
voiceandlanguage. Passing other values, or anyinstructions, returns an OpenAI-style error.
Example — Qwen TTS (set models.tts.name to a Qwen model in config.yaml):
status=$(
curl --noproxy '*' -sS \
-o speech.wav \
-w '%{http_code}' \
-X POST http://127.0.0.1:8011/v1/audio/speech \
-H 'Content-Type: application/json' \
-d '{
"model": "default",
"input": "The kiosk is ready for your next request.",
"voice": "Ryan",
"language": "English",
"instructions": "Speak clearly and warmly.",
"response_format": "wav"
}'
)
if [ "$status" = "200" ]; then echo "Success: saved audio to speech.wav"; else echo "Failure: HTTP $status"; cat speech.wav; rm -f speech.wav; fi
Sessions#
When pipeline.persist_outputs is enabled, each wav response is
associated with a session_id returned in the X-Session-ID header. The
corresponding WAV and metadata are written under
storage/<session_id>/.
Supporting Resources#
Startup and deployment guides:
Configuration of ASR and sentiment backends: