# API Reference Base URL: `http://127.0.0.1:8011` (default). All endpoints return JSON unless noted. The speech endpoint also sets the `X-Session-ID` response header on `wav` responses; clients that want to correlate uploads with persisted storage should read it. ## `GET /health` Liveness probe. Response: ```json { "status": "ok" } ``` ## `GET /v1/audio/voices` Returns the configured model metadata, supported speakers, and the currently supported language list. Example: ```bash curl --noproxy '*' http://127.0.0.1:8011/v1/audio/voices ``` ## `POST /v1/audio/speech` Synthesize speech from text. JSON body fields: | Field | Required | Description | | ----------------- | -------- | ----------------------------------------------------------------------------------- | | `model` | Yes | Required for OpenAI API compatibility; the configured service model is always used. | | `input` | Yes | Text to synthesize. | | `voice` | No | Speaker name; defaults to the configured speaker. | | `language` | No | Only `English` is currently accepted. | | `instructions` | No | Optional speaking style guidance (where supported by the model). | | `response_format` | No | `wav` (raw `audio/wav`) or `json` (metadata + base64-encoded WAV). | Example — SpeechT5 (set `models.tts.name` to `microsoft/speecht5_tts` in `config.yaml`): ```bash status=$( curl --noproxy '*' -sS \ -o speech.wav \ -w '%{http_code}' \ -X POST http://127.0.0.1:8011/v1/audio/speech \ -H 'Content-Type: application/json' \ -d '{ "model": "default", "input": "The kiosk is ready for your next request.", "response_format": "wav" }' ) if [ "$status" = "200" ]; then echo "Success: saved audio to speech.wav"; else echo "Failure: HTTP $status"; cat speech.wav; rm -f speech.wav; fi ``` > **Note:** SpeechT5 accepts only the configured `voice` and `language`. Passing > other values, or any `instructions`, returns an OpenAI-style error. Example — Qwen TTS (set `models.tts.name` to a Qwen model in `config.yaml`): ```bash status=$( curl --noproxy '*' -sS \ -o speech.wav \ -w '%{http_code}' \ -X POST http://127.0.0.1:8011/v1/audio/speech \ -H 'Content-Type: application/json' \ -d '{ "model": "default", "input": "The kiosk is ready for your next request.", "voice": "Ryan", "language": "English", "instructions": "Speak clearly and warmly.", "response_format": "wav" }' ) if [ "$status" = "200" ]; then echo "Success: saved audio to speech.wav"; else echo "Failure: HTTP $status"; cat speech.wav; rm -f speech.wav; fi ``` ## Sessions When `pipeline.persist_outputs` is enabled, each `wav` response is associated with a `session_id` returned in the `X-Session-ID` header. The corresponding WAV and metadata are written under `storage//`. ## Supporting Resources - Startup and deployment guides: - [Get Started](./get-started.md) - [Run with Docker](./get-started/run-container.md) - [Run on the Host](./get-started/run-standalone.md) - Configuration of ASR and sentiment backends: - [Configuration Guide](./get-started/configuration.md)