Configuration#

kiosk-core and kiosk-ui are configured through environment variables (see Environment Variables).

The three model-hosting services (audio-analyzer, text-to-speech, rag-service) are configured through YAML files that the kiosk pins and mounts into the containers. The most common changes are the model and the inference device.

Model Selection#

Each model-hosting service reads the model identifier from the same pinned config file used for device selection:

Service	File	Model fields
`audio-analyzer`	`configs/audio-analyzer/config.yaml`	`models.asr.name` (e.g. `whisper-tiny`, `whisper-base`); `sentiment.model` (optional)
`text-to-speech`	`configs/text-to-speech/config.yaml`	`models.tts.name` (e.g. `microsoft/speecht5_tts`, Qwen-TTS variant); `model_variant`
`rag-service`	`rag-service/config.yaml`	`models.llm.hf_id`, `models.embedding.hf_id`, `retrieval.reranker.hf_id`; per-model `weight_format` (`int4`, `int8`, `fp16`)

Use Hugging Face IDs where the field name is hf_id. Models are downloaded and exported on first start into the per-service models/ directory; subsequent starts reuse the cache.

Supported / validated models#

The kiosk ships with the following defaults. These are the models the stack has been validated with — they are the recommended starting point. The Devices column lists the supported inference devices for each:

Service	Field	Default (validated)	Other examples	Devices
`audio-analyzer` ASR	`models.asr.name`	`whisper-base`	`whisper-tiny`, `whisper-small`, `whisper-medium`, `whisper-large`	`CPU`, `GPU` (`GPU` requires `provider: openvino`)
`audio-analyzer` sentiment	`sentiment.model`	`speechbrain/emotion-recognition-wav2vec2-IEMOCAP`	other SpeechBrain emotion-recognition models	`CPU`, `GPU` (disabled by default)
`text-to-speech`	`models.tts.name`	`microsoft/speecht5_tts` (SpeechT5)	`Qwen/Qwen3-TTS-12Hz-0.6B-CustomVoice` (Qwen-TTS)	`CPU`, `GPU` (`int4` on iGPU produces noise; use `fp16` or `int8` on GPU)
`rag-service` LLM	`models.llm.hf_id`	`Qwen/Qwen3-4B-Instruct-2507`	other OpenVINO-exportable instruct LLMs	`CPU`, `GPU` (`GPU` recommended for acceptable latency)
`rag-service` embedding	`models.embedding.hf_id`	`BAAI/bge-large-en-v1.5`	`BAAI/bge-base-en-v1.5`, `BAAI/bge-small-en-v1.5`	`CPU`, `GPU` (`CPU` is usually fast enough)
`rag-service` reranker	`retrieval.reranker.hf_id`	`BAAI/bge-reranker-base`	`BAAI/bge-reranker-large`	`CPU`, `GPU` (optional)

[!IMPORTANT] Changing models is at your own discretion. The defaults above are the only combinations validated with this stack. Configuring models, variants, devices, or precisions other than the defaults may negatively affect the functionality, accuracy, latency, or stability of the application. You are responsible for ensuring the configuration you choose is correct and works for your use case — make changes only if you understand the implications.

In particular:

Some models do not function properly at aggressive quantization. If a model produces garbled, empty, or low-quality output at int4, switch that model’s weight_format/dtype to int8 or fp16.

A model must be exportable to OpenVINO IR for the OpenVINO backend; not every Hugging Face model is supported.

Larger models increase first-run download/export time, memory use, and per-request latency, and may not fit on the selected device.

After any change, restart the affected service and verify it loads and responds correctly before relying on it.

Inference Device#

Each model-hosting service reads its device from a pinned config file:

Service	File	Fields
`audio-analyzer`	`configs/audio-analyzer/config.yaml`	`models.asr.device`, `sentiment.device`
`text-to-speech`	`configs/text-to-speech/config.yaml`	`models.tts.device`
`rag-service`	`rag-service/config.yaml`	`models.llm.device`, `models.embedding.device`, `retrieval.reranker.device`

The supported devices for each model are listed in the Supported / validated models table above.

Use uppercase device names (CPU, GPU). rag-service expects them as quoted strings; audio-analyzer and text-to-speech unquoted.

After editing, restart the affected service and confirm OpenVINO picked the device:

docker compose up -d --build --force-recreate <service-name>
docker compose logs <service-name> | grep -i -E "device|compiling|GPU|CPU"

OpenVINO prints a Compiling model on <DEVICE> line on first load.

GPU execution is delegated to the OpenVINO backend used by each service. Whether a given model actually runs on GPU and how it performs depends on the OpenVINO version and operator coverage for that model.

Environment Variables#

kiosk-core has no config file. All settings are controlled through environment variables.

kiosk-core API (`main:app`)#

Variable	Default	Description
`KIOSK_CORE_ANALYZER_URL`	`http://127.0.0.1:8010/v1/audio/transcriptions`	audio-analyzer transcription endpoint
`KIOSK_CORE_RAG_URL`	`http://127.0.0.1:8020/api/v1/query`	RAG query endpoint
`KIOSK_CORE_TTS_URL`	`http://127.0.0.1:8011/v1/audio/speech`	TTS speech synthesis endpoint
`KIOSK_CORE_TTS_MODEL`	`qwen-tts`	Model name sent to the TTS service
`KIOSK_CORE_TTS_VOICE`	(unset)	Voice name sent to the TTS service
`KIOSK_CORE_TTS_LANGUAGE`	`English`	Language sent to the TTS service
`KIOSK_CORE_TTS_INSTRUCTIONS`	(unset)	Optional style instructions for TTS
`KIOSK_CORE_SAMPLE_RATE`	`16000`	Default audio sample rate in Hz
`KIOSK_CORE_CHUNK_SECONDS`	`4.0`	Length of each audio chunk sent to audio-analyzer
`KIOSK_CORE_SILENCE_TIMEOUT_SECONDS`	`1.5`	Silence duration after speech that ends a session
`KIOSK_CORE_MAX_SESSION_SECONDS`	`20.0`	Hard cap on session duration
`KIOSK_CORE_SILENCE_THRESHOLD`	`900`	RMS threshold below which audio is treated as silence
`KIOSK_CORE_BLOCK_DURATION_SECONDS`	`0.1`	PortAudio capture block size
`KIOSK_CORE_PREROLL_SECONDS`	`0.3`	Audio buffered before speech starts
`KIOSK_CORE_HTTP_TIMEOUT_SECONDS`	`120.0`	HTTP client timeout for downstream calls

Gradio UI (`gradio_app.py`)#

Variable	Default	Description
`KIOSK_CORE_UI_BASE_URL`	`http://127.0.0.1:8012`	Base URL of the kiosk-core API
`KIOSK_CORE_UI_ANALYZER_URL`	`http://127.0.0.1:8010/v1/audio/transcriptions`	Passed to start-file sessions as `analyzer_url`
`KIOSK_CORE_UI_RAG_URL`	`http://127.0.0.1:8020/api/v1/query`	Passed to start-file sessions as `rag_url`
`KIOSK_CORE_UI_TTS_URL`	`http://127.0.0.1:8011/v1/audio/speech`	Passed to start-file sessions as `tts_url`
`KIOSK_CORE_UI_TIMEOUT_SECONDS`	`120.0`	HTTP client timeout in the UI
`KIOSK_CORE_UI_POLL_INTERVAL_SECONDS`	`0.35`	How often the UI polls for session state updates

Compose Defaults#

When running with the top-level docker-compose.yml, the defaults are wired to the internal Compose network:

KIOSK_CORE_ANALYZER_URL=http://audio-analyzer:8010/v1/audio/transcriptions
KIOSK_CORE_RAG_URL=http://rag-service:8020/api/v1/query
KIOSK_CORE_TTS_URL=http://text-to-speech:8011/v1/audio/speech
KIOSK_CORE_UI_BASE_URL=http://kiosk-core:8012

Most deployments should leave these values unchanged. Override them only when kiosk-core or kiosk-ui must call services outside the local Compose stack.

Session Parameters#

Session parameters (chunk duration, silence threshold, etc.) can also be provided per-request in the POST body for /api/v1/sessions/start and /api/v1/sessions/start-file. Per-request values take precedence over the environment variable defaults.