Configuration#
kiosk-core and kiosk-ui are configured through environment variables
(see Environment Variables).
The three model-hosting services (audio-analyzer, text-to-speech,
rag-service) are configured through YAML files that the kiosk pins
and mounts into the containers. The most common changes are the
model and the inference device.
Model Selection#
Each model-hosting service reads the model identifier from the same pinned config file used for device selection:
Service |
File |
Model fields |
|---|---|---|
|
|
|
|
|
|
|
|
Use Hugging Face IDs where the field name is hf_id. Models are
downloaded and exported on first start into the per-service models/
directory; subsequent starts reuse the cache.
Supported / validated models#
The kiosk ships with the following defaults. These are the models the stack has been validated with — they are the recommended starting point. The Devices column lists the supported inference devices for each:
Service |
Field |
Default (validated) |
Other examples |
Devices |
|---|---|---|---|---|
|
|
|
|
|
|
|
|
other SpeechBrain emotion-recognition models |
|
|
|
|
|
|
|
|
|
other OpenVINO-exportable instruct LLMs |
|
|
|
|
|
|
|
|
|
|
|
[!IMPORTANT] Changing models is at your own discretion. The defaults above are the only combinations validated with this stack. Configuring models, variants, devices, or precisions other than the defaults may negatively affect the functionality, accuracy, latency, or stability of the application. You are responsible for ensuring the configuration you choose is correct and works for your use case — make changes only if you understand the implications.
In particular:
Some models do not function properly at aggressive quantization. If a model produces garbled, empty, or low-quality output at
int4, switch that model’sweight_format/dtypetoint8orfp16.A model must be exportable to OpenVINO IR for the OpenVINO backend; not every Hugging Face model is supported.
Larger models increase first-run download/export time, memory use, and per-request latency, and may not fit on the selected device.
After any change, restart the affected service and verify it loads and responds correctly before relying on it.
Inference Device#
Each model-hosting service reads its device from a pinned config file:
Service |
File |
Fields |
|---|---|---|
|
|
|
|
|
|
|
|
The supported devices for each model are listed in the Supported / validated models table above.
Use uppercase device names (CPU, GPU). rag-service expects
them as quoted strings; audio-analyzer and text-to-speech unquoted.
After editing, restart the affected service and confirm OpenVINO picked the device:
docker compose up -d --build --force-recreate <service-name>
docker compose logs <service-name> | grep -i -E "device|compiling|GPU|CPU"
OpenVINO prints a Compiling model on <DEVICE> line on first load.
GPU execution is delegated to the OpenVINO backend used by each service. Whether a given model actually runs on GPU and how it performs depends on the OpenVINO version and operator coverage for that model.
Environment Variables#
kiosk-core has no config file. All settings are controlled through environment variables.
kiosk-core API (main:app)#
Variable |
Default |
Description |
|---|---|---|
|
|
audio-analyzer transcription endpoint |
|
|
RAG query endpoint |
|
|
TTS speech synthesis endpoint |
|
|
Model name sent to the TTS service |
|
(unset) |
Voice name sent to the TTS service |
|
|
Language sent to the TTS service |
|
(unset) |
Optional style instructions for TTS |
|
|
Default audio sample rate in Hz |
|
|
Length of each audio chunk sent to audio-analyzer |
|
|
Silence duration after speech that ends a session |
|
|
Hard cap on session duration |
|
|
RMS threshold below which audio is treated as silence |
|
|
PortAudio capture block size |
|
|
Audio buffered before speech starts |
|
|
HTTP client timeout for downstream calls |
Gradio UI (gradio_app.py)#
Variable |
Default |
Description |
|---|---|---|
|
|
Base URL of the kiosk-core API |
|
|
Passed to start-file sessions as |
|
|
Passed to start-file sessions as |
|
|
Passed to start-file sessions as |
|
|
HTTP client timeout in the UI |
|
|
How often the UI polls for session state updates |
Compose Defaults#
When running with the top-level docker-compose.yml, the defaults are wired to the internal Compose network:
KIOSK_CORE_ANALYZER_URL=http://audio-analyzer:8010/v1/audio/transcriptionsKIOSK_CORE_RAG_URL=http://rag-service:8020/api/v1/queryKIOSK_CORE_TTS_URL=http://text-to-speech:8011/v1/audio/speechKIOSK_CORE_UI_BASE_URL=http://kiosk-core:8012
Most deployments should leave these values unchanged. Override them only when kiosk-core or kiosk-ui must call services outside the local Compose stack.
Session Parameters#
Session parameters (chunk duration, silence threshold, etc.) can also be provided per-request in the POST body for /api/v1/sessions/start and /api/v1/sessions/start-file. Per-request values take precedence over the environment variable defaults.