Configuration#

Load Order#

The service loads configuration in this order:

config.yaml
Environment variables with the TEXT_TO_SPEECH__... prefix

The same config.yaml is used for both Docker and standalone runs. In Docker, config.yaml is bind-mounted into the container, so edits on the host take effect on docker compose restart.

Config File#

config.yaml: single source of truth for both standalone and container runs.

Environment Variables#

TEXT_TO_SPEECH_CONFIG_PATH: alternate base config file (advanced)
TEXT_TO_SPEECH_SERVER_HOST: host used by python main.py
TEXT_TO_SPEECH_SERVER_PORT: port used by python main.py

Targeted config overrides use the TEXT_TO_SPEECH__... prefix.

Example:

TEXT_TO_SPEECH__MODELS__TTS__DEVICE=GPU python main.py

Key Sections#

models.tts: model name, runtime, device, dtype, variant, speaker, English language default, cache settings
audio: output format and sample width
pipeline.persist_outputs: whether synthesized audio and metadata are written to storage

Common Values#

models.tts.runtime: openvino or pytorch
models.tts.device: CPU or GPU depending on model/runtime support
models.tts.dtype: int8, int4, fp16, fp32
models.tts.model_variant: custom_voice or voice_design for Qwen variants
models.tts.default_language: keep this at English; other languages are not currently supported by the service API
audio.output_format: typically wav

Linux iGPU / OpenVINO GPU#

To use the Intel iGPU on Linux:

Install the required Intel/OpenVINO host GPU runtime (e.g. intel-opencl-icd, level-zero) on the host machine.
Set models.tts.device: GPU for OpenVINO TTS.

This GPU path was validated on the Linux host setup. The container path uses an Intel OpenVINO runtime base image plus /dev/dri passthrough, but it still depends on the host having working Intel GPU support.