Configuration#
Load Order#
The service loads configuration in this order:
config.yamlEnvironment variables with the
TEXT_TO_SPEECH__...prefix
The same config.yaml is used for both Docker and standalone runs. In Docker, config.yaml is bind-mounted into the container, so edits on the host take effect on docker compose restart.
Config File#
config.yaml: single source of truth for both standalone and container runs.
Environment Variables#
TEXT_TO_SPEECH_CONFIG_PATH: alternate base config file (advanced)TEXT_TO_SPEECH_SERVER_HOST: host used bypython main.pyTEXT_TO_SPEECH_SERVER_PORT: port used bypython main.py
Targeted config overrides use the TEXT_TO_SPEECH__... prefix.
Example:
TEXT_TO_SPEECH__MODELS__TTS__DEVICE=GPU python main.py
Key Sections#
models.tts: model name, runtime, device, dtype, variant, speaker, English language default, cache settingsaudio: output format and sample widthpipeline.persist_outputs: whether synthesized audio and metadata are written to storage
Common Values#
models.tts.runtime:openvinoorpytorchmodels.tts.device:CPU,GPU, orNPUdepending on model/runtime supportmodels.tts.dtype:int8,int4,fp16,fp32models.tts.model_variant:custom_voiceorvoice_designfor Qwen variantsmodels.tts.default_language: keep this atEnglish; other languages are not currently supported by the service APIaudio.output_format: typicallywav
Linux iGPU / OpenVINO GPU#
To use the Intel iGPU on Linux:
Install the required Intel/OpenVINO host GPU runtime (e.g.
intel-opencl-icd,level-zero) on the host machine.Set
models.tts.device: GPUfor OpenVINO TTS.
This GPU path was validated on the Linux host setup. The container path
uses an Intel OpenVINO runtime base image plus /dev/dri passthrough, but
it still depends on the host having working Intel GPU support.