Release Notes: Text To Speech#
This page tracks releases of the Text To Speech microservice. The most recent release is listed first; older entries are preserved for history.
v1.0.0#
Initial release of the Text To Speech microservice: an OpenAI-API-compatible speech synthesis service with multi-runtime support and selectable models, built for edge deployment on Intel hardware.
New
OpenAI-compatible speech endpoint (
POST /v1/audio/speech) returning either rawaudio/wavor a JSON envelope with metadata and a base64-encoded WAV payload.Voice and model metadata endpoint (
GET /v1/audio/voices) for client discovery of available speakers.Multi-runtime TTS backends:
openvino(Intel-optimized) andpytorch.Supported models: SpeechT5 (
microsoft/speecht5_tts) and Qwen3-TTS (Qwen/Qwen3-TTS-12Hz-0.6B-CustomVoice) withcustom_voiceandvoice_designvariants.Configurable device (
CPU,GPU,NPU) and precision (int8,int4,fp16,fp32) where supported by the runtime/model.Optional persistence of synthesized output to
storage/<session_id>/withX-Session-IDreturned in the response headers.Health endpoint (
GET /health) for readiness probes.Models are warm-loaded once per process and reused across requests to keep per-request synthesis latency low.
OpenVINO acceleration on Intel CPUs, integrated/discrete GPUs, and NPUs.
Single
config.yamlshared by standalone and container runs, with env overrides viaTEXT_TO_SPEECH__....Docker Compose deployment exposing the API on port
8011; standalone Python mode binds127.0.0.1:8011on the host.Container runs as a non-root user (UID 1000).
Known issues
English-only synthesis. Requests with any other language are rejected with HTTP
400.The
modelrequest field is accepted for OpenAI API compatibility but is ignored; the service always uses the model defined inconfig.yaml.For SpeechT5, the
voiceandlanguagefields are accepted but ignored; the model uses a single fixed speaker embedding.Compatibility with the Video Search and Summarization sample application will be added in a subsequent release.