# Release Notes: Text To Speech

This page tracks releases of the Text To Speech microservice. The most
recent release is listed first; older entries are preserved for history.

## v1.0.0

Initial release of the Text To Speech microservice: an
OpenAI-API-compatible speech synthesis service with multi-runtime support
and selectable models, built for edge deployment on Intel hardware.

**New**

- OpenAI-compatible speech endpoint (`POST /v1/audio/speech`) returning
  either raw `audio/wav` or a JSON envelope with metadata and a
  base64-encoded WAV payload.
- Voice and model metadata endpoint (`GET /v1/audio/voices`) for client
  discovery of available speakers.
- Multi-runtime TTS backends: `openvino` (Intel-optimized) and `pytorch`.
- Supported models: SpeechT5 (`microsoft/speecht5_tts`) and Qwen3-TTS
  (`Qwen/Qwen3-TTS-12Hz-0.6B-CustomVoice`) with `custom_voice` and
  `voice_design` variants.
- Configurable device (`CPU`, `GPU`, `NPU`) and precision (`int8`, `int4`,
  `fp16`, `fp32`) where supported by the runtime/model.
- Optional persistence of synthesized output to `storage/<session_id>/`
  with `X-Session-ID` returned in the response headers.
- Health endpoint (`GET /health`) for readiness probes.
- Models are warm-loaded once per process and reused across requests to
  keep per-request synthesis latency low.
- OpenVINO acceleration on Intel CPUs, integrated/discrete GPUs, and NPUs.
- Single `config.yaml` shared by standalone and container runs, with env
  overrides via `TEXT_TO_SPEECH__...`.
- Docker Compose deployment exposing the API on port `8011`; standalone
  Python mode binds `127.0.0.1:8011` on the host.
- Container runs as a non-root user (UID 1000).

**Known issues**

- English-only synthesis. Requests with any other language are rejected
  with HTTP `400`.
- The `model` request field is accepted for OpenAI API compatibility but
  is ignored; the service always uses the model defined in `config.yaml`.
- For SpeechT5, the `voice` and `language` fields are accepted but
  ignored; the model uses a single fixed speaker embedding.
- Compatibility with the Video Search and Summarization sample
  application will be added in a subsequent release.