Release Notes: Audio Analyzer#
This page tracks releases of the Audio Analyzer microservice. The most recent release is listed first; older entries are preserved for history.
v1.4.0#
First release of the Audio Analyzer as a self-contained, OpenAI API-compatible speech-to-text microservice with optional voice sentiment analysis, built for edge deployment on Intel hardware.
New
OpenAI-compatible transcription API (
POST /v1/audio/transcriptions) and a streaming NDJSON variant (/stream).Multi-backend ASR:
openai(PyTorch Whisper) andopenvino(Intel-optimized);whispercppplanned for a follow-up release.Full Whisper model family supported (
tiny→large).Optional voice sentiment analysis with session-level aggregation (
openvinoorpytorchprovider).FFmpeg-based preprocessing: chunking, silence detection, optional RNNoise denoising.
Session continuation via
session_id(returned inX-Session-ID).Health (
/health) and ALSA device listing (/devices) endpoints.
Improved
OpenVINO CPU/GPU acceleration on Intel hardware; models warm-loaded once per process.
Layered config (
config.yaml, env overrides viaAUDIO_ANALYZER__...) and Docker Compose deployment on port8010.Container now runs as a non-root user (UID 1000).
Known issues
whispercppbackend is wired into configuration but not yet enabled at runtime.The
promptform field is accepted for API compatibility but currently ignored.Compatibility with the Video Search and Summarization sample application will be added in a subsequent release.
v1.3.1#
Released as part of
release-2026.0.0.Supported features based on the requirements of the Video Search and Summarization sample application. Refer to that sample’s release notes for details on this microservice at that version.