Release Notes: Audio Analyzer#

This page tracks releases of the Audio Analyzer microservice. The most recent release is listed first; older entries are preserved for history.

v1.4.0#

First release of the Audio Analyzer as a self-contained, OpenAI API-compatible speech-to-text microservice with optional voice sentiment analysis, built for edge deployment on Intel hardware.

New

  • OpenAI-compatible transcription API (POST /v1/audio/transcriptions) and a streaming NDJSON variant (/stream).

  • Multi-backend ASR: openai (PyTorch Whisper) and openvino (Intel-optimized); whispercpp planned for a follow-up release.

  • Full Whisper model family supported (tinylarge).

  • Optional voice sentiment analysis with session-level aggregation (openvino or pytorch provider).

  • FFmpeg-based preprocessing: chunking, silence detection, optional RNNoise denoising.

  • Session continuation via session_id (returned in X-Session-ID).

  • Health (/health) and ALSA device listing (/devices) endpoints.

Improved

  • OpenVINO CPU/GPU acceleration on Intel hardware; models warm-loaded once per process.

  • Layered config (config.yaml, env overrides via AUDIO_ANALYZER__...) and Docker Compose deployment on port 8010.

  • Container now runs as a non-root user (UID 1000).

Known issues

  • whispercpp backend is wired into configuration but not yet enabled at runtime.

  • The prompt form field is accepted for API compatibility but currently ignored.

  • Compatibility with the Video Search and Summarization sample application will be added in a subsequent release.

v1.3.1#

  • Released as part of release-2026.0.0.

  • Supported features based on the requirements of the Video Search and Summarization sample application. Refer to that sample’s release notes for details on this microservice at that version.