# Release Notes: Multimodal Embedding Serving This microservice supports features based on the requirements of Video Search and Summarization sample application, which uses this microservice. Refer to Video Search and Summarization [release notes](https://docs.openedgeplatform.intel.com/2026.1/edge-ai-libraries/video-search-and-summarization/release-notes.html) for release details of this microservice. ## Version 2026.1.0-rc1 **14 May 2026** **New** - **Batched sampled-frame video pipeline**: Video embedding requests (URL, base64, local file, RTSP) now process frames through a streaming batched-frame extraction path instead of extracting all frames upfront. Frame selection priority: `frame_indexes` > `extraction_fps` > `num_frames` > `frame_interval`. Set `num_frames: 0` to process all frames in a video. - **PyAV-based video decoder with shared memory transport**: Replaced `decord` with a custom `PyAV`-based decoder (`decoder.py`) featuring a shared memory pool (`SharedMemoryPool`) for zero-copy frame metadata transport between pipeline stages. Supports file, URL, bytes, and RTSP sources with keyframe and uniform sampling strategies. - **RTSP multi-stream ingestion**: Multiple parallel decoder instances for concurrent RTSP and file/bytes streams in a single request. - **Async OpenVINO inference with static shape compilation**: All model handlers now use `AsyncInferQueue`-based batched OV inference. GPU/iGPU models are compiled to a static batch shape at load time for higher hardware utilization; dynamic-size inputs are handled via padding or splitting. - **Parallel image pre-processing**: New `ParallelImagePreprocessor` applies thread-pool-based preprocessing in parallel while preserving batch order, decoupling preprocessing latency from inference. - **Inference metrics reporting**: All model handlers expose an optional `metrics_out=True` mode on `encode_image()` that returns timing and throughput metrics for both OV and native PyTorch execution paths. - **Configurable embedding pipeline via environment variables**: The following variables are now exposed and seeded by `setup.sh`. **Improvements** - **PyTorch fallback for all model handlers**: CLIP, SigLIP, MobileCLIP, BLIP2-Transformers, and CN-CLIP handlers transparently fall back to native PyTorch inference when OpenVINO is not configured, controlled via `EMBEDDING_USE_OV`. - Significant runtime memory reduction (up to 8–10×) and improved end-to-end throughput through the shared memory pipeline and async batched inference. - GPU deployment defaults are now automatically applied when `EMBEDDING_DEVICE=GPU`: `OV_PERFORMANCE_MODE=THROUGHPUT`, `INFER_BATCH_SIZE=64`, `VIDEO_FRAME_BATCH_SIZE=256`. - OpenVINO dependency bumped to `2026.1.0`; Intel GPU driver updated to `26.09.37435`. - Detailed logger format enriched with filename, function name, and line number for easier debugging. - `get-started.md` updated with full environment variable reference, preset configuration examples (GPU, high-throughput, memory-constrained, debug), and a performance tuning guide. **Breaking Changes** - `encode_image()` no longer accepts a pre-processed `torch.Tensor` as input; pass `PIL.Image` or `List[PIL.Image]` instead. - `decord` has been removed as a dependency; replace any direct `decord` usage with `PyAV` (`av` package). **Validated configuration** - Intel® Xeon® 5 + Intel® Arc™ B580 GPU, Intel® Core™ Ultra Processors (Series 2 and 3) - Vanilla Kubernetes Cluster ## Version 1.3.2 **20 March 2026** **New** - Support for Intel® Core™ Ultra Processors (Series 3) - Provided support for data and time based search queries **Validated configuration** - Intel® Xeon® 5 + Intel® Arc™ B580 GPU, Intel® Core™ Ultra Processors (Series 2 and 3) - Vanilla Kubernetes Cluster ## Previous releases - [Release notes 2025](./release-notes/release-notes-2025.md) :::{toctree} :hidden: Release Notes 2025 <./release-notes/release-notes-2025.md> :::