Release Notes: Live Video Captioning RAG#
Version 2026.1.0#
June 17, 2026
The Live Video Captioning RAG sample application combines caption ingestion, vector search, and LLM-based response generation into a Retrieval-Augmented Generation workflow. The sample application processes text captions generated from RTSP video streams through the Live Video Captioning application to deliver AI-powered chatbot responses based on text captioning context from video frames. The application leverages the following key features:
RAG-based Video Analysis: Generates embeddings from video captions and stores them in a vector database.
OpenVINO LLM Integration: Deploys LLM models efficiently using OpenVINO for response generation.
Interactive Chatbot Interface: A web-based dashboard for querying video content.
Docker Compose Deployment: Simplified deployment with containerized services.
REST API: Endpoints for embedding ingestion (
/api/embeddings) and chat queries (/api/chat).Multi-device Support: CPU and GPU device options for embedding and LLM inference.
Streaming Responses: Real-time chat responses with the retrieved frame references.
New
The initial release with core RAG capabilities.
Support for embedding and LLM models.
Streaming response rendering.
Inline frame preview with the caption context.
Deployment with the Docker Compose tool for the stack.
Known Issues
Limited Standalone Functionality: The sample application works with the Live Video Captioning sample application. Running the sample application standalone provides limited context until embeddings are manually added.
Workaround: Use the provided demo script (sample/demo_call_embedding.py) to test the standalone functionality.Platform Support: The sample application is not validated either on the Standalone or Developer Node versions of Edge Microvisor Toolkit.
For detailed instructions, see Get Started.