Release Notes: Live Video Captioning RAG#

Version 1.0.0#

April 1, 2026

The Live Video Captioning RAG sample application combines caption ingestion, vector search, and LLM-based response generation into a Retrieval-Augmented Generation workflow. The sample application processes text captioning generated from RTSP video streams through the Live Video Captioning application to deliver AI-powered chatbot responses based on text captioning context from video frames.

Key Features

  • RAG-based Video Analysis: Generates embeddings from video captions and store in vector database

  • OpenVINO LLM Integration: Deploys LLM models efficiently using OpenVINO for response generation

  • Interactive Chatbot Interface: Web-based dashboard for querying video content

  • Docker Compose Deployment: Simplified deployment with containerized services

  • REST API: Endpoints for embedding ingestion (/api/embeddings) and chat queries (/api/chat)

  • Multi-device Support: CPU and GPU device options for embedding and LLM inference

  • Streaming Responses: Real-time chat responses with retrieved frame references

New

  • Initial release with core RAG capabilities

  • Support for embedding and LLM models

  • Streaming response rendering

  • Inline frame preview with caption context

  • Deployment with the Docker Compose tool for the stack

Known Issues

  • Limited Standalone Functionality: The sample application works with the Live Video Captioning sample application. Running the sample application standalone provides limited context until embeddings are manually added. Workaround: Use the provided demo script (sample/demo_call_embedding.py) to test standalone functionality.

  • Platform Support: Intel does not validate the sample application on the EMT-S and EMT-D variants of the Edge Microvisor Toolkit.

For detailed instructions, see Get Started.