# Live Video Captioning RAG

Live Video Captioning RAG sample application uses the Retrieval-Augmentation Generation technique, which transforms live video captions into a knowledge base. The sample application ingests captions from the Live Video Captioning sample application, generates semantic embeddings, and uses LLMs optimized through the OpenVINO™ toolkit to deliver AI-powered chatbot responses grounded in the video context. The sample application builds searchable caption embeddings and interacts with the video content through natural language queries.

## Key Features - **RAG-based Video Context**: Converts caption text from video frames into embeddings and store them in a vector database for semantic search and retrieval. - **OpenVINO toolkit-LLM Integration**: Deploys large language models efficiently on Intel® hardware for context-aware response generation. - **Interactive Chat Interface**: Web-based dashboard for querying video content with streaming responses and an inline preview of retrieved frames and captions. - **Multi-Model Support**: Configurable embedding models and LLM models with flexible model switching for different use cases and performance requirements. - **Multi-Device Support**: CPU and GPU device options for embedding generation and LLM inference, optimized for Intel® platforms. - **REST API Endpoints**: Programmatic access to embedding ingestion (`/api/embeddings`) and chat queries (`/api/chat`) for integration with external systems. - **Streaming Responses**: Real-time chat responses with full caption context and visual frame references for enhanced user understanding. - **Deployment through Docker Compose tool**: Containerized stack for simplified setup and deployment across different environments. ## Use Cases - **Video Content Search and Discovery**: Build searchable knowledge bases from surveillance, educational, or archival videos to find relevant scenes (or frames) and information quickly using natural language queries. - **Real-time Video Analytics with Q&A**: Monitor live video feeds with the ability to ask questions about the video content and receive answers grounded in actual video captions and context. - **Accessibility and Content Understanding**: Generate and query video captions to make the video content more accessible, and enable users to understand the video content without watching the full stream. - **Intelligent Security and Safety**: Deploy RAG-backed chatbots for security monitoring workflows to answer questions about events, activities, and anomalies detected in surveillance video streams. :::{toctree} :hidden: ./get-started.md ./how-it-works.md ./api-reference.md ./known-issues.md Release Notes <./release-notes.md> :::