RAG Model Download#

This guide covers the optional Live Video Captioning RAG setup. These steps are not required for the base Live Video Captioning application.

What RAG needs#

RAG uses:

  • the base VLM model for Live Video Captioning in ov_models/,

  • an LLM model cache in llm_models/,

  • embedding service settings configured by scripts/setup_embeddings.sh.

Download the LLM model#

From the live-video-captioning directory:

./model_download_scripts/download_models.sh \
  --model Qwen/Qwen2.5-3B-Instruct \
  --type llm \
  --device CPU \
  --weight-format int8

The model is prepared under llm_models/.

For gated Hugging Face models, set a token first:

export HUGGINGFACEHUB_API_TOKEN=<your-huggingface-token>

Review Embedding Defaults#

The default embeddings and LLM settings are in:

scripts/setup_embeddings.sh

Update these values only if you want different models or devices:

EMBEDDING_MODEL_NAME=QwenText/qwen3-embedding-0.6b
EMBEDDING_DEVICE=CPU
LLM_DEVICE=CPU
LLM_MODEL_ID=Qwen/Qwen2.5-3B-Instruct

Enable RAG services#

After downloading the LLM model, follow Configure Embedding Creation with RAG to enable the compose profile and start the RAG services.