RAG Model Download#
This guide covers the optional Live Video Captioning RAG setup. These steps are not required for the base Live Video Captioning application.
What RAG needs#
RAG uses:
the base VLM model for Live Video Captioning in
ov_models/,an LLM model cache in
llm_models/,embedding service settings configured by
scripts/setup_embeddings.sh.
Download the LLM model#
From the live-video-captioning directory:
./model_download_scripts/download_models.sh \
--model Qwen/Qwen2.5-3B-Instruct \
--type llm \
--device CPU \
--weight-format int8
The model is prepared under llm_models/.
For gated Hugging Face models, set a token first:
export HUGGINGFACEHUB_API_TOKEN=<your-huggingface-token>
Review Embedding Defaults#
The default embeddings and LLM settings are in:
scripts/setup_embeddings.sh
Update these values only if you want different models or devices:
EMBEDDING_MODEL_NAME=QwenText/qwen3-embedding-0.6b
EMBEDDING_DEVICE=CPU
LLM_DEVICE=CPU
LLM_MODEL_ID=Qwen/Qwen2.5-3B-Instruct
Enable RAG services#
After downloading the LLM model, follow Configure Embedding Creation with RAG to enable the compose profile and start the RAG services.