Model Preparation#
To run this sample application, a Vision-Language Model (VLM) is required. If you wish to enable the detection pipeline, you will also need a YOLO vision model. Model preparation is handled using the model-download microservice from the open-edge-platform/edge-ai-libraries. Follow the steps below to download and convert the required models:
Clone the repository:
Open a new terminal, clone the edge-ai-libraries repository.
# Clone the latest on the mainline git clone https://github.com/open-edge-platform/edge-ai-libraries.git edge-ai-libraries # Alternatively, clone a specific release branch git clone https://github.com/open-edge-platform/edge-ai-libraries.git edge-ai-libraries -b <release-tag>
Navigate to the directory:
cd edge-ai-libraries/microservices/model-download
Configure the environment variables:
export REGISTRY="intel/" export TAG=latest export HUGGINGFACEHUB_API_TOKEN=<your-huggingface-token>
Launch the service with required plugins:
export MODEL_PATH=<path-to-directory-for-models-to-be-stored> # Example paths: # - ~/edge-ai-suites/metro-ai-suite/live-video-analysis/live-video-captioning (for live-video-captioning and with rag) # - ~/edge-ai-suites/metro-ai-suite/live-video-analysis/live-video-captioning-rag (for live-video-captioning only deployment) # Run the script to launch the service source scripts/run_service.sh --plugins openvino,ultralytics --model-path $MODEL_PATH
Download and convert the models:
Navigate to
live-video-analysis/live-video-captioningand use the provided script to download and convert the required models:cd edge-ai-suites/metro-ai-suite/live-video-analysis/live-video-captioning # export MODEL_PATH with the same directory that exported in previous step. export MODEL_PATH=<path-to-directory-for-models-to-be-stored> # Parameters: # model_name: specify the model identifier from Hugging Face # model_type: choose from vlm, vision, or llm # model_quantization: select int4, int8, or fp16 ./model_download_scripts/download_models.sh --model <model_name> --type <model_type> --weight-format <model_quantization>
Examples:
For a VLM model (required for live-video-captioning):
./model_download_scripts/download_models.sh --model OpenGVLab/InternVL2-1B --type vlm --weight-format int8
For a YOLO vision model (for live-video-captioning with object-detection pipeline):
./model_download_scripts/download_models.sh --model yolov8s --type vision
For a LLM model (for live-video-captioning with RAG):
./model_download_scripts/download_models.sh --model microsoft/Phi-3.5-mini-instruct --type llm --device <CPU/GPU> --weight-format int8
For more detailed information about the scripts:
./model_download_scripts/download_models.sh -h
The script will download and convert the models to OpenVINO IR format and store them in the respective directories:
VLM models →
ov_models/Vision detection models →
ov_detection_models/LLM models →
llm_models/
Stop the Model Download service:
The Model Download service handles the downloading and conversion of models needed for the Live Video Captioning and Live Video Captioniong RAG sample applications. The Model Download service functions independently and is not tied to the operations of the Live Video Captioning and Live Video Captioniong RAG sample applications. You can stop or terminate the service once the required models have been prepared.