Get Started#
The Model Download is a microservice that downloads models from multiple hubs as follows: Hugging Face, Ollama, Geti™ software, and Ultralytics. It supports conversion to OpenVINO™ model server format for Hugging Face models, supports uploading custom model ZIP artifacts, and exposes a RESTful API for managing model downloads, uploads, and conversions.
Note: Model Download replaces Model Registry, which will be deprecated soon. See Migrate from Model Registry to Model Download for the migration guidelines.
Features#
Downloads models from Hugging Face, Ollama, Geti software, and Ultralytics model hubs
Converts Hugging Face models to OpenVINO model server format
Supports multiple model precisions (INT4, INT8, FP16, and FP32)
Supports various device targets (CPU, GPU, and NPU)
OpenVINO plugin supports NPU model conversion exclusively in INT4 precision.
Models supported for health AI suites(AI-ECG, rPPG and 3D Pose) with HLS plugin.
Supports parallel download
Supports configurable model caching
Supports custom model upload through
POST /models/uploadExposes a REST API with OpenAPI documentation
Prerequisites#
(Optional) Hugging Face API token, required for gated Hugging Face models or conversion.
Sufficient disk space for model storage.
Quick Start with Setup Script#
1. Clone the repository#
# Clone the latest on the mainline
git clone https://github.com/open-edge-platform/edge-ai-libraries.git edge-ai-libraries
# Alternatively, clone a specific release branch
git clone https://github.com/open-edge-platform/edge-ai-libraries.git edge-ai-libraries -b <release-tag>
3. Configure the environment variables#
export REGISTRY="intel/"
export TAG=latest
export HUGGINGFACEHUB_API_TOKEN=<your-huggingface-token>
To use the Geti™ plugin, set these variables:
export GETI_WORKSPACE_ID=<YOUR_GETI_WORKSPACE_ID>
export GETI_HOST=<GETI_HOST_ADDRESS>
export GETI_TOKEN=<GETI_ACCESS_TOKEN>
export GETI_SERVER_API_VERSION=v1
export GETI_SERVER_SSL_VERIFY=False # Default is FALSE
Note: For Geti™ software setup instructions, see the documentation here.
4. Launch the service and enable the plugins#
source scripts/run_service.sh up --plugins all --model-path <host path>
Note: For public models, no token is needed. Set the Hugging Face token via the
HUGGINGFACEHUB_API_TOKENenvironment variable to download GATED models and for conversion to OpenVINO IR format.
Note: Ensure the host path does not require privileged access for directory creation. Intel recommends using
$PWD/host_pathor a similar location within your work directory.
The run_service.sh script is a Docker Compose wrapper that builds and manages the model download service container with configurable plugins, model paths, and deployment options.
Options available with the script:
__Actions__:
```text
up Start the services (default)
down Stop the services
```
__Options__:
| Option | Description |
|--------------------------|--------------------------------------------------------------------------------------------------|
| `--build` | Builds the Docker image before running |
| `--rebuild` | This flag instructs to ignore any existing cached images, and rebuild them from scratch using the Dockerfile definitions|
| `--model-path <path>` | Sets the custom model path (default: `$HOME/models/`) |
| `--plugins <list>` | Comma-separated list of plugins to enable (e.g., `huggingface,ollama,openvino,ultralytics,hls or geti`) or `all` to enable all available plugins |
| `--help` | Shows this help message |
source scripts/run_service.sh [options] [action]
Actions:
up Start the services (default)
down Stop the services
Options:
Option |
Description |
|---|---|
|
Builds the Docker image before running |
|
This flag instructs to ignore any existing cached images, and rebuild them from scratch using the Dockerfile definitions |
|
Sets the custom model path (default: |
|
Comma-separated list of plugins to enable (e.g., |
|
Shows this help message |
Examples:
Start the service with default settings:
source scripts/run_service.sh upStop the service:
source scripts/run_service.sh downEnable specific plugins:
source scripts/run_service.sh up --plugins huggingfaceEnable multiple plugins:
source scripts/run_service.sh up --plugins huggingface,ollama,ultralytics,getiUse a custom model storage:
source scripts/run_service.sh up --model-path /data/my-modelsProduction deployment with all plugins:
source scripts/run_service.sh up --plugins all --model-path tmp/modelsDisplay usage information:
source scripts/run_service.sh --help
5. Access the service#
The service will be available at
http://<host-ip>:8200/api/v1/docs, where you can view the Swagger documentation for the available APIs.
Verification#
Ensure that the application is running by checking the Docker container status:
docker psAccess the application dashboard and verify that it is functioning as expected.
Sample usage with CURL Command#
Download a Hugging Face model:
curl -X POST "http://<host-ip>:8200/api/v1/models/download?download_path=hf_model" \
-H "Content-Type: application/json" \
-d '{
"models": [
{
"name": "microsoft/Phi-3.5-mini-instruct",
"hub": "huggingface",
"type": "llm"
}
],
"parallel_downloads": false
}'
Download an Ollama model:
curl -X POST "http://<host-ip>:8200/api/v1/models/download?download_path=ollama_model" \
-H "Content-Type: application/json" \
-d '{
"models": [
{
"name": "tinyllama",
"hub": "ollama",
"type": "llm"
}
],
"parallel_downloads": false
}'
Download a YOLO vision model from Ultralytics:
curl -X POST "http://<host-ip>:8200/api/v1/models/download?download_path=yolo_model" \
-H "Content-Type: application/json" \
-d '{
"models": [
{
"name": "yolov8s",
"hub": "ultralytics",
"type": "vision"
}
],
"parallel_downloads": true
}'
Note: YOLO vision models from Ultralytics model hub will be downloaded and converted to the OpenVINO IR format with FP32 and FP16 precision by default.
Download an Ultralytics model with INT8 quantization:
curl -X POST "http://<host-ip>:8200/api/v1/models/download?download_path=yolo_int8" \
-H "Content-Type: application/json" \
-d '{
"models": [
{
"name": "yolov8n",
"hub": "ultralytics",
"type": "vision",
"config": {
"quantize": "coco128"
}
}
],
"parallel_downloads": false
}'
Note: INT8 behavior for Ultralytics requests:
Set
config.quantizeto request INT8 export.INT8 requests only support a single model name per request. Requests using comma-separated model names,
all, oryolo_allwithquantizeare rejected.If INT8 is requested but no INT8 artifact is produced, the request fails and partial artifacts are cleaned up.
Due to a limitation in the DL Streamer public model download script, requesting INT8 also downloads other supported precision artifacts for the model if present like FP32, FP16.
Currently available datasets are coco, coco8 and coco128.
NOTE: coco is a very large dataset of over 20GB and containing more than a 100,000 images. Quantization on this dataset can take a very long time. For development purposes, it is recommended to use coco128 or coco8 instead, which is much lighter.
Download a Hugging Face model and convert it to OpenVINO IR format:
curl -X POST "http://<host-ip>:8200/api/v1/models/download?download_path=ovms_model" \
-H "Content-Type: application/json" \
-d '{
"models": [
{
"name": "BAAI/bge-reranker-base",
"hub": "openvino",
"type": "rerank",
"is_ovms": true,
"config": {
"precision": "fp32",
"device": "CPU",
"cache_size": 10
}
}
],
"parallel_downloads": false
}'
Example: Optimum CLI-aligned nested config
curl -X POST "http://<host-ip>:8200/api/v1/models/download?download_path=ovms_model" \
-H "Content-Type: application/json" \
-d '{
"models": [
{
"name": "Alibaba-NLP/gte-large-en-v1.5",
"hub":"openvino",
"type": "embeddings",
"is_ovms": true,
"config": {
"precision": "int8",
"device": "CPU",
"cache_size": 2,
"extra_quantization_params":"--library sentence_transformers"
}
}
],
"parallel_downloads": false
}'
NOTES
Need additional OpenVINO export knobs? Review the parameter matrix in the OpenVINO Model Server export guide and pass the corresponding fields through
config.Visual-language models automatically set
pipeline_typetoVLMfor type ‘VLM’.Unknown parameters keep their original spelling (underscores included) and are forwarded as
--<param_name>, so options such asreasoning_parser,tool_parseretc.Boolean flags are emitted only when they evaluate to true. Leave them unset or false to skip the corresponding CLI switch.
Hugging Face authentication is still required for OVMS exports; provide
HUGGINGFACEHUB_API_TOKEN(or pass the token via the API) before invoking these parameters.
Download models from GETI software, which are optimized through OpenVINO toolkit’s optimization tool:
curl -X POST 'http://<host-ip>:8200/api/v1/models/download?download_path=geti_folder' \
-H "Content-Type: application/json" \
-d '{
"models": [
{
"name": "yolox-tiny",
"hub": "geti",
"revision": "1",
"config":{
"precision": "fp32"
}
}
],
"parallel_downloads": true
}'
Note: The default precision is FP16.
Download fixed HLS models (3D pose, rPPG, AI-ECG):
curl -X POST "http://<host-ip>:8200/api/v1/models/download?download_path=hls_assets" \
-H "Content-Type: application/json" \
-d '{
"models": [
{
"name": "human-pose-estimation-3d-0001",
"hub": "hls",
"type": "3d-pose"
}
],
"parallel_downloads": false
}'
Notes: Valid HLS types are
3d-pose,rppg, andai-ecg. The service downloads model artifacts only; demo videos must be fetched separately if needed.
Query Parameter:
download_path(string): Specify a local filesystem path for saving the downloaded model. If not provided, the model will be saved to the default location.
Response: Sample Response (when a download request is started):
{
"message": "Started processing 1 model(s)",
"job_ids": ["5f0d4eba-c79c-4d02-97a6-43c3d0168ca0"],
"status": "processing"
}
Each model-download request returns a job_id. To check the status of a download:
curl -X GET "http://<host-ip>:8200/api/v1/jobs/<job_id>"
Sample Response (when the job is completed):
{
"id": "5f0d4eba-c79c-4d02-97a6-43c3d0168ca0",
"operation_type": "download",
"model_name": "yolov8s",
"hub": "ultralytics",
"output_dir": "/opt/models/ultra_folder",
"status": "completed",
"start_time": "2025-10-27T08:24:23.510870",
"plugin_name": "ultralytics",
"model_type": "vision",
"plugin": "ultralytics",
"completion_time": "2025-10-27T08:30:14.443898",
"result": {
"model_name": "yolov8s",
"source": "ultralytics",
"download_path": "model/download/path",
"return_code": 0
}
}
Upload a custom model ZIP:
Use this endpoint when user (or another client app) needs to upload a local model directly to model-download.
The ZIP must contain at least one .xml and one .bin file.
curl -X POST "http://<host-ip>:8200/api/v1/models/upload" \
-F "file=@/path/to/my_model.zip" \
-F "model_name=my_custom_model" \
-F "provider=geti" \
-F "framework=openvino" \
-F "precision=FP16"
Upload storage path format:
/opt/models/custom_uploaded_models/{provider}/{framework}/{model_name}/[{precision}/]
On successful upload, the model is registered as a completed operation and is visible in:
curl -X GET "http://<host-ip>:8200/api/v1/models/results"
Sample Response (when the upload is completed):
{
"status": "success",
"message": "Model 'my_custom_model' uploaded successfully.",
"job_id": "a1b2c3d4-1234-5678-9abc-def012345678",
"model_name": "my_custom_model",
"model_path": "/opt/models/custom_uploaded_models/geti/openvino/my_custom_model/FP16"
}
For details, see the API reference.
Configuration#
You can configure the service through environment variables and Docker volumes:
Environment Variables:
HF_HUB_ENABLE_HF_TRANSFER: Enable Hugging Face transfer (default: 1)HUGGINGFACEHUB_API_TOKEN: Hugging Face token (only required for gated models or conversion)MAX_UPLOAD_SIZE_MB: Maximum allowed upload ZIP size in MB (default: 500)UPLOAD_CHUNK_SIZE_KB: Chunk size for streaming file uploads in KB (default: 8). Larger values improve throughput, smaller values reduce memory usage for concurrent uploads
Volumes:
~/models:/app/models: Persist downloaded models
Troubleshooting#
If you encounter any issues during the build or run process, check the Docker logs for errors:
docker logs <container-id>
Run Unit Tests#
To validate changes locally before deploying:
Set up virtual environment:
pip install uv
uv venv
source .venv/bin/activate
Install all optional dependencies:
uv sync --all-extras
Execute unit tests:
uv run pytest tests/unit -v
Use pytest tests/ --cov=src --cov-report=term if you also need coverage metrics. See docs/user-guide/running-tests.md for advanced filtering options and troubleshooting tips.
Best Practices#
Use parallel downloads with caution because they can consume significant resources.
Configure cache sizes based on available memory.
Select model precision according to your performance requirements.
Use appropriate model types and configurations for OpenVINO model server conversion.
For Ultralytics INT8 exports, submit one model per request and verify
config.quantizeis provided only when INT8 is intended.
Run in Kubernetes Cluster#
See Deploy with Helm Chart for details. Address the prerequisites mentioned on this page before deploying with Helm chart.
Learn More#
For alternative ways to set up the sample application, see: