Model Download#

The Model Download microservice is a centralized model management system that downloads AI or machine learning models from various model hubs while ensuring consistency and simplicity across applications, stores the models, and handles optional format conversions.

Architecture#

The following figure shows the high-level architecture of Model Download, which includes its core components and their interactions with external systems:

Architecture

Components#

The following are the core components of the plugin-based microservice architecture:

Core Components#

  1. FastAPI Service Layer

    • Description: The FastAPI Service Layer is the primary entry point for client interactions. It exposes a RESTful API for downloading, converting, and managing models.

    • Functions:

      • Provides RESTful API endpoints for service operations.

      • Handles incoming request validation, serialization, and routes to the appropriate components.

      • Generates and serves OpenAPI (Swagger suite) documentation for clear, interactive API specifications.

  2. Model Manager

    • Description: The Model Manager is the central orchestration component that directs model download and conversion processes. It coordinates actions between the API layer and the plugin system.

    • Functions:

      • Orchestrates end-to-end model download and conversion workflows.

      • Manages model storage, which includes organizing file paths and handling caching.

      • Interfaces with the Plugin Registry to delegate tasks to the appropriate plugins.

  3. Plugin Registry

    • Description: The Plugin Registry discovers, registers, and manages available plugins. It can extend the service’s capabilities without modifying the core application logic.

    • Functions:

      • Dynamically discovers and registers plugins at startup.

      • Manages the lifecycle of each plugin.

      • Provides a consistent abstraction layer that decouples the Model Manager from concrete plugin implementations.

Plugin System#

The Plugin System extends the service’s functionality by handling interactions with different model sources and conversion tasks.

Model Hub Plugins:

  • HuggingFace Hub Plugin: Downloads models from the Hugging Face hub, including handling authentication for private or gated models.

  • Ollama Hub Plugin: Interfaces with Ollama tool to pull and manage models from the Ollama model library.

  • Ultralytics Hub Plugin: Downloads computer vision models, such as YOLO, from the Ultralytics framework.

  • Geti™ Plugin: Downloads models optimized through the Geti™ platform.

Conversion Plugins:

  • OpenVINO™ Model Conversion Plugin: Converts downloaded models, for example, from Hugging Face model hub into the OpenVINO Intermediate Representation (IR) format for optimized inference on Intel® hardware.

Storage#

  • Downloaded Models Storage: This component represents the physical storage location for downloaded and converted models. It is a configurable filesystem path that acts as a centralized repository and cache.

    • Functions:

      • Provides a persistent location for storing model files.

      • Enables caching to avoid redundant downloads of the same model.

      • Organizes models in a structured directory format for easy access.

Key Features#

  • Multi-Hub Support: Download models from multiple sources (Hugging Face model hub, Ollama model library, Ultralytics library, OpenVINO Model Hub, and Geti platform)

  • Format Conversion: Convert models to OpenVINO format for optimization

  • Parallel Downloads: Optional concurrent model downloads

  • Precision Control: Support for various model precisions (INT8, FP16, and FP32)

  • Device Targeting: Optimization for different compute devices (CPU, GPU, and NPU)

  • Caching: Configurable model caching for improved performance

Integration#

The service can be integrated into applications through:

  • REST API calls

  • Docker container deployment

  • Docker Compose orchestration

Use Cases#

This microservice is ideal for:

  • Edge AI applications requiring model downloads

  • Development and testing environments

  • Sample applications demonstrating AI capabilities

  • Automated model deployment pipelines

Limitations#

This service does not replace full model registry solutions and has the following limitations:

  • Basic model versioning

  • Limited model metadata management

  • No built-in model serving capabilities

Learn More#