# Get Started This guide provides step-by-step instructions to quickly deploy and test the **Multimodal Embedding Serving microservice**. ## Prerequisites Before you begin, confirm the following: - **System Requirements**: Your system meets the [minimum requirements](./get-started/system-requirements.md). - **Docker Installed**: Install Docker if needed. See [Get Docker](https://docs.docker.com/get-docker/). This guide assumes basic familiarity with Docker commands and terminal usage. ## Environment Variables Reference ### Model Configuration - **EMBEDDING_MODEL_NAME** - The model to use (e.g., "CLIP/clip-vit-b-16"). Refer to the [Supported Models](./supported-models.md) list for additional choices. - **EMBEDDING_DEVICE** - Device for inference (CPU/GPU, default: CPU) - **EMBEDDING_USE_OV** - Enable OpenVINO optimization (true/false, default: false) - **EMBEDDING_OV_MODELS_DIR** - Directory for OpenVINO models (default: ./ov-models) ### Model Handler Performance - **INFER_BATCH_SIZE** - Batch size for inference (default: 64). Compiles model to accept fixed batch input. Padding or split is done to accommodate dynamic input sizes. - **PREPROCESS_WORKERS** - Number of parallel preprocessing workers (default: min(16, cpu_count * 2)). Higher is better but yields diminishing returns if > number of CPU cores. ### Video Frame Extraction These variables control the video frame extraction pipeline performance and memory usage. #### Extraction Performance - **VIDEO_FRAME_BATCH_SIZE** - Batch size for video frame extraction (default: 64) - **VIDEO_FRAME_DECODER_WORKERS** - Number of workers for video frame decoding (default: 8) - **VIDEO_FRAME_QUEUE_SIZE** - Queue size for frame extraction pipeline (default: 32) #### Shared Memory Configuration - **VIDEO_FRAME_SHM_POOL_BLOCK_SIZE** - Shared memory block size in bytes (default: 1920*1080*3 = 6,220,800 bytes for 1080p RGB) - **VIDEO_FRAME_SHM_POOL_BLOCKS_MULTIPLIER** - Multiplier for total shared memory blocks (default: 2) - Total blocks = VIDEO_FRAME_BATCH_SIZE × VIDEO_FRAME_SHM_POOL_BLOCKS_MULTIPLIER #### Logging - **VIDEO_FRAME_LOG_LEVEL** - Logging level for video frame extraction (DEBUG/INFO/WARNING/ERROR/CRITICAL, default: INFO) ## Set Environment Values ### Basic Setup Set the required environment variables before launching the service. ```bash export EMBEDDING_MODEL_NAME=CLIP/clip-vit-b-32 ``` Refer to the [Supported Models](./supported-models.md) list for additional choices. > **_NOTE:_** You can change the model, OpenVINO conversion, device, or tokenization parameters by editing `setup.sh`. ### Configure the Registry ```bash export REGISTRY_URL=intel export TAG=2026.1.0-rc1 ``` ### Configuration Examples **Basic CPU setup (default)**: ```bash export EMBEDDING_MODEL_NAME=CLIP/clip-vit-b-32 ``` **GPU acceleration with OpenVINO**: ```bash export EMBEDDING_MODEL_NAME=CLIP/clip-vit-b-32 export EMBEDDING_DEVICE=GPU export EMBEDDING_USE_OV=true ``` **High Performance Video Processing**: ```bash export EMBEDDING_MODEL_NAME=CLIP/clip-vit-b-32 export VIDEO_FRAME_BATCH_SIZE=256 export VIDEO_FRAME_DECODER_WORKERS=8 export VIDEO_FRAME_SHM_POOL_BLOCK_SIZE=$((1920 * 1080 * 3)) # 6MB for 1080p export VIDEO_FRAME_SHM_POOL_BLOCKS_MULTIPLIER=2 export INFER_BATCH_SIZE=64 export PREPROCESS_WORKERS=16 ``` **Memory-Constrained Environment**: ```bash export EMBEDDING_MODEL_NAME=CLIP/clip-vit-b-32 export VIDEO_FRAME_BATCH_SIZE=64 export VIDEO_FRAME_DECODER_WORKERS=4 export VIDEO_FRAME_SHM_POOL_BLOCK_SIZE=$((1280 * 720 * 3)) # 2.8MB for 720p export VIDEO_FRAME_SHM_POOL_BLOCKS_MULTIPLIER=2 ``` **With OpenVINO Optimization on GPU**: ```bash export EMBEDDING_MODEL_NAME=CLIP/clip-vit-b-16 export EMBEDDING_USE_OV=true export EMBEDDING_DEVICE=GPU export INFER_BATCH_SIZE=64 export PREPROCESS_WORKERS=16 ``` **Debug Mode with Detailed Logging**: ```bash export VIDEO_FRAME_LOG_LEVEL=INFO export EMBEDDING_DEVICE=CPU ``` ### Performance Tuning Guide #### For Video Processing Bottlenecks 1. Increase `VIDEO_FRAME_BATCH_SIZE` (trades memory for throughput) 2. Increase `VIDEO_FRAME_DECODER_WORKERS` (limited by CPU cores) 3. Increase `VIDEO_FRAME_QUEUE_SIZE` if frames are being dropped #### For Memory Constraints 1. Decrease `VIDEO_FRAME_BATCH_SIZE` 2. Decrease `VIDEO_FRAME_SHM_POOL_BLOCKS_MULTIPLIER` 3. Reduce `VIDEO_FRAME_SHM_POOL_BLOCK_SIZE` if processing lower resolutions #### For Inference Performance 1. Increase `INFER_BATCH_SIZE` and `PREPROCESS_WORKERS` 2. Enable OpenVINO: `EMBEDDING_USE_OV=true` 3. Use GPU if available: `EMBEDDING_DEVICE=GPU` ### Set the environment variables Set the environment with default values by running the below command. Note that this needs to be run anytime the environment variables are changed. For example: if running on GPU, additional environment variables will need to be set. ```bash source setup.sh ``` ## Quick Start with Docker You can [build the Docker image](./get-started/build-from-source.md#steps-to-build) or pull a prebuilt image from the configured registry and tag. For prebuilt image, the `setup` script will configure the necessary variables to pull the right version of the image. ## Running the Server with CPU ```bash docker compose -f docker/compose.yaml up -d ``` Verify the deployment by running the below command. The user should see a `healthy` status printed on the console. ```bash curl --location --request GET 'http://localhost:9777/health' ``` ## Running the Server with GPU ### 1. Configure GPU Device ```bash # Automatic GPU selection export EMBEDDING_DEVICE=GPU # Specific GPU index (if applicable) export EMBEDDING_DEVICE=GPU.0 ``` ### 2. Run Setup Script ```bash source setup.sh ``` > **Note**: When `EMBEDDING_DEVICE=GPU` is set, `setup.sh` applies GPU-friendly defaults, including setting `EMBEDDING_USE_OV=true`. ### 3. Start the Service ```bash docker compose -f docker/compose.yaml up -d ``` ### 4. Verify GPU Configuration ```bash # Check service health curl --location --request GET 'http://localhost:9777/health' # Inspect active model capabilities curl --location --request GET 'http://localhost:9777/model/capabilities' ``` ## Stop the Multimodal Embedding microservice ```bash docker compose -f docker/compose.yaml down ``` ## Sample CURL Commands The following samples mirror the accompanying Postman collection. All requests target `http://localhost:9777`. ### Text Embedding ```bash curl --location 'http://localhost:9777/embeddings' \ --header 'Content-Type: application/json' \ --data '{ "input": { "type": "text", "text": "Sample input text1" }, "model": "CLIP/clip-vit-b-32", "encoding_format": "float" }' ``` ### Document Embedding (multiple texts) ```bash curl --location 'http://localhost:9777/embeddings' \ --header 'Content-Type: application/json' \ --data '{ "input": { "type": "text", "text": ["Sample input text1", "Sample input text2"] }, "model": "CLIP/clip-vit-b-32", "encoding_format": "float" }' ``` ### Image URL Embedding ```bash curl --location 'http://localhost:9777/embeddings' \ --header 'Content-Type: application/json' \ --data '{ "input": { "type": "image_url", "image_url": "https://i.ytimg.com/vi/H_8J2YfMpY0/sddefault.jpg" }, "model": "CLIP/clip-vit-b-32", "encoding_format": "float" }' ``` ### Image Base64 Embedding ```bash curl --location 'http://localhost:9777/embeddings' \ --header 'Content-Type: application/json' \ --data '{ "model": "CLIP/clip-vit-b-32", "encoding_format": "float", "input": { "type": "image_base64", "image_base64": "" } }' ``` ### Video Frames Embedding ```bash curl --location 'http://localhost:9777/embeddings' \ --header 'Content-Type: application/json' \ --data '{ "model": "CLIP/clip-vit-b-32", "encoding_format": "float", "input": { "type": "video_frames", "video_frames": [ { "type": "image_url", "image_url": "https://i.ytimg.com/vi/H_8J2YfMpY0/sddefault.jpg" }, { "type": "image_base64", "image_base64": "" } ] } }' ``` ### Video URL Embedding (with segment config) ```bash curl --location 'http://localhost:9777/embeddings' \ --header 'Content-Type: application/json' \ --data '{ "model": "CLIP/clip-vit-b-32", "encoding_format": "float", "input": { "type": "video_url", "video_url": "https://sample-videos.com/video321/mp4/720/big_buck_bunny_720p_10mb.mp4", "segment_config": { "startOffsetSec": 0, "clip_duration": -1, "num_frames": 64, "frame_indexes": [1, 10, 20] } } }' ``` ### Video Base64 Embedding set `num_frames: 0` to process all the frames. ```bash curl --location 'http://localhost:9777/embeddings' \ --header 'Content-Type: application/json' \ --data '{ "model": "CLIP/clip-vit-b-32", "encoding_format": "float", "input": { "type": "video_base64", "segment_config": { "startOffsetSec": 0, "clip_duration": -1, "num_frames": 64 }, "video_base64": "