# SDK Usage Guide This guide shows you how to use the Multimodal Embedding Serving microservice as a Python SDK for embedding text, images, and videos in your applications. The SDK provides a convenient wrapper around the REST API for seamless integration. > **Model Selection**: The examples in this guide use placeholder model names (`"your-chosen-model"`). Replace these with a specific model from [Supported Models](./supported-models.md) based on your requirements. ## Installation ### Option 1: Install from Wheel (Recommended for Production) Build and install the microservice as a wheel package for clean, production-ready integration. > **Comprehensive Guide**: See [Wheel-Based Installation Guide](./wheel-installation.md) for detailed instructions on building, installing, distributing, and troubleshooting wheel installations. **Quick Install:** ```bash # 1. Build the wheel cd multimodal-embedding-serving poetry build # 2. Install in your project pip install dist/multimodal_embedding_serving-0.1.1-py3-none-any.whl # OR add to pyproject.toml (recommended) # [tool.poetry.dependencies] # multimodal-embedding-serving = {path = "wheels/multimodal_embedding_serving-0.1.1-py3-none-any.whl"} ``` ### Option 2: Install from Source (Development) ```bash git clone https://github.com/intel/edge-ai-libraries cd edge-ai-libraries/microservices/multimodal-embedding-serving pip install -e . ``` ### Option 3: Using Poetry for Development ```bash cd multimodal-embedding-serving poetry install poetry shell ``` ## Quick Start ### 1. Basic SDK Usage ```python # Import from the installed package from multimodal_embedding_serving import get_model_handler, EmbeddingModel # Create and load a model (replace with your chosen model from supported-models.md) model_handler = get_model_handler("your-chosen-model") model_handler.load_model() # Create the application wrapper embedding_model = EmbeddingModel(model_handler) # Test the model print("Model loaded successfully!") print(f"Embedding dimension: {embedding_model.get_embedding_length()}") ``` ### 2. Text Embeddings ```python # Single text embedding text = "A beautiful sunset over the mountains" embedding = embedding_model.embed_query(text) print(f"Text embedding shape: {len(embedding)}") # Multiple text embeddings texts = [ "A red car driving down the road", "A blue ocean with white waves", "A green forest in spring" ] embeddings = embedding_model.embed_documents(texts) print(f"Batch embeddings shape: {len(embeddings)}x{len(embeddings[0])}") ``` > **Text-only models**: Qwen text embeddings expose only the text encoder. Use the `/model/capabilities` endpoint or `embedding_model.get_supported_modalities()` to confirm modality support before invoking image/video helpers. #### Qwen text embeddings with OpenVINO INT8 ```python from multimodal_embedding_serving import get_model_handler, EmbeddingModel handler = get_model_handler( "QwenText/qwen3-embedding-0.6b", device="GPU", # or CPU / AUTO use_openvino=True, ov_models_dir="./ov-models" ) handler.load_model() embedding_model = EmbeddingModel(handler) print(embedding_model.get_supported_modalities()) # ['text'] query = "How does photosynthesis work?" embedding = embedding_model.embed_query(query) print(len(embedding)) ``` ### 3. Image Embeddings > Image helpers require a model with image modality support (e.g., CLIP, MobileCLIP, SigLIP, BLIP-2). They are not available when a text-only model such as Qwen is active. #### From URL ```python import asyncio async def process_image_url(): image_url = "https://example.com/image.jpg" embedding = await embedding_model.get_image_embedding_from_url(image_url) print(f"Image embedding shape: {len(embedding)}") # Run async function asyncio.run(process_image_url()) ``` #### From Base64 ```python import base64 from PIL import Image import io # Convert image to base64 image = Image.new('RGB', (224, 224), color='red') buffer = io.BytesIO() image.save(buffer, format='JPEG') image_base64 = base64.b64encode(buffer.getvalue()).decode() # Get embedding embedding = embedding_model.get_image_embedding_from_base64(image_base64) print(f"Image embedding shape: {len(embedding)}") ``` ### 4. Video Embeddings > Video helpers rely on image encoders under the hood; ensure the active model advertises video support via `embedding_model.supports_video()`. #### From_URL ```python async def process_video_url(): video_url = "https://example.com/video.mp4" # Basic video processing frame_embeddings = await embedding_model.get_video_embedding_from_url(video_url) print(f"Video frame embeddings: {len(frame_embeddings)} frames") # With custom segment configuration segment_config = { "startOffsetSec": 10, "clip_duration": 30, "num_frames": 16 } frame_embeddings = await embedding_model.get_video_embedding_from_url( video_url, segment_config ) print(f"Custom video embeddings: {len(frame_embeddings)} frames") asyncio.run(process_video_url()) ``` #### From Local File ```python async def process_local_video(): video_path = "/path/to/your/video.mp4" # Advanced frame sampling options segment_config = { "fps": 2.0, # Extract 2 frames per second "startOffsetSec": 0, "clip_duration": -1 # Process entire video } frame_embeddings = await embedding_model.get_video_embedding_from_file( video_path, segment_config ) print(f"Local video embeddings: {len(frame_embeddings)} frames") asyncio.run(process_local_video()) ``` #### Using Specific Frame Indices ```python segment_config = { "frame_indexes": [0, 15, 30, 45, 60], # Extract specific frames "startOffsetSec": 5, "clip_duration": 20 } frame_embeddings = await embedding_model.get_video_embedding_from_file( "video.mp4", segment_config ) ``` ## Advanced Configuration ### 1. Using Different Models ```python from multimodal_embedding_serving import get_model_handler, EmbeddingModel # Standard CLIP clip_handler = get_model_handler("your-chosen-model") clip_model = EmbeddingModel(clip_handler) # Chinese CLIP for multilingual support cn_clip_handler = get_model_handler("CN-CLIP/cn-clip-vit-b-16") cn_clip_model = EmbeddingModel(cn_clip_handler) # Mobile-optimized CLIP mobile_handler = get_model_handler("MobileCLIP/mobileclip_b") mobile_model = EmbeddingModel(mobile_handler) # BLIP-2 for advanced multimodal understanding blip2_handler = get_model_handler("Blip2/blip2_transformers") blip2_model = EmbeddingModel(blip2_handler) ``` ### 2. OpenVINO Optimization ```python from multimodal_embedding_serving import get_model_handler, EmbeddingModel # Enable OpenVINO for faster inference model_handler = get_model_handler( model_id="your-chosen-model", device="CPU", use_openvino=True, ov_models_dir="./ov-models" ) model_handler.load_model() embedding_model = EmbeddingModel(model_handler) ``` ### 3. GPU Acceleration (if available) ```python from multimodal_embedding_serving import get_model_handler # Use GPU for inference model_handler = get_model_handler( model_id="your-chosen-model", device="GPU" ) ``` ## Practical Examples ### 1. Image-Text Similarity ```python import numpy as np from sklearn.metrics.pairwise import cosine_similarity # Get embeddings text_embedding = embedding_model.embed_query("A red sports car") image_embedding = await embedding_model.get_image_embedding_from_url( "https://example.com/red_car.jpg" ) # Calculate similarity similarity = cosine_similarity( [text_embedding], [image_embedding] )[0][0] print(f"Similarity: {similarity:.3f}") ``` ### 2. Video Content Search ```python async def search_video_content(): # Process video to get frame embeddings video_embeddings = await embedding_model.get_video_embedding_from_file( "movie.mp4", {"fps": 0.5, "clip_duration": -1} # 1 frame every 2 seconds ) # Search query query = "person walking in a park" query_embedding = embedding_model.embed_query(query) # Find most similar frames similarities = [] for i, frame_emb in enumerate(video_embeddings): sim = cosine_similarity([query_embedding], [frame_emb])[0][0] similarities.append((i, sim)) # Get top 5 matches top_matches = sorted(similarities, key=lambda x: x[1], reverse=True)[:5] for frame_idx, similarity in top_matches: timestamp = frame_idx * 2 # Since we used 0.5 fps print(f"Frame {frame_idx} (t={timestamp}s): {similarity:.3f}") asyncio.run(search_video_content()) ``` ### 3. Multilingual Text Processing ```python from multimodal_embedding_serving import get_model_handler, EmbeddingModel # Using CN-CLIP for Chinese text cn_clip_handler = get_model_handler("CN-CLIP/cn-clip-vit-b-16") cn_clip_handler.load_model() cn_model = EmbeddingModel(cn_clip_handler) # Process Chinese and English text texts = [ "一只可爱的小猫", # Chinese: "A cute little cat" "A beautiful landscape", "红色的汽车", # Chinese: "Red car" "Blue ocean waves" ] embeddings = cn_model.embed_documents(texts) print(f"Multilingual embeddings: {len(embeddings)} texts processed") ``` ### 4. Batch Processing for Efficiency ```python async def batch_process_images(): image_urls = [ "https://example.com/image1.jpg", "https://example.com/image2.jpg", "https://example.com/image3.jpg" ] # Process images concurrently import asyncio tasks = [ embedding_model.get_image_embedding_from_url(url) for url in image_urls ] embeddings = await asyncio.gather(*tasks) print(f"Processed {len(embeddings)} images") return embeddings asyncio.run(batch_process_images()) ``` ## Error Handling ```python from multimodal_embedding_serving import get_model_handler, EmbeddingModel try: # Load model with error handling model_handler = get_model_handler("your-chosen-model") model_handler.load_model() embedding_model = EmbeddingModel(model_handler) # Check if model is healthy if embedding_model.check_health(): print("Model is ready!") else: print("Model health check failed") except Exception as e: print(f"Failed to load model: {e}") try: # Process with error handling embedding = embedding_model.embed_query("test text") print("Text processed successfully") except Exception as e: print(f"Processing failed: {e}") ``` ## Configuration Options ### Model Selection See [Supported Models](./supported-models.md) for all available models and their specifications. ```python from multimodal_embedding_serving import get_model_handler # Example: Using different models clip_handler = get_model_handler("your-chosen-model") cn_clip_handler = get_model_handler("CN-CLIP/cn-clip-vit-b-16") mobile_handler = get_model_handler("MobileCLIP/mobileclip_b") ``` ### OpenVINO Optimization ```python from multimodal_embedding_serving import get_model_handler # Enable OpenVINO for Intel hardware acceleration model_handler = get_model_handler( "your-chosen-model", use_openvino=True ) ``` ### Batch Processing ```python # Process multiple texts for better throughput embeddings = embedding_model.embed_documents(text_batch) ``` ## Integration Examples ### 1. Flask Web Application ```python from flask import Flask, request, jsonify from multimodal_embedding_serving import get_model_handler, EmbeddingModel import asyncio app = Flask(__name__) # Initialize model globally model_handler = get_model_handler("your-chosen-model") model_handler.load_model() embedding_model = EmbeddingModel(model_handler) @app.route('/embed', methods=['POST']) def embed_text(): data = request.json text = data.get('text', '') try: embedding = embedding_model.embed_query(text) return jsonify({'embedding': embedding}) except Exception as e: return jsonify({'error': str(e)}), 500 if __name__ == '__main__': app.run(host='0.0.0.0', port=5000) ``` ### 2. FastAPI Integration ```python from fastapi import FastAPI, HTTPException from pydantic import BaseModel from multimodal_embedding_serving import get_model_handler, EmbeddingModel app = FastAPI() class TextRequest(BaseModel): text: str @app.on_event("startup") async def startup_event(): global embedding_model model_handler = get_model_handler("your-chosen-model") model_handler.load_model() embedding_model = EmbeddingModel(model_handler) @app.post("/embed") async def embed_text(request: TextRequest): try: embedding = embedding_model.embed_query(request.text) return {"embedding": embedding} except Exception as e: raise HTTPException(status_code=500, detail=str(e)) ``` ## Troubleshooting ### Common Issues 1. **Model Loading Errors** ```python # Check available models from multimodal_embedding_serving import list_available_models print(list_available_models()) ``` 2. **Memory Issues** ```python # Use smaller models for limited memory from multimodal_embedding_serving import get_model_handler model_handler = get_model_handler("MobileCLIP/mobileclip_s0") ``` 3. **OpenVINO Issues** ```python # Disable OpenVINO if having issues from multimodal_embedding_serving import get_model_handler model_handler = get_model_handler( "your-chosen-model", use_openvino=False ) ``` ### Getting Help - Check the [API Reference](./api-reference.md) for detailed endpoint documentation - See [Supported Models](./supported-models.md) for model selection guidance - Review system requirements in [System Requirements](./get-started/system-requirements.md)