# Deploying VSS with vLLM on Kubernetes Using Helm ## Overview This guide covers deploying the Video Search and Summarization (VSS) application on Kubernetes using **vLLM** as the LLM inference backend. vLLM provides an OpenAI-compatible API for efficient CPU-based inference on Intel Xeon systems - no GPU required. This is one of several supported deployment configurations. For an overview of all configurations (including OVMS, VLM Microservice, and GPU-based deployment), see [Deploy with Helm](./deploy-with-helm.md). For a conceptual overview of how VSS works, see [How It Works](./how-it-works.md). --- ## Prerequisites ### Hardware Requirements For best performance, **Intel® Xeon® 6 Processors** are recommended. | Component | Specification | | --- | --- | | CPU Cores | For optimal performance: 16 cores for vLLM, additional cores for ingestion, embedding, vectordb, and other microservices | | RAM Memory | Minimum 256GB total system memory | | Disk Space | Minimum 500GB (SSD recommended for optimal performance) | | Storage | Dynamic storage provisioning capability (NFS or local storage) | ### Software Requirements | Tool | Version | Installation Guide | | --- | --- | --- | | Kubernetes | v1.24 or later | [Kubernetes docs](https://kubernetes.io/docs/setup/) | | kubectl | Latest | [kubectl docs](https://kubernetes.io/docs/tasks/tools/install-kubectl/) | | Helm | v3.0 or later | [Helm docs](https://helm.sh/docs/intro/install/) | Your cluster must support **dynamic provisioning of Persistent Volumes**. Confirm a default storage class is configured: ```bash kubectl get storageclass ``` See the [Kubernetes Dynamic Provisioning Guide](https://kubernetes.io/docs/concepts/storage/dynamic-provisioning/) if none is available. --- ## Step 1: Acquire the Helm Chart ```bash # Clone the repository (main branch) git clone https://github.com/open-edge-platform/edge-ai-libraries.git edge-ai-libraries # Navigate to the chart directory cd edge-ai-libraries/sample-applications/video-search-and-summarization/chart ``` --- ## Step 2: Configure Required Values Open `user_values_override.yaml` in your editor: ```bash nano user_values_override.yaml ``` ### Required Parameters | Key | Description | Example Value | | --- | --- | --- | | `global.sharedPvcName` | Name of the shared PVC for all components | `vss-shared-pvc` | | `global.huggingfaceToken` | Hugging Face API token for model access | `hf_xxxxxxxxxxxxxxxxxxxx` | | `global.vlmName` | Vision Language Model used for video analysis | `Qwen/Qwen2.5-VL-3B-Instruct` | | `global.env.POSTGRES_USER` | PostgreSQL username | `vsadmin` | | `global.env.POSTGRES_PASSWORD` | PostgreSQL password | `` | | `global.env.MINIO_ROOT_USER` | MinIO username (min 3 chars) | `minioadmin` | | `global.env.MINIO_ROOT_PASSWORD` | MinIO password (min 8 chars) | `` | | `global.env.RABBITMQ_DEFAULT_USER` | RabbitMQ username | `guest` | | `global.env.RABBITMQ_DEFAULT_PASS` | RabbitMQ password | `` | | `global.env.EMBEDDING_MODEL_NAME` | Multimodal embedding model | `CLIP/clip-vit-b-32` (search) or `QwenText/qwen3-embedding-0.6b` (unified) | For the full parameter catalog across all deployment modes, see [Deploy with Helm](./deploy-with-helm.md). ### Optional Parameters | Key | Description | Example Value | | --- | --- | --- | | `global.keepPvc` | Retain PVC on `helm uninstall` to avoid re-downloading models | `true` | | `global.proxy.http_proxy` | HTTP proxy (if required by your environment) | `http://proxy-example.com:000` | | `global.proxy.https_proxy` | HTTPS proxy (if required by your environment) | `http://proxy-example.com:000` | ### vLLM-Specific Parameters The `xeon_vllm_values.yaml` override file (included in the chart) pre-configures vLLM with sensible defaults for Intel Xeon. You can override individual values as needed: | Key | Description | Default | Notes | | --- | --- | --- | --- | | `vllm.resources.requests.cpu` | CPU request for the vLLM pod | `16` | Increase for higher throughput | | `vllm.resources.requests.memory` | Memory request for the vLLM pod | `128Gi` | Increase for larger models | | `vllm.pvc.size` | Model cache PVC size | `80Gi` | Increase for larger model footprints | | `vllm.modelCachePath` | Model cache mount path in the pod | `/cache/vllm` | Uses shared PVC | > **Model selection**: `vllm.enabled: true` (set by `xeon_vllm_values.yaml`) automatically disables the VLM Inference Microservice (`vlminference.enabled: false`). vLLM uses the model specified in `global.vlmName`; ensure it is compatible with vLLM and available on Hugging Face. ## Step 3: Build Helm Dependencies From the chart directory, run: ```bash helm dependency update ``` Verify all dependencies are resolved: ```bash helm dependency list ``` --- ## Step 4: Create a Namespace ```bash export NAMESPACE=vss-deployment kubectl create namespace ${NAMESPACE} ``` > All subsequent commands assume the `NAMESPACE` variable is set in your shell session. --- ## Step 5: Deploy with vLLM Choose the deployment mode that fits your use case. In both cases, `xeon_vllm_values.yaml` enables vLLM and configures resource allocations for Intel Xeon CPUs. > **Switching modes**: Always uninstall the current release before switching to a different mode: > ```bash > helm uninstall vss -n ${NAMESPACE} > ``` ### Option A: Video Summarization Only Deploys the summarization pipeline with vLLM for text generation. ```bash helm install vss . \ -f summary_override.yaml \ -f xeon_vllm_values.yaml \ -f user_values_override.yaml \ -n ${NAMESPACE} ``` ### Option B: Unified Video Search and Summarization Deploys both the search and summarization pipelines with vLLM. Before installing, ensure `global.env.TEXT_EMBEDDING_MODEL_NAME` is set and `global.embedding.preferTextModel: true` in `user_values_override.yaml` (built into `unified_summary_search.yaml`). ```bash helm install vss . \ -f unified_summary_search.yaml \ -f xeon_vllm_values.yaml \ -f user_values_override.yaml \ -n ${NAMESPACE} ``` > **Requirement:** The chart will raise an error if `global.env.TEXT_EMBEDDING_MODEL_NAME` is omitted while unified mode is enabled. Review the supported model list in [supported-models](https://github.com/open-edge-platform/edge-ai-libraries/blob/main/microservices/multimodal-embedding-serving/docs/user-guide/supported-models.md) before choosing model IDs. **Understanding the override files:** | File | Purpose | | --- | --- | | `summary_override.yaml` | Enables the summarization pipeline | | `unified_summary_search.yaml` | Enables combined search and summarization; sets `preferTextModel: true` | | `xeon_vllm_values.yaml` | Enables vLLM, disables VLM Microservice, sets Xeon-optimized resource allocations | | `user_values_override.yaml` | Your credentials, model selections, and environment-specific overrides | --- ## Step 6: Verify the Deployment Monitor pod startup progress: ```bash kubectl get pods -n ${NAMESPACE} -w ``` After a successful deployment, all pods should reach **Running / 1/1 Ready** state: > **First-time startup**: All pods can take **up to 20–30 minutes** to reach Running state because models (vLLM, embedding, object detection — up to ~50 GB total) are downloaded from Hugging Face and cached. Set `global.keepPvc: true` to skip model re-downloads on reinstallation. --- ## Step 7: Access the Application Once all pods are running, retrieve the URL: ```bash NGINX_HOST=$(kubectl get pods -l app=vss-nginx -n ${NAMESPACE} -o jsonpath='{.items[0].status.hostIP}') NGINX_PORT=$(kubectl get service vss-nginx -n ${NAMESPACE} -o jsonpath='{.spec.ports[0].nodePort}') echo "http://${NGINX_HOST}:${NGINX_PORT}" ``` Open the URL in your browser to access the VSS dashboard. --- ## Managing the Deployment ### Upgrading After editing `user_values_override.yaml`, apply changes with: ```bash helm upgrade vss . \ -f summary_override.yaml \ -f xeon_vllm_values.yaml \ -f user_values_override.yaml \ -n ${NAMESPACE} ``` Replace `summary_override.yaml` with `unified_summary_search.yaml` for the unified mode. --- ## Troubleshooting For troubleshooting guidance, see [Deploy with Helm — Troubleshooting](./deploy-with-helm.md#troubleshooting). --- ## Uninstallation ```bash helm uninstall vss -n ${NAMESPACE} # Optional: delete the namespace kubectl delete namespace ${NAMESPACE} ``` > By default, PVCs are deleted with the Helm release. If you set `global.keepPvc: true`, PVCs are retained and reusable in future deployments to avoid re-downloading models.