Deploying VSS with vLLM on Kubernetes Using Helm#
Overview#
This guide covers deploying the Video Search and Summarization (VSS) application on Kubernetes using vLLM as the LLM inference backend. vLLM provides an OpenAI-compatible API for efficient CPU-based inference on Intel Xeon systems - no GPU required.
This is one of several supported deployment configurations. For an overview of all configurations (including OVMS, VLM Microservice, and GPU-based deployment), see Deploy with Helm. For a conceptual overview of how VSS works, see How It Works.
Prerequisites#
Hardware Requirements#
For best performance, Intel® Xeon® 6 Processors are recommended.
Component |
Specification |
|---|---|
CPU Cores |
For optimal performance: 16 cores for vLLM, additional cores for ingestion, embedding, vectordb, and other microservices |
RAM Memory |
Minimum 256GB total system memory |
Disk Space |
Minimum 500GB (SSD recommended for optimal performance) |
Storage |
Dynamic storage provisioning capability (NFS or local storage) |
Software Requirements#
Tool |
Version |
Installation Guide |
|---|---|---|
Kubernetes |
v1.24 or later |
|
kubectl |
Latest |
|
Helm |
v3.0 or later |
Your cluster must support dynamic provisioning of Persistent Volumes. Confirm a default storage class is configured:
kubectl get storageclass
See the Kubernetes Dynamic Provisioning Guide if none is available.
Step 1: Acquire the Helm Chart#
# Clone the repository (main branch)
git clone https://github.com/open-edge-platform/edge-ai-libraries.git edge-ai-libraries
# Navigate to the chart directory
cd edge-ai-libraries/sample-applications/video-search-and-summarization/chart
Step 2: Configure Required Values#
Open user_values_override.yaml in your editor:
nano user_values_override.yaml
Required Parameters#
Key |
Description |
Example Value |
|---|---|---|
|
Name of the shared PVC for all components |
|
|
Hugging Face API token for model access |
|
|
Vision Language Model used for video analysis |
|
|
PostgreSQL username |
|
|
PostgreSQL password |
|
|
MinIO username (min 3 chars) |
|
|
MinIO password (min 8 chars) |
|
|
RabbitMQ username |
|
|
RabbitMQ password |
|
|
Multimodal embedding model |
|
For the full parameter catalog across all deployment modes, see Deploy with Helm.
Optional Parameters#
Key |
Description |
Example Value |
|---|---|---|
|
Retain PVC on |
|
|
HTTP proxy (if required by your environment) |
|
|
HTTPS proxy (if required by your environment) |
|
vLLM-Specific Parameters#
The xeon_vllm_values.yaml override file (included in the chart) pre-configures vLLM with sensible defaults for Intel Xeon. You can override individual values as needed:
Key |
Description |
Default |
Notes |
|---|---|---|---|
|
CPU request for the vLLM pod |
|
Increase for higher throughput |
|
Memory request for the vLLM pod |
|
Increase for larger models |
|
Model cache PVC size |
|
Increase for larger model footprints |
|
Model cache mount path in the pod |
|
Uses shared PVC |
Model selection:
vllm.enabled: true(set byxeon_vllm_values.yaml) automatically disables the VLM Inference Microservice (vlminference.enabled: false). vLLM uses the model specified inglobal.vlmName; ensure it is compatible with vLLM and available on Hugging Face.
Step 3: Build Helm Dependencies#
From the chart directory, run:
helm dependency update
Verify all dependencies are resolved:
helm dependency list
Step 4: Create a Namespace#
export NAMESPACE=vss-deployment
kubectl create namespace ${NAMESPACE}
All subsequent commands assume the
NAMESPACEvariable is set in your shell session.
Step 5: Deploy with vLLM#
Choose the deployment mode that fits your use case. In both cases, xeon_vllm_values.yaml enables vLLM and configures resource allocations for Intel Xeon CPUs.
Switching modes: Always uninstall the current release before switching to a different mode:
helm uninstall vss -n ${NAMESPACE}
Option A: Video Summarization Only#
Deploys the summarization pipeline with vLLM for text generation.
helm install vss . \
-f summary_override.yaml \
-f xeon_vllm_values.yaml \
-f user_values_override.yaml \
-n ${NAMESPACE}
Option B: Unified Video Search and Summarization#
Deploys both the search and summarization pipelines with vLLM. Before installing, ensure global.env.TEXT_EMBEDDING_MODEL_NAME is set and global.embedding.preferTextModel: true in user_values_override.yaml (built into unified_summary_search.yaml).
helm install vss . \
-f unified_summary_search.yaml \
-f xeon_vllm_values.yaml \
-f user_values_override.yaml \
-n ${NAMESPACE}
Requirement: The chart will raise an error if
global.env.TEXT_EMBEDDING_MODEL_NAMEis omitted while unified mode is enabled. Review the supported model list in supported-models before choosing model IDs.
Understanding the override files:
File |
Purpose |
|---|---|
|
Enables the summarization pipeline |
|
Enables combined search and summarization; sets |
|
Enables vLLM, disables VLM Microservice, sets Xeon-optimized resource allocations |
|
Your credentials, model selections, and environment-specific overrides |
Step 6: Verify the Deployment#
Monitor pod startup progress:
kubectl get pods -n ${NAMESPACE} -w
After a successful deployment, all pods should reach Running / 1/1 Ready state:
First-time startup: All pods can take up to 20–30 minutes to reach Running state because models (vLLM, embedding, object detection — up to ~50 GB total) are downloaded from Hugging Face and cached. Set
global.keepPvc: trueto skip model re-downloads on reinstallation.
Step 7: Access the Application#
Once all pods are running, retrieve the URL:
NGINX_HOST=$(kubectl get pods -l app=vss-nginx -n ${NAMESPACE} -o jsonpath='{.items[0].status.hostIP}')
NGINX_PORT=$(kubectl get service vss-nginx -n ${NAMESPACE} -o jsonpath='{.spec.ports[0].nodePort}')
echo "http://${NGINX_HOST}:${NGINX_PORT}"
Open the URL in your browser to access the VSS dashboard.
Managing the Deployment#
Upgrading#
After editing user_values_override.yaml, apply changes with:
helm upgrade vss . \
-f summary_override.yaml \
-f xeon_vllm_values.yaml \
-f user_values_override.yaml \
-n ${NAMESPACE}
Replace summary_override.yaml with unified_summary_search.yaml for the unified mode.
Troubleshooting#
For troubleshooting guidance, see Deploy with Helm — Troubleshooting.
Uninstallation#
helm uninstall vss -n ${NAMESPACE}
# Optional: delete the namespace
kubectl delete namespace ${NAMESPACE}
By default, PVCs are deleted with the Helm release. If you set
global.keepPvc: true, PVCs are retained and reusable in future deployments to avoid re-downloading models.