How to Deploy with Helm Chart#
This guide shows how to deploy the Live Video Captioning application on Kubernetes with the Helm chart included in this repository.
Prerequisites#
Before you begin, ensure that you have the following:
A Kubernetes cluster with
kubectlconfigured for access.Helm installed on your system. See the Installation Guide.
The cluster must support dynamic provisioning of Persistent Volumes (PV). See Kubernetes Documentation on Dynamic Volume Provisioning for details.
A worker node reachable by your browser client. Prefer a GPU-capable worker node when available, because the chart pins the media and inference workloads to the selected node and DL Streamer benefits most from GPU access.
A writable host path for collector signal files on the target node. By default the chart uses
/tmp/lvc/collector-signals.An RTSP source reachable from the Kubernetes node that runs
dlstreamer-pipeline-server.Setup model-download chart which responsible for all the models used in this Live Video Captioning chart. If you use gated Hugging Face models, a Hugging Face token is required.
Prepare/Deploy model-download chart#
Model-download service from Open Edge Platform - Edge AI Libraries will be used for models management in Live Video Captioning.
Install the model-download chart
Refer to this guide section to download and install the chart.Configure the values.yaml file
Edit thevalues.yamllocated in the chart.Configure the following:
Parameter
Description
Required Values
proxy
Set the proxy value based on your system environment
<your_system_proxy>
HUGGINGFACEHUB_API_TOKEN
HuggingFace token to download gated model
<your_huggingface_token>
ENABLE_PLUGINS
Comma-separated list of plugins to enable
“openvino,ultralytics”
gpu.enabled
For model-download service pod to be deployed on GPU
true
gpu.key
Label assigned to the GPU node on kubernetes cluster by the device plugin. Identify by running
kubectl describe nodegpu.intel.com/i915 or gpu.intel.com/xe
affinity.enabled
Set to true to deploy on dedicated node
true
affinity.value
Your dedicated node name/value. Identify by running
kubectl get node<your_node_name>
Note: This chart can run on CPU‑only nodes; however, a GPU‑enabled node is strongly recommended to host the models and deliver optimal performance.
Deploy the chart
Deploy the chart using command below:helm install model-download . -n <your-namespace>
Note:
model-downloadcreates and manages a shared PVC that used by live-video-captioning. Hence, do not delete or uninstall helm chart when live-video-captioning chart is running.Verify the deployment
Check the status of the deployed resources to ensure they are running correctly.kubectl get pods -n <your-namespace> kubectl get services -n <your-namespace>
Prepare/Deploy live-video-captioning chart#
To set up the live-video-captioning application, you must obtain the charts and install them with optimal values and configurations. The following sections provide step-by-step instructions for this process.
Acquire the helm chart#
There are 2 options to obtain the charts in your workspace:
Option 1: Get the charts from Docker Hub#
Step 1: Pull the Chart#
Use the following command to pull the prebuild chart from Docker Hub:
helm pull oci://registry-1.docker.io/intel/live-video-captioning --version <version-no>
Refer to the release notes for details on the latest version number to use for the sample application.
Step 2: Extract the .tgz File#
After pulling the chart, extract the .tgz file:
tar -xvf live-video-captioning-<version-no>.tgz
This will create a directory named live-video-captioning containing the chart files. Navigate to the extracted directory to access the charts.
cd live-video-captioning
Option 2: Install from Source#
Step 1: Clone the repository#
Clone the repository containing the charts files:
# Clone the latest on mainline
git clone https://github.com/open-edge-platform/edge-ai-suites.git edge-ai-suites
# Alternatively, clone a specific release branch
git clone https://github.com/open-edge-platform/edge-ai-suites.git edge-ai-suites -b <release-tag>
Select the target node#
The chart pins the workloads that need to stay together to the target node selected in the chart values:
model-downloaddlstreamer-pipeline-servervideo-caption-servicemediamtxcoturncollectorlive-video-captioning-rag (if RAG is enabled)
These workloads are kept on the same worker because they rely on node-local access patterns:
dlstreamer-pipeline-server,video-caption-service,live-video-captioning-ragandmodel-downloadshare the model PVCs that created bymodel-download.dlstreamer-pipeline-serverandcollectorneed direct access to node hardware and host resources.mediamtxandcoturnexpose browser-facing WebRTC and TURN endpoints that must match the selected node’s reachable IP.
Other supporting services such as mqtt-broker, live-metrics-service, multimodal-embedding (when RAG is enabled), and vdms-vectordb (when RAG is enabled) do not require pinning to the same worker node.
For best performance, choose a worker node with a GPU. The chart can run with CPU-only inference, but a GPU-capable node is the preferred deployment target for DL Streamer and real-time media processing.
In the values-override.yaml, specify the Kubernetes node name by setting global.nodeName. This references the built-in kubernetes.io/hostname label, so no node labeling permissions are required.
Example:
global:
nodeName: worker4
Get the IP of the selected node#
Use the same node that you selected for the pinned media workloads. First list the nodes and labels:
kubectl get nodes --show-labels
Then inspect the selected node:
kubectl get node <node-name> -o wide
Set global.hostIP to the node address that is reachable by the browser:
In clusters without worker-node external IPs, use
INTERNAL-IP.Use
EXTERNAL-IPonly if the node actually has one and your browser reaches the application through it.Use
INTERNAL-IPwhen your browser is on the same LAN or VPN and can reach the node directly.
To print the value directly:
kubectl get node <node-name> -o jsonpath='{.status.addresses[?(@.type=="ExternalIP")].address}'
If no external address is present, use:
kubectl get node <node-name> -o jsonpath='{.status.addresses[?(@.type=="InternalIP")].address}'
Set that value in global.hostIP.
If the worker node does not have any browser-reachable IP, direct NodePort access will not work. This capability will be added to the chart in a future update.
Known Limitations#
Single-node deployment with host port binding#
This chart is designed to run on a single worker node. Several workloads bind directly to host ports on that node so that the browser and RTSP clients can reach them without a LoadBalancer or Ingress.
Because of these host port bindings:
replicaCountmust remain1for all workloads that use host ports. Increasing it will fail at scheduling time because two pods cannot bind the same host port on the same node.Multi-node or high-availability deployments are not supported. The chart intentionally pins all workloads to a single node via
global.nodeName.Port conflicts with other applications on the same node are possible. Ensure the ports listed above are not already in use on the target worker node before deploying.
Configure Required Values#
Prior to deployment, edit the sample override file at charts/values-override.yaml, focusing on the key configuration parameters below:
Key |
Description |
Example |
|---|---|---|
|
Browser-reachable IP of the selected node that runs the pinned media workloads. In many on-prem clusters this is the node |
|
|
Kubernetes node name used to pin the media, TURN, and host-coupled workloads to one worker node. Prefer a GPU-capable node when available |
|
|
List of VLM models from HuggingFace to export to OpenVINO format (at least one VLM required) |
|
|
HuggingfaceHub token to download gated model. |
<your_huggingfacehub_token> |
|
Enables detection filtering in the pipeline. When set to |
|
|
List of detection model names to download (only required when |
|
|
Default RTSP URL shown in the dashboard |
|
|
Switches captioning to binary alert-style responses |
|
Proxy Configuration#
If your cluster runs behind a proxy, set the proxy fields under global:
global:
httpProxy: "http://<your-proxy-host>:<port>"
httpsProxy: "http://<your-proxy-host>:<port>"
noProxy: "<your-rtsp-camera-host-or-ip>"
Important: the host portion of every RTSP URL must be included in
noProxywhen the deployment runs behind a proxy.
For example:
If your stream URL is
rtsp://camera.example.com:8554/live, addcamera.example.comtonoProxy.If your stream URL is
rtsp://192.168.1.50:554/stream1, add192.168.1.50tonoProxy.
If the RTSP host is not listed in
noProxy, the application may try to reach the stream through the proxy and fail to connect.
Optional: Enable RAG with Live-Video-Captioning#
Live‑Video‑Captioning includes an optional RAG (Retrieval‑Augmented Generation) capability. You can leave this disabled for a standard captioning deployment, or enable it to add retrieval-backed chatbot features. When enabled, generated caption text is converted into embeddings and stored in a vector store along with the associated frame data and metadata. A RAG‑based chatbot service is included, allowing users to submit queries and receive LLM‑generated responses using context retrieved from the vector store.
If you want to enable this optional feature, edit the override file at charts/values-override.yaml and configure the following additional parameters:
Key |
Description |
Example |
|---|---|---|
global.enableRAG |
Set to |
|
global.llmModel.modelId |
Configure choice of LLM in RAG |
|
global.llmModel.weightFormat |
Model Quantization |
|
global.embeddingModel.modelId |
Configure choice of embedding model for embedding creation |
|
Note: To deploy the llmModel or embeddingModel on a GPU, set
global.llmModel.useGPU.enabledorglobal.embeddingModel.useGPU.enabledtotrue.
For
global.llmModel.useGPU.key, set the value to the GPU resource key label that set in the configurednodeName, as the LLM models share the same PVC on that node.
For
global.embeddingModel.useGPU.key, you may specify any available GPU resource key label if multiple GPU‑enabled nodes are present. The embedding model does not share the PVC and is managed independently by the embedding service.
A GPU resource key refers to the label assigned to a GPU‑enabled node by the Kubernetes device plugin. This label is used by Kubernetes to identify and schedule workloads onto nodes with specific GPU resources. You can identify the available GPU resource keys by running
kubectl describe node <node-name>. Example values includegpu.intel.com/i915orgpu.intel.com/xe.
Build Chart Dependencies#
Run the following command from the chart directory:
helm dependency update
This refreshes the chart dependencies from subcharts/ and updates Chart.lock.
Install the Chart#
From charts/, install the application with the override file:
helm install lvc . \
-f values-override.yaml \
-n "$my_namespace" \
--timeout 60m
You can also install from the repository root:
helm install lvc ./charts \
-f ./charts/values-override.yaml \
-n "$my_namespace" \
Verify the Deployment#
Before accessing the application, confirm the following:
Status of
models-pvccreated by model-download chart is bound. You can check viakubectl get pvccommand.All pods are in the
Runningstate.All containers report
Ready. Check viakubectl get podscommand.
The initial deployment may take several minutes, as the chart performs multiple model downloads and conversion steps before the application pods are started.
Access the Application#
By default the chart exposes these NodePort services:
Dashboard UI:
http://<global.hostIP>:4173
If you changed the service ports in your override values, use those instead.
To start captioning after deployment:
Open the dashboard URL in your browser.
Enter an RTSP stream URL, unless you preconfigured
defaultRtspUrl.Select the model you downloaded into the models PVC.
Adjust the prompt and generation parameters if needed.
Start the stream.
To submit query via RAG chatbot. Click on the
chat iconbutton located top right of the dashboard. The button only visible when RAG is enabled.
Upgrade the Release#
If you modify the chart or subcharts, refresh dependencies first:
helm dependency update
Then upgrade the release:
helm upgrade lvc . \
-f values-override.yaml \
-n "$my_namespace"
Uninstall the Release#
helm uninstall lvc -n "$my_namespace"
Troubleshooting#
If pods remain
Pending, check thatglobal.nodeNamematches the correct node name, that the selected node has the required hardware access.If the dashboard opens but video does not start, confirm that
global.hostIPis reachable from the browser. If your worker nodes do not have external IPs, this usually means using the nodeINTERNAL-IPover a reachable LAN or VPN. Also confirm that the RTSP source is reachable from the Kubernetes node.If WebRTC negotiation fails, verify that
global.hostIPpoints to the same node that runsmediamtxandcoturn, and that the required ports are allowed by your network policy or firewall.If detection is enabled but the pipeline cannot start, ensure the detection models PVC contains the required OpenVINO detection model artifacts.
If the collector does not report metrics, confirm that the host path in
collector.collectorSignalsHostPathexists on the selected node and that the pod is scheduled there.If the
live-video-captioningandvideo-caption-servicepods stuck inInitorPendingstate, check whether the models successfully download or not.# Get the pods kubectl get pods -n <your_namespace> # View the logs of initContainers where it process for model download and conversion kubectl logs -f <video-caption-service pod or live-video-captioning pod> -n <your_namespace> -c download-models
If the PVC created during a Helm chart deployment is not removed or auto-deleted due to a deployment failure or being stuck, delete it manually:
# List the PVCs present in the given namespace kubectl get pvc -n <namespace> # Delete the required PVC from the namespace kubectl delete pvc <pvc-name> -n <namespace>
Note: Delete the shared PVC only after confirming no other workload or application depends on it. In such cases, uninstall the dependent application first, then clean up model-download resources, and finally delete the shared PVC if required.