# How to Deploy with Helm Chart This guide shows how to deploy the Live Video Captioning application on Kubernetes with the Helm chart included in this repository. ## Prerequisites Before you begin, ensure that you have the following: - A Kubernetes cluster with `kubectl` configured for access. - Helm installed on your system. See the [Installation Guide](https://helm.sh/docs/intro/install/). - The cluster must support **dynamic provisioning of Persistent Volumes (PV)**. See [Kubernetes Documentation on Dynamic Volume Provisioning](https://kubernetes.io/docs/concepts/storage/dynamic-provisioning/) for details. - A worker node reachable by your browser client. Prefer a GPU-capable worker node when available, because the chart pins the media and inference workloads to the selected node and DL Streamer benefits most from GPU access. - A writable host path for collector signal files on the target node. By default the chart uses `/tmp/lvc/collector-signals`. - An RTSP source reachable from the Kubernetes node that runs `dlstreamer-pipeline-server`. - Setup [model-download chart](https://github.com/open-edge-platform/edge-ai-libraries/blob/main/microservices/model-download/docs/user-guide/get-started/deploy-with-helm-chart.md) which responsible for all the models used in this Live Video Captioning chart. If you use gated Hugging Face models, a Hugging Face token is required. ## Prepare/Deploy model-download chart [Model-download service](https://github.com/open-edge-platform/edge-ai-libraries/tree/main/microservices/model-download) from [Open Edge Platform - Edge AI Libraries](https://github.com/open-edge-platform/edge-ai-libraries) will be used for models management in Live Video Captioning. 1. Install the model-download chart
Refer to this [guide section](https://github.com/open-edge-platform/edge-ai-libraries/blob/main/microservices/model-download/docs/user-guide/get-started/deploy-with-helm-chart.md#install-helm-chart-from-docker-hub-or-from-source) to download and install the chart. 2. Configure the values.yaml file
Edit the [`values.yaml`](https://github.com/open-edge-platform/edge-ai-libraries/blob/main/microservices/model-download/chart/values.yaml) located in the chart. Configure the following: | Parameter | Description | Required Values | | -------------------------- | ----------------------------------------------------------------------------------------------------------------------- | -------------------------------------- | | proxy | Set the proxy value based on your system environment | | | HUGGINGFACEHUB_API_TOKEN | HuggingFace token to download gated model | | | ENABLE_PLUGINS | Comma-separated list of plugins to enable | "openvino,ultralytics" | | gpu.enabled | For model-download service pod to be deployed on GPU | true | | gpu.key | Label assigned to the GPU node on kubernetes cluster by the device plugin. Identify by running `kubectl describe node` | gpu.intel.com/i915 or gpu.intel.com/xe | | affinity.enabled | Set to true to deploy on dedicated node | true | | affinity.value | Your dedicated node name/value. Identify by running `kubectl get node` | | > **Note**: This chart can run on CPU‑only nodes; however, a GPU‑enabled node is strongly recommended to host the models and deliver optimal performance. 3. Deploy the chart
Deploy the chart using command below: ```bash helm install model-download . -n ``` > **Note**: `model-download` creates and manages a shared PVC that used by live-video-captioning. Hence, do not delete or uninstall helm chart when live-video-captioning chart is running. 4. Verify the deployment
Check the status of the deployed resources to ensure they are running correctly. ```bash kubectl get pods -n kubectl get services -n ``` ## Prepare/Deploy live-video-captioning chart To set up the live-video-captioning application, you must obtain the charts and install them with optimal values and configurations. The following sections provide step-by-step instructions for this process. ### Acquire the helm chart There are 2 options to obtain the charts in your workspace: #### Option 1: Get the charts from Docker Hub ##### Step 1: Pull the Chart Use the following command to pull the [prebuild chart](https://hub.docker.com/r/intel/live-video-captioning/tags) from Docker Hub: ```bash helm pull oci://registry-1.docker.io/intel/live-video-captioning --version ``` Refer to the release notes for details on the latest version number to use for the sample application. ##### Step 2: Extract the `.tgz` File After pulling the chart, extract the `.tgz` file: ```bash tar -xvf live-video-captioning-.tgz ``` This will create a directory named `live-video-captioning` containing the chart files. Navigate to the extracted directory to access the charts. ```bash cd live-video-captioning ``` #### Option 2: Install from Source ##### Step 1: Clone the repository Clone the repository containing the charts files: ```bash # Clone the latest on mainline git clone https://github.com/open-edge-platform/edge-ai-suites.git edge-ai-suites # Alternatively, clone a specific release branch git clone https://github.com/open-edge-platform/edge-ai-suites.git edge-ai-suites -b ``` ##### Step 2: Navigate to the chart directory Navigate to the chart directory: ```bash cd edge-ai-suites/metro-ai-suite/live-video-analysis/live-video-captioning/charts ``` ### Select the target node The chart pins the workloads that need to stay together to the target node selected in the chart values: - `model-download` - `dlstreamer-pipeline-server` - `video-caption-service` - `mediamtx` - `coturn` - `collector` - `live-video-captioning-rag (if RAG is enabled)` These workloads are kept on the same worker because they rely on node-local access patterns: - `dlstreamer-pipeline-server`, `video-caption-service`, `live-video-captioning-rag` and `model-download` share the model PVCs that created by `model-download`. - `dlstreamer-pipeline-server` and `collector` need direct access to node hardware and host resources. - `mediamtx` and `coturn` expose browser-facing WebRTC and TURN endpoints that must match the selected node's reachable IP. Other supporting services such as `mqtt-broker`, `live-metrics-service`, `multimodal-embedding` (when RAG is enabled), and `vdms-vectordb` (when RAG is enabled) do not require pinning to the same worker node. For best performance, choose a worker node with a GPU. The chart can run with CPU-only inference, but a GPU-capable node is the preferred deployment target for DL Streamer and real-time media processing. In the [values-override.yaml](../../charts/values-override.yaml), specify the Kubernetes node name by setting `global.nodeName`. This references the built-in `kubernetes.io/hostname` label, so no node labeling permissions are required. Example: ```yaml global: nodeName: worker4 ``` #### Get the IP of the selected node Use the same node that you selected for the pinned media workloads. First list the nodes and labels: ```bash kubectl get nodes --show-labels ``` Then inspect the selected node: ```bash kubectl get node -o wide ``` Set `global.hostIP` to the node address that is reachable by the browser: - In clusters without worker-node external IPs, use `INTERNAL-IP`. - Use `EXTERNAL-IP` only if the node actually has one and your browser reaches the application through it. - Use `INTERNAL-IP` when your browser is on the same LAN or VPN and can reach the node directly. To print the value directly: ```bash kubectl get node -o jsonpath='{.status.addresses[?(@.type=="ExternalIP")].address}' ``` If no external address is present, use: ```bash kubectl get node -o jsonpath='{.status.addresses[?(@.type=="InternalIP")].address}' ``` Set that value in `global.hostIP`. If the worker node does not have any browser-reachable IP, direct NodePort access will not work. This capability will be added to the chart in a future update. ## Known Limitations ### Single-node deployment with host port binding This chart is designed to run on a **single worker node**. Several workloads bind directly to host ports on that node so that the browser and RTSP clients can reach them without a LoadBalancer or Ingress. Because of these host port bindings: - **`replicaCount` must remain `1`** for all workloads that use host ports. Increasing it will fail at scheduling time because two pods cannot bind the same host port on the same node. - **Multi-node or high-availability deployments are not supported.** The chart intentionally pins all workloads to a single node via `global.nodeName`. - **Port conflicts with other applications on the same node** are possible. Ensure the ports listed above are not already in use on the target worker node before deploying. ### Configure Required Values Prior to deployment, edit the sample override file at `charts/values-override.yaml`, focusing on the key configuration parameters below: | Key | Description | Example | | --- | --- | --- | | `global.hostIP` | Browser-reachable IP of the selected node that runs the pinned media workloads. In many on-prem clusters this is the node `INTERNAL-IP`. Retrieve it with `kubectl get node -o wide` | `192.168.1.20` | | `global.nodeName` | Kubernetes node name used to pin the media, TURN, and host-coupled workloads to one worker node. Prefer a GPU-capable node when available | `worker4` | | `global.models` | List of VLM models from HuggingFace to export to OpenVINO format (at least one VLM required) | `OpenGVLab/InternVL2-1B` | | `global.huggingface.apiToken` | HuggingfaceHub token to download gated model. | | | `video-caption-service.env.enableDetectionPipeline` | Enables detection filtering in the pipeline. When set to `"true"` and configure `global.detectionModels` so the chart downloads the required detection models automatically | `"true"` or `"false"` | | `global.detectionModels` | List of detection model names to download (only required when `video-caption-service.enableDetectionPipeline` is enabled) | `["yolov8s"]` | | `video-caption-service.env.defaultRtspUrl` | Default RTSP URL shown in the dashboard | `rtsp://camera.example/live` | | `video-caption-service.env.alertMode` | Switches captioning to binary alert-style responses | `"true"` or `"false"` | #### Proxy Configuration If your cluster runs behind a proxy, set the proxy fields under `global`: ```yaml global: httpProxy: "http://:" httpsProxy: "http://:" noProxy: "" ``` > **Important**: the host portion of every RTSP URL must be included in `noProxy` when the deployment runs behind a proxy. >For example: >- If your stream URL is `rtsp://camera.example.com:8554/live`, add `camera.example.com` to `noProxy`. >- If your stream URL is `rtsp://192.168.1.50:554/stream1`, add `192.168.1.50` to `noProxy`. >If the RTSP host is not listed in `noProxy`, the application may try to reach the stream through the proxy and fail to connect. #### Optional: Enable RAG with Live-Video-Captioning Live‑Video‑Captioning includes an optional RAG (Retrieval‑Augmented Generation) capability. You can leave this disabled for a standard captioning deployment, or enable it to add retrieval-backed chatbot features. When enabled, generated caption text is converted into embeddings and stored in a vector store along with the associated frame data and metadata. A RAG‑based chatbot service is included, allowing users to submit queries and receive LLM‑generated responses using context retrieved from the vector store. If you want to enable this optional feature, edit the override file at `charts/values-override.yaml` and configure the following additional parameters: | Key | Description | Example | | --- | --- | --- | | global.enableRAG | Set to `true` to enable RAG subchart to deploy RAG service | `true` or `false` | | global.llmModel.modelId | Configure choice of LLM in RAG | `"microsoft/Phi-3.5-mini-instruct"` | | global.llmModel.weightFormat | Model Quantization | `"int4"` or `"int8"` or `"fp16"` | | global.embeddingModel.modelId | Configure choice of embedding model for embedding creation | `"QwenText/qwen3-embedding-0.6b"` | > **Note**: To deploy the llmModel or embeddingModel on a GPU, set `global.llmModel.useGPU.enabled` or `global.embeddingModel.useGPU.enabled` to `true`. > For `global.llmModel.useGPU.key`, set the value to the GPU resource key label that set in the configured `nodeName`, as the LLM models share the same PVC on that node. > For `global.embeddingModel.useGPU.key`, you may specify any available GPU resource key label if multiple GPU‑enabled nodes are present. The embedding model does not share the PVC and is managed independently by the embedding service. > A GPU resource key refers to the label assigned to a GPU‑enabled node by the Kubernetes device plugin. This label is used by Kubernetes to identify and schedule workloads onto nodes with specific GPU resources. You can identify the available GPU resource keys by running `kubectl describe node `. Example values include `gpu.intel.com/i915` or `gpu.intel.com/xe`. ### Build Chart Dependencies Run the following command from the chart directory: ```bash helm dependency update ``` This refreshes the chart dependencies from `subcharts/` and updates `Chart.lock`. ### Install the Chart From `charts/`, install the application with the override file: ```bash helm install lvc . \ -f values-override.yaml \ -n "$my_namespace" \ --timeout 60m ``` You can also install from the repository root: ```bash helm install lvc ./charts \ -f ./charts/values-override.yaml \ -n "$my_namespace" \ ``` ## Verify the Deployment Before accessing the application, confirm the following: - Status of `models-pvc` created by model-download chart is bound. You can check via `kubectl get pvc` command. - All pods are in the `Running` state. - All containers report `Ready`. Check via `kubectl get pods` command. > The initial deployment may take several minutes, as the chart performs multiple model downloads and conversion steps before the application pods are started. ## Access the Application By default the chart exposes these NodePort services: - Dashboard UI: `http://:4173` If you changed the service ports in your override values, use those instead. To start captioning after deployment: 1. Open the dashboard URL in your browser. 2. Enter an RTSP stream URL, unless you preconfigured `defaultRtspUrl`. 3. Select the model you downloaded into the models PVC. 4. Adjust the prompt and generation parameters if needed. 5. Start the stream. 6. To submit query via RAG chatbot. Click on the `chat icon` button located top right of the dashboard. The button only visible when RAG is enabled. ## Upgrade the Release If you modify the chart or subcharts, refresh dependencies first: ```bash helm dependency update ``` Then upgrade the release: ```bash helm upgrade lvc . \ -f values-override.yaml \ -n "$my_namespace" ``` ## Uninstall the Release ```bash helm uninstall lvc -n "$my_namespace" ``` ## Troubleshooting - If pods remain `Pending`, check that `global.nodeName` matches the correct node name, that the selected node has the required hardware access. - If the dashboard opens but video does not start, confirm that `global.hostIP` is reachable from the browser. If your worker nodes do not have external IPs, this usually means using the node `INTERNAL-IP` over a reachable LAN or VPN. Also confirm that the RTSP source is reachable from the Kubernetes node. - If WebRTC negotiation fails, verify that `global.hostIP` points to the same node that runs `mediamtx` and `coturn`, and that the required ports are allowed by your network policy or firewall. - If detection is enabled but the pipeline cannot start, ensure the detection models PVC contains the required OpenVINO detection model artifacts. - If the collector does not report metrics, confirm that the host path in `collector.collectorSignalsHostPath` exists on the selected node and that the pod is scheduled there. - If the `live-video-captioning` and `video-caption-service` pods stuck in `Init` or `Pending` state, check whether the models successfully download or not. ```bash # Get the pods kubectl get pods -n # View the logs of initContainers where it process for model download and conversion kubectl logs -f -n -c download-models ``` - If the PVC created during a Helm chart deployment is not removed or auto-deleted due to a deployment failure or being stuck, delete it manually: ```bash # List the PVCs present in the given namespace kubectl get pvc -n # Delete the required PVC from the namespace kubectl delete pvc -n ``` > **Note**: Delete the shared PVC only after confirming no other workload or application depends on it. In such cases, uninstall the dependent application first, then clean up model-download resources, and finally delete the shared PVC if required. ## Related Links - [Get Started](../get-started.md) - [System Requirements](../get-started/system-requirements.md) - [How it Works](../how-it-works.md) - [Object Detection Pipeline](../how-to-guides/configure-object-detection-pipeline.md) - [Build from Source](../get-started/build-from-source.md) - [Embedding Creation with RAG](../how-to-guides/configure-embedding-creation-with-rag.md) - [Model Download Service](https://github.com/open-edge-platform/edge-ai-libraries/blob/main/microservices/model-download/docs/user-guide/get-started/deploy-with-helm-chart.md)