How to Deploy with Helm#

This guide provides step-by-step instructions for deploying the Smart Traffic Intersection Agent application using Helm.

Prerequisites#

Before you begin, ensure that you have the following prerequisites:

Kubernetes cluster set up and running.
The cluster must support dynamic provisioning of Persistent Volumes (PV). Refer to the Kubernetes Dynamic Provisioning Guide for more details.
Install kubectl on your system. Refer to the Installation Guide. Ensure access to the Kubernetes cluster.
Helm installed on your system: Installation Guide.
A running Smart Intersection deployment (provides MQTT broker, camera pipelines, and scene analytics). See Step 4 below.
The SceneScape CA certificate file (scenescape-ca.pem) for TLS connections to the MQTT broker (created during the Smart Intersection installation).
(Optional) A Hugging Face API token if the VLM model requires authentication.
Storage Requirement: The VLM model cache PVC requests 20 GiB by default. Ensure the cluster has sufficient storage available.
(Optional — GPU inference) To run VLM inference on an Intel GPU:
- An Intel integrated, Arc, or Data Center GPU must be available on at least one worker node.
- The Intel GPU device plugin for Kubernetes must be installed so that GPU resources (e.g., gpu.intel.com/i915 or gpu.intel.com/xe) are advertised to the scheduler. Verify by running:
```
kubectl describe node <gpu-node> | grep gpu.intel.com
```
- The /dev/dri/renderD* device must be accessible inside containers. The Helm chart automatically adds the correct supplementalGroups entry for the render group.

Steps to Deploy with Helm#

The following steps walk through deploying the Smart Traffic Intersection Agent application using Helm. You can install from source code or pull the chart from a registry.

Steps 1 to 3 vary depending on whether you prefer to build or pull the Helm chart.

Option 1: Install from a Registry#

Step 1: Pull the Chart#

Use the following command to pull the Helm chart:

helm pull oci://registry-1.docker.io/intel/smart-traffic-intersection-agent --version 1.0.0-helm

Step 2: Extract the `.tgz` File#

After pulling the chart, extract the .tgz file:

tar -xvf smart-traffic-intersection-agent-1.0.0-helm.tgz

Navigate to the extracted directory:

cd smart-traffic-intersection-agent

Step 3: Configure the `values.yaml` File#

Edit the values.yaml file to set the necessary environment variables. Refer to the values reference table below.

Option 2: Install from Source#

Step 1: Clone the Repository#

Clone the repository containing the Helm chart:

# Clone the release branch
git clone https://github.com/open-edge-platform/edge-ai-suites.git -b release-2026.0.0

Step 2: Change to the Chart Directory#

Navigate to the chart directory:

cd edge-ai-suites/metro-ai-suite/smart-traffic-intersection-agent/chart

Step 3: Configure the `values.yaml` File#

Edit the values.yaml file located in the chart directory to set the necessary environment variables. Refer to the values reference table below.

Common Steps After Configuration#

Step 4: Deploy Smart Intersection#

The Smart Traffic Intersection Agent depends on a running Smart Intersection deployment, which includes SceneScape. It provides the MQTT broker, camera pipelines, and scene analytics that the Traffic Agent consumes.

Follow the Smart Intersection Helm Deployment Guide to deploy it. Once all Smart Intersection pods are running and the MQTT broker is reachable, proceed to the next step.

Step 5: Configure GPU Support (Optional)#

By default, the chart deploys VLM inference on an Intel GPU. To change graph or verify GPU configuration, edit the following values in values.yaml:

Value	Description	Default
`vlmServing.gpu.enabled`	Enable Intel GPU for VLM inference. When `true`, `VLM_DEVICE` is automatically set to `GPU` and workers are forced to `1`.	`true`
`vlmServing.gpu.resourceName`	Kubernetes GPU resource name exposed by the Intel device plugin. Use `gpu.intel.com/i915` for integrated/Arc GPUs, `gpu.intel.com/xe` for Data Center GPU Flex/Max.	`gpu.intel.com/i915`
`vlmServing.gpu.resourceLimit`	Number of GPU devices to request	`1`
`vlmServing.gpu.renderGroupIds`	List of render group GIDs for `/dev/dri` access. Defaults cover all common distros.	`[44, 109, 992]`
`vlmServing.nodeSelector`	Pin VLM pod to nodes with GPUs (e.g., `intel.feature.node.kubernetes.io/gpu: "true"`)	`{}`

Identify your cluster’s GPU resource key by running:

kubectl describe node <gpu-node> | grep gpu.intel.com

To deploy on CPU instead, set:

helm install stia . -n <your-namespace> --create-namespace \
  --set vlmServing.gpu.enabled=false

Note: The OV_CONFIG environment variable is automatically set based on the device. When GPU is enabled, CPU-only options like INFERENCE_NUM_THREADS are excluded to avoid runtime errors.

Step 6: Deploy the Helm Chart#

Deploy the Smart Traffic Intersection Agent Helm chart:

helm install stia . -n <your-namespace> --create-namespace

Note: Please make sure to use the same namespace as the Smart Intersection application. Default namespace for Smart Intersection is smart-intersection.

Note: The VLM OpenVINO Serving pod will download and convert the model on first startup. This may take several minutes depending on network speed and model size. To avoid re-downloading the model on every install cycle, set vlmServing.persistence.keepOnUninstall to true (the default). This tells Helm to retain the model cache PVC on uninstall.

Step 7: Verify the Deployment#

Check the status of the deployed resources to ensure everything is running correctly:

kubectl get pods -n <your-namespace>
kubectl get services -n <your-namespace>

You should see two pods:

Pod	Description
`stia-traffic-agent-*`	The traffic intersection agent (backend + Gradio UI)
`stia-vlm-openvino-serving-*`	The VLM inference server

Wait until both pods show Running and READY 1/1:

kubectl wait --for=condition=ready pod -l app.kubernetes.io/instance=stia -n <your-namespace> --timeout=600s

Step 8: Access the Application#

Using NodePort (default)#

The chart deploys services as NodePort by default. Retrieve the allocated ports and a node IP:

# Get the NodePort values
kubectl get svc stia-traffic-agent -n <your-namespace>

# Get the node IP
kubectl get nodes -o wide
# Use the INTERNAL-IP of any node

Then open your browser at:

http://<node-ip>:<backend-node-port>   # Backend API (default NodePort: 30881)
http://<node-ip>:<ui-node-port>         # Gradio UI   (default NodePort: 30860)

Note: If you are behind a corporate proxy, make sure the node IPs are included in your no_proxy / browser proxy exceptions.

Using Port-Forward (ClusterIP)#

If you changed the service type to ClusterIP in values.yaml:

# Traffic Agent Backend API
kubectl port-forward svc/stia-traffic-agent 8081:8081 -n <your-namespace> &

# Traffic Agent Gradio UI
kubectl port-forward svc/stia-traffic-agent 7860:7860 -n <your-namespace> &

Then open your browser at:

Backend API: http://127.0.0.1:8081/docs
Gradio UI: http://127.0.0.1:7860

Step 9: Uninstall the Helm Chart#

To uninstall the deployed Helm chart:

helm uninstall stia -n <your-namespace>

Note: When vlmServing.persistence.keepOnUninstall is true (the default), the VLM model cache PVC is retained after uninstall to avoid re-downloading the model. This is recommended during development and testing. To fully clean up all PVCs:
kubectl get pvc -n <your-namespace>
kubectl delete pvc <pvc-name> -n <your-namespace>
To have Helm delete the PVC automatically on uninstall, set vlmServing.persistence.keepOnUninstall=false before deploying.

`values.yaml` Reference#

Global Settings#

Key	Description	Default
`global.proxy.httpProxy`	HTTP proxy URL	`""`
`global.proxy.httpsProxy`	HTTPS proxy URL	`""`
`global.proxy.noProxy`	Comma-separated no-proxy list	`""`

Traffic Agent Settings#

Key	Description	Default
`trafficAgent.image.repository`	Traffic agent container image repository	`intel/smart-traffic-intersection-agent`
`trafficAgent.image.tag`	Image tag	`1.0.0`
`trafficAgent.service.type`	Kubernetes service type (`NodePort` or `ClusterIP`)	`NodePort`
`trafficAgent.service.backendPort`	Backend API port	`8081`
`trafficAgent.service.backendNodePort`	NodePort for backend API (only used when type is `NodePort`)	`30881`
`trafficAgent.service.uiPort`	Gradio UI port	`7860`
`trafficAgent.service.uiNodePort`	NodePort for Gradio UI (only used when type is `NodePort`)	`30860`
`trafficAgent.intersection.name`	Unique intersection identifier	`intersection_1`
`trafficAgent.intersection.latitude`	Intersection latitude	`37.51358`
`trafficAgent.intersection.longitude`	Intersection longitude	`-122.25591`
`trafficAgent.env.logLevel`	Application log level	`INFO`
`trafficAgent.env.refreshInterval`	Dashboard refresh interval (seconds)	`15`
`trafficAgent.env.weatherMock`	Use mock weather data (`true`/`false`)	`false`
`trafficAgent.env.vlmTimeoutSeconds`	Timeout for VLM inference requests (seconds)	`600`
`trafficAgent.mqtt.host`	MQTT broker hostname (SceneScape K8s service name)	`smart-intersection-broker`
`trafficAgent.mqtt.port`	MQTT broker port	`1883`
`trafficAgent.traffic.highDensityThreshold`	Object count for high-density classification	`10`
`trafficAgent.traffic.moderateDensityThreshold`	Object count for moderate-density classification	`""`
`trafficAgent.traffic.bufferDuration`	Traffic analysis buffer window	`""`
`trafficAgent.persistence.enabled`	Enable persistent storage for agent data	`true`
`trafficAgent.persistence.size`	PVC size for agent data	`1Gi`
`trafficAgent.persistence.storageClass`	Storage class (empty = cluster default)	`""`

VLM OpenVINO Serving Settings#

Key	Description	Default
`vlmServing.image.repository`	VLM serving container image repository	`intel/vlm-openvino-serving`
`vlmServing.image.tag`	Image tag	`1.3.2`
`vlmServing.service.type`	Kubernetes service type (`NodePort` or `ClusterIP`)	`NodePort`
`vlmServing.service.port`	VLM HTTP API port	`8000`
`vlmServing.service.nodePort`	NodePort for VLM API (only used when type is `NodePort`)	`30800`
`vlmServing.env.modelName`	Hugging Face model identifier	`microsoft/Phi-3.5-vision-instruct`
`vlmServing.env.compressionWeightFormat`	Model weight format (`int4`, `int8`, `fp16`)	`int4`
`vlmServing.env.device`	OpenVINO inference device when GPU is disabled (`CPU` or `GPU`). Ignored when `vlmServing.gpu.enabled=true` (auto-set to `GPU`).	`CPU`
`vlmServing.env.maxCompletionTokens`	Max tokens per completion	`1500`
`vlmServing.env.workers`	Number of serving workers. Forced to `1` when GPU is enabled.	`1`
`vlmServing.env.logLevel`	VLM serving log level	`info`
`vlmServing.env.openvinoLogLevel`	OpenVINO runtime log level	`1`
`vlmServing.env.accessLogFile`	Access log file path (`/dev/null` to suppress)	`/dev/null`
`vlmServing.env.seed`	Random seed for reproducible inference	`42`
`vlmServing.env.ovConfigCpu`	OpenVINO config JSON for CPU mode (supports `INFERENCE_NUM_THREADS`)	`{"PERFORMANCE_HINT": "LATENCY", "INFERENCE_NUM_THREADS": 32}`
`vlmServing.env.ovConfigGpu`	OpenVINO config JSON for GPU mode (includes GPU model cache)	`{"PERFORMANCE_HINT": "LATENCY", "CACHE_DIR": "/app/ov-model/gpu-cache"}`
`vlmServing.huggingfaceToken`	Hugging Face API token (stored as a Secret)	`""`
`vlmServing.gpu.enabled`	Enable Intel GPU for VLM inference. Auto-sets `VLM_DEVICE=GPU` and `WORKERS=1`.	`true`
`vlmServing.gpu.resourceName`	Kubernetes GPU resource name exposed by the Intel device plugin (`gpu.intel.com/i915` or `gpu.intel.com/xe`)	`gpu.intel.com/i915`
`vlmServing.gpu.resourceLimit`	Number of GPU devices to request	`1`
`vlmServing.gpu.renderGroupIds`	List of GIDs for the `render` group added to `supplementalGroups` for `/dev/dri` access. All common distro values are included by default (44, 109, 992).	`[44, 109, 992]`
`vlmServing.nodeSelector`	Pin VLM pod to GPU nodes (e.g., `intel.feature.node.kubernetes.io/gpu: "true"`)	`{}`
`vlmServing.persistence.enabled`	Enable persistent storage for model cache	`true`
`vlmServing.persistence.size`	PVC size for model cache	`20Gi`
`vlmServing.persistence.storageClass`	Storage class (empty = cluster default)	`""`
`vlmServing.persistence.keepOnUninstall`	Retain PVC on `helm uninstall` to avoid re-downloading the model	`true`

TLS / Secrets Settings#

Key	Description	Default
`tls.caCert`	PEM-encoded CA certificate for the MQTT broker (base64-encoded in the Secret)	`""`
`tls.caCertSecretName`	Name of an existing Secret containing the CA cert (overrides `tls.caCert`)	`smart-intersection-broker-rootcert`
`tls.caCertKey`	Key name inside the external secret (required when `caCertSecretName` is set)	`root-cert`

Example: Minimal Deployment#

# values-override.yaml
global:
  proxy:
    httpProxy: "http://proxy.example.com:8080"
    httpsProxy: "http://proxy.example.com:8080"
    noProxy: "localhost,127.0.0.1,10.0.0.0/8,.example.com"

trafficAgent:
  intersection:
    name: "intersection_main_st"
    latitude: "37.7749"
    longitude: "-122.4194"
  mqtt:
    host: "smart-intersection-broker"

tls:
  caCert: |
    -----BEGIN CERTIFICATE-----
    MIIDxTCCA...
    -----END CERTIFICATE-----

helm install stia . -n traffic -f values-override.yaml --create-namespace

Example: GPU Deployment#

To deploy VLM inference on an Intel GPU (the default), ensure vlmServing.gpu.enabled is true and the GPU resource name matches your cluster:

# values-gpu-override.yaml
vlmServing:
  gpu:
    enabled: true
    # Use "gpu.intel.com/i915" for integrated / Arc A-series
    # Use "gpu.intel.com/xe" for Data Center GPU Flex / Max
    resourceName: "gpu.intel.com/i915"
    resourceLimit: 1
    # All common render group GIDs included by default — works across distros
    renderGroupIds:
      - 44
      - 109
      - 992
  # Optional: pin to GPU nodes
  nodeSelector:
    intel.feature.node.kubernetes.io/gpu: "true"
  persistence:
    keepOnUninstall: true

helm install stia . -n traffic -f values-override.yaml -f values-gpu-override.yaml --create-namespace

Example: CPU-Only Deployment#

To run VLM inference on CPU:

helm install stia . -n traffic -f values-override.yaml \
  --set vlmServing.gpu.enabled=false \
  --create-namespace

Verification#

Ensure that all pods are running and the services are accessible.
Access the Gradio UI and verify that it is showing the traffic intersection dashboard.
Check the backend API at /docs for the interactive Swagger documentation.
Verify that the traffic agent is receiving MQTT messages from SceneScape by checking the logs:
```
kubectl logs -l app=stia-traffic-agent -n <your-namespace> -f
```

Troubleshooting#

If you encounter any issues during the deployment process, check the Kubernetes logs for errors:
```
kubectl logs <pod-name> -n <your-namespace>
```
VLM pod stuck in CrashLoopBackOff: The model download may have failed. Check logs and verify proxy settings (global.proxy.httpProxy / global.proxy.httpsProxy) and huggingfaceToken if the model requires authentication.
VLM model download stuck or not progressing: Verify that proxy environment variables are correctly set inside the pod. A common cause is a mismatch between values.yaml key names and the template references (e.g., http_proxy vs httpProxy). Check with:
```
kubectl exec <vlm-pod-name> -n <your-namespace> -- env | grep -i proxy
```
Option not found: INFERENCE_NUM_THREADS error on GPU: This occurs when the OV_CONFIG contains CPU-only options while running on GPU. Ensure vlmServing.env.ovConfigGpu does not include INFERENCE_NUM_THREADS. The chart automatically selects the correct config (ovConfigCpu or ovConfigGpu) based on vlmServing.gpu.enabled.
GPU not detected / VLM pod Pending: Verify the Intel GPU device plugin is installed and the GPU resource is available:
```
kubectl describe node <gpu-node> | grep gpu.intel.com
```
If no GPU resource is listed, install the Intel GPU device plugin for Kubernetes. Also verify that vlmServing.gpu.resourceName matches the resource key reported by the device plugin (gpu.intel.com/i915 for integrated/Arc, gpu.intel.com/xe for Data Center GPUs).
GPU permission denied (/dev/dri access): The chart includes all common render group GIDs (44, 109, 992) by default. If your distro uses a different GID, find it with getent group render on the node and override:
```
helm install stia . --set-json 'vlmServing.gpu.renderGroupIds=[<your-gid>]'
```
Traffic agent cannot connect to MQTT broker: Verify that the SceneScape deployment is reachable from the cluster, the trafficAgent.mqtt.host value is correct, and the CA certificate is provided via tls.caCert or tls.caCertSecretName.

PVC not cleaned up after uninstall: When vlmServing.persistence.keepOnUninstall is true (the default), the model cache PVC is intentionally retained. To reclaim storage, delete it manually:

# List the PVCs present in the given namespace
kubectl get pvc -n <your-namespace>

# Delete the required PVC from the namespace
kubectl delete pvc <pvc-name> -n <your-namespace>

How to Deploy with Helm#

Prerequisites#

Steps to Deploy with Helm#

Option 1: Install from a Registry#

Step 1: Pull the Chart#

Step 2: Extract the .tgz File#

Step 3: Configure the values.yaml File#

Option 2: Install from Source#

Step 1: Clone the Repository#

Step 2: Change to the Chart Directory#

Step 3: Configure the values.yaml File#

Common Steps After Configuration#

Step 4: Deploy Smart Intersection#

Step 5: Configure GPU Support (Optional)#

Step 6: Deploy the Helm Chart#

Step 7: Verify the Deployment#

Step 8: Access the Application#

Using NodePort (default)#

Using Port-Forward (ClusterIP)#

Step 9: Uninstall the Helm Chart#

values.yaml Reference#

Global Settings#

Traffic Agent Settings#

VLM OpenVINO Serving Settings#

TLS / Secrets Settings#

Example: Minimal Deployment#

Example: GPU Deployment#

Example: CPU-Only Deployment#

Verification#

Troubleshooting#

Related Links#

This Page

Step 2: Extract the `.tgz` File#

Step 3: Configure the `values.yaml` File#

Step 3: Configure the `values.yaml` File#

`values.yaml` Reference#