How to deploy with Helm#
This guide provides step-by-step instructions for deploying the ChatQ&A Sample Application using Helm.
Prerequisites#
Before you begin, ensure that you have the following prerequisites:
- Kubernetes cluster set up and running. 
- The cluster must support dynamic provisioning of Persistent Volumes (PV). Refer to the Kubernetes Dynamic Provisioning Guide for more details. 
- Install - kubectlon your system. Refer to Installation Guide. Ensure access to the Kubernetes cluster.
- Helm installed on your system: Installation Guide. 
Steps to deploy with Helm#
Following steps should be followed to deploy ChatQ&A using Helm. You can install from source code or pull the chart from Docker hub.
Steps 1 to 3 varies depending on if the user prefers to build or pull the Helm details.
Option 1: Install from Docker Hub#
Step 1: Pull the Specific Chart#
Use the following command to pull the Helm chart from Docker Hub:
helm pull oci://registry-1.docker.io/intel/chat-question-and-answer --version <version-no>
🔍 Refer to the Docker Hub tags page for details on the latest version number to use for the sample application.
Step 2: Extract the .tgz File#
After pulling the chart, extract the .tgz file:
tar -xvf chat-question-and-answer-<version-no>.tgz
This will create a directory named chat-question-and-answer containing the chart files. Navigate to the extracted directory.
cd chat-question-and-answer
Step 3: Configure the values.yaml File#
Choose the appropriate values*.yaml file based on the model server you want to use:
- For OVMS: Use - values_ovms.yaml.
- For vLLM: Use - values_vllm.yaml.
- For TGI: Use - values_tgi.yaml.
Edit only the values.yaml file to set the necessary environment variables. Ensure you set the huggingface.apiToken and proxy settings as required.
| Key | Description | Example Value | 
|---|---|---|
| 
 | Your Hugging Face API token | 
 | 
| 
 | Give User name for PG Vector DB | 
 | 
| 
 | Give pwd for PG Vector DB | 
 | 
| 
 | A Minio server user name | 
 | 
| 
 | A password to connect to minio server | 
 | 
| 
 | OTLP endpoint | |
| 
 | OTLP endpoint for trace | |
| 
 | Flag to enable TEI embedding model server | 
 | 
| 
 | Flag to enable OVMS embedding model server | 
 | 
| 
 | Sets the static port (in the 30000–32767 range) | |
| 
 | Set true to persists the storage. Default is false | false | 
| 
 | To Enable OVMS Embedding on GPU | false | 
| 
 | For model server deployed on GPU | false | 
| 
 | Label assigned to the GPU node on kubernetes cluster by the device plugin example- gpu.intel.com/i915, gpu.intel.com/xe. Identify by running kubectl describe node  | 
 | 
| 
 | Default is GPU, If the system has an integrated GPU, its id is always 0 (GPU.0). The GPU is an alias for GPU.0. If a system has multiple GPUs (for example, an integrated and a discrete Intel GPU) It is done by specifying GPU.1,GPU.0 | GPU | 
| 
 | Name of the ChatQnA application | 
 | 
| 
 | image repository url | 
 | 
| 
 | latest image tag | 
 | 
| 
 | connection endpoint to model server | |
| 
 | index name for pgVector | 
 | 
| 
 | Number of top K results to fetch | 
 | 
| 
 | embedding model name | 
 | 
| 
 | pgvector connection string | 
 | 
| 
 | model to be used with tgi/vllm/ovms | 
 | 
| 
 | model to be used with tei | 
 | 
NOTE: GPU is only enabled for openvino model server (OVMS)
Option 2: Install from Source#
Step 1: Clone the Repository#
Clone the repository containing the Helm chart:
git clone <repository-url>
Step 2: Change to the Chart Directory#
Navigate to the chart directory:
cd <repository-url>/sample-applications/chat-question-and-answer/chart
Step 3: Configure the values.yaml File#
Edit the values.yaml file located in the chart directory to set the necessary environment variables. Refer to the table in Option 1, Step 3 for the list of keys and example values.
Step 4: Build Helm Dependencies#
Navigate to the chart directory and build the Helm dependencies using the following command:
helm dependency build
Common Steps after configuration#
Step 5: Deploy the Helm Chart#
Deploy the OVMS Helm chart:
helm install chatqna . -f values_ovms.yaml -n <your-namespace>
Note: When deploying OVMS, the OVMS service is observed to take more time than other model serving due to model conversion time.
Deploy the vLLM Helm chart:
helm install chatqna . -f values_vllm.yaml -n <your-namespace>
Deploy the TGI Helm chart:
helm install chatqna . -f values_tgi.yaml -n <your-namespace>
Step 6: Verify the Deployment#
Check the status of the deployed resources to ensure everything is running correctly
kubectl get pods -n <your-namespace>
kubectl get services -n <your-namespace>
Step 7: Access the Application#
Open the UI in a browser at http://<node-ip>:<ui-node-port>
Step 8: Update Helm Dependencies#
If any changes are made to the subcharts, update the Helm dependencies using the following command:
helm dependency update
Step 9: Uninstall Helm chart#
To uninstall helm charts deployed, use the following command:
helm uninstall <name> -n <your-namespace>
Verification#
- Ensure that all pods are running and the services are accessible. 
- Access the application dashboard and verify that it is functioning as expected. 
Troubleshooting#
- If you encounter any issues during the deployment process, check the Kubernetes logs for errors: - kubectl logs <pod-name> 
- If the PVC created during a Helm chart deployment is not removed or auto-deleted due to a deployment failure or being stuck, it must be deleted manually using the following commands: - # List the PVCs present in the given namespace kubectl get pvc -n <namespace> # Delete the required PVC from the namespace kubectl delete pvc <pvc-name> -n <namespace>