# Using NVIDIA® GPU with OVMS in Scenescape ## Pre-requisite Follow instructions for enabling NVIDIA GPU Support from this Blog post: [Deploying AI workloads with OpenVINO Model Server across CPUs and GPUs](https://blog.openvino.ai/blog-posts/deploy-ai-workloads-with-openvino-tm-model-server-across-cpus-and-gpus) ## Setup Docker Build Environment Pull docker cuda runtime. ``` docker pull docker.io/nvidia/cuda:11.8.0-runtime-ubuntu20.04 ``` or for Ubuntu 24.04: ``` docker pull docker.io/nvidia/cuda:11.8.0-runtime-ubuntu22.04 ``` Follow instructions in the blog for installation the NVIDIA Container Toolkit. Generally, the steps are: - download NVIDIA keyring - install experimental packages - apt update ``` sudo apt-get install -y nvidia-container-toolkit ``` ## Fetch and Build OVMS Fetch all of the model server sources from github. ``` mkdir ovms_nvidia cd ovms_nvidia git clone https://github.com/openvinotoolkit/model_server.git cd model_server ``` Build the model server docker container. ``` NVIDIA=1 OV_USE_BINARY=0 OV_SOURCE_BRANCH=master OV_CONTRIB_BRANCH=master make docker_build ``` _Note: The build and test process will take anywhere from 20 - 45 minutes to complete._ Results displayed at the end of build/test: ``` => => writing image sha256:6664132b5bf15b0afe53e4acfc3829d712810500ad5a64e5a3511c599fd65b9b 0.0s => => naming to docker.io/openvino/model_server-gpu:latest-cuda ``` View built containers via the "docker images" command. ``` tom@adlgraphics:~/develop/ovms_nvidia/model_server$ docker images REPOSITORY TAG IMAGE ID CREATED SIZE openvino/model_server-gpu latest-cuda 6664132b5bf1 18 minutes ago 5.3GB openvino/model_server latest-gpu-cuda 6664132b5bf1 18 minutes ago 5.3GB nvidia/cuda 11.8.0-runtime-ubuntu20.04 87fde1234010 6 months ago 2.66GB nvidia/cuda 11.8.0-runtime-ubuntu22.04 d8fb74ecc8b2 6 months ago 2.65GB hello-world latest d2c94e258dcb 13 months ago 13.3kB ``` ## Run NVIDIA Enabled OVMS Container Follow the directions in OVMS documentation for setting-up the directory structure for video content. Where the directory structure looks similair to: ``` workspace/ person-detection-retail-0013 1/ person-detection-retail-0013.bin person-detection-retail-0013.xml ``` Set the model directory environment variable: ``` MODEL_DIR=/home/tom/develop/openvino echo $MODEL_DIR ``` Run the model server docker container. ``` docker run -p 30001:30001 -p 30002:30002 -it --gpus all \ -v ${MODEL_DIR}/workspace:/workspace openvino/model_server:latest-cuda \ --model_path /workspace/person-detection-retail-0013 \ --model_name person-detection-retail-0013 --port 30001 \ --rest_port 30002 --target_device NVIDIA ``` When the OVMS server is running, output should be similar to: ``` [2024-05-31 11:36:28.233][1][modelmanager][info][modelinstance.cpp:1321] Number of OpenVINO streams: 1 [2024-05-31 11:36:28.233][1][modelmanager][info][modelinstance.cpp:757] Plugin config for device: NVIDIA [2024-05-31 11:36:28.233][1][modelmanager][info][modelinstance.cpp:761] OVMS set plugin settings key: PERFORMANCE_HINT; value: LATENCY; [2024-05-31 11:36:28.235][1][serving][info][modelinstance.cpp:824] Loaded model person-detection-retail-0013; version: 1; batch size: 1; No of InferRequests: 1 [2024-05-31 11:36:28.235][1][serving][info][modelversionstatus.cpp:109] STATUS CHANGE: Version 1 of model person-detection-retail-0013 status change. New status: ( "state": "AVAILABLE", "error_code": "OK" ) [2024-05-31 11:36:28.235][1][serving][info][model.cpp:88] Updating default version for model: person-detection-retail-0013, from: 0 [2024-05-31 11:36:28.235][1][serving][info][model.cpp:98] Updated default version for model: person-detection-retail-0013, to: 1 [2024-05-31 11:36:28.235][1][serving][info][servablemanagermodule.cpp:55] ServableManagerModule started [2024-05-31 11:36:28.235][268][modelmanager][info][modelmanager.cpp:1086] Started cleaner thread [2024-05-31 11:36:28.235][267][modelmanager][info][modelmanager.cpp:1067] Started model manager thread ``` ## Testing OVMS Using Benchmark Client Build the model server benchmark client following directions in "Deploy AI Workloads with OpenVINO™ Model Server across CPUs and GPUs" Blog. When docker build completes, benchmark_client displays with "docker images" command ``` tom@adlgraphics:~/develop/ovms_nvidia/model_server/demos/benchmark/python$ docker images REPOSITORY TAG IMAGE ID CREATED SIZE benchmark_client latest 0aeba9dc0462 32 seconds ago 2.36GB ``` Run benchmark client. ``` docker run --network host benchmark_client -a localhost -r 30002 -m person-detection-retail-0013 -p 30001 -n 8 --report_warmup --print_all ``` The output of benchmark client shows latencies and frame rates. ``` XI worker: window_first_latency: 0.044996039000125165 XI worker: window_pass_max_latency: 0.044996039000125165 XI worker: window_fail_max_latency: 0.0 XI worker: window_brutto_batch_rate: 31.29846114349164 XI worker: window_brutto_frame_rate: 31.29846114349164 XI worker: window_netto_batch_rate: 26.631751020653166 XI worker: window_netto_frame_rate: 26.631751020653166 XI worker: window_frame_passrate: 1.0 XI worker: window_batch_passrate: 1.0 XI worker: window_mean_latency: 0.03754916450009205 XI worker: window_mean_latency2: 0.0014308553988021302 XI worker: window_stdev_latency: 0.004573362455257325 XI worker: window_cv_latency: 0.12179665023561613 XI worker: window_pass_mean_latency: 0.03754916450009205 XI worker: window_pass_mean_latency2: 0.0014308553988021302 XI worker: window_pass_stdev_latency: 0.004573362455257325 XI worker: window_pass_cv_latency: 0.12179665023561613 ``` ## Scenescape docker_compose.yml file configuration The OVMS configuration section of docker_compose.yml should look similiar to the configuration below. Refer to docker help for information under the "devices" section to use a selected GPU if multiple GPUs are installed. ``` ovms: image: openvino/model_server:latest-cuda user: "${UID}:${GID}" networks: scenescape: command: --config_path /models/ovms-config.json --port 30001 --rest_port 30002 --cache_dir /models/ovms/cache volumes: - ./model_installer/models:/models restart: always deploy: resources: reservations: devices: - driver: nvidia count: 1 capabilities: [gpu] ``` For completeness, an example retail video section is shown below. There are four changes to the scene section: depends_on, camerachain, ovmshost and the "volumes" section pointing to the directory containing the models. ``` retail-video: image: scenescape: init: true networks: scenescape: depends_on: - broker - ntpserv - ovms command: - "percebro" - "--camera=sample_data/apriltag-cam1.mp4" - "--cameraid=camera1" - "--intrinsics={\"fov\":70}" - "--camera=sample_data/apriltag-cam2.mp4" - "--cameraid=camera2" - "--intrinsics={\"fov\":70}" - "--camerachain=retail=ovms" - "--ovmshost=ovms:30001" - "--ntp=ntpserv" - "--auth=/run/secrets/percebro.auth" - "broker.scenescape.intel.com" privileged: true volumes: - ./model_installer/models:/opt/intel/openvino/deployment_tools/intel_models - ./model_installer/models/ovms-config.json:/opt/ml/ovms-config.json - ./model_installer/models:/models secrets: - certs - percebro.auth restart: always ``` ## Verifying that NVIDIA hardware is being utilized Use nvtop to view GPU utilization while Scenescape is running. ``` sudo apt install nvtop ``` ## Troubleshooting Tips ### Build Issues - When building OVMS with NVIDIA support, using `OV_SOURCE_BRANCH=master` may be more reliable than specific version branches like 2024.0 or 2024.1 - If build failures occur with specific versions, try the master branch as demonstrated in the build command above ### NVIDIA Driver Issues If you encounter "device not found" errors when running `nvidia-smi` despite having drivers installed: 1. Consider switching from NVIDIA proprietary drivers to the open source version 2. Add the following configuration to enable support for your GPU in `/etc/modprobe.d/nvidia.conf`: ``` options nvidia NVreg_OpenRmEnableUnsupportedGpus=1 ``` 3. Reboot your system after making these changes 4. Verify GPU detection with `nvidia-smi` command ### Verifying GPU Utilization Monitor GPU usage during inference to confirm NVIDIA acceleration is working properly: ``` nvtop ``` or ``` nvidia-smi -l 1 ``` to refresh GPU statistics every second ``` tom@adlgraphics:~$ nvidia-smi Fri May 31 07:25:12 2024 +---------------------------------------------------------------------------------------+ | NVIDIA-SMI 535.171.04 Driver Version: 535.171.04 CUDA Version: 12.2 | |-----------------------------------------+----------------------+----------------------+ | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |=========================================+======================+======================| | 0 NVIDIA GeForce RTX 3050 Off | 00000000:01:00.0 Off | N/A | | 34% 34C P2 20W / 70W | 185MiB / 6144MiB | 3% Default | | | | N/A | +-----------------------------------------+----------------------+----------------------+ +---------------------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=======================================================================================| | 0 N/A N/A 1611 G /usr/lib/xorg/Xorg 4MiB | | 0 N/A N/A 1859 C ...libexec/gnome-remote-desktop-daemon 157MiB | | 0 N/A N/A 2310 G ...libexec/gnome-remote-desktop-daemon 0MiB | +---------------------------------------------------------------------------------------+ tom@adlgraphics:~$ ```