# Using NVIDIA GPU with OVMS in Scenescape ## Pre-requisite Follow instructions for enabling NVIDIA GPU Support from this Blog post: [Deploying AI workloads with OpenVINO Model Server across CPUs and GPUs](https://blog.openvino.ai/blog-posts/deploy-ai-workloads-with-openvino-tm-model-server-across-cpus-and-gpus) ## Setup Docker Build Environment Pull docker cuda runtime. ``` docker pull docker.io/nvidia/cuda:11.8.0-runtime-ubuntu20.04 ``` or for Ubuntu 22.04: ``` docker pull docker.io/nvidia/cuda:11.8.0-runtime-ubuntu22.04 ``` Follow instructions in the blog for installation the NVIDIA Container Toolkit. Generally, the steps are: - download NVIDIA keyring - install experimental packages - apt update ``` sudo apt-get install -y nvidia-container-toolkit ``` ## Fetch and Build OVMS Fetch all of the model server sources from github. ``` mkdir ovms_nvidia cd ovms_nvidia git clone https://github.com/openvinotoolkit/model_server.git cd model_server ``` Build the model server docker container. ``` NVIDIA=1 OV_USE_BINARY=0 OV_SOURCE_BRANCH=master OV_CONTRIB_BRANCH=master make docker_build ``` *Note: The build and test process will take anywhere from 20 - 45 minutes to complete.* Results displayed at the end of build/test: ``` => => writing image sha256:6664132b5bf15b0afe53e4acfc3829d712810500ad5a64e5a3511c599fd65b9b 0.0s => => naming to docker.io/openvino/model_server-gpu:latest-cuda ``` View built containers via the "docker images" command. ``` tom@adlgraphics:~/develop/ovms_nvidia/model_server$ docker images REPOSITORY TAG IMAGE ID CREATED SIZE openvino/model_server-gpu latest-cuda 6664132b5bf1 18 minutes ago 5.3GB openvino/model_server latest-gpu-cuda 6664132b5bf1 18 minutes ago 5.3GB nvidia/cuda 11.8.0-runtime-ubuntu20.04 87fde1234010 6 months ago 2.66GB nvidia/cuda 11.8.0-runtime-ubuntu22.04 d8fb74ecc8b2 6 months ago 2.65GB hello-world latest d2c94e258dcb 13 months ago 13.3kB ``` ## Run NVIDIA Enabled OVMS Container Follow the directions in OVMS documentation for setting-up the directory structure for video content. Where the directory structure looks similair to: ``` workspace/ person-detection-retail-0013 1/ person-detection-retail-0013.bin person-detection-retail-0013.xml ``` Set the model directory environment variable: ``` MODEL_DIR=/home/tom/develop/openvino echo $MODEL_DIR ``` Run the model server docker container. ``` docker run -p 30001:30001 -p 30002:30002 -it --gpus all \ -v ${MODEL_DIR}/workspace:/workspace openvino/model_server:latest-cuda \ --model_path /workspace/person-detection-retail-0013 \ --model_name person-detection-retail-0013 --port 30001 \ --rest_port 30002 --target_device NVIDIA ``` When the OVMS server is running, output should be similar to: ``` [2024-05-31 11:36:28.233][1][modelmanager][info][modelinstance.cpp:1321] Number of OpenVINO streams: 1 [2024-05-31 11:36:28.233][1][modelmanager][info][modelinstance.cpp:757] Plugin config for device: NVIDIA [2024-05-31 11:36:28.233][1][modelmanager][info][modelinstance.cpp:761] OVMS set plugin settings key: PERFORMANCE_HINT; value: LATENCY; [2024-05-31 11:36:28.235][1][serving][info][modelinstance.cpp:824] Loaded model person-detection-retail-0013; version: 1; batch size: 1; No of InferRequests: 1 [2024-05-31 11:36:28.235][1][serving][info][modelversionstatus.cpp:109] STATUS CHANGE: Version 1 of model person-detection-retail-0013 status change. New status: ( "state": "AVAILABLE", "error_code": "OK" ) [2024-05-31 11:36:28.235][1][serving][info][model.cpp:88] Updating default version for model: person-detection-retail-0013, from: 0 [2024-05-31 11:36:28.235][1][serving][info][model.cpp:98] Updated default version for model: person-detection-retail-0013, to: 1 [2024-05-31 11:36:28.235][1][serving][info][servablemanagermodule.cpp:55] ServableManagerModule started [2024-05-31 11:36:28.235][268][modelmanager][info][modelmanager.cpp:1086] Started cleaner thread [2024-05-31 11:36:28.235][267][modelmanager][info][modelmanager.cpp:1067] Started model manager thread ``` ## Testing OVMS Using Benchmark Client Build the model server benchmark client following directions in "Deploy AI Workloads with OpenVINO™ Model Server across CPUs and GPUs" Blog. When docker build completes, benchmark_client displays with "docker images" command ``` tom@adlgraphics:~/develop/ovms_nvidia/model_server/demos/benchmark/python$ docker images REPOSITORY TAG IMAGE ID CREATED SIZE benchmark_client latest 0aeba9dc0462 32 seconds ago 2.36GB ``` Run benchmark client. ``` docker run --network host benchmark_client -a localhost -r 30002 -m person-detection-retail-0013 -p 30001 -n 8 --report_warmup --print_all ``` The output of benchmark client shows latencies and frame rates. ``` XI worker: window_first_latency: 0.044996039000125165 XI worker: window_pass_max_latency: 0.044996039000125165 XI worker: window_fail_max_latency: 0.0 XI worker: window_brutto_batch_rate: 31.29846114349164 XI worker: window_brutto_frame_rate: 31.29846114349164 XI worker: window_netto_batch_rate: 26.631751020653166 XI worker: window_netto_frame_rate: 26.631751020653166 XI worker: window_frame_passrate: 1.0 XI worker: window_batch_passrate: 1.0 XI worker: window_mean_latency: 0.03754916450009205 XI worker: window_mean_latency2: 0.0014308553988021302 XI worker: window_stdev_latency: 0.004573362455257325 XI worker: window_cv_latency: 0.12179665023561613 XI worker: window_pass_mean_latency: 0.03754916450009205 XI worker: window_pass_mean_latency2: 0.0014308553988021302 XI worker: window_pass_stdev_latency: 0.004573362455257325 XI worker: window_pass_cv_latency: 0.12179665023561613 ``` ## Scenescape docker_compose.yml file configuration The OVMS configuration section of docker_compose.yml should look similiar to the configuration below. Refer to docker help for information under the "devices" section to use a selected GPU if multiple GPUs are installed. ``` ovms: image: openvino/model_server:latest-cuda user: "${UID}:${GID}" networks: scenescape: command: --config_path /models/ovms-config.json --port 30001 --rest_port 30002 --cache_dir /models/ovms/cache volumes: - ./models:/models restart: always deploy: resources: reservations: devices: - driver: nvidia count: 1 capabilities: [gpu] ``` For completeness, an example retail video section is shown below. There are four changes to the scene section: depends_on, camerachain, ovmshost and the "volumes" section pointing to the directory containing the models. ``` retail-video: image: scenescape: init: true networks: scenescape: depends_on: - broker - ntpserv - ovms command: - "percebro" - "--camera=sample_data/apriltag-cam1.mp4" - "--cameraid=camera1" - "--intrinsics={\"fov\":70}" - "--camera=sample_data/apriltag-cam2.mp4" - "--cameraid=camera2" - "--intrinsics={\"fov\":70}" - "--camerachain=retail=ovms" - "--ovmshost=ovms:30001" - "--ntp=ntpserv" - "--auth=/run/secrets/percebro.auth" - "broker.scenescape.intel.com" privileged: true volumes: - ./models:/opt/intel/openvino/deployment_tools/intel_models - ./models/ovms-config.json:/opt/ml/ovms-config.json - ./models:/models secrets: - certs - percebro.auth restart: always ``` ## Verifying that NVIDIA hardware is being utilized Use nvtop to view GPU utilization while Scenescape is running. ``` sudo apt install nvtop ``` ## Troubleshooting Tips Tried several versions OV_SOURCE_BRANCH 2024.0 and 2024.1 and found that "master" pull was able to build, others did not. The command "nvidia-smi" kept returning "device not found", even though all of the NVIDIA drivers were install on Ubuntu 22.04. The solution that worked was adding the line below to nvidia config file in /etc/modprobe.d/. Also had to uninstall NVIDIA closed proprietary drivers and use the open version. ``` /etc/modprobe.d/ configuration file: options nvidia NVreg_OpenRmEnableUnsupportedGpus=1 ``` ``` tom@adlgraphics:~$ nvidia-smi Fri May 31 07:25:12 2024 +---------------------------------------------------------------------------------------+ | NVIDIA-SMI 535.171.04 Driver Version: 535.171.04 CUDA Version: 12.2 | |-----------------------------------------+----------------------+----------------------+ | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |=========================================+======================+======================| | 0 NVIDIA GeForce RTX 3050 Off | 00000000:01:00.0 Off | N/A | | 34% 34C P2 20W / 70W | 185MiB / 6144MiB | 3% Default | | | | N/A | +-----------------------------------------+----------------------+----------------------+ +---------------------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=======================================================================================| | 0 N/A N/A 1611 G /usr/lib/xorg/Xorg 4MiB | | 0 N/A N/A 1859 C ...libexec/gnome-remote-desktop-daemon 157MiB | | 0 N/A N/A 2310 G ...libexec/gnome-remote-desktop-daemon 0MiB | +---------------------------------------------------------------------------------------+ tom@adlgraphics:~$ ```