Using NVIDIA GPU with OVMS in Scenescape#
Pre-requisite#
Follow instructions for enabling NVIDIA GPU Support from this Blog post:
Deploying AI workloads with OpenVINO Model Server across CPUs and GPUs
Setup Docker Build Environment#
Pull docker cuda runtime.
docker pull docker.io/nvidia/cuda:11.8.0-runtime-ubuntu20.04
or for Ubuntu 22.04:
docker pull docker.io/nvidia/cuda:11.8.0-runtime-ubuntu22.04
Follow instructions in the blog for installation the NVIDIA Container Toolkit. Generally, the steps are:
download NVIDIA keyring
install experimental packages
apt update
sudo apt-get install -y nvidia-container-toolkit
Fetch and Build OVMS#
Fetch all of the model server sources from github.
mkdir ovms_nvidia
cd ovms_nvidia
git clone https://github.com/openvinotoolkit/model_server.git
cd model_server
Build the model server docker container.
NVIDIA=1 OV_USE_BINARY=0 OV_SOURCE_BRANCH=master OV_CONTRIB_BRANCH=master make docker_build
Note: The build and test process will take anywhere from 20 - 45 minutes to complete.
Results displayed at the end of build/test:
=> => writing image sha256:6664132b5bf15b0afe53e4acfc3829d712810500ad5a64e5a3511c599fd65b9b 0.0s
=> => naming to docker.io/openvino/model_server-gpu:latest-cuda
View built containers via the “docker images” command.
tom@adlgraphics:~/develop/ovms_nvidia/model_server$ docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
openvino/model_server-gpu latest-cuda 6664132b5bf1 18 minutes ago 5.3GB
openvino/model_server latest-gpu-cuda 6664132b5bf1 18 minutes ago 5.3GB
nvidia/cuda 11.8.0-runtime-ubuntu20.04 87fde1234010 6 months ago 2.66GB
nvidia/cuda 11.8.0-runtime-ubuntu22.04 d8fb74ecc8b2 6 months ago 2.65GB
hello-world latest d2c94e258dcb 13 months ago 13.3kB
Run NVIDIA Enabled OVMS Container#
Follow the directions in OVMS documentation for setting-up the directory structure for video content. Where the directory structure looks similair to:
workspace/
person-detection-retail-0013
1/
person-detection-retail-0013.bin
person-detection-retail-0013.xml
Set the model directory environment variable:
MODEL_DIR=/home/tom/develop/openvino
echo $MODEL_DIR
Run the model server docker container.
docker run -p 30001:30001 -p 30002:30002 -it --gpus all \
-v ${MODEL_DIR}/workspace:/workspace openvino/model_server:latest-cuda \
--model_path /workspace/person-detection-retail-0013 \
--model_name person-detection-retail-0013 --port 30001 \
--rest_port 30002 --target_device NVIDIA
When the OVMS server is running, output should be similar to:
[2024-05-31 11:36:28.233][1][modelmanager][info][modelinstance.cpp:1321] Number of OpenVINO streams: 1
[2024-05-31 11:36:28.233][1][modelmanager][info][modelinstance.cpp:757] Plugin config for device: NVIDIA
[2024-05-31 11:36:28.233][1][modelmanager][info][modelinstance.cpp:761] OVMS set plugin settings key: PERFORMANCE_HINT; value: LATENCY;
[2024-05-31 11:36:28.235][1][serving][info][modelinstance.cpp:824] Loaded model person-detection-retail-0013; version: 1; batch size: 1; No of InferRequests: 1
[2024-05-31 11:36:28.235][1][serving][info][modelversionstatus.cpp:109] STATUS CHANGE: Version 1 of model person-detection-retail-0013 status change. New status: ( "state": "AVAILABLE", "error_code": "OK" )
[2024-05-31 11:36:28.235][1][serving][info][model.cpp:88] Updating default version for model: person-detection-retail-0013, from: 0
[2024-05-31 11:36:28.235][1][serving][info][model.cpp:98] Updated default version for model: person-detection-retail-0013, to: 1
[2024-05-31 11:36:28.235][1][serving][info][servablemanagermodule.cpp:55] ServableManagerModule started
[2024-05-31 11:36:28.235][268][modelmanager][info][modelmanager.cpp:1086] Started cleaner thread
[2024-05-31 11:36:28.235][267][modelmanager][info][modelmanager.cpp:1067] Started model manager thread
Testing OVMS Using Benchmark Client#
Build the model server benchmark client following directions in “Deploy AI Workloads with OpenVINO™ Model Server across CPUs and GPUs” Blog.
When docker build completes, benchmark_client displays with “docker images” command
tom@adlgraphics:~/develop/ovms_nvidia/model_server/demos/benchmark/python$ docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
benchmark_client latest 0aeba9dc0462 32 seconds ago 2.36GB
Run benchmark client.
docker run --network host benchmark_client -a localhost -r 30002 -m person-detection-retail-0013 -p 30001 -n 8 --report_warmup --print_all
The output of benchmark client shows latencies and frame rates.
XI worker: window_first_latency: 0.044996039000125165
XI worker: window_pass_max_latency: 0.044996039000125165
XI worker: window_fail_max_latency: 0.0
XI worker: window_brutto_batch_rate: 31.29846114349164
XI worker: window_brutto_frame_rate: 31.29846114349164
XI worker: window_netto_batch_rate: 26.631751020653166
XI worker: window_netto_frame_rate: 26.631751020653166
XI worker: window_frame_passrate: 1.0
XI worker: window_batch_passrate: 1.0
XI worker: window_mean_latency: 0.03754916450009205
XI worker: window_mean_latency2: 0.0014308553988021302
XI worker: window_stdev_latency: 0.004573362455257325
XI worker: window_cv_latency: 0.12179665023561613
XI worker: window_pass_mean_latency: 0.03754916450009205
XI worker: window_pass_mean_latency2: 0.0014308553988021302
XI worker: window_pass_stdev_latency: 0.004573362455257325
XI worker: window_pass_cv_latency: 0.12179665023561613
Scenescape docker_compose.yml file configuration#
The OVMS configuration section of docker_compose.yml should look similiar to the configuration below. Refer to docker help for information under the “devices” section to use a selected GPU if multiple GPUs are installed.
ovms:
image: openvino/model_server:latest-cuda
user: "${UID}:${GID}"
networks:
scenescape:
command: --config_path /models/ovms-config.json --port 30001 --rest_port 30002 --cache_dir /models/ovms/cache
volumes:
- ./models:/models
restart: always
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: 1
capabilities: [gpu]
For completeness, an example retail video section is shown below. There are four changes to the scene section: depends_on, camerachain, ovmshost and the “volumes” section pointing to the directory containing the models.
retail-video:
image: scenescape:<version>
init: true
networks:
scenescape:
depends_on:
- broker
- ntpserv
- ovms
command:
- "percebro"
- "--camera=sample_data/apriltag-cam1.mp4"
- "--cameraid=camera1"
- "--intrinsics={\"fov\":70}"
- "--camera=sample_data/apriltag-cam2.mp4"
- "--cameraid=camera2"
- "--intrinsics={\"fov\":70}"
- "--camerachain=retail=ovms"
- "--ovmshost=ovms:30001"
- "--ntp=ntpserv"
- "--auth=/run/secrets/percebro.auth"
- "broker.scenescape.intel.com"
privileged: true
volumes:
- ./models:/opt/intel/openvino/deployment_tools/intel_models
- ./models/ovms-config.json:/opt/ml/ovms-config.json
- ./models:/models
secrets:
- certs
- percebro.auth
restart: always
Verifying that NVIDIA hardware is being utilized#
Use nvtop to view GPU utilization while Scenescape is running.
sudo apt install nvtop
Troubleshooting Tips#
Tried several versions OV_SOURCE_BRANCH 2024.0 and 2024.1 and found that “master” pull was able to build, others did not.
The command “nvidia-smi” kept returning “device not found”, even though all of the NVIDIA drivers were install on Ubuntu 22.04. The solution that worked was adding the line below to nvidia config file in /etc/modprobe.d/. Also had to uninstall NVIDIA closed proprietary drivers and use the open version.
/etc/modprobe.d/ configuration file:
options nvidia NVreg_OpenRmEnableUnsupportedGpus=1
tom@adlgraphics:~$ nvidia-smi
Fri May 31 07:25:12 2024
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.171.04 Driver Version: 535.171.04 CUDA Version: 12.2 |
|-----------------------------------------+----------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 NVIDIA GeForce RTX 3050 Off | 00000000:01:00.0 Off | N/A |
| 34% 34C P2 20W / 70W | 185MiB / 6144MiB | 3% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
+---------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
| 0 N/A N/A 1611 G /usr/lib/xorg/Xorg 4MiB |
| 0 N/A N/A 1859 C ...libexec/gnome-remote-desktop-daemon 157MiB |
| 0 N/A N/A 2310 G ...libexec/gnome-remote-desktop-daemon 0MiB |
+---------------------------------------------------------------------------------------+
tom@adlgraphics:~$