Using NVIDIA® GPU with OVMS in Scenescape#

Pre-requisite#

Follow instructions for enabling NVIDIA GPU Support from this Blog post:

Deploying AI workloads with OpenVINO Model Server across CPUs and GPUs

Setup Docker Build Environment#

Pull docker cuda runtime.

docker pull docker.io/nvidia/cuda:11.8.0-runtime-ubuntu20.04

or for Ubuntu 24.04:

docker pull docker.io/nvidia/cuda:11.8.0-runtime-ubuntu22.04

Follow instructions in the blog for installation the NVIDIA Container Toolkit. Generally, the steps are:

  • download NVIDIA keyring

  • install experimental packages

  • apt update

sudo apt-get install -y nvidia-container-toolkit

Fetch and Build OVMS#

Fetch all of the model server sources from github.

mkdir ovms_nvidia
cd ovms_nvidia
git clone https://github.com/openvinotoolkit/model_server.git
cd model_server

Build the model server docker container.

NVIDIA=1 OV_USE_BINARY=0 OV_SOURCE_BRANCH=master OV_CONTRIB_BRANCH=master make docker_build

Note: The build and test process will take anywhere from 20 - 45 minutes to complete.

Results displayed at the end of build/test:

=> => writing image sha256:6664132b5bf15b0afe53e4acfc3829d712810500ad5a64e5a3511c599fd65b9b                                                 0.0s
=> => naming to docker.io/openvino/model_server-gpu:latest-cuda

View built containers via the “docker images” command.

tom@adlgraphics:~/develop/ovms_nvidia/model_server$ docker images
REPOSITORY                    TAG                          IMAGE ID       CREATED          SIZE
openvino/model_server-gpu     latest-cuda                  6664132b5bf1   18 minutes ago   5.3GB
openvino/model_server         latest-gpu-cuda              6664132b5bf1   18 minutes ago   5.3GB
nvidia/cuda                   11.8.0-runtime-ubuntu20.04   87fde1234010   6 months ago     2.66GB
nvidia/cuda                   11.8.0-runtime-ubuntu22.04   d8fb74ecc8b2   6 months ago     2.65GB
hello-world                   latest                       d2c94e258dcb   13 months ago    13.3kB

Run NVIDIA Enabled OVMS Container#

Follow the directions in OVMS documentation for setting-up the directory structure for video content. Where the directory structure looks similair to:

workspace/
    person-detection-retail-0013
        1/
           person-detection-retail-0013.bin
           person-detection-retail-0013.xml

Set the model directory environment variable:

MODEL_DIR=/home/tom/develop/openvino
echo $MODEL_DIR

Run the model server docker container.

docker run -p 30001:30001 -p 30002:30002 -it --gpus all \
-v ${MODEL_DIR}/workspace:/workspace openvino/model_server:latest-cuda \
--model_path /workspace/person-detection-retail-0013 \
--model_name person-detection-retail-0013 --port 30001 \
--rest_port 30002 --target_device NVIDIA

When the OVMS server is running, output should be similar to:

[2024-05-31 11:36:28.233][1][modelmanager][info][modelinstance.cpp:1321] Number of OpenVINO streams: 1
[2024-05-31 11:36:28.233][1][modelmanager][info][modelinstance.cpp:757] Plugin config for device: NVIDIA
[2024-05-31 11:36:28.233][1][modelmanager][info][modelinstance.cpp:761] OVMS set plugin settings key: PERFORMANCE_HINT; value: LATENCY;
[2024-05-31 11:36:28.235][1][serving][info][modelinstance.cpp:824] Loaded model person-detection-retail-0013; version: 1; batch size: 1; No of InferRequests: 1
[2024-05-31 11:36:28.235][1][serving][info][modelversionstatus.cpp:109] STATUS CHANGE: Version 1 of model person-detection-retail-0013 status change. New status: ( "state": "AVAILABLE", "error_code": "OK" )
[2024-05-31 11:36:28.235][1][serving][info][model.cpp:88] Updating default version for model: person-detection-retail-0013, from: 0
[2024-05-31 11:36:28.235][1][serving][info][model.cpp:98] Updated default version for model: person-detection-retail-0013, to: 1
[2024-05-31 11:36:28.235][1][serving][info][servablemanagermodule.cpp:55] ServableManagerModule started
[2024-05-31 11:36:28.235][268][modelmanager][info][modelmanager.cpp:1086] Started cleaner thread
[2024-05-31 11:36:28.235][267][modelmanager][info][modelmanager.cpp:1067] Started model manager thread

Testing OVMS Using Benchmark Client#

Build the model server benchmark client following directions in “Deploy AI Workloads with OpenVINO™ Model Server across CPUs and GPUs” Blog.

When docker build completes, benchmark_client displays with “docker images” command

tom@adlgraphics:~/develop/ovms_nvidia/model_server/demos/benchmark/python$ docker images
REPOSITORY                    TAG                          IMAGE ID       CREATED          SIZE
benchmark_client              latest                       0aeba9dc0462   32 seconds ago   2.36GB

Run benchmark client.

docker run --network host benchmark_client -a localhost -r 30002 -m person-detection-retail-0013 -p 30001 -n 8 --report_warmup --print_all

The output of benchmark client shows latencies and frame rates.

XI worker: window_first_latency: 0.044996039000125165
XI worker: window_pass_max_latency: 0.044996039000125165
XI worker: window_fail_max_latency: 0.0
XI worker: window_brutto_batch_rate: 31.29846114349164
XI worker: window_brutto_frame_rate: 31.29846114349164
XI worker: window_netto_batch_rate: 26.631751020653166
XI worker: window_netto_frame_rate: 26.631751020653166
XI worker: window_frame_passrate: 1.0
XI worker: window_batch_passrate: 1.0
XI worker: window_mean_latency: 0.03754916450009205
XI worker: window_mean_latency2: 0.0014308553988021302
XI worker: window_stdev_latency: 0.004573362455257325
XI worker: window_cv_latency: 0.12179665023561613
XI worker: window_pass_mean_latency: 0.03754916450009205
XI worker: window_pass_mean_latency2: 0.0014308553988021302
XI worker: window_pass_stdev_latency: 0.004573362455257325
XI worker: window_pass_cv_latency: 0.12179665023561613

Scenescape docker_compose.yml file configuration#

The OVMS configuration section of docker_compose.yml should look similiar to the configuration below. Refer to docker help for information under the “devices” section to use a selected GPU if multiple GPUs are installed.

ovms:
     image: openvino/model_server:latest-cuda
     user: "${UID}:${GID}"
     networks:
       scenescape:
     command: --config_path /models/ovms-config.json --port 30001 --rest_port 30002 --cache_dir /models/ovms/cache
     volumes:
      - ./model_installer/models:/models
     restart: always
     deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: 1
              capabilities: [gpu]

For completeness, an example retail video section is shown below. There are four changes to the scene section: depends_on, camerachain, ovmshost and the “volumes” section pointing to the directory containing the models.

retail-video:
    image: scenescape:<version>
    init: true
    networks:
      scenescape:
    depends_on:
     - broker
     - ntpserv
     - ovms
    command:
     - "percebro"
     - "--camera=sample_data/apriltag-cam1.mp4"
     - "--cameraid=camera1"
     - "--intrinsics={\"fov\":70}"
     - "--camera=sample_data/apriltag-cam2.mp4"
     - "--cameraid=camera2"
     - "--intrinsics={\"fov\":70}"
     - "--camerachain=retail=ovms"
     - "--ovmshost=ovms:30001"
     - "--ntp=ntpserv"
     - "--auth=/run/secrets/percebro.auth"
     - "broker.scenescape.intel.com"
    privileged: true
    volumes:
     - ./model_installer/models:/opt/intel/openvino/deployment_tools/intel_models
     - ./model_installer/models/ovms-config.json:/opt/ml/ovms-config.json
     - ./model_installer/models:/models
    secrets:
     - certs
     - percebro.auth
    restart: always

Verifying that NVIDIA hardware is being utilized#

Use nvtop to view GPU utilization while Scenescape is running.

sudo apt install nvtop

Troubleshooting Tips#

Build Issues#

  • When building OVMS with NVIDIA support, using OV_SOURCE_BRANCH=master may be more reliable than specific version branches like 2024.0 or 2024.1

  • If build failures occur with specific versions, try the master branch as demonstrated in the build command above

NVIDIA Driver Issues#

If you encounter “device not found” errors when running nvidia-smi despite having drivers installed:

  1. Consider switching from NVIDIA proprietary drivers to the open source version

  2. Add the following configuration to enable support for your GPU in /etc/modprobe.d/nvidia.conf:

options nvidia NVreg_OpenRmEnableUnsupportedGpus=1
  1. Reboot your system after making these changes

  2. Verify GPU detection with nvidia-smi command

Verifying GPU Utilization#

Monitor GPU usage during inference to confirm NVIDIA acceleration is working properly:

nvtop

or

nvidia-smi -l 1

to refresh GPU statistics every second

tom@adlgraphics:~$ nvidia-smi
Fri May 31 07:25:12 2024
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.171.04             Driver Version: 535.171.04   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA GeForce RTX 3050        Off | 00000000:01:00.0 Off |                  N/A |
| 34%   34C    P2              20W /  70W |    185MiB /  6144MiB |      3%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+

+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|    0   N/A  N/A      1611      G   /usr/lib/xorg/Xorg                            4MiB |
|    0   N/A  N/A      1859      C   ...libexec/gnome-remote-desktop-daemon      157MiB |
|    0   N/A  N/A      2310      G   ...libexec/gnome-remote-desktop-daemon        0MiB |
+---------------------------------------------------------------------------------------+
tom@adlgraphics:~$