Using NVIDIA GPU with OVMS in Scenescape#

Pre-requisite#

Follow instructions for enabling NVIDIA GPU Support from this Blog post:

Deploying AI workloads with OpenVINO Model Server across CPUs and GPUs

Setup Docker Build Environment#

Pull docker cuda runtime.

docker pull docker.io/nvidia/cuda:11.8.0-runtime-ubuntu20.04

or for Ubuntu 22.04:

docker pull docker.io/nvidia/cuda:11.8.0-runtime-ubuntu22.04

Follow instructions in the blog for installation the NVIDIA Container Toolkit. Generally, the steps are:

  • download NVIDIA keyring

  • install experimental packages

  • apt update

sudo apt-get install -y nvidia-container-toolkit

Fetch and Build OVMS#

Fetch all of the model server sources from github.

mkdir ovms_nvidia
cd ovms_nvidia
git clone https://github.com/openvinotoolkit/model_server.git
cd model_server

Build the model server docker container.

NVIDIA=1 OV_USE_BINARY=0 OV_SOURCE_BRANCH=master OV_CONTRIB_BRANCH=master make docker_build

Note: The build and test process will take anywhere from 20 - 45 minutes to complete.

Results displayed at the end of build/test:

=> => writing image sha256:6664132b5bf15b0afe53e4acfc3829d712810500ad5a64e5a3511c599fd65b9b                                                 0.0s
=> => naming to docker.io/openvino/model_server-gpu:latest-cuda

View built containers via the “docker images” command.

tom@adlgraphics:~/develop/ovms_nvidia/model_server$ docker images
REPOSITORY                    TAG                          IMAGE ID       CREATED          SIZE
openvino/model_server-gpu     latest-cuda                  6664132b5bf1   18 minutes ago   5.3GB
openvino/model_server         latest-gpu-cuda              6664132b5bf1   18 minutes ago   5.3GB
nvidia/cuda                   11.8.0-runtime-ubuntu20.04   87fde1234010   6 months ago     2.66GB
nvidia/cuda                   11.8.0-runtime-ubuntu22.04   d8fb74ecc8b2   6 months ago     2.65GB
hello-world                   latest                       d2c94e258dcb   13 months ago    13.3kB

Run NVIDIA Enabled OVMS Container#

Follow the directions in OVMS documentation for setting-up the directory structure for video content. Where the directory structure looks similair to:

workspace/
    person-detection-retail-0013
        1/
           person-detection-retail-0013.bin
           person-detection-retail-0013.xml

Set the model directory environment variable:

MODEL_DIR=/home/tom/develop/openvino
echo $MODEL_DIR

Run the model server docker container.

docker run -p 30001:30001 -p 30002:30002 -it --gpus all \
-v ${MODEL_DIR}/workspace:/workspace openvino/model_server:latest-cuda \
--model_path /workspace/person-detection-retail-0013 \
--model_name person-detection-retail-0013 --port 30001 \
--rest_port 30002 --target_device NVIDIA

When the OVMS server is running, output should be similar to:

[2024-05-31 11:36:28.233][1][modelmanager][info][modelinstance.cpp:1321] Number of OpenVINO streams: 1
[2024-05-31 11:36:28.233][1][modelmanager][info][modelinstance.cpp:757] Plugin config for device: NVIDIA
[2024-05-31 11:36:28.233][1][modelmanager][info][modelinstance.cpp:761] OVMS set plugin settings key: PERFORMANCE_HINT; value: LATENCY;
[2024-05-31 11:36:28.235][1][serving][info][modelinstance.cpp:824] Loaded model person-detection-retail-0013; version: 1; batch size: 1; No of InferRequests: 1
[2024-05-31 11:36:28.235][1][serving][info][modelversionstatus.cpp:109] STATUS CHANGE: Version 1 of model person-detection-retail-0013 status change. New status: ( "state": "AVAILABLE", "error_code": "OK" )
[2024-05-31 11:36:28.235][1][serving][info][model.cpp:88] Updating default version for model: person-detection-retail-0013, from: 0
[2024-05-31 11:36:28.235][1][serving][info][model.cpp:98] Updated default version for model: person-detection-retail-0013, to: 1
[2024-05-31 11:36:28.235][1][serving][info][servablemanagermodule.cpp:55] ServableManagerModule started
[2024-05-31 11:36:28.235][268][modelmanager][info][modelmanager.cpp:1086] Started cleaner thread
[2024-05-31 11:36:28.235][267][modelmanager][info][modelmanager.cpp:1067] Started model manager thread

Testing OVMS Using Benchmark Client#

Build the model server benchmark client following directions in “Deploy AI Workloads with OpenVINO™ Model Server across CPUs and GPUs” Blog.

When docker build completes, benchmark_client displays with “docker images” command

tom@adlgraphics:~/develop/ovms_nvidia/model_server/demos/benchmark/python$ docker images
REPOSITORY                    TAG                          IMAGE ID       CREATED          SIZE
benchmark_client              latest                       0aeba9dc0462   32 seconds ago   2.36GB

Run benchmark client.

docker run --network host benchmark_client -a localhost -r 30002 -m person-detection-retail-0013 -p 30001 -n 8 --report_warmup --print_all

The output of benchmark client shows latencies and frame rates.

XI worker: window_first_latency: 0.044996039000125165
XI worker: window_pass_max_latency: 0.044996039000125165
XI worker: window_fail_max_latency: 0.0
XI worker: window_brutto_batch_rate: 31.29846114349164
XI worker: window_brutto_frame_rate: 31.29846114349164
XI worker: window_netto_batch_rate: 26.631751020653166
XI worker: window_netto_frame_rate: 26.631751020653166
XI worker: window_frame_passrate: 1.0
XI worker: window_batch_passrate: 1.0
XI worker: window_mean_latency: 0.03754916450009205
XI worker: window_mean_latency2: 0.0014308553988021302
XI worker: window_stdev_latency: 0.004573362455257325
XI worker: window_cv_latency: 0.12179665023561613
XI worker: window_pass_mean_latency: 0.03754916450009205
XI worker: window_pass_mean_latency2: 0.0014308553988021302
XI worker: window_pass_stdev_latency: 0.004573362455257325
XI worker: window_pass_cv_latency: 0.12179665023561613

Scenescape docker_compose.yml file configuration#

The OVMS configuration section of docker_compose.yml should look similiar to the configuration below. Refer to docker help for information under the “devices” section to use a selected GPU if multiple GPUs are installed.

ovms:
     image: openvino/model_server:latest-cuda
     user: "${UID}:${GID}"
     networks:
       scenescape:
     command: --config_path /models/ovms-config.json --port 30001 --rest_port 30002 --cache_dir /models/ovms/cache
     volumes:
      - ./models:/models
     restart: always
     deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: 1
              capabilities: [gpu]

For completeness, an example retail video section is shown below. There are four changes to the scene section: depends_on, camerachain, ovmshost and the “volumes” section pointing to the directory containing the models.

retail-video:
    image: scenescape:<version>
    init: true
    networks:
      scenescape:
    depends_on:
     - broker
     - ntpserv
     - ovms
    command:
     - "percebro"
     - "--camera=sample_data/apriltag-cam1.mp4"
     - "--cameraid=camera1"
     - "--intrinsics={\"fov\":70}"
     - "--camera=sample_data/apriltag-cam2.mp4"
     - "--cameraid=camera2"
     - "--intrinsics={\"fov\":70}"
     - "--camerachain=retail=ovms"
     - "--ovmshost=ovms:30001"
     - "--ntp=ntpserv"
     - "--auth=/run/secrets/percebro.auth"
     - "broker.scenescape.intel.com"
    privileged: true
    volumes:
     - ./models:/opt/intel/openvino/deployment_tools/intel_models
     - ./models/ovms-config.json:/opt/ml/ovms-config.json
     - ./models:/models
    secrets:
     - certs
     - percebro.auth
    restart: always

Verifying that NVIDIA hardware is being utilized#

Use nvtop to view GPU utilization while Scenescape is running.

sudo apt install nvtop

Troubleshooting Tips#

Tried several versions OV_SOURCE_BRANCH 2024.0 and 2024.1 and found that “master” pull was able to build, others did not.

The command “nvidia-smi” kept returning “device not found”, even though all of the NVIDIA drivers were install on Ubuntu 22.04. The solution that worked was adding the line below to nvidia config file in /etc/modprobe.d/. Also had to uninstall NVIDIA closed proprietary drivers and use the open version.

/etc/modprobe.d/ configuration file:

    options nvidia NVreg_OpenRmEnableUnsupportedGpus=1
tom@adlgraphics:~$ nvidia-smi
Fri May 31 07:25:12 2024
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.171.04             Driver Version: 535.171.04   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA GeForce RTX 3050        Off | 00000000:01:00.0 Off |                  N/A |
| 34%   34C    P2              20W /  70W |    185MiB /  6144MiB |      3%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+

+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|    0   N/A  N/A      1611      G   /usr/lib/xorg/Xorg                            4MiB |
|    0   N/A  N/A      1859      C   ...libexec/gnome-remote-desktop-daemon      157MiB |
|    0   N/A  N/A      2310      G   ...libexec/gnome-remote-desktop-daemon        0MiB |
+---------------------------------------------------------------------------------------+
tom@adlgraphics:~$