# Using NVIDIA GPU with OVMS in Scenescape

## Pre-requisite

Follow instructions for enabling NVIDIA GPU Support from this Blog post:

[Deploying AI workloads with OpenVINO Model Server across CPUs and GPUs](https://blog.openvino.ai/blog-posts/deploy-ai-workloads-with-openvino-tm-model-server-across-cpus-and-gpus)

## Setup Docker Build Environment

Pull docker cuda runtime.
```
docker pull docker.io/nvidia/cuda:11.8.0-runtime-ubuntu20.04
```
or for Ubuntu 22.04:
```
docker pull docker.io/nvidia/cuda:11.8.0-runtime-ubuntu22.04
```

Follow instructions in the blog for installation the NVIDIA Container Toolkit. Generally, the steps are:

- download NVIDIA keyring
- install experimental packages
- apt update

```
sudo apt-get install -y nvidia-container-toolkit
```


## Fetch and Build OVMS

Fetch all of the model server sources from github.

```
mkdir ovms_nvidia
cd ovms_nvidia
git clone https://github.com/openvinotoolkit/model_server.git
cd model_server
```

Build the model server docker container.

```
NVIDIA=1 OV_USE_BINARY=0 OV_SOURCE_BRANCH=master OV_CONTRIB_BRANCH=master make docker_build
```

*Note: The build and test process will take anywhere from 20 - 45 minutes to complete.*

Results displayed at the end of build/test:

```
=> => writing image sha256:6664132b5bf15b0afe53e4acfc3829d712810500ad5a64e5a3511c599fd65b9b                                                 0.0s
=> => naming to docker.io/openvino/model_server-gpu:latest-cuda
```

View built containers via the "docker images" command.

```
tom@adlgraphics:~/develop/ovms_nvidia/model_server$ docker images
REPOSITORY                    TAG                          IMAGE ID       CREATED          SIZE
openvino/model_server-gpu     latest-cuda                  6664132b5bf1   18 minutes ago   5.3GB
openvino/model_server         latest-gpu-cuda              6664132b5bf1   18 minutes ago   5.3GB
nvidia/cuda                   11.8.0-runtime-ubuntu20.04   87fde1234010   6 months ago     2.66GB
nvidia/cuda                   11.8.0-runtime-ubuntu22.04   d8fb74ecc8b2   6 months ago     2.65GB
hello-world                   latest                       d2c94e258dcb   13 months ago    13.3kB
```


## Run NVIDIA Enabled OVMS Container

Follow the directions in OVMS documentation for setting-up the directory structure for video content. Where the directory
structure looks similair to:

```
workspace/
    person-detection-retail-0013
        1/
           person-detection-retail-0013.bin
           person-detection-retail-0013.xml

```

Set the model directory environment variable:

```
MODEL_DIR=/home/tom/develop/openvino
echo $MODEL_DIR
```

Run the model server docker container.
```
docker run -p 30001:30001 -p 30002:30002 -it --gpus all \
-v ${MODEL_DIR}/workspace:/workspace openvino/model_server:latest-cuda \
--model_path /workspace/person-detection-retail-0013 \
--model_name person-detection-retail-0013 --port 30001 \
--rest_port 30002 --target_device NVIDIA
```

When the OVMS server is running, output should be similar to:
```
[2024-05-31 11:36:28.233][1][modelmanager][info][modelinstance.cpp:1321] Number of OpenVINO streams: 1
[2024-05-31 11:36:28.233][1][modelmanager][info][modelinstance.cpp:757] Plugin config for device: NVIDIA
[2024-05-31 11:36:28.233][1][modelmanager][info][modelinstance.cpp:761] OVMS set plugin settings key: PERFORMANCE_HINT; value: LATENCY;
[2024-05-31 11:36:28.235][1][serving][info][modelinstance.cpp:824] Loaded model person-detection-retail-0013; version: 1; batch size: 1; No of InferRequests: 1
[2024-05-31 11:36:28.235][1][serving][info][modelversionstatus.cpp:109] STATUS CHANGE: Version 1 of model person-detection-retail-0013 status change. New status: ( "state": "AVAILABLE", "error_code": "OK" )
[2024-05-31 11:36:28.235][1][serving][info][model.cpp:88] Updating default version for model: person-detection-retail-0013, from: 0
[2024-05-31 11:36:28.235][1][serving][info][model.cpp:98] Updated default version for model: person-detection-retail-0013, to: 1
[2024-05-31 11:36:28.235][1][serving][info][servablemanagermodule.cpp:55] ServableManagerModule started
[2024-05-31 11:36:28.235][268][modelmanager][info][modelmanager.cpp:1086] Started cleaner thread
[2024-05-31 11:36:28.235][267][modelmanager][info][modelmanager.cpp:1067] Started model manager thread
```


## Testing OVMS Using Benchmark Client

Build the model server benchmark client following directions in "Deploy AI Workloads with OpenVINO™ Model Server across CPUs and GPUs" Blog.

When docker build completes, benchmark_client displays with "docker images" command

```
tom@adlgraphics:~/develop/ovms_nvidia/model_server/demos/benchmark/python$ docker images
REPOSITORY                    TAG                          IMAGE ID       CREATED          SIZE
benchmark_client              latest                       0aeba9dc0462   32 seconds ago   2.36GB
```

Run benchmark client.
```
docker run --network host benchmark_client -a localhost -r 30002 -m person-detection-retail-0013 -p 30001 -n 8 --report_warmup --print_all
```

The output of benchmark client shows latencies and frame rates.

```
XI worker: window_first_latency: 0.044996039000125165
XI worker: window_pass_max_latency: 0.044996039000125165
XI worker: window_fail_max_latency: 0.0
XI worker: window_brutto_batch_rate: 31.29846114349164
XI worker: window_brutto_frame_rate: 31.29846114349164
XI worker: window_netto_batch_rate: 26.631751020653166
XI worker: window_netto_frame_rate: 26.631751020653166
XI worker: window_frame_passrate: 1.0
XI worker: window_batch_passrate: 1.0
XI worker: window_mean_latency: 0.03754916450009205
XI worker: window_mean_latency2: 0.0014308553988021302
XI worker: window_stdev_latency: 0.004573362455257325
XI worker: window_cv_latency: 0.12179665023561613
XI worker: window_pass_mean_latency: 0.03754916450009205
XI worker: window_pass_mean_latency2: 0.0014308553988021302
XI worker: window_pass_stdev_latency: 0.004573362455257325
XI worker: window_pass_cv_latency: 0.12179665023561613
```


## Scenescape docker_compose.yml file configuration

The OVMS configuration section of docker_compose.yml should look similiar to the configuration below. Refer to
docker help for information under the "devices" section to use a selected GPU if multiple GPUs are installed.

```
ovms:
     image: openvino/model_server:latest-cuda
     user: "${UID}:${GID}"
     networks:
       scenescape:
     command: --config_path /models/ovms-config.json --port 30001 --rest_port 30002 --cache_dir /models/ovms/cache
     volumes:
      - ./models:/models
     restart: always
     deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: 1
              capabilities: [gpu]
```

For completeness, an example retail video section is shown below. There are four changes to
the scene section: depends_on, camerachain, ovmshost and the "volumes" section pointing to the directory
containing the models.

```
retail-video:
    image: scenescape:<version>
    init: true
    networks:
      scenescape:
    depends_on:
     - broker
     - ntpserv
     - ovms
    command:
     - "percebro"
     - "--camera=sample_data/apriltag-cam1.mp4"
     - "--cameraid=camera1"
     - "--intrinsics={\"fov\":70}"
     - "--camera=sample_data/apriltag-cam2.mp4"
     - "--cameraid=camera2"
     - "--intrinsics={\"fov\":70}"
     - "--camerachain=retail=ovms"
     - "--ovmshost=ovms:30001"
     - "--ntp=ntpserv"
     - "--auth=/run/secrets/percebro.auth"
     - "broker.scenescape.intel.com"
    privileged: true
    volumes:
     - ./models:/opt/intel/openvino/deployment_tools/intel_models
     - ./models/ovms-config.json:/opt/ml/ovms-config.json
     - ./models:/models
    secrets:
     - certs
     - percebro.auth
    restart: always
```

## Verifying that NVIDIA hardware is being utilized

Use nvtop to view GPU utilization while Scenescape is running.

```
sudo apt install nvtop
```

## Troubleshooting Tips

Tried several versions OV_SOURCE_BRANCH 2024.0 and 2024.1 and found that "master" pull was able to
build, others did not.

The command "nvidia-smi" kept returning "device not found", even though all of the NVIDIA drivers were
install on Ubuntu 22.04. The solution that worked was adding the line below to nvidia config file
in /etc/modprobe.d/. Also had to uninstall NVIDIA closed proprietary drivers and use the open version.

```
/etc/modprobe.d/ configuration file:

    options nvidia NVreg_OpenRmEnableUnsupportedGpus=1
```

```
tom@adlgraphics:~$ nvidia-smi
Fri May 31 07:25:12 2024
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.171.04             Driver Version: 535.171.04   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA GeForce RTX 3050        Off | 00000000:01:00.0 Off |                  N/A |
| 34%   34C    P2              20W /  70W |    185MiB /  6144MiB |      3%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+

+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|    0   N/A  N/A      1611      G   /usr/lib/xorg/Xorg                            4MiB |
|    0   N/A  N/A      1859      C   ...libexec/gnome-remote-desktop-daemon      157MiB |
|    0   N/A  N/A      2310      G   ...libexec/gnome-remote-desktop-daemon        0MiB |
+---------------------------------------------------------------------------------------+
tom@adlgraphics:~$
```