How to use GPU for decode and inference#

In order to benefit from hardware acceleration devices, pipelines can be constructed in a manner that different stages such as decoding, inference etc., can make use of them.

Pre-requisites#

Ensure you have a GPU#

To determine which graphics processor you have, please follow this document.

Provide GPU access to the container#

For containerized application such as the DLStreamer Pipeline Server, first we need to provide GPU device(s) access to the container user. This can be done by making the following changes to the docker compose file.

services:
  dlstreamer-pipeline-server:
    group_add:
      # render group ID for ubuntu 22.04 host OS
      - "110"
      # render group ID for ubuntu 24.04 host OS
      - "992"
    devices:
      # you can add specific devices in case you don't want to provide access to all like below.
      - "/dev:/dev"

The changes above adds the container user to the render group and provides access to the GPU devices.

Hardware specific encoder/decoders#

Unlike the changes done for the container above, the following requires a modification to the media pipeline itself.

Gstreamer has a variety of hardware specific encoders and decoders elements such as Intel specific VA-API elements that you can benefit from by adding them into your media pipeline. Examples of such elements are vah264dec, vah264enc, vajpegdec, vajpegdec etc.,

Additionally, one can also enforce zero-copy of buffers using GStreamer caps (capabilities) to the pipeline by adding video/x-raw(memory: VAMemory) for Intel GPUs (integrated and discrete).

Read DLStreamer docs for more details.

GPU specific element properties#

DLStreamer inference elements also provides property such as pre-process-backend=va-surface-sharing and device=GPU to pre-process and infer on GPU. Read DLStreamer docs for more details.

Tutorial on how to use GPU specific pipelines#

Note - DLStreamer Pipeline Server already provides a default [WORKDIR]/edge-ai-libraries/microservices/dlstreamer-pipeline-server/docker/docker-compose.yml file that includes the necessary GPU access to the container.

  • A sample config has been provided for this demonstration at [WORKDIR]/edge-ai-libraries/microservices/dlstreamer-pipeline-server/configs/sample_gpu_decode_and_inference/config.json. We need to volume mount the sample config file into dlstreamer-pipeline-server service present in [WORKDIR]/edge-ai-libraries/microservices/dlstreamer-pipeline-server/docker/docker-compose.yml file. Refer below snippets:

        volumes:
        # Volume mount [WORKDIR]/edge-ai-libraries/microservices/dlstreamer-pipeline-server/configs/sample_gpu_decode_and_inference/config.json to config file that DL Streamer Pipeline Server container loads.
        - "../configs/sample_gpu_decode_and_inference/config.json:/home/pipeline-server/config.json"
    
  • In the pipeline string in the above config file, we have added GPU specific elements/properties for decoding and inferencing on GPU backend. We will now start the pipeline with a curl request

    curl localhost:8080/pipelines/user_defined_pipelines/pallet_defect_detection -X POST -H 'Content-Type: application/json' -d '{
        "source": {
            "uri": "file:///home/pipeline-server/resources/videos/warehouse.avi",
            "type": "uri"
        },
        "destination": {
            "metadata": {
                "type": "file",
                "path": "/tmp/results.jsonl",
                "format": "json-lines"
            }
        },
        "parameters": {
            "detection-properties": {
                "model": "/home/pipeline-server/resources/models/geti/pallet_defect_detection/deployment/Detection/model/model.xml"
            }
        }
    }'
    
  • Restart DL Streamer pipeline server

        cd [WORKDIR]/edge-ai-libraries/microservices/dlstreamer-pipeline-server/docker/
        docker compose down
        docker compose up
    
  • We should see the metadata results in /tmp/results.jsonl file.

  • To perform decode and inference on CPU, please see this document. For more combinations of different devices for decode and inference, please see this document