# ① Memory Interop and C++ abstract interfaces Deep Learning Streamer provides independent sub-component for zero-copy buffer sharing and memory interop between various frameworks and memory handles on CPU and GPU - CPU memory `void*` - FFmpeg `AVFrame` - GStreamer `GstBuffer` and `GstMemory` - Level-Zero `USM pointers` - OpenCL `cl_mem` - OpenCV `cv::Mat` - OpenCV `cv::UMat` - OpenVINO™ `ov::Tensor` and `ov::RemoteTensor` - SYCL `USM pointers` - VA-API `VASurfaceID` The memory interop sub-component is available via APT installation `sudo apt install intel-dlstreamer-cpp` and on [github](https://github.com/open-edge-platform/edge-ai-libraries/tree/main/libraries/dl-streamer/include/dlstreamer). > **Note:** This sub-component implemented as C++ header-only library. Python > bindings for this library coming in next releases. ## Why memory interop library? Each media and compute framework with accelerators support (GPU, VPU) defines own interfaces for device and context creation, memory allocation and task submission. Most frameworks also expose export/import interfaces to convert memory objects to/from other memory handles: - High-level media frameworks (FFmpeg, GStreamer) support conversion to/from low-level media handles (VA-API and DirectX surfaces) - Low-level media interfaces (VA-API, DirectX) support conversion to/from OS-specific general-purpose GPU memory handles such as DMA buffers on Linux and NT handles on Windows - OpenCL 3.0 recently introduced extension for DMA buffers and NT handles import and export - Intel® oneAPI Level Zero support conversion between USM device pointers (accessible on GPU only) and DMA buffers / NT handles Together these interfaces allow zero-copy memory sharing between media operations submitted via media frameworks and SYCL/OpenCL compute kernels submitted into SYCL/OpenCL queue, assuming media and compute queues created on same physical GPU device. Despite multiple stages of memory handles conversion (FFmpeg/GStreamer, VA-API/DirectX, DMA/NT, Level-Zero, SYCL), all converted memory handles refer to same physical memory block. Thus writing data into one memory handle makes the data available in all other memory handles, assuming proper synchronization between write and read operations. Below is reference to some low-level interfaces used by Deep Learning Streamer memory interop sub-components for zero-copy buffer sharing between media frameworks and OpenCL/SYCL 1. (Linux) [VA-API to DMA-BUF](http://intel.github.io/libva/group__api__core.html#ga404be4f513f3a15b9a831ff561b1b179) 2. [DMA-BUF or NT-Handle to Level-zero](https://oneapi-src.github.io/level-zero-spec/level-zero/latest/core/PROG.html#external-memory-import-and-export) 3. [OpenCL extension cl_khr_external_memory](https://registry.khronos.org/OpenCL/specs/3.0-unified/html/OpenCL_API.html#cl_khr_external_memory) ## Memory interop in a few lines - using Deep Learning Streamer Deep Learning Streamer hides complexity of dealing with low-level interfaces and greatly simplifies memory interop by defining abstract interfaces [Tensor](./api_ref/class_dlstreamer_Tensor.md) and [MemoryMapper](./api_ref/class_dlstreamer_MemoryMapper.md), and providing header-only implementation of the `Tensor` interface for various frameworks and `MemoryMapper` implementation for all technically feasible zero-copy mappings on CPU and GPU and mappings between CPU and GPU: ![memory_interop](../_images/memory-interop.svg) All memory mappers implemented under unified interface [MemoryMapper](./api_ref/class_dlstreamer_MemoryMapper.md) with [TensorPtr](./api_ref/class_dlstreamer_TensorPtr.md) or [FramePtr](./api_ref/class_dlstreamer_FramePtr.md) as input parameter, but each mapper from framework `AAA` to framework `BBB` internally casts input pointer to specific class `AAA` Tensor / `AAA` Frame and creates output as specific class `BBB` Tensor / `BBB` Frame, see table below for each supported framework/library: | Framework / Library | Native memory object | Class implementing [Tensor](./api_ref/class_dlstreamer_Tensor) | Class implementing [Frame](./api_ref/class_dlstreamer_Frame) | | --- | --- | --- | --- | |CPU (no framework)|void\*|[CPUTensor](./api_ref/class_dlstreamer_CPUTensor)|[BaseFrame](./api_ref/class_dlstreamer_BaseFrame)| |FFmpeg|AVFrame| |[FFmpegFrame](./api_ref/class_dlstreamer_FFmpegFrame)| |GStreamer|GstMemory, GstBuffer|[GSTTensor](./api_ref/class_dlstreamer_GSTTensor)|[GSTFrame](./api_ref/class_dlstreamer_GSTFrame)| |Level-zero|void\*|[USMTensor](./api_ref/class_dlstreamer_USMTensor)|[BaseFrame](./api_ref/class_dlstreamer_BaseFrame)| |OpenCL|cl_mem|[OpenCLTensor](./api_ref/class_dlstreamer_OpenCLTensor)|[BaseFrame](./api_ref/class_dlstreamer_BaseFrame)| |OpenCV|cv::Mat|[OpenCVTensor](./api_ref/class_dlstreamer_OpenCVTensor)|[BaseFrame](./api_ref/class_dlstreamer_BaseFrame)| |OpenCV|cv::UMat|[OpenCVUMatTensor](./api_ref/class_dlstreamer_OpenCVUMatTensor)|[BaseFrame](./api_ref/class_dlstreamer_BaseFrame)| |OpenVINO™|ov::Tensor|[OpenVINOTensor](./api_ref/class_dlstreamer_OpenVINOTensor)|[OpenVINOFrame](./api_ref/class_dlstreamer_OpenVINOFrame)| |SYCL|void\*|[SYCLUSMTensor](./api_ref/class_dlstreamer_SYCLUSMTensor)|[BaseFrame](./api_ref/class_dlstreamer_BaseFrame)| Application can create `Tensor` and `Frame` objects by either passing pre-allocated native memory object to C++ constructor (wrap already allocated object) or passing allocation parameters to C++ constructor (allocate new memory). Many examples how to allocate memory and create and use memory mappers can be found by searching word `mapper` in [samples https://github.com/open-edge-platform/edge-ai-libraries/tree/main/libraries/dl-streamer/samples\>]{.title-ref}\_\_ and [src https://github.com/open-edge-platform/edge-ai-libraries/tree/main/libraries/dl-streamer/src\>]{.title-ref}\_\_ folders on github source code, for example FFmpeg+DPCPP sample [rgb_to_grayscale https://github.com/open-edge-platform/edge-ai-libraries/tree/main/libraries/dl-streamer/samples/ffmpeg_dpcpp/rgb_to_grayscale\>]() and almost every C++ element. There is special mapper [MemoryMapperChain](./api_ref/class_dlstreamer_MemoryMapperChain) implementing unified interface [MemoryMapper](./api_ref/class_dlstreamer_MemoryMapper) as arbitrary chain of multiple mappers. As examples, FFmpeg to DPC++/USM is chain of the following mappers: ![ffmpeg-to-usm-memory-mappers-chain](../_images/c++-interfaces-and-classes.svg) and GStreamer to OpenCV UMat is chain of the following mappers: ![gst-to-usm-memory-mappers-chain](../_images/gst-to-usm-memory-mappers-chain.svg) ## Abstract interfaces for C++ elements Additionally, this Deep Learning Streamer sub-component defines abstract interfaces [Source](./api_ref/class_dlstreamer_Source) , [Transform](./api_ref/class_dlstreamer_Transform) and [Sink](./api_ref/class_dlstreamer_Sink) used as base interfaces for all C++ and GStreamer elements. These interfaces take unified pointers to [Tensor](./api_ref/class_dlstreamer_Tensor) and [Frame](./api_ref/class_dlstreamer_Frame) objects as input and output parameters in functions [read], [process], [write] and allow to easily build chain of multiple operations. See next page [C++ elements](cpp_elements). ## How to use in CMake build system If application uses Deep Learning Streamer memory interop library and application based on cmake build system, add `pkg_check_modules` and `include_directories` statements like below: ``` none pkg_check_modules(DLSTREAMER dl-streamer REQUIRED) include_directories(${DLSTREAMER_INCLUDE_DIRS}) ``` For each framework involved in memory interop, add corresponding `include_directories` and `link_libraries` statements as required/documented by framework. For example if using memory interop with OpenVINO™, cmake file should contain lines like below ``` none find_package(OpenVINO COMPONENTS runtime) include_directories(${OpenVINO_INCLUDE_DIRS}) link_libraries(openvino::runtime) ``` ## Files structure Abstract interfaces are defined in the following header files and installed by `sudo apt install intel-dlstreamer-cpp` under folder `/opt/intel/dlstreamer/include/dlstreamer`: ``` none include/dlstreamer ├── audio_info.h ├── context.h ├── dictionary.h ├── element.h ├── frame.h ├── frame_info.h ├── image_info.h ├── image_metadata.h ├── memory_mapper_factory.h ├── memory_mapper.h ├── memory_type.h ├── metadata.h ├── sink.h ├── source.h ├── tensor.h ├── tensor_info.h ├── transform.h └── utils.h ``` The following header files implement [Tensor](./api_ref/class_dlstreamer_Tensor) interface memory objects in various frameworks and [MemoryMapper](./api_ref/class_dlstreamer_MemoryMapper) for memory mapping between frameworks. These header files installed under corresponding subfolders of `/opt/intel/dlstreamer/include/dlstreamer` by same package `intel-dlstreamer-cpp`: ``` none include/dlstreamer ├── ffmpeg │ ├── mappers │ │ └── ffmpeg_to_vaapi.h │ ├── context.h │ ├── frame.h │ └── utils.h ├── gst │ ├── allocator.h │ ├── context.h │ ├── dictionary.h │ ├── frame_batch.h │ ├── frame.h │ ├── mappers │ │ ├── any_to_gst.h │ │ ├── gst_to_cpu.h │ │ ├── gst_to_dma.h │ │ ├── gst_to_opencl.h │ │ └── gst_to_vaapi.h │ ├── metadata │ │ ├── gva_audio_event_meta.h │ │ ├── gva_json_meta.h │ │ └── gva_tensor_meta.h │ ├── metadata.h │ ├── plugin.h │ ├── tensor.h │ └── utils.h ├── level_zero │ ├── context.h │ ├── mappers │ │ ├── dma_to_usm.h │ │ └── usm_to_dma.h │ └── usm_tensor.h ├── opencl │ ├── context.h │ ├── mappers │ │ ├── dma_to_opencl.h │ │ ├── opencl_to_cpu.h │ │ └── opencl_to_dma.h │ ├── tensor.h │ ├── tensor_ref_counted.h │ └── utils.h ├── opencv │ ├── context.h │ ├── mappers │ │ └── cpu_to_opencv.h │ ├── tensor.h │ └── utils.h ├── opencv_umat │ ├── context.h │ ├── mappers │ │ └── opencl_to_opencv_umat.h │ ├── tensor.h │ └── utils.h ├── openvino │ ├── context.h │ ├── frame.h │ ├── mappers │ │ ├── cpu_to_openvino.h │ │ ├── opencl_to_openvino.h │ │ ├── openvino_to_cpu.h │ │ └── vaapi_to_openvino.h │ ├── tensor.h │ └── utils.h ├── sycl │ ├── context.h │ ├── mappers │ │ └── sycl_usm_to_cpu.h │ └── sycl_usm_tensor.h └── vaapi ├── context.h ├── frame_alloc.h ├── frame.h ├── mappers │ ├── dma_to_vaapi.h │ └── vaapi_to_dma.h ├── tensor.h └── utils.h ```