Model Tutorials#

The OpenVINO™ toolkit supports most TensorFlow and PyTorch models. The following table lists deep-learning models commonly used in the Embodied Intelligence solutions, and information on how to run them on Intel® platforms:

Algorithm

Description

Link

YOLOv8

CNN-based object detection

https://github.com/openvinotoolkit/openvino_notebooks/tree/latest/notebooks/yolov8-optimization

YOLOv12

CNN-based object detection

https://github.com/openvinotoolkit/openvino_notebooks/tree/latest/notebooks/yolov12-optimization

MobileNetV2

CNN-based object detection

https://github.com/openvinotoolkit/open_model_zoo/blob/master/models/public/mobilenet-v2-1.0-224

SAM

Transformer-based segmentation

https://github.com/openvinotoolkit/openvino_notebooks/tree/latest/notebooks/segment-anything

SAM2

Extend SAM to video segmentation and object tracking with cross attention to memory

https://github.com/openvinotoolkit/openvino_notebooks/tree/latest/notebooks/sam2-image-segmentation

FastSAM

Lightweight substitute to SAM

https://github.com/openvinotoolkit/openvino_notebooks/tree/latest/notebooks/fast-segment-anything

MobileSAM

Lightweight substitute to SAM (Same model architecture as SAM. See OpenVINO SAM tutorials for model export and application

https://github.com/openvinotoolkit/openvino_notebooks/tree/latest/notebooks/segment-anything

U-NET

CNN-based segmentation and diffusion model

https://community.intel.com/t5/Blogs/Products-and-Solutions/Healthcare/Optimizing-Brain-Tumor-Segmentation-BTS-U-Net-model-using-Intel/post/1399037?wapkw=U-Net

DETR

Transformer-based object detection

https://github.com/openvinotoolkit/open_model_zoo/tree/master/models/public/detr-resnet50

GroundingDino

Transformer-based object detection

https://github.com/openvinotoolkit/openvino_notebooks/tree/latest/notebooks/grounded-segment-anything

CLIP

Transformer-based image classification

https://github.com/openvinotoolkit/openvino_notebooks/tree/latest/notebooks/clip-zero-shot-image-classification

Qwen2.5VL

Multimodal large language model

https://github.com/openvinotoolkit/openvino_notebooks/tree/latest/notebooks/qwen2.5-vl

Whisper

Automatic speech recognition

https://github.com/openvinotoolkit/openvino_notebooks/tree/latest/notebooks/whisper-asr-genai

FunASR

Automatic speech recognition

See the FunASR Setup funasr-setup in LLM Robotics sample pipeline

Attention: When following these tutorials for model conversion, ensure that the OpenVINO toolkit version used for model conversion is the same as the runtime version used for inference. Otherwise, unexpected errors may occur, especially if the model is converted using a newer version and the runtime is an older version. See details in the Troubleshooting section.

Please also find information for the models of imitation learning, grasp generation, simultaneous localization and mapping (SLAM) and bird’s-eye view (BEV):

Note: Before using these models, read the AI Content Disclaimer.