Model Tutorials#

Intel OpenVINO supports most of the TensorFlow and PyTorch models. The table below lists some deep learning models that commonly used in the Embodied Intelligence solutions. You can find information about how to run them on Intel platforms:

Algorithm

Description

Link

YOLOv8

CNN based object detection

openvinotoolkit/openvino_notebooks

YOLOv12

CNN based object detection

openvinotoolkit/openvino_notebooks

MobileNetV2

CNN based object detection

openvinotoolkit/open_model_zoo

SAM

Transformer based segmentation

openvinotoolkit/openvino_notebooks

SAM2

Extend SAM to video segmentation and object tracking with cross attention to memory

openvinotoolkit/openvino_notebooks

FastSAM

Lightweight substitute to SAM

openvinotoolkit/openvino_notebooks

MobileSAM

Lightweight substitute to SAM (Same model architecture with SAM. Can refer to OpenVINO SAM tutorials for model export and application)

openvinotoolkit/openvino_notebooks

U-NET

CNN based segmentation and diffusion model

https://community.intel.com/t5/Blogs/Products-and-Solutions/Healthcare/Optimizing-Brain-Tumor-Segmentation-BTS-U-Net-model-using-Intel/post/1399037?wapkw=U-Net

DETR

Transformer based object detection

openvinotoolkit/open_model_zoo

GroundingDino

Transformer based object detection

openvinotoolkit/openvino_notebooks

CLIP

Transformer based image classification

openvinotoolkit/openvino_notebooks

Qwen2.5VL

Multimodal large language model

openvinotoolkit/openvino_notebooks

Whisper

Automatic speech recognition

openvinotoolkit/openvino_notebooks

FunASR

Automatic speech recognition

Refer to the FunASR Setup in LLM Robotics sample pipeline

Attention

When following these tutorials for model conversion, please ensure that the OpenVINO version used for model conversion is the same as the runtime version used for inference. Otherwise, unexpected errors may occur, especially if the model is converted using a newer version and the runtime is an older version. See more details in the Troubleshooting.

Please also find information for the models of imitation learning, grasp generation, simultaneous localization and mapping (SLAM) and bird’s-eye view (BEV):

Note

Before using these models, please ensure that you have read the AI Content Disclaimer.

Algorithm	Description	Link
YOLOv8	CNN based object detection	openvinotoolkit/openvino_notebooks
YOLOv12	CNN based object detection	openvinotoolkit/openvino_notebooks
MobileNetV2	CNN based object detection	openvinotoolkit/open_model_zoo
SAM	Transformer based segmentation	openvinotoolkit/openvino_notebooks
SAM2	Extend SAM to video segmentation and object tracking with cross attention to memory	openvinotoolkit/openvino_notebooks
FastSAM	Lightweight substitute to SAM	openvinotoolkit/openvino_notebooks
MobileSAM	Lightweight substitute to SAM (Same model architecture with SAM. Can refer to OpenVINO SAM tutorials for model export and application)	openvinotoolkit/openvino_notebooks
U-NET	CNN based segmentation and diffusion model	https://community.intel.com/t5/Blogs/Products-and-Solutions/Healthcare/Optimizing-Brain-Tumor-Segmentation-BTS-U-Net-model-using-Intel/post/1399037?wapkw=U-Net
DETR	Transformer based object detection	openvinotoolkit/open_model_zoo
GroundingDino	Transformer based object detection	openvinotoolkit/openvino_notebooks
CLIP	Transformer based image classification	openvinotoolkit/openvino_notebooks
Qwen2.5VL	Multimodal large language model	openvinotoolkit/openvino_notebooks
Whisper	Automatic speech recognition	openvinotoolkit/openvino_notebooks
FunASR	Automatic speech recognition	Refer to the FunASR Setup in LLM Robotics sample pipeline

Model Tutorials#

This Page