Model Tutorials#

The OpenVINO™ toolkit supports most TensorFlow and PyTorch models. The following table lists deep-learning models commonly used in the Embodied Intelligence solutions, and information on how to run them on Intel® platforms:

Algorithm	Description	Link
YOLOv8	CNN-based object detection	https://github.com/openvinotoolkit/openvino_notebooks/tree/latest/notebooks/yolov8-optimization
YOLOv12	CNN-based object detection	https://github.com/openvinotoolkit/openvino_notebooks/tree/latest/notebooks/yolov12-optimization
MobileNetV2	CNN-based object detection	https://github.com/openvinotoolkit/open_model_zoo/blob/master/models/public/mobilenet-v2-1.0-224
SAM	Transformer-based segmentation	https://github.com/openvinotoolkit/openvino_notebooks/tree/latest/notebooks/segment-anything
SAM2	Extend SAM to video segmentation and object tracking with cross attention to memory	https://github.com/openvinotoolkit/openvino_notebooks/tree/latest/notebooks/sam2-image-segmentation
FastSAM	Lightweight substitute to SAM	https://github.com/openvinotoolkit/openvino_notebooks/tree/latest/notebooks/fast-segment-anything
MobileSAM	Lightweight substitute to SAM (Same model architecture as SAM. See OpenVINO SAM tutorials for model export and application	https://github.com/openvinotoolkit/openvino_notebooks/tree/latest/notebooks/segment-anything
U-NET	CNN-based segmentation and diffusion model	https://community.intel.com/t5/Blogs/Products-and-Solutions/Healthcare/Optimizing-Brain-Tumor-Segmentation-BTS-U-Net-model-using-Intel/post/1399037?wapkw=U-Net
DETR	Transformer-based object detection	https://github.com/openvinotoolkit/open_model_zoo/tree/master/models/public/detr-resnet50
GroundingDino	Transformer-based object detection	https://github.com/openvinotoolkit/openvino_notebooks/tree/latest/notebooks/grounded-segment-anything
CLIP	Transformer-based image classification	https://github.com/openvinotoolkit/openvino_notebooks/tree/latest/notebooks/clip-zero-shot-image-classification
Qwen2.5VL	Multimodal large language model	https://github.com/openvinotoolkit/openvino_notebooks/tree/latest/notebooks/qwen2.5-vl
Whisper	Automatic speech recognition	https://github.com/openvinotoolkit/openvino_notebooks/tree/latest/notebooks/whisper-asr-genai
FunASR	Automatic speech recognition	See the FunASR Setup funasr-setup in LLM Robotics sample pipeline

Attention: When following these tutorials for model conversion, ensure that the OpenVINO toolkit version used for model conversion is the same as the runtime version used for inference. Otherwise, unexpected errors may occur, especially if the model is converted using a newer version and the runtime is an older version. See details in the Troubleshooting section.

Please also find information for the models of imitation learning, grasp generation, simultaneous localization and mapping (SLAM) and bird’s-eye view (BEV):

Note: Before using these models, read the AI Content Disclaimer.

Model Tutorials#

This Page