OpenVINO™ Optimization of Robotics VLA Model Pi0.5#
This example shows how to optimize the Vision-Language-Action (VLA) model Pi0.5 with Intel® OpenVINO™, compress model weights to INT8 and benchmark using the OpenVINO benchmark_app tool.
What Steps VLA Models Perform#
Vision: Process and understand visual (image/video) information
Language: Receive a language task (e.g., pick up the dishes)
Action: Given the visual input (e.g., picture of the room) and natural language task (e.g., clean the room) and turn them into an action for the robot to take
About Pi0.5#
Pi0.5 is a state-of-the-art VLA model which supports long horizon tasks and open world generalization. During runtime when given a high level prompt (e.g., clean the room), Pi0.5 predicts relevant semantic subtasks giving the relevant behavior to perform next based on the semantics of the room layout (e.g., rearrange the pillow). Based on this subtask, the model then generates a low-level robot action chunk.
Overview#
This tutorial covers:
Converting the Pi0.5 model from PyTorch to ONNX
Exporting the ONNX model to OpenVINO intermediate representation
Compressing model weights to INT8 using NNCF
Benchmarking the model using the OpenVINO benchmark tool
Validating the optimized model outputs
Source Code#
The source code for this sample can be found here: VLA-Pi0.5-OpenVINO
Environment and Model Setup#
Create a Python 3.10 virtual environment and activate it with the following command:
sudo apt install python3-venv python3 -m venv pi05_env source pi05_env/bin/activate
Install LeRobot from source with the following command:
git clone https://github.com/huggingface/lerobot.git cd lerobot pip install -e ".[pi]"
Install additional dependencies including OpenVINO and NNCF:
pip install onnx==1.20.0 openvino==2025.4.0 nncf==2.19.0
Within the LeRobot Pi0.5 source code, there are some operations which are not supported by ONNX and must be modified to ensure successful model conversion.
Navigate to the
modeling_pi05.pyfile found atlerobot/src/lerobot/policies/pi05/modeling_pi05.pyand find thesample_noise()method as shown below. This method samples from a normal distribution which will cause the ONNX conversion to fail.def sample_noise(self, shape, device): return torch.normal( mean=0.0, std=1.0, size=shape, dtype=torch.float32, device=device, )
Now, modify this method set generated noise vector to instead initialize noise vector as zeros as shown below:
def sample_noise(self, shape, device): return torch.zeros(shape, dtype=torch.float32, device=device)
Additionally, within the
modeling_pi05.pyfile, locate thesample_time()method. This samples from a beta distrubtion which will also cause ONNX export to fail.def sample_time(self, bsize, device): time_beta = sample_beta( self.config.time_sampling_beta_alpha, self.config.time_sampling_beta_beta, bsize, device ) time = time_beta * self.config.time_sampling_scale + self.config.time_sampling_offset return time.to(dtype=torch.float32, device=device)
Now Modify this
sample_time()method to match that shown below. This sets the tensor value to the mean value for the mean of the beta distribution and will allow for successful model conversion:def sample_time(self, bsize, device): time = torch.full((bsize,), 1.5 / (1.5 + 1.0), device=device, dtype=torch.float32) # Beta mean time = time * 0.999 + 0.001 return time.to(dtype=torch.float32, device=device)
Model Conversion and OpenVINO™ Optimization#
Clone the edge-ai-suites repository and then run the
convert_pytorch_onnx.pyscript. This will download the HuggingFace Pi0.5 model taken from here and will convert it to ONNX using thetorch.onnx.exportmethod.cd .. git clone https://github.com/open-edge-platform/edge-ai-suites cd edge-ai-suites/robotics-ai-suite/pipelines/vla-pi0.5-openvino python convert_pytorch_onnx.py
Next, run the
onnx_to_ov_ir.pyscript. This will then generate the OV IR form of the model.python onnx_to_ov_ir.pyThe snippet below shows how in this script the ONNX representation of the model is converted to OpenVINO using the
openvino.convert_modelmethod:ov_model = ov.convert_model("pi05_onnx/pi05.onnx")
Optionally, to compress the model to FP16 modify the
openvino.save_modelmethod in theonnx_to_ov_ir.pyby settingcompress_to_fp16=True:ov.save_model(ov_model, output_model=f"{output_dir}/model.xml", compress_to_fp16=True)
Run the
nncf_int8_compression.pyfile to quantize the OpenVINO Pi0.5 model to INT8.The snippet below shows how the uncompressed OpenVINO model is compressed to INT8 using Intel Neural Network Compression (NNCF):
from nncf import compress_weights compression_mode = CompressWeightsMode.INT8_ASYM uncompressed_model = core.read_model(model=model_xml_path) compressed_model = compress_weights( model=uncompressed_model, mode=compression_mode, all_layers=True )
Benchmarking#
To benchmark the model using the OpenVINO benchmark_tool application on CPU:
Convert the Pi0.5 model to OpenVINO as described in section Model Conversion and OpenVINO™ Optimization.
Run the following command to utilize OpenVINO command line
benchmark_toolwith the compressed Pi0.5 model:benchmark_app -m pi05_lerobot_ov_ir_INT8/model.xml -hint latency -shape "images[1,1,3,224,224],img_masks[1,1],lang_tokens[1,200],lang_masks[1,200],state[1,32],actions[1,50,32]" -d CPU
Validation (Optional)#
To validate the outputs of the model ensuring that model predictions are the same before and after OpenVINO optimization:
Ensure you have ran
convert_pytorch_onnx.pyscript (see Model Conversion and OpenVINO™ Optimization). This will generate a random input tensor and pass it through the original HuggingFace Pi0.5 model and save both the model input and output in the validation folder.Run the
validation/lerobot_ov_inferencing.pyfile on the randomly generated input tensor from step 1. This will generate the model output for that tensor.cd validation python lerobot_ov_inferencing.py
Run
validation/is_same_tensor.pyand modify it to compare the original PyTorch Pi0.5 output and the OV optimized output tensors. MSE should be<1e-3showing that optimized model yields same predictions as the original PyTorch model.python is_same_tensor.py