OpenVINO™ Optimization of Robotics VLA Model Pi0.5#

This example shows how to optimize the Vision-Language-Action (VLA) model Pi0.5 with Intel® OpenVINO™, compress model weights to INT8 and benchmark using the OpenVINO benchmark_app tool.

What Steps VLA Models Perform#

Vision: Process and understand visual (image/video) information
Language: Receive a language task (e.g., pick up the dishes)
Action: Given the visual input (e.g., picture of the room) and natural language task (e.g., clean the room) and turn them into an action for the robot to take

About Pi0.5#

Pi0.5 is a state-of-the-art VLA model which supports long horizon tasks and open world generalization. During runtime when given a high level prompt (e.g., clean the room), Pi0.5 predicts relevant semantic subtasks giving the relevant behavior to perform next based on the semantics of the room layout (e.g., rearrange the pillow). Based on this subtask, the model then generates a low-level robot action chunk.

Overview#

This tutorial covers:

Converting the Pi0.5 model from PyTorch to ONNX
Exporting the ONNX model to OpenVINO intermediate representation
Compressing model weights to INT8 using NNCF
Benchmarking the model using the OpenVINO benchmark tool
Validating the optimized model outputs

Source Code#

The source code for this sample can be found here: VLA-Pi0.5-OpenVINO

Environment and Model Setup#

Create a Python 3.10 virtual environment and activate it with the following command:

sudo apt install python3-venv
python3 -m venv pi05_env
source pi05_env/bin/activate

Install LeRobot from source with the following command:

git clone https://github.com/huggingface/lerobot.git
cd lerobot
pip install -e ".[pi]"

Install additional dependencies including OpenVINO and NNCF:

pip install onnx==1.20.0 openvino==2025.4.0 nncf==2.19.0

Within the LeRobot Pi0.5 source code, there are some operations which are not supported by ONNX and must be modified to ensure successful model conversion.

Navigate to the modeling_pi05.py file found at lerobot/src/lerobot/policies/pi05/modeling_pi05.py and find the sample_noise() method as shown below. This method samples from a normal distribution which will cause the ONNX conversion to fail.

def sample_noise(self, shape, device):
    return torch.normal(
        mean=0.0,
        std=1.0,
        size=shape,
        dtype=torch.float32,
        device=device,
    )

Now, modify this method set generated noise vector to instead initialize noise vector as zeros as shown below:

def sample_noise(self, shape, device):
    return torch.zeros(shape, dtype=torch.float32, device=device)

Additionally, within the modeling_pi05.py file, locate the sample_time() method. This samples from a beta distrubtion which will also cause ONNX export to fail.

def sample_time(self, bsize, device):
    time_beta = sample_beta(
        self.config.time_sampling_beta_alpha, self.config.time_sampling_beta_beta, bsize, device
    )
    time = time_beta * self.config.time_sampling_scale + self.config.time_sampling_offset
    return time.to(dtype=torch.float32, device=device)

Now Modify this sample_time() method to match that shown below. This sets the tensor value to the mean value for the mean of the beta distribution and will allow for successful model conversion:

def sample_time(self, bsize, device):
    time = torch.full((bsize,), 1.5 / (1.5 + 1.0), device=device, dtype=torch.float32)  # Beta mean
    time = time * 0.999 + 0.001
    return time.to(dtype=torch.float32, device=device)

Model Conversion and OpenVINO™ Optimization#

Clone the edge-ai-suites repository and then run the convert_pytorch_onnx.py script. This will download the HuggingFace Pi0.5 model taken from here and will convert it to ONNX using the torch.onnx.export method.
```
cd ..
git clone https://github.com/open-edge-platform/edge-ai-suites
cd edge-ai-suites/robotics-ai-suite/pipelines/vla-pi0.5-openvino
python convert_pytorch_onnx.py
```
Next, run the onnx_to_ov_ir.py script. This will then generate the OV IR form of the model.
```
python onnx_to_ov_ir.py
```
The snippet below shows how in this script the ONNX representation of the model is converted to OpenVINO using the openvino.convert_model method:
```
ov_model = ov.convert_model("pi05_onnx/pi05.onnx")
```

Optionally, to compress the model to FP16 modify the openvino.save_model method in the onnx_to_ov_ir.py by setting compress_to_fp16=True:

ov.save_model(ov_model,
            output_model=f"{output_dir}/model.xml",
            compress_to_fp16=True)

Run the nncf_int8_compression.py file to quantize the OpenVINO Pi0.5 model to INT8.

The snippet below shows how the uncompressed OpenVINO model is compressed to INT8 using Intel Neural Network Compression (NNCF):

from nncf import compress_weights
compression_mode = CompressWeightsMode.INT8_ASYM
uncompressed_model = core.read_model(model=model_xml_path)
compressed_model = compress_weights(
    model=uncompressed_model,
    mode=compression_mode,
    all_layers=True
)

Benchmarking#

To benchmark the model using the OpenVINO benchmark_tool application on CPU:

Convert the Pi0.5 model to OpenVINO as described in section Model Conversion and OpenVINO™ Optimization.

Run the following command to utilize OpenVINO command line benchmark_tool with the compressed Pi0.5 model:

benchmark_app -m pi05_lerobot_ov_ir_INT8/model.xml -hint latency -shape "images[1,1,3,224,224],img_masks[1,1],lang_tokens[1,200],lang_masks[1,200],state[1,32],actions[1,50,32]" -d CPU

Validation (Optional)#

To validate the outputs of the model ensuring that model predictions are the same before and after OpenVINO optimization:

Ensure you have ran convert_pytorch_onnx.py script (see Model Conversion and OpenVINO™ Optimization). This will generate a random input tensor and pass it through the original HuggingFace Pi0.5 model and save both the model input and output in the validation folder.
Run the validation/lerobot_ov_inferencing.py file on the randomly generated input tensor from step 1. This will generate the model output for that tensor.
```
cd validation
python lerobot_ov_inferencing.py
```
Run validation/is_same_tensor.py and modify it to compare the original PyTorch Pi0.5 output and the OV optimized output tensors. MSE should be <1e-3 showing that optimized model yields same predictions as the original PyTorch model.
```
python is_same_tensor.py
```