# LLM Robotics Demo
This is a code-generation pipeline for robotics, which interacts with a chat bot utilizing AI technologies such as large language models (Phi-4) and computer vision (SAM, CLIP). This pipeline uses your voice or text commands as prompts to the robotics agent for generating corresponding actions.
This tutorial shows you how to set up a real-time system to control a JAKA robot arm, with movement commands generated using an LLM. The following figure shows the demo architecture:

## Prerequisites
Ensure you have completed the setup steps in [Get Started](../get_started.md) and have the following:
| Specification | Recommendation |
| ------------- | ----------------------------------- |
| Processor | Intel® Core™ Ultra 7 Processor 265H |
| Storage | 256G |
| Memory | LPDDR5, 6400 MHz, 16G x 2 |
## Set up JAKA Robot Arm
This section shows how to set up a simulation of the JAKA robot-arm ROS2 application.
### Install PLCopen Library
1. Install the dependencies:
```bash
sudo apt install libeigen3-dev python3-pip python3-venv cmake
sudo python3 -m pip install pymodbus==v3.6.9
```
2. Install the PLCopen library:
```bash
sudo apt install libshmringbuf libshmringbuf-dev plcopen-ruckig plcopen-ruckig-dev plcopen-motion plcopen-motion-dev plcopen-servo plcopen-servo-dev plcopen-databus plcopen-databus-dev
```
### Install ROS2 Iron Distribution
1. Install the dependencies:
```bash
sudo apt update && sudo apt install -y locales curl gnupg2 lsb-release
```
2. Set up the Intel® oneAPI APT repository:
```bash
sudo -E wget -O- https://apt.repos.intel.com/intel-gpg-keys/GPG-PUB-KEY-INTEL-SW-PRODUCTS.PUB | gpg --dearmor | sudo tee /usr/share/keyrings/oneapi-archive-keyring.gpg > /dev/null
echo "deb [signed-by=/usr/share/keyrings/oneapi-archive-keyring.gpg] https://apt.repos.intel.com/oneapi all main" | sudo tee /etc/apt/sources.list.d/oneAPI.list
sudo apt update
```
3. Set up the public ROS2 Iron APT repository:
```bash
sudo curl -sSL https://raw.githubusercontent.com/ros/rosdistro/master/ros.key -o /usr/share/keyrings/ros-archive-keyring.gpg
echo "deb [arch=$(dpkg --print-architecture) signed-by=/usr/share/keyrings/ros-archive-keyring.gpg] http://packages.ros.org/ros2/ubuntu $(source /etc/os-release && echo $UBUNTU_CODENAME) main" | sudo tee /etc/apt/sources.list.d/ros2.list > /dev/null
sudo bash -c 'echo -e "Package: *\nPin: origin eci.intel.com\nPin-Priority: -1" > /etc/apt/preferences.d/isar'
sudo apt update
```
4. Install ROS2 Iron packages:
```bash
sudo apt install -y python3-colcon-common-extensions python3-argcomplete python3-pykdl
sudo apt install -y ros-iron-desktop ros-iron-moveit* ros-iron-osqp-vendor ros-iron-ament-cmake-google-benchmark librange-v3-dev ros-iron-ros-testing
sudo bash -c 'echo -e "Package: *\nPin: origin eci.intel.com\nPin-Priority: 1000" > /etc/apt/preferences.d/isar'
```
### Install JAKA Robot Arm Application
1. Download the JAKA robot arm source code:
```bash
cd ~/Downloads/
sudo apt source ros-humble-pykdl-utils ros-humble-jaka-bringup ros-humble-jaka-description ros-humble-jaka-hardware ros-humble-jaka-moveit-config ros-humble-jaka-moveit-py ros-humble-jaka-servo ros-humble-run-jaka-moveit ros-humble-run-jaka-plc
```
2. Create a workspace for the robot arm source code:
```bash
mkdir -p ~/ws_jaka/src
cp -r ~/Downloads/ros-humble-jaka-bringup-3.2.0/robot_arm/ ~/ws_jaka/src
```
3. Build the JAKA robot arm source code:
```bash
cd ~/ws_jaka/ && source /opt/ros/iron/setup.bash
touch src/robot_arm/jaka/jaka_servo/COLCON_IGNORE
colcon build
```
## Set up the Fundamental End-to-End Speech Recognition (FunASR) Toolkit
This section shows how to set up the FunASR toolkit server.
### Install Dependencies
```bash
sudo apt-get install cmake libopenblas-dev libssl-dev portaudio19-dev ffmpeg git python3-pip -y
```
### Add OpenVINO Toolkit Speech Model to FunASR Toolkit
1. Install FunASR environment:
```bash
sudo apt install funasr llm-robotics
cd /opt/funasr/
sudo bash install_funasr.sh
```
2. Install the `asr-openvino` model script:
```bash
sudo chown -R $USER /opt/funasr/
sudo chown -R $USER /opt/llm-robotics/
mkdir /opt/funasr/FunASR/funasr/models/intel/
cp -r /opt/llm-robotics/asr-openvino-demo/models/* /opt/funasr/FunASR/funasr/models/intel/
```
3. Create a virtual Python environment for running FunASR toolkit:
```bash
cd /opt/funasr/
python3 -m venv venv-asr
source venv-asr/bin/activate
pip install modelscope==1.17.1 onnx==1.16.2 humanfriendly==10.0 pyaudio websocket==0.2.1 websockets==12.0 translate==3.6.1 kaldi_native_fbank==1.20.0 onnxruntime==1.18.1 torchaudio==2.4.0 openvino==2024.3.0
```
4. Build the `asr-openvino` model:
```bash
cd /opt/funasr/FunASR/
pip install -e ./
python ov_convert_FunASR.py
cp -r ~/.cache/modelscope/hub/iic/speech_seaco_paraformer_large_asr_nat-zh-cn-16k-common-vocab8404-pytorch /opt/llm-robotics/asr-openvino-demo/
```
5. Quantize the model using `ovc`:
```bash
cd /opt/llm-robotics/asr-openvino-demo/speech_seaco_paraformer_large_asr_nat-zh-cn-16k-common-vocab8404-pytorch/
ovc model.onnx --output_model=model_bb_fp16
ovc model_eb.onnx --output_model=model_eb_fp16
```
6. Modify the `configuration.json` file of the speech model:
```shell
# modify model_name_in_hub.ms & file_path_metas.init_param
{
"framework": "pytorch",
"task" : "auto-speech-recognition",
"model": {"type" : "funasr"},
"pipeline": {"type":"funasr-pipeline"},
"model_name_in_hub": {
"ms":"",
"hf":""},
"file_path_metas": {
"init_param":"model_bb_fp16.xml",
"config":"config.yaml",
"tokenizer_conf": {"token_list": "tokens.json", "seg_dict_file": "seg_dict"},
"frontend_conf":{"cmvn_file": "am.mvn"}}
}
```
7. Reinstall the `funasr` model of FunASR toolkit:
```bash
cd /opt/funasr/FunASR/
pip uninstall funasr
pip install -e ./
```
## Set up LLM and Vision Models
This section shows how to set up a virtual Python environment to run the LLM demo.
### Set up a Virtual Environment for the Application
1. Install the `pip` packages for LLM:
```bash
cd /opt/llm-robotics/LLM/
python3 -m venv venv-llm
source venv-llm/bin/activate
python -m pip install --upgrade pip
pip install -r requirement.txt
```
2. Set the environment variable:
```bash
# If you have connection issue on HuggingFace in PRC, please set-up the networking environment by following commands:
export HF_ENDPOINT="https://hf-mirror.com"
# transformers offline: export TRANSFORMERS_OFFLINE=1
```
### Set up the SAM Model
See the following OpenVINO documentation to export and save the `SAM` model:
- [Segment Anything](https://github.com/openvinotoolkit/openvino_notebooks/tree/latest/notebooks/segment-anything)
Modify the loading PATH of models to the exported model path, the default path is:
```bash
# /opt/llm-robotics/LLM/utils/mobilesam_helper.py:L88-L89
ov_sam_encoder_path = f"/home/intel/ov_models/sam_image_encoder.xml"
ov_sam_predictor_path = f"/home/intel/ov_models/sam_mask_predictor.xml"
```
### Set up the CLIP Model
See the following OpenVINO documentation to export and save `CLIP (ViT-B)` model:
- [CLIP](https://github.com/openvinotoolkit/openvino_notebooks/tree/latest/notebooks/clip-zero-shot-image-classification)
Modify the loading PATH of models to the exported model path, the default path is:
```bash
# /opt/llm-robotics/LLM/utils/mobilesam_helper.py:L87
clip_model_path = f"/home/intel/ov_models/clip-vit-base-patch16.xml"
```
### Set up the `Phi-4-mini-instruct-int8-ov` Model
Download `Phi-4-mini-instruct-int8-ov` models:
```bash
sudo apt install git-lfs
mkdir ~/ov_models && cd ~/ov_models
GIT_LFS_SKIP_SMUDGE=1 git clone https://huggingface.co/OpenVINO/Phi-4-mini-instruct-int8-ov
git lfs pull
```
Set the environment variables:
Modify the loading PATH of models to the exported model path, the default path is:
```shell
# /opt/llm-robotics/LLM/llm_bridge.py:L27
self.model_path = "/home/intel/ov_models/Phi-4-mini-instruct-int8-ov"
```
## Run the Pipeline
This section shows how to launch the LLM robotics demo.
### Prepare the System
Connect the following to the Intel® Core™ Ultra Processors IPC.
| Item | Explanation | LINK |
| ------- | ----------------------------------------------- | ---------------------------------------------------------------- |
| Camera | RealSense™ Depth Camera D435 | |
| USB Mic | Audio input device of FunASR, 16k sampling rate | UGREEN CM564 |
### Launch the LLM Robotic Demo
The LLM Robotic demo includes the real-time component, non-real-time ROS2 component, and non-real-time LLM component.
> **Important:**
> Ensure a stable network connection before running the demo. The FunASR and LLM applications require an active network connection.
1. Launch the OpenVINO FunASR server:
```bash
source /opt/funasr/venv-asr/bin/activate
python3 /opt/funasr/FunASR/runtime/python/websocket/funasr_wss_server.py --port 10095 --certfile "" --keyfile "" --asr_model /opt/llm-robotics/asr-openvino-demo/speech_seaco_paraformer_large_asr_nat-zh-cn-16k-common-vocab8404-pytorch/
```
2. Launch the real-time application:
```bash
# affinity real time application to core 3
sudo taskset -c 3 /opt/plcopen/plc_rt_pos_rtmotion
```
If the real-time application launches successfully, the terminal will show the following:
```shell
Axis 0 initialized.
Axis 1 initialized.
Axis 2 initialized.
Axis 3 initialized.
Axis 4 initialized.
Axis 5 initialized.
Function blocks initialized.
```
3. Launch the JAKA robot arm ROS2 node:
> **Important:**
> Execute the following commands as a privileged user (`root`). Open a root terminal:
```bash
sudo -i
```
```bash
source /opt/ros/iron/setup.bash
source /install/setup.bash
ros2 launch jaka_moveit_py jaka_motion_planning.launch.py
```
If the ROS2 node launches successfully, the RVIZ2 tool will display the following:

4. Launch the LLM application:
```bash
cd /opt/llm-robotics/LLM/
source venv-llm/bin/activate
python main.py
```
If the LLM application launches successfully, the demo UI will display the following:

In the "Apps" tab:
- Camera Stream and Depth Stream: displays the real-time color and depth streams from the camera.
- App status: indicates the status and outcome of code generation.
- Inference Result: presents the results from the SAM and CLIP models.
- Text prompt: you can enter prompts by typing, or by speaking through the microphone. Press the "Submit" button to start the inference process.
Attach a demo picture with the prompt of "Please pick up the black computer mouse and place it in the target position", shown as follows:
