GPU and NPU Device Plugins Usage Guide#

Overview#

Intel GPU and NPU device plugins expose hardware accelerators to Kubernetes pods without requiring privileged containers. The plugins register devices with the kubelet via the Kubernetes Device Plugin framework, allowing pods to request accelerators through standard resource limits — gpu.intel.com/xe for GPU and npu.intel.com/accel for NPU.

The Edge Node Infrastructure Blueprint pre-installs these plugins during first boot. This guide covers verification, pod scheduling, and common usage patterns.

Resource Name

Plugin

Hardware

Use Case

gpu.intel.com/xe

Intel GPU device plugin

Intel Xe-based integrated/discrete GPU (including SR-IOV VFs)

Media processing, inference, rendering

npu.intel.com/accel

Intel NPU device plugin

Intel NPU 2000/3000/4000 series

Low-power AI inference


Prerequisites#

  • Edge Node Infrastructure Blueprint image deployed with Kubernetes (K3s) host type

  • Intel GPU and/or NPU hardware present

  • K3s running with device plugins installed


Step 1: Verify Plugin Installation#

After first boot, confirm that the device plugin pods are running:

# Check device plugin pods
kubectl get pods -n intel-device-plugins

Check all pods across namespaces:

sudo kubectl get pods -A

Expected healthy output includes the running Intel and Node Feature Discovery components:

intel-device-plugins     intel-gpu-plugin-xxxxx                  1/1   Running
intel-device-plugins     intel-npu-plugin-xxxxx                  1/1   Running
node-feature-discovery   nfd-master-xxxxx                        1/1   Running
node-feature-discovery   nfd-worker-xxxxx                        1/1   Running

Step 2: Verify Allocatable Resources#

Confirm that the node advertises GPU and NPU resources:

kubectl describe node | grep -A 20 "Allocatable:"

Expected output (example with SR-IOV VFs enabled):

Allocatable:
  cpu:                       xx
  ephemeral-storage:         xxx
  gpu.intel.com/monitoring:  x
  gpu.intel.com/xe:          x
  memory:                    xxx
  npu.intel.com/accel:       x
  pods:                      xxx

The gpu.intel.com/xe count reflects the number of SR-IOV Virtual Functions available for allocation.

You can also verify the node labels applied by NFD:

kubectl get nodes --show-labels | tr ',' '\n' | grep intel

Step 3: Run a Pod with GPU Access#

Create a pod that requests one Intel GPU device:

apiVersion: v1
kind: Pod
metadata:
  name: gpu-test
  namespace: default
spec:
  restartPolicy: Never
  containers:
    - name: gpu-check
      image: ubuntu:24.04
      command: ["sh", "-c", "ls -la /dev/dri/ && echo 'GPU accessible'"]
      resources:
        limits:
          gpu.intel.com/xe: "1"

Apply and check output:

kubectl apply -f gpu-test.yaml
kubectl wait pod/gpu-test --for=jsonpath='{.status.phase}'=Succeeded --timeout=60s
kubectl logs gpu-test

Expected output:

crw-rw---- 1 root render 226, 129 Jun 17 10:00 renderD129
GPU accessible

Step 4: Run a Pod with NPU Access#

Create a pod that requests the Intel NPU device:

apiVersion: v1
kind: Pod
metadata:
  name: npu-test
  namespace: default
spec:
  restartPolicy: Never
  containers:
    - name: npu-check
      image: ubuntu:24.04
      command: ["sh", "-c", "ls -la /dev/accel/ && echo 'NPU accessible'"]
      resources:
        limits:
          npu.intel.com/accel: "1"

Apply and check output:

kubectl apply -f npu-test.yaml
kubectl wait pod/npu-test --for=jsonpath='{.status.phase}'=Succeeded --timeout=60s
kubectl logs npu-test

Expected output:

crw-rw---- 1 root render 261, 0 Jun 17 10:00 accel0
NPU accessible

Step 5: Run a Pod with Both GPU and NPU#

Request both accelerators in a single pod for combined inference workloads:

apiVersion: v1
kind: Pod
metadata:
  name: gpu-npu-test
  namespace: default
spec:
  restartPolicy: Never
  containers:
    - name: accel-check
      image: ubuntu:24.04
      command: ["sh", "-c", "ls /dev/dri/ && ls /dev/accel/ && echo 'Both accelerators accessible'"]
      resources:
        limits:
          gpu.intel.com/xe: "1"
          npu.intel.com/accel: "1"

Using GPU/NPU in Deployments#

For production workloads, use a Deployment to manage GPU or NPU pods:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: inference-app
  namespace: default
spec:
  replicas: 2
  selector:
    matchLabels:
      app: inference
  template:
    metadata:
      labels:
        app: inference
    spec:
      containers:
        - name: inference
          image: ubuntu:24.04
          command: ["sh", "-c", "echo 'GPU device available:' && ls /dev/dri/ && sleep infinity"]
          resources:
            limits:
              gpu.intel.com/xe: "1"

The scheduler only places pods on nodes that have available GPU or NPU resources. If the node has 7 SR-IOV VFs, up to 7 pods can each receive one GPU device.


Troubleshooting#

No GPU/NPU Resources on the Node#

  1. Check that the device plugin pods are running:

    kubectl get pods -n intel-device-plugins
    kubectl logs -n intel-device-plugins -l app=intel-gpu-plugin
    
  2. Verify the hardware is detected by the host:

    ls /dev/dri/          # GPU devices
    ls /dev/accel/        # NPU devices
    
  3. Verify NFD labels are applied:

    kubectl get nodes --show-labels | tr ',' '\n' | grep 'gpu.intel.com'
    kubectl get nodes --show-labels | tr ',' '\n' | grep 'npu.intel.com'
    

Pod Stuck in Pending State#

If a pod requesting GPU/NPU is stuck in Pending:

kubectl describe pod <pod-name>

Common causes:

  • Insufficient resources: All GPU/NPU devices are already allocated to other pods

  • No matching node: The node does not have the requested hardware

  • Plugin not running: The device plugin pod crashed or was not installed

GPU Plugin Shows 0 Devices#

If SR-IOV VFs are expected but not detected:

cat /sys/bus/pci/devices/0000:00:02.0/sriov_numvfs
ls /dev/dri/renderD*

If VFs are not created, the SR-IOV service may not have run. Check:

sudo systemctl status intel-sriov-vf.service
sudo journalctl -u intel-sriov-vf.service --no-pager

NPU Plugin Shows 0 Devices#

Verify the NPU driver is loaded:

ls /sys/bus/pci/drivers/intel_vpu/
lsmod | grep intel_vpu

If not loaded:

sudo modprobe intel_vpu

For kernel 6.17+, verify firmware:

ls /lib/firmware/intel/vpu/
dmesg | grep -i vpu

References#