Troubleshooting SRIOV Extension Failed State#

There is a known issue where the SRIOV extension may enter a failure state due to permission settings in recent images. Despite this, the UI may show the deployment as “Running.” We recommend applying a workaround until we resolve the root cause in a future release.

Check for SRIOV Failed State#

Verify whether SRIOV extension pod is in a Running or Failed state.

Run the following commands to verify Failed state:

$ kubectl get pods -n sriov-network-operator
NAME                                                            READY   STATUS                       RESTARTS   AGE
b-224edb04-cf6e-57c7-9b01-5ba13098a6cd-sriov-85bbd564d7-spb6q   1/1     Running                      0          84s
network-resources-injector-dd7lw                                0/1     CreateContainerConfigError   0          83s
operator-webhook-6dqww                                          0/1     CreateContainerConfigError   0          83s
sriov-network-config-daemon-jk46r                               1/1     Running                      0          83s

$ kubectl describe pod operator-webhook-6dqww -n sriov-network-operator
  Warning  Failed          21s (x12 over 2m17s)  kubelet            Error: container has runAsNonRoot and image will run as root (pod: "operator-webhook-6dqww_sriov-network-operator(79096e45-1e29-4ef8-bda8-b08176a6911c)", container: webhook-server)

Workaround to Resolve the Issue#

If the SRIOV extension fails to start due to permission settings, administrators are advised to implement this temporary workaround. The SRIOV extension should restart and reach a Running state after this workaround.

Delete the operator webhook config and disable the injector by running the following commands:

$ kubectl delete validatingwebhookconfigurations sriov-operator-webhook-config
validatingwebhookconfiguration.admissionregistration.k8s.io "sriov-operator-webhook-config" deleted

$ kubectl patch sriovoperatorconfigs.sriovnetwork.openshift.io -n sriov-network-operator default  --type=merge -p '{"spec":{"enableInjector":false,"enableOperatorWebhook":false}}'
sriovoperatorconfig.sriovnetwork.openshift.io/default patched

Note

Verify that all SRIOV pods are running after the workaround is applied. There should be two pods running in the sriov-network-operator namespace. This workaround needs to be applied for every new SRIOV deployment.