-
Bug
-
Resolution: Done-Errata
-
Normal
-
None
-
None
-
8
-
False
-
False
-
?
-
?
-
test-operator-container-1.0.0-46
-
?
-
?
-
None
-
-
-
Moderate
I am working to add tobiko execution to the RHOSO18 BGP_DT01 job with the following MR:
https://gitlab.cee.redhat.com/ci-framework/ci-framework-jobs/-/merge_requests/663
The BGP job forces the test to run from a specific OCP node:
cifmw_test_operator_node_selector: kubernetes.io/hostname: worker-3
It add networkAttachments to the test pods:
cifmw_test_operator_tobiko_network_attachments: - bgpnet-worker-3
And the tobiko tests are executed with a workflow:
cifmw_test_operator_tobiko_workflow: - stepName: sanity-before-faults testenv: sanity - stepName: create-resources testenv: scenario ...
I have observed that, when the workflow is run, sometimes some pod fails to start and remains in status ContainerCreating because of the following error:
Warning FailedCreatePodSandBox 36m kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_tobiko-tests-neutron-faults-workflow-step-2-f6hjz_openstack_20af9c89-ff53-4990-a280-bc153d5940e8_0(2975018038d53eaa86ec98e3abc1432c4bd3583d8d70797bf6fe077877bda855): error adding pod openstack_tobiko-tests-neutron-faults-workflow-step-2-f6hjz to CNI network "multus-cni-network": plugin type="multus-shim" name="multus-cni-network" failed (add): CmdAdd (shim): CNI request failed with status 400: 'ContainerID:"2975018038d53eaa86ec98e3abc1432c4bd3583d8d70797bf6fe077877bda855" Netns:"/var/run/netns/185b4dba-d166-4928-8370-566e5bb5130a" IfName:"eth0" Args:"IgnoreUnknown=1;K8S_POD_NAMESPACE=openstack;K8S_POD_NAME=tobiko-tests-neutron-faults-workflow-step-2-f6hjz;K8S_POD_INFRA_CONTAINER_ID=2975018038d53eaa86ec98e3abc1432c4bd3583d8d70797bf6fe077877bda855;K8S_POD_UID=20af9c89-ff53-4990-a280-bc153d5940e8" Path:"" ERRORED: error configuring pod [openstack/tobiko-tests-neutron-faults-workflow-step-2-f6hjz] networking: [openstack/tobiko-tests-neutron-faults-workflow-step-2-f6hjz/20af9c89-ff53-4990-a280-bc153d5940e8:bgpnet-worker-3]: error adding container to network "bgpnet-worker-3": failed to find host device: Link not found ': StdinData: {"binDir":"/var/lib/cni/bin","clusterNetwork":"/host/run/multus/cni/net.d/10-ovn-kubernetes.conf","cniVersion":"0.3.1","daemonSocketDir":"/run/multus/socket","globalNamespaces":"default,openshift-multus,openshift-sriov-network-operator","logLevel":"verbose","logToStderr":true,"name":"multus-cni-network","namespaceIsolation":true,"type":"multus-shim"}
When this happens, the next workflow step (N+1) is executed (the next tobiko pod is created and starts running) and, when that pod ends, the step/pod N starts running.
With tobiko, the execution order matters. The test-operator manager running the workflow should wait and retry until the pod corresponding to the workflow step N starts running.
The following link includes the output from `oc describe pod <tobiko-pod>` commands and you can see pods corresponding to steps 2 and 4 hit the issue:
https://file.emea.redhat.com/eolivare/test-operator-workflow-netattach/
DoD:
- When pod N is stuck in ContainerCreating state N+1 pod is not executed. N+1 pod is allowed to be executed only when pod N reached status Completed or Error.
PRs:
- relates to
-
OSPRH-9243 Issue when copying logs from a workflow step
- Refinement
- links to
-
RHBA-2024:138623 Control plane Operators for RHOSO 18.0