Loading...

XML

Word

Printable

Type: Bug
Resolution: Done-Errata
Priority: Normal
Fix Version/s: rhos-18.0.1
Affects Version/s: None
Component/s: test-operator
Labels:
None

Story Points:
8
Epic Link:
Test-operator 2024Q3 targets
Blocked:
False
Ready:
False
Docs Approval:
?
Fixed in Build:
test-operator-container-1.0.0-46
Regression:
None
Intelligence Requested:
Market:
Errata Link:
https://errata.engineering.redhat.com/advisory/138623

Severity:
Moderate

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

I am working to add tobiko execution to the RHOSO18 BGP_DT01 job with the following MR:
https://gitlab.cee.redhat.com/ci-framework/ci-framework-jobs/-/merge_requests/663

The BGP job forces the test to run from a specific OCP node:

cifmw_test_operator_node_selector:
  kubernetes.io/hostname: worker-3

It add networkAttachments to the test pods:

cifmw_test_operator_tobiko_network_attachments:
  - bgpnet-worker-3

And the tobiko tests are executed with a workflow:

cifmw_test_operator_tobiko_workflow:
  - stepName: sanity-before-faults
    testenv: sanity
  - stepName: create-resources
    testenv: scenario
...

I have observed that, when the workflow is run, sometimes some pod fails to start and remains in status ContainerCreating because of the following error:

  Warning  FailedCreatePodSandBox  36m   kubelet            Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_tobiko-tests-neutron-faults-workflow-step-2-f6hjz_openstack_20af9c89-ff53-4990-a280-bc153d5940e8_0(2975018038d53eaa86ec98e3abc1432c4bd3583d8d70797bf6fe077877bda855): error adding pod openstack_tobiko-tests-neutron-faults-workflow-step-2-f6hjz to CNI network "multus-cni-network": plugin type="multus-shim" name="multus-cni-network" failed (add): CmdAdd (shim): CNI request failed with status 400: 'ContainerID:"2975018038d53eaa86ec98e3abc1432c4bd3583d8d70797bf6fe077877bda855" Netns:"/var/run/netns/185b4dba-d166-4928-8370-566e5bb5130a" IfName:"eth0" Args:"IgnoreUnknown=1;K8S_POD_NAMESPACE=openstack;K8S_POD_NAME=tobiko-tests-neutron-faults-workflow-step-2-f6hjz;K8S_POD_INFRA_CONTAINER_ID=2975018038d53eaa86ec98e3abc1432c4bd3583d8d70797bf6fe077877bda855;K8S_POD_UID=20af9c89-ff53-4990-a280-bc153d5940e8" Path:"" ERRORED: error configuring pod [openstack/tobiko-tests-neutron-faults-workflow-step-2-f6hjz] networking: [openstack/tobiko-tests-neutron-faults-workflow-step-2-f6hjz/20af9c89-ff53-4990-a280-bc153d5940e8:bgpnet-worker-3]: error adding container to network "bgpnet-worker-3": failed to find host device: Link not found
': StdinData: {"binDir":"/var/lib/cni/bin","clusterNetwork":"/host/run/multus/cni/net.d/10-ovn-kubernetes.conf","cniVersion":"0.3.1","daemonSocketDir":"/run/multus/socket","globalNamespaces":"default,openshift-multus,openshift-sriov-network-operator","logLevel":"verbose","logToStderr":true,"name":"multus-cni-network","namespaceIsolation":true,"type":"multus-shim"}

When this happens, the next workflow step (N+1) is executed (the next tobiko pod is created and starts running) and, when that pod ends, the step/pod N starts running.

With tobiko, the execution order matters. The test-operator manager running the workflow should wait and retry until the pod corresponding to the workflow step N starts running.

The following link includes the output from `oc describe pod <tobiko-pod>` commands and you can see pods corresponding to steps 2 and 4 hit the issue:
https://file.emea.redhat.com/eolivare/test-operator-workflow-netattach/

DoD:

When pod N is stuck in ContainerCreating state N+1 pod is not executed. N+1 pod is allowed to be executed only when pod N reached status Completed or Error.

PRs:

https://github.com/openstack-k8s-operators/test-operator/pull/154

relates to

OSPRH-9243 Issue when copying logs from a workflow step

Closed

links to

RHBA-2024:138623 Control plane Operators for RHOSO 18.0

Assignee:: Lukáš Piwowarski

Reporter:: Eduardo Olivares Toledo

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Created:: 2024/07/22 11:34 AM

Updated:: 2024/12/27 8:01 AM

Resolved:: 2024/09/20 7:54 AM

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates

PagerDuty