Uploaded image for project: 'Red Hat OpenStack Services on OpenShift'
  1. Red Hat OpenStack Services on OpenShift
  2. OSPRH-8837

Wrong workflow execution order when networkAttachments are used

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Done-Errata
    • Icon: Normal Normal
    • rhos-18.0.1
    • None
    • test-operator
    • None
    • Moderate

      I am working to add tobiko execution to the RHOSO18 BGP_DT01 job with the following MR:
      https://gitlab.cee.redhat.com/ci-framework/ci-framework-jobs/-/merge_requests/663

      The BGP job forces the test to run from a specific OCP node:

      cifmw_test_operator_node_selector:
        kubernetes.io/hostname: worker-3

      It add networkAttachments to the test pods:

      cifmw_test_operator_tobiko_network_attachments:
        - bgpnet-worker-3

      And the tobiko tests are executed with a workflow:

      cifmw_test_operator_tobiko_workflow:
        - stepName: sanity-before-faults
          testenv: sanity
        - stepName: create-resources
          testenv: scenario
      ...

      I have observed that, when the workflow is run, sometimes some pod fails to start and remains in status ContainerCreating because of the following error:

        Warning  FailedCreatePodSandBox  36m   kubelet            Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_tobiko-tests-neutron-faults-workflow-step-2-f6hjz_openstack_20af9c89-ff53-4990-a280-bc153d5940e8_0(2975018038d53eaa86ec98e3abc1432c4bd3583d8d70797bf6fe077877bda855): error adding pod openstack_tobiko-tests-neutron-faults-workflow-step-2-f6hjz to CNI network "multus-cni-network": plugin type="multus-shim" name="multus-cni-network" failed (add): CmdAdd (shim): CNI request failed with status 400: 'ContainerID:"2975018038d53eaa86ec98e3abc1432c4bd3583d8d70797bf6fe077877bda855" Netns:"/var/run/netns/185b4dba-d166-4928-8370-566e5bb5130a" IfName:"eth0" Args:"IgnoreUnknown=1;K8S_POD_NAMESPACE=openstack;K8S_POD_NAME=tobiko-tests-neutron-faults-workflow-step-2-f6hjz;K8S_POD_INFRA_CONTAINER_ID=2975018038d53eaa86ec98e3abc1432c4bd3583d8d70797bf6fe077877bda855;K8S_POD_UID=20af9c89-ff53-4990-a280-bc153d5940e8" Path:"" ERRORED: error configuring pod [openstack/tobiko-tests-neutron-faults-workflow-step-2-f6hjz] networking: [openstack/tobiko-tests-neutron-faults-workflow-step-2-f6hjz/20af9c89-ff53-4990-a280-bc153d5940e8:bgpnet-worker-3]: error adding container to network "bgpnet-worker-3": failed to find host device: Link not found
      ': StdinData: {"binDir":"/var/lib/cni/bin","clusterNetwork":"/host/run/multus/cni/net.d/10-ovn-kubernetes.conf","cniVersion":"0.3.1","daemonSocketDir":"/run/multus/socket","globalNamespaces":"default,openshift-multus,openshift-sriov-network-operator","logLevel":"verbose","logToStderr":true,"name":"multus-cni-network","namespaceIsolation":true,"type":"multus-shim"}

      When this happens, the next workflow step (N+1) is executed (the next tobiko pod is created and starts running) and, when that pod ends, the step/pod N starts running.

       

      With tobiko, the execution order matters. The test-operator manager running the workflow should wait and retry until the pod corresponding to the workflow step N starts running.

       

      The following link includes the output from `oc describe pod <tobiko-pod>` commands and you can see pods corresponding to steps 2 and 4 hit the issue:
      https://file.emea.redhat.com/eolivare/test-operator-workflow-netattach/

       

      DoD:

      • When pod N is stuck in ContainerCreating state N+1 pod is not executed. N+1 pod is allowed to be executed only when pod N reached status Completed or Error.

      PRs:

              lpiwowar Lukáš Piwowarski
              eolivare Eduardo Olivares Toledo
              rhos-tempest
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

                Created:
                Updated:
                Resolved: