Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-41817

"pods should successfully create sandboxes by adding pod to network" are failing on multiple platforms

XMLWordPrintable

    • None
    • Approved
    • False
    • Hide

      None

      Show
      None
    • N/A
    • Release Note Not Required
    • Done

      This is a clone of issue OCPBUGS-41270. The following is the description of the original issue:

      Component Readiness has found a potential regression in the following test:

      [sig-network] pods should successfully create sandboxes by adding pod to network

      Probability of significant regression: 96.41%

      Sample (being evaluated) Release: 4.17
      Start Time: 2024-08-27T00:00:00Z
      End Time: 2024-09-03T23:59:59Z
      Success Rate: 88.37%
      Successes: 26
      Failures: 5
      Flakes: 12

      Base (historical) Release: 4.16
      Start Time: 2024-05-31T00:00:00Z
      End Time: 2024-06-27T23:59:59Z
      Success Rate: 98.46%
      Successes: 43
      Failures: 1
      Flakes: 21

      View the test details report at https://sippy.dptools.openshift.org/sippy-ng/component_readiness/test_details?Architecture=amd64&Architecture=amd64&FeatureSet=default&FeatureSet=default&Installer=ipi&Installer=ipi&Network=ovn&Network=ovn&NetworkAccess=default&Platform=metal&Platform=metal&Scheduler=default&SecurityMode=default&Suite=unknown&Suite=unknown&Topology=ha&Topology=ha&Upgrade=minor&Upgrade=minor&baseEndTime=2024-06-27%2023%3A59%3A59&baseRelease=4.16&baseStartTime=2024-05-31%2000%3A00%3A00&capability=Other&columnGroupBy=Platform%2CArchitecture%2CNetwork&component=Networking%20%2F%20cluster-network-operator&confidence=95&dbGroupBy=Platform%2CArchitecture%2CNetwork%2CTopology%2CFeatureSet%2CUpgrade%2CSuite%2CInstaller&environment=amd64%20default%20ipi%20ovn%20metal%20unknown%20ha%20minor&ignoreDisruption=true&ignoreMissing=false&includeVariant=Architecture%3Aamd64&includeVariant=FeatureSet%3Adefault&includeVariant=Installer%3Aipi&includeVariant=Installer%3Aupi&includeVariant=Owner%3Aeng&includeVariant=Platform%3Ametal&includeVariant=Topology%3Aha&minFail=3&pity=5&sampleEndTime=2024-09-03%2023%3A59%3A59&sampleRelease=4.17&sampleStartTime=2024-08-27%2000%3A00%3A00&testId=openshift-tests-upgrade%3A65e48733eb0b6115134b2b8c6a365f16&testName=%5Bsig-network%5D%20pods%20should%20successfully%20create%20sandboxes%20by%20adding%20pod%20to%20network

       

      Here is an example run

      We see the following signature for the failure:

       

      namespace/openshift-etcd node/master-0 pod/revision-pruner-11-master-0 hmsg/b90fda805a - 111.86 seconds after deletion - firstTimestamp/2024-09-02T13:14:37Z interesting/true lastTimestamp/2024-09-02T13:14:37Z reason/FailedCreatePodSandBox Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_revision-pruner-11-master-0_openshift-etcd_08346d8f-7d22-4d70-ab40-538a67e21e3c_0(d4b61f9ff9f2ddfd3b64352203e8a3eafc2c3bd7c3d31a0a573bc29e4ac6da57): error adding pod openshift-etcd_revision-pruner-11-master-0 to CNI network "multus-cni-network": plugin type="multus-shim" name="multus-cni-network" failed (add): CmdAdd (shim): CNI request failed with status 400: 'ContainerID:"d4b61f9ff9f2ddfd3b64352203e8a3eafc2c3bd7c3d31a0a573bc29e4ac6da57" Netns:"/var/run/netns/97dc5eb9-19da-462f-8b2e-c301cfd7f3cf" IfName:"eth0" Args:"IgnoreUnknown=1;K8S_POD_NAMESPACE=openshift-etcd;K8S_POD_NAME=revision-pruner-11-master-0;K8S_POD_INFRA_CONTAINER_ID=d4b61f9ff9f2ddfd3b64352203e8a3eafc2c3bd7c3d31a0a573bc29e4ac6da57;K8S_POD_UID=08346d8f-7d22-4d70-ab40-538a67e21e3c" Path:"" ERRORED: error configuring pod [openshift-etcd/revision-pruner-11-master-0] networking: Multus: [openshift-etcd/revision-pruner-11-master-0/08346d8f-7d22-4d70-ab40-538a67e21e3c]: error waiting for pod: pod "revision-pruner-11-master-0" not found  

       

      The same signature has been reported for both azure and x390x as well.

       

      It is worth mentioning that sdn to ovn transition adds some complication to our analysis. From the component readiness above, you will see most of the failures are for job: periodic-ci-openshift-release-master-nightly-X.X-upgrade-from-stable-X.X-e2e-metal-ipi-ovn-upgrade. This is a new job for 4.17 and therefore miss base stats in 4.16.

       

      So we ask for:

      1. An analysis of the root cause and impact of this issue
      2. Team can compare relevant 4.16 sdn jobs to see if this is really a regression.
      3. Given the current passing rate of 88%, what priority we should give to this?
      4. Since this is affecting component readiness, and management depends on a green dashboard for release decision, we need to figure out what is the best approach for handling this issue.  

       

            dosmith Douglas Smith
            openshift-crt-jira-prow OpenShift Prow Bot
            Anurag Saxena Anurag Saxena
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

              Created:
              Updated:
              Resolved: