Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-20479

Ignore pod sandbox creation failures due to networking when the node is NetworkUnavailable=true

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Normal Normal
    • 4.15.0
    • 4.15.0
    • Test Framework
    • Low
    • No
    • False
    • Hide

      None

      Show
      None

      The test:

      [sig-network] pods should successfully create sandboxes by adding pod to network

      Failed a couple payloads today with 1-2 failures in batches of 10 aggregated jobs. I looked at the most recent errors and they seem to often be the same:

      1 failures to create the sandbox
      
      ns/openshift-monitoring pod/prometheus-k8s-1 node/ip-10-0-24-217.us-west-1.compute.internal - 475.52 seconds after deletion - reason/FailedCreatePodSandBox Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_prometheus-k8s-1_openshift-monitoring_c712fc61-5a1e-4cec-b6fa-18c8f2e91c0a_0(46df8384ffeb433fc0e4864262aa52f2ede570265c43bf8b0900f184b27b10f1): error adding pod openshift-monitoring_prometheus-k8s-1 to CNI network "multus-cni-network": plugin type="multus-shim" name="multus-cni-network" failed (add): CmdAdd (shim): failed to send CNI request: Post "http://dummy/cni": EOF
      

      This http://dummy/cni URL looked interesting and seemed worthy of a bug.

      The problem is a rare failure overall, but happening quite frequently day to day, search.ci indicates lots of hits over the last two days in both 4.14 and 4.15, and seemingly ovn and sdn both:

      https://search.ci.openshift.org/?search=Post+%22http%3A%2F%2Fdummy%2Fcni%22%3A+EOF&maxAge=48h&context=1&type=bug%2Bissue%2Bjunit&name=&excludeName=&maxMatches=5&maxBytes=20971520&groupBy=job

      Some of these will show as flakes as the test gets retried at times and then passes.

      Additionally in 4.14 we are seeing similar failures reporting

      No CNI configuration file in /etc/kubernetes/cni/net.d/. Has your network provider started?

      4.14.0-0.nightly-2023-10-12-015817 show pod sandbox errors for azure & aws both show a drop from the 10th which comes after our force accept

      4.14.0-0.nightly-2023-10-11-141212  had a host of failures but it is what killed aws sdn

      4.14.0-0.nightly-2023-10-11-200059 aws sdn as well and shows in azure

            rhn-engineering-dgoodwin Devan Goodwin
            rhn-engineering-dgoodwin Devan Goodwin
            Weibin Liang Weibin Liang
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

              Created:
              Updated: