-
Bug
-
Resolution: Unresolved
-
Normal
-
4.15.0
-
Low
-
No
-
False
-
The test:
[sig-network] pods should successfully create sandboxes by adding pod to network
Failed a couple payloads today with 1-2 failures in batches of 10 aggregated jobs. I looked at the most recent errors and they seem to often be the same:
1 failures to create the sandbox ns/openshift-monitoring pod/prometheus-k8s-1 node/ip-10-0-24-217.us-west-1.compute.internal - 475.52 seconds after deletion - reason/FailedCreatePodSandBox Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_prometheus-k8s-1_openshift-monitoring_c712fc61-5a1e-4cec-b6fa-18c8f2e91c0a_0(46df8384ffeb433fc0e4864262aa52f2ede570265c43bf8b0900f184b27b10f1): error adding pod openshift-monitoring_prometheus-k8s-1 to CNI network "multus-cni-network": plugin type="multus-shim" name="multus-cni-network" failed (add): CmdAdd (shim): failed to send CNI request: Post "http://dummy/cni": EOF
This http://dummy/cni URL looked interesting and seemed worthy of a bug.
The problem is a rare failure overall, but happening quite frequently day to day, search.ci indicates lots of hits over the last two days in both 4.14 and 4.15, and seemingly ovn and sdn both:
Some of these will show as flakes as the test gets retried at times and then passes.
Additionally in 4.14 we are seeing similar failures reporting
No CNI configuration file in /etc/kubernetes/cni/net.d/. Has your network provider started?
4.14.0-0.nightly-2023-10-12-015817 show pod sandbox errors for azure & aws both show a drop from the 10th which comes after our force accept
4.14.0-0.nightly-2023-10-11-141212 had a host of failures but it is what killed aws sdn
4.14.0-0.nightly-2023-10-11-200059 aws sdn as well and shows in azure