-
Bug
-
Resolution: Not a Bug
-
Normal
-
None
-
4.12
-
Moderate
-
No
-
3
-
SDN Sprint 240, SDN Sprint 241
-
2
-
Rejected
-
False
-
-
Customer Escalated
-
Description of problem:
Issue: ovn-kubernetes-master pods are in crash-loop back-off state and forcing new leader elections every few seconds/minutes. cluster is degraded, operators impacted. Kube-apiserver is reporting a requeuing error indcating it's likely flooded openshift-multus has a scheduler pod down, that cannot bind an IP with error: "error adding container to network ovn-kubernetes cni request failed with status 400" --> https://access.redhat.com/solutions/6646561 [restarting pods/databases, does not mitigate issue]
Version-Release number of selected component (if applicable):
4.12.21
How reproducible:
customer-specific
Steps to Reproduce:
1. 2. 3.
Actual results:
Expected results:
cluster upgrade should not stall/crash-loop ovnkube-master pods
Additional info:
next update (internal) will include logs/sample data + troubleshooting taken so far. Impact is high for customer, production down.