-
Bug
-
Resolution: Won't Do
-
Undefined
-
None
-
4.11.z, 4.10.z
-
None
-
Important
-
No
-
SDN Sprint 235
-
1
-
Proposed
-
False
-
Description of problem:
Upgrading a 250 node ovn-k cluster after running cluster-density 2000 iterations fails with ovnkube-master pods infinitely crashlooping on sbdb connection failures.
Version-Release number of selected component (if applicable):
Upgrade from 4.10.57 to 4.11.38
How reproducible:
Always
Steps to Reproduce:
1. Create 250 node cluster on AWS (4xlarge masters, 12xlarge infra, xlarge nodes) 2. Run kubeburner cluster-density at 2000 iterations 3. Upgrade from 4.10.57 to 4.11.38
Actual results:
Upgrade gets stuck with networking co not progressing. 2 out of 3 ovnkube-master pods are stuck crashlooping with this in their logs: I0420 15:05:20.276025 1 client.go:325] "msg"="trying to connect" "database"="OVN_Southbound" "endpoint"="ssl:10.0.189.177:9642" I0420 15:05:40.266832 1 client.go:325] "msg"="trying to connect" "database"="OVN_Southbound" "endpoint"="ssl:10.0.221.135:9642" F0420 15:05:40.266875 1 ovnkube.go:133] error when trying to initialize libovsdb SB client: unable to connect to any endpoints: failed to connect to ssl:10.0.136.140:9642: endpoint is not leader. failed to connect to ssl:10.0.189.177:9642: failed to open connection: dial tcp 10.0.189.177:9642: i/o timeout. failed to connect to ssl:10.0.221.135:9642: failed to open connection: dial tcp 10.0.221.135:9642: i/o timeout
Expected results:
Successful upgrade
Additional info:
must-gather was flakey due to networking issues, but here is what was gathered: http://file.rdu.redhat.com/~mifiedle/ocpbugs-10734.tgz