Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-12706

OVN-K 250 node upgrade from 4.10.z to 4.11.z fails with ovnkube-masters crash looping

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Won't Do
    • Icon: Undefined Undefined
    • None
    • 4.11.z, 4.10.z
    • None
    • Important
    • No
    • SDN Sprint 235
    • 1
    • Proposed
    • False
    • Hide

      None

      Show
      None

      Description of problem:

      Upgrading a 250 node ovn-k cluster after running cluster-density 2000 iterations fails with ovnkube-master pods infinitely crashlooping on sbdb connection failures.

      Version-Release number of selected component (if applicable):

      Upgrade from 4.10.57 to 4.11.38

      How reproducible:

      Always

      Steps to Reproduce:

      1. Create 250 node cluster on AWS (4xlarge masters, 12xlarge infra, xlarge nodes)
      2. Run kubeburner cluster-density at 2000 iterations
      3. Upgrade from 4.10.57 to 4.11.38 

      Actual results:

      Upgrade gets stuck with networking co not progressing.  2 out of 3 ovnkube-master pods are stuck crashlooping with this in their logs:
      
      I0420 15:05:20.276025       1 client.go:325]  "msg"="trying to connect" "database"="OVN_Southbound" "endpoint"="ssl:10.0.189.177:9642"
      I0420 15:05:40.266832       1 client.go:325]  "msg"="trying to connect" "database"="OVN_Southbound" "endpoint"="ssl:10.0.221.135:9642"
      F0420 15:05:40.266875       1 ovnkube.go:133] error when trying to initialize libovsdb SB client: unable to connect to any endpoints: failed to connect to ssl:10.0.136.140:9642: endpoint is not leader. failed to connect to ssl:10.0.189.177:9642: failed to open connection: dial tcp 10.0.189.177:9642: i/o timeout. failed to connect to ssl:10.0.221.135:9642: failed to open connection: dial tcp 10.0.221.135:9642: i/o timeout

      Expected results:

      Successful upgrade

      Additional info:

      must-gather was flakey due to networking issues, but here is what was gathered:  http://file.rdu.redhat.com/~mifiedle/ocpbugs-10734.tgz

              sseethar Surya Seetharaman
              mifiedle@redhat.com Mike Fiedler
              Anurag Saxena Anurag Saxena
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

                Created:
                Updated:
                Resolved: