Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-47760

Konflux cluster etcd is unhealthy

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Cannot Reproduce
    • Icon: Undefined Undefined
    • None
    • 4.15
    • Etcd
    • Incidents & Support
    • False
    • Hide

      None

      Show
      None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      Description of problem:

      A ROSA cluster running Konflux is unhealthy and inaccessible to SRE. We've managed to directly SSH into control-plane nodes to troubleshoot the issue, and it appears that etcd pods are routinely starting up, forming a quorum, then dying without a clear cause. As a result, the cluster is extremely unhealthy. 
          

      Version-Release number of selected component (if applicable):

      4.15.36
          

      How reproducible:

      At the moment - very. Not clear how we can recreate this on a separate cluster
          

      Steps to Reproduce:

          1.
          2.
          3.
          

      Actual results:

      Cluster is unresponsive, etcd cannot seem to hold a quorum after initially forming it
          

      Expected results:

      etcd holds quorum after forming it initially
          

      Additional info:

      Current theory is that excessive querying from customer workloads may be contributing, but we're still working to prove/disprove this (main workload is tekton, which is known to be extremely resource intensive, and cluster has had its control-plane repeatedly scaled to accommodate this)
          

              dwest@redhat.com Dean West
              tnierman.openshift Trevor Nierman
              None
              None
              Ge Liu Ge Liu
              None
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

                Created:
                Updated:
                Resolved: