Uploaded image for project: 'OpenShift Etcd'
  1. OpenShift Etcd
  2. ETCD-348

Investigate CPMS Failure Domain on Etcd cluster

XMLWordPrintable

    • Icon: Story Story
    • Resolution: Obsolete
    • Icon: Normal Normal
    • None
    • None

      During testing https://issues.redhat.com/browse/ETCD-328, identified a scenario where 

      1. etcd cluster members is down to `1` member
      2. there is no guarantee that this member will be `leader` 

       

      The testing scenario involved scaling down by only one master-machine/ etcd-member at a time. However, when a replacement is being created by CPMS to replace the scaled-down master machine, and influenced by failure domains in https://github.com/openshift/enhancements/blob/master/enhancements/machine-api/control-plane-machine-set.md, the etcd cluster could end up with only one member, and no guarantee that this member will always be a leader.

      Such scenario could result in corrupted etcd cluster state that does not reflect the current system state. IMO this scenario should be prohibited.

       

      See https://github.com/openshift/origin/pull/27496#issuecomment-1293749080 for logs 

              melbeher@redhat.com Mustafa Elbehery
              melbeher@redhat.com Mustafa Elbehery
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

                Created:
                Updated:
                Resolved: