Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-23897

Agent Based Installation on BareMetal on version 4.13.23 failed due to degraded etcd operator

XMLWordPrintable

    • Critical
    • No
    • False
    • Hide

      None

      Show
      None

      Description of problem:

      When trying to install a 4.13.23 cluster on baremetal using agent based method fails with degraded etcd operator. All the cluster nodes are available and other cluster operators are available, but etcd is degraded. The customer is using `FIPS=True`. The customer has tried to install a 4.13.22 cluster using the same configurations and was successful in the attempt. The issue has been observed for just one of the master nodes. The master node is available are running, but, the etcd member for the master node is not added.

      Version-Release number of selected component (if applicable):

          

      How reproducible:

      Install a BareMetal ABI cluster for version 4.13.23 with `fips=true`

      Steps to Reproduce:

          1. Install a ABI cluster on BareMetal on version 4.13.23 version
          2. Configure FIPS to true
          3. The installation fails due to degraded etcd operator.     

      Actual results:

      # etcd operator - 
      
      message: 'EtcdEndpointsDegraded: EtcdEndpointsController can''t evaluate whether
            quorum is safe: etcd cluster has quorum of 2 which is not fault tolerant: [{Member:ID:9786165888550508979
            name:"<master-node-1>" peerURLs:"https://x.x.x.x:2380" clientURLs:"https://x.x.x.x:2379"  Healthy:true
            Took:442.124µs Error:<nil>} {Member:ID:12078976424769477516 name:"<master-node-3>"
            peerURLs:"https://x.x.x.x:2380" clientURLs:"https://x.x.x.x:2379"  Healthy:true
            Took:939.119µs Error:<nil>}]'
      
      # Etcd pods for problematic master - 
      
      etcd-c1-<master-node-2>          0/4     Init:CrashLoopBackOff   106        8h    x.x.x.x   <master-node-2>   <none>           <none>
      etcd-guard-<master-node-2>    0/1     Running                 0          8h    x.x.x.x     <master-node-2>   <none>           <none>
      installer-5-<master-node-2>   0/1     Completed               0          8h    x.x.x.x     <master-node-2>   <none>           <none>
      
      The google drive link below contains additional logs from the failed installation attempt. 
      
      Link -
      
      https://drive.google.com/file/d/199rfdzDXgEw4ePfj7Ozzq6lYAmbkGPOb/view?usp=drive_link

      Expected results:

      The cluster should be installed successfully

      Additional info:

        If this is a Bug, the customer is looking for a fix in 4.13.z version if possible. Also, the customer is interested in knowing a possible workaround for the issue. 

              dwest@redhat.com Dean West
              rhn-support-adikulka Aditya Kulkarni
              Manoj Hans Manoj Hans
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

                Created:
                Updated:
                Resolved: