Loading...

XML

Word

Printable

Type: Bug
Resolution: Duplicate
Priority: Major
Fix Version/s: None
Affects Version/s: 4.13.z
Component/s: Etcd
Labels:

Severity:
Critical
Regression:
No
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Target Version:

4.13.z

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

PX Impact Score:
PX Priority Data:

Description of problem:

When trying to install a 4.13.23 cluster on baremetal using agent based method fails with degraded etcd operator. All the cluster nodes are available and other cluster operators are available, but etcd is degraded. The customer is using `FIPS=True`. The customer has tried to install a 4.13.22 cluster using the same configurations and was successful in the attempt. The issue has been observed for just one of the master nodes. The master node is available are running, but, the etcd member for the master node is not added.

Version-Release number of selected component (if applicable):

How reproducible:

Install a BareMetal ABI cluster for version 4.13.23 with `fips=true`

Steps to Reproduce:

    1. Install a ABI cluster on BareMetal on version 4.13.23 version
    2. Configure FIPS to true
    3. The installation fails due to degraded etcd operator.

Actual results:

# etcd operator - 

message: 'EtcdEndpointsDegraded: EtcdEndpointsController can''t evaluate whether
      quorum is safe: etcd cluster has quorum of 2 which is not fault tolerant: [{Member:ID:9786165888550508979
      name:"<master-node-1>" peerURLs:"https://x.x.x.x:2380" clientURLs:"https://x.x.x.x:2379"  Healthy:true
      Took:442.124µs Error:<nil>} {Member:ID:12078976424769477516 name:"<master-node-3>"
      peerURLs:"https://x.x.x.x:2380" clientURLs:"https://x.x.x.x:2379"  Healthy:true
      Took:939.119µs Error:<nil>}]'

# Etcd pods for problematic master - 

etcd-c1-<master-node-2>          0/4     Init:CrashLoopBackOff   106        8h    x.x.x.x   <master-node-2>   <none>           <none>
etcd-guard-<master-node-2>    0/1     Running                 0          8h    x.x.x.x     <master-node-2>   <none>           <none>
installer-5-<master-node-2>   0/1     Completed               0          8h    x.x.x.x     <master-node-2>   <none>           <none>

The google drive link below contains additional logs from the failed installation attempt. 

Link -

https://drive.google.com/file/d/199rfdzDXgEw4ePfj7Ozzq6lYAmbkGPOb/view?usp=drive_link

Expected results:

The cluster should be installed successfully

Additional info:

  If this is a Bug, the customer is looking for a fix in 4.13.z version if possible. Also, the customer is interested in knowing a possible workaround for the issue.

is duplicated by

OCPBUGS-23044 [4.13] CEO prevents member deletion during revision rollout

Closed

Assignee:: Dean West

Reporter:: Aditya Kulkarni

QA Contact:: Manoj Hans

Votes:: 0 Vote for this issue

Watchers:: 5 Start watching this issue

Created:: 2023/11/23 3:36 PM

Updated:: 2024/01/02 10:12 AM

Resolved:: 2024/01/02 10:11 AM

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates