[OCPBUGS-6935] [4.12] Degraded etcd on assisted-installer installation- bootstrap etcd is not removed properly - Red Hat Issue Tracker

Type: Bug
Resolution: Done
Priority: Normal
Fix Version/s: 4.12.z
Affects Version/s: 4.12
Component/s: Etcd
Labels:
- ServiceDeliveryImpact

Regression:
None
Story Points:
1
Sprint:
ETCD Sprint 231, ETCD Sprint 232
sprint_count:
2
Release Blocker:
Rejected
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Target Version:

4.12.z

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

This is a clone of issue ~~OCPBUGS-5988~~. The following is the description of the original issue:
—
Description of problem:

Etcd operator is in degraded state as one of the masters can't connect.
Master that fails to connect was previously bootstrap and pivoted as part of assisted-installer installation to master.

Etcd log:
2023-01-17T23:09:26.523562312Z 28dcf1b0a44481b0, started, test-infra-cluster-04bf4418-master-1, https://192.168.127.11:2380, https://192.168.127.11:2379, false
2023-01-17T23:09:26.523562312Z 30600b5b86e23c8e, started, etcd-bootstrap, https://192.168.127.12:2380, https://192.168.127.12:2379, false
2023-01-17T23:09:26.523562312Z 73f00626fee34a87, started, test-infra-cluster-04bf4418-master-0, https://192.168.127.10:2380, https://192.168.127.10:2379, false
2023-01-17T23:09:26.541214220Z #### attempt 0
2023-01-17T23:09:26.547811132Z       member={name="test-infra-cluster-04bf4418-master-1", peerURLs=[https://192.168.127.11:2380}, clientURLs=[https://192.168.127.11:2379]
2023-01-17T23:09:26.547811132Z       member={name="etcd-bootstrap", peerURLs=[https://192.168.127.12:2380}, clientURLs=[https://192.168.127.12:2379]
2023-01-17T23:09:26.547811132Z       member={name="test-infra-cluster-04bf4418-master-0", peerURLs=[https://192.168.127.10:2380}, clientURLs=[https://192.168.127.10:2379]
2023-01-17T23:09:26.547811132Z       target={name="etcd-bootstrap", peerURLs=[https://192.168.127.12:2380}, clientURLs=[https://192.168.127.12:2379]
2023-01-17T23:09:26.547846508Z member "https://192.168.127.12:2380" dataDir has been destroyed and must be removed from the cluster

There are couple of problems that we see:
1. For unknown reason etcd operator BootstrapTeardownController fails to start as it fails to see "openshift-etcd" namespace though by the logs it is there.
2023-01-17T21:39:43.323928903Z E0117 21:39:43.323917       1 base_controller.go:272] BootstrapTeardownController reconciliation failed: failed to get bootstrap scaling strategy: failed to get openshift-etcd names

2. DelayStrategy code was change by https://github.com/openshift/cluster-etcd-operator/pull/964/files and currently it requires 3 healthy members in order to remove. It can create issues as etcd and cluster-bootstrap(bootkube) are not synchronized and nothing is actually blocking bootstrap on stop etcd and block remove of bootstrap etcd.(at least how i understand the flow)

Version-Release number of selected component (if applicable):

How reproducible:

It is race as far as i understand but reproduced pretty much in our CI by installing 4.12 nightlies

Steps to Reproduce:

1.
2.
3.

Actual results:

Etcd is degrade cause third joined master etcd can't start

Expected results:

Etcd is healthy

Additional info:

clones

OCPBUGS-5988 Degraded etcd on assisted-installer installation- bootstrap etcd is not removed properly

Closed

is blocked by

OCPBUGS-5988 Degraded etcd on assisted-installer installation- bootstrap etcd is not removed properly

Closed

is cloned by

OCPBUGS-10477 [4.12.6] Degraded etcd on assisted-installer installation- bootstrap etcd is not removed properly

Closed

links to

openshift/cluster-etcd-operator#999: [release-4.12] OCPBUGS-6935: add dedicated success status for bootstrap removal

Assignee:: Thomas Jungblut

Reporter:: OpenShift Prow Bot

QA Contact:: Sandeep Kundu

Votes:: 0 Vote for this issue

Watchers:: 9 Start watching this issue

Created:: 2023/02/01 4:48 PM

Updated:: 2024/02/15 3:24 PM

Resolved:: 2023/02/28 11:59 AM

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates

Hide