Loading...

XML

Word

Printable

Type: Bug
Resolution: Unresolved
Priority: Normal
Fix Version/s: None
Affects Version/s: 4.16
Component/s: Installer / Agent based installation
Labels:
- triaged

Activity Type:
Quality / Stability / Reliability
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Story Points:
None
Severity:
Moderate
Regression:
None

Target Backport Versions:
None
Target Version:
None
Release Blocker:
None
Sprint:
None

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

PX Impact Score:

Release Note Status:
None
Release Note Type:
None
Release Note Text:
None

Escape Reason:
None
Escape Impact:
None
Corrective Measures:
None
SDLC stage when should've been found:
None

Description of problem:

  This is happening "sometimes", in the baremetal CI.

When the cluster is being installed and all the two masters and three workers become ready, we expect the rendezVous host to eventually reboot, install and join the cluster as the third master to conclude the installation.

However, it seems that one of the two masters is not detected as ready, even if `oc get nodes` would show two masters as joined and ready.

This lead the rendezVous host to keep stuck and never reboot to finish the installation by joining the cluster.

Version-Release number of selected component (if applicable):

    4.16

How reproducible:

    Sometimes. Most of the times it works fine, sometimes it fails, always the same way.

Steps to Reproduce:

    1. Install an agent-based cluster. The configuration i've faced this with is
      - dhcp
      - ipv4 only 
      - BMO enabled (platform: baremetal)
      - 3 masters + 3 workers

Actual results:

    The installation fails

Expected results:

    The rendezvous host reboots, join the cluster and the installation succeeds.

Additional info:

By using `oc get nodes` and `oc get co`, everything looked good, except the degraded operators due to the missing third master.

example job (from a PR, but no changes are made to the automation for the agent installation) https://prow.ci.openshift.org/view/gs/test-platform-results/pr-logs/pull/openshift_release/54483/rehearse-54483-periodic-ci-openshift-hypershift-release-4.16-periodics-mce-e2e-agent-connected-ovn-ipv4-metal3-conformance/1817908001202245632

Also, in the journal, right after the timeout of the cluster installation:
"Error syncing pod, skipping" err="failed to \"StartContainer\" for \"etcd\" with CrashLoopBackOff: \"back-off 5m0s restarting failed container=etcd pod=etcd-bootstrap-member-master-00_openshift-etcd(77a8963015ba959007010dc81931348d)\"" pod="openshift-etcd/etcd-bootstrap-member-master-00" podUID="77a8963015ba959007010dc81931348d"Jul 29 14:48:21 master-00 start-cluster-installation.sh[7044]: Cluster status: errorJul 29 14:48:26 master-00 start-cluster-installation.sh[7044]: Cluster status: error  Also see attachments.


The etcd-bootstrap-member-master-00 is in crashloopbackoff. Log:

aft.go:77","msg":"8f4fcab0df4f7c44 switched to configuration voters=(7372168020071371606 10326695331593419844 18105834420489811888)"}
{"level":"info","ts":"2024-07-29T14:49:41.373886Z","caller":"membership/cluster.go:537","msg":"promote member","cluster-id":"cf7ed821fb17c7fa","local-member-id":"8f4fcab0df4f7c44"}
{"level":"warn","ts":"2024-07-29T14:49:41.374503Z","caller":"etcdserver/server.go:1149","msg":"server error","error":"the member has been permanently removed from the cluster"}
{"level":"warn","ts":"2024-07-29T14:49:41.374538Z","caller":"etcdserver/server.go:1150","msg":"data-dir used by this member must be removed"}

- - Sort By Name
  - Sort By Date
  - Ascending
  - Descending
  - Thumbnails
  - List
  - Download All

journal-excerpt.txt
12 kB
2024/07/29 3:27 PM

Assignee:: Richard Su

Reporter:: Alessandro Di Stefano

Need Info From:: None

Contributors:: None

QA Contact:: Manoj Hans

Doc Contact:: None

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Created:: 2024/07/29 3:26 PM

Updated:: 2025/07/22 5:31 AM

Details

Description

Attachments

Attachments

Easy Agile Planning Poker

Activity

People

Dates

Hide