Loading...

XML

Word

Printable

Type: Bug
Resolution: Unresolved
Priority: Major
Fix Version/s: None
Affects Version/s: 4.16
Component/s: Bare Metal Hardware Provisioning / ironic
Labels:
- triaged

Severity:
Important
Regression:
None
Story Points:
3
Sprint:
Metal Platform 258, Metal Platform 259, Metal Platform 263
sprint_count:
3
Release Blocker:
Rejected
Blocked:
False
Blocked Reason:

Hide

None

Show
None
RH Private Keywords:
Target Version:

4.18.0
Target Backport Versions:

4.17.z

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

Description of problem:

    When a cluster that is been deployed with Gitops/ZTP approach is being trying to deleted, some of the worker nodes stuck in "deprovisioning" or "deleting" state. When I check logs of metal3-baremetal-operator, I see the following message:

{"level":"info","ts":1722497241.5526164,"logger":"provisioner.ironic","msg":"deleting host","host":"fi-911~storage-2.fi-911.tre.nsn-rdnet.net","ID":"accb0c10-8a95-4208-9e2d-c38d57ec0667","lastError":"Failed to prepare node accb0c10-8a95-4208-9e2d-c38d57ec0667 for cleaning: Validation of image href https://assisted-image-service-multicluster-engine.apps.fi-915.tre.nsn-rdnet.net/byapikey/eyJhbGciOiJFUzI1NiIsInR5cCI6IkpXVCJ9.eyJpbmZyYV9lbnZfaWQiOiIwMzY1ZjlkNy0yNjFiLTRmMDUtOWMwYy0xMzNhZjMxZmU1ZDYifQ.4vZ9nNKf9BEv_xxRb_fqEgykwfa4O9Fs-k1kjyBcRjOF3c-qcbDaGMcEkSoOecMhDIFjoOkwTxSoVAKkpUrc0Q/4.16.0/x86_64/minimal.iso failed, reason: Got HTTP code 401 instead of 200 in response to HEAD request.","current":"clean failed","target":"available","deploy step":{"args":{},"interface":"deploy","priority":80,"step":"start_assisted_install"}}


This is also effecting installation of cluster next time. When other nodes are booted up with discovery iso, these old nodes which were failing, this time stuck in "inspecting" state in ACM GUI. The nodes that are failing have "automatedCleaningMode: metadata" in their BMH resource.

Version-Release number of selected component (if applicable):

    4.16.0

How reproducible:

    Try to delete spoke cluster using Gitops approach and then some of the nodes who has "automatedCleaningMode: metadata" stuck in "deprovisioning state. Also, when you try to deploy the same cluster, these old node who were failing this time stuck in "inspecting" state.

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

    During cluster deletion, some BMH resource stuck in "deprovisioning" or "deleting" state

Expected results:

    BMH resources should have been removed successfully.

Additional info:

is related to

OCPBUGS-42945 BareMetalHost CR fails to delete on cluster cleanup

OCPBUGS-44026 BaremetalHosts stuck in 'provisioning' state despite deploy_step success

Assignee:: Riccardo Pittau

Reporter:: Sarp Koksal

QA Contact:: Jad Haj Yahya

Votes:: 0 Vote for this issue

Watchers:: 7 Start watching this issue

Created:: 2024/08/01 11:36 AM

Updated:: 2024/11/21 2:34 PM

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates