Loading...

XML

Word

Printable

Type: Bug
Resolution: Not a Bug
Priority: Critical
Fix Version/s: None
Affects Version/s: 4.18, 4.19, 4.20
Component/s: Installer / Agent based installation
Labels:
- triaged

Activity Type:
Quality / Stability / Reliability
Blocked:
True
Blocked Reason:

Hide

None

Show
None
Story Points:
None
Severity:
None
Regression:
None

Target Backport Versions:
None
Target Version:
None
Release Blocker:
Rejected
Sprint:
None

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

PX Impact Score:

Release Note Status:
None
Release Note Type:
None
Release Note Text:
None

Escape Reason:
None
Escape Impact:
None
Corrective Measures:
None
SDLC stage when should've been found:
None

Description of problem:

During worker node addition in OCI, the installation failed after 3 attempts. The error indicates that the CoreOS installer (coreos-installer) was unable to gain exclusive access to the target disk /dev/sda, as it was marked busy.

Version-Release number of selected component (if applicable):

4.19

How reproducible:

Always

Steps to Reproduce:

1. Create the node ISO.
2. Boot the created ISO in the OCI environment.
3. Monitor the progress using the oc monitoring command and wait until the installer reaches the disk writing stage.

Actual results:

2025-09-23T19:11:22Z [node-image monitor] Node 10.0.17.156: Host 02-00-17-03-3c-0e: updated status from known to installing (Installation is in progress)
2025-09-23T19:12:22Z [node-image monitor] Node 10.0.17.156: Host: 02-00-17-03-3c-0e, reached installation stage Failed: failed after 3 attempts, last error: failed executing /usr/bin/nsenter [--target 1 --cgroup --mount --ipc --pid -- coreos-installer install --insecure -i /opt/install-dir/worker-c9cd9447-3e67-434d-b720-aa44056ea61d.ign /dev/sda], Error exit status 1, LastOutput "Error: checking for exclusive access to /dev/sda
2025-09-23T19:12:22Z [node-image monitor] time=2025-09-23T19:12:22Z level=info
2025-09-23T19:12:22Z [node-image monitor] Caused by:
2025-09-23T19:12:22Z [node-image monitor]     0: couldn't reread partition table: device is in use
2025-09-23T19:12:22Z [node-image monitor]     1: EBUSY: Device or resource busy"
2025-09-23T19:13:17Z [node-image monitor] Node 10.0.17.156: Uploaded logs for host 02-00-17-03-3c-0e cluster 4bb341d9-b582-436d-9920-4264912683af

Expected results:

The worker node should be added successfully.

Additional info:

Tried adding a worker node by attaching an extra block volume. However, after writing to the disk and rebooting, the node booted again from the boot volume (ISO) instead of the block volume but the expectation was that the node should boot from the block volume.

2025-09-23T19:41:25Z [node-image monitor] Node 10.0.30.62: Host 02-00-17-01-cc-24: updated status from known to installing (Installation is in progress)
2025-09-23T19:42:20Z [node-image monitor] Node 10.0.30.62: Host: 02-00-17-01-cc-24, reached installation stage Writing image to disk: 15%
2025-09-23T19:42:25Z [node-image monitor] Node 10.0.30.62: Host: 02-00-17-01-cc-24, reached installation stage Writing image to disk: 27%
2025-09-23T19:42:30Z [node-image monitor] Node 10.0.30.62: Host: 02-00-17-01-cc-24, reached installation stage Writing image to disk: 57%
2025-09-23T19:42:35Z [node-image monitor] Node 10.0.30.62: Host: 02-00-17-01-cc-24, reached installation stage Writing image to disk: 71%
2025-09-23T19:42:40Z [node-image monitor] Node 10.0.30.62: Host: 02-00-17-01-cc-24, reached installation stage Writing image to disk: 83%
2025-09-23T19:42:45Z [node-image monitor] Node 10.0.30.62: Host: 02-00-17-01-cc-24, reached installation stage Writing image to disk: 95%
2025-09-23T19:42:50Z [node-image monitor] Node 10.0.30.62: Host: 02-00-17-01-cc-24, reached installation stage Writing image to disk: 100%
2025-09-23T19:43:00Z [node-image monitor] Node 10.0.30.62: Host: 02-00-17-01-cc-24, reached installation stage Waiting for control plane
2025-09-23T19:43:05Z [node-image monitor] Node 10.0.30.62: Host: 02-00-17-01-cc-24, reached installation stage Rebooting
2025-09-23T19:45:30Z [node-image monitor] Node 10.0.30.62: Error fetching status from assisted-service for node 10.0.30.62: Unable to retrieve cluster metadata from Agent Rest API: [GET /v2/clusters/{cluster_id}][404] v2GetClusterNotFound  &{Code:0xc000793510 Href:0xc000793540 ID:0xc00176610c Kind:0xc000793570 Reason:0xc0007935a0}
2025-09-23T19:45:35Z [node-image monitor] Node 10.0.30.62: Error fetching status from assisted-service for node 10.0.30.62: Unable to retrieve cluster metadata from Agent Rest API: [GET /v2/clusters/{cluster_id}][

Workaround:
The following steps can be used as a workaround for adding worker nodes to OCP versions 4.18 and above, which utilize the ABI mechanism for node addition.

Prepare the Node Image:

- Use the command: oc adm node-image create --mac-address=<FAKE MAC ADDRESS> --root-device-hint='deviceName:/dev/sdb'

Create the Node:

Now, proceed with creating the worker node, either via the Terraform module or manually.

Attach Block Volume (Critical):

- If using Terraform: Attach the block volume to the created instance manually.
- If creating the instance manually: The block volume can be done during the instance creation.

When Node Reboots:

Access the Cloud Shell: Connect to the node’s cloud shell.
Modify the Boot Order:
Press 'e' from the GRUB menu to enter the kernel arguments.
Press 'Ctrl + C' to enter the GRUB command-line interface (CLI).
Type exit and press Enter to exit the GRUB CLI.
Type exit and press Enter again in the shell to exit.
Navigate to Boot Maintenance Manager > Boot Options > Change Boot Order.
Select BlockVolume2 and move it to the top of the list.
Commit the changes and exit. You may need to repeat these steps if the GRUB menu reappears.

CSR Approval (Critical):

Monitor for CSR Pending Approval: Check the monitoring logs for messages indicating "First CSR Pending approval" and “Second CSR Pending approval”.
Retrieve CSR Certificate: Use the command: oc get csr
Approve CSR: Manually approve the CSR using the command: oc adm certificate approve <NAME> (Replace <NAME> with the CSR’s name).

Confirmation of Node Joining:

- Look for the Confirmation Log Message: After completing all previous steps, you will see a log message similar to: 2025-09-25T11:18:03Z [node-image monitor] Node 10.0.30.224: Node joined cluster 2025-09-25T11:18:03Z [node-image monitor] Node 10.0.30.224: Node is Ready

is depended on by

OCPSTRAT-1941 OpenShift on Oracle Cloud Infrastructure (OCI) Bare metal - post GA improvements

In Progress

Assignee:: Andrea Fasano

Reporter:: Manoj Hans

Need Info From:: None

Contributors:: None

QA Contact:: Manoj Hans

Doc Contact:: None

Votes:: 0 Vote for this issue

Watchers:: 8 Start watching this issue

Created:: 2025/09/25 7:44 AM

Updated:: 2025/11/24 2:50 AM

Resolved:: 2025/11/24 2:50 AM

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates

Hide