-
Bug
-
Resolution: Duplicate
-
Major
-
None
-
4.13.z
-
Important
-
No
-
Rejected
-
False
-
Description of problem:
After update to OpenShift Container Platform 4.13.4, scaling OpenShift Container Platform 4 - Node(s) is failing as the provisioned OpenShift Container Platform 4 - Node is stuck due to the below error. Jul 05 11:47:16 new-node-0 clever_pare[2118]: [2023-07-05T11:47:16Z INFO nmstatectl::persist_nic] Skipping interface ens5 Jul 05 11:47:16 new-node-0 clever_pare[2118]: [2023-07-05T11:47:16Z INFO nmstatectl::persist_nic] No changes. Jul 05 11:47:16 new-node-0 podman[2106]: [2023-07-05T11:47:16Z INFO nmstatectl::persist_nic] Skipping interface ens5 Jul 05 11:47:16 new-node-0 podman[2106]: [2023-07-05T11:47:16Z INFO nmstatectl::persist_nic] No changes. Jul 05 11:47:16 new-node-0 podman[2106]: std::io::Error: No such file or directory (os error 2) Jul 05 11:47:16 new-node-0 clever_pare[2118]: std::io::Error: No such file or directory (os error 2) Jul 05 11:47:16 new-node-0 clever_pare[2118]: W0705 11:47:16.013513 1 firstboot_complete_machineconfig.go:63] error: failed to persist network interfaces: failed to run nmstatectl: exit status 1 Jul 05 11:47:16 new-node-0 podman[2106]: W0705 11:47:16.013513 1 firstboot_complete_machineconfig.go:63] error: failed to persist network interfaces: failed to run nmstatectl: exit status 1 Jul 05 11:47:16 new-node-0 podman[2106]: I0705 11:47:16.013525 1 firstboot_complete_machineconfig.go:64] Sleeping 1 minute for retry Jul 05 11:47:16 new-node-0 clever_pare[2118]: I0705 11:47:16.013525 1 firstboot_complete_machineconfig.go:64] Sleeping 1 minute for retry This appears to be the same problem that was tracked and fixed in https://issues.redhat.com/browse/OCPBUGS-14298 (the fix was part of OpenShift Container Platform 4.13.4). So while the upgrade to OpenShift Container Platform 4.13.3 successfully completed, newly scaled OpenShift Container Platform 4 - Node(s) are now failing because of that issue. - When manually creating /etc/systemd/network on the problematic OpenShift Container Platform 4 - Node, the OpenShift Container Platform 4 - Node will eventually join the OpenShift Container Platform 4 - Cluster and report Ready state. When updating the AMI in the MachineSet to the AMI for OpenShift Container Platform 4.13.4 scaling new OpenShift Container Platform 4 - Node(s) work without issue. But itthis change in the MachineSet should not be required as this would be a massive effort for all OpenShift Container Platform 4 - Cluster updating to OpenShift Container Platform 4.13.4 and beyond. - Also the OpenShift Container Platform 4 - Node is running the Red Hat Enterprise Linux - CoreOS version specified in the AMI of the MachineSet, which is OpenShift Container Platform 4.11. So it's experiencing the problem there and not after the OpenShift Container Platform 4.13.4 update was applied.
Version-Release number of selected component (if applicable):
OpenShift Container Platform 4.13.4
How reproducible:
Unknown
Steps to Reproduce:
1. OpenShift Container Platform 4 - Cluster updated from OpenShift Container Platform 4.11 to 4.13.4 on AWS 2. Scaling additional Machine via MachineSet
Actual results:
OpenShift Container Platform 4 - Node is stuck in Provisioned state, failing to ever turn ready because of the below error found in the system journal. Jul 05 11:47:16 new-node-0 clever_pare[2118]: [2023-07-05T11:47:16Z INFO nmstatectl::persist_nic] Skipping interface ens5 Jul 05 11:47:16 new-node-0 clever_pare[2118]: [2023-07-05T11:47:16Z INFO nmstatectl::persist_nic] No changes. Jul 05 11:47:16 new-node-0 podman[2106]: [2023-07-05T11:47:16Z INFO nmstatectl::persist_nic] Skipping interface ens5 Jul 05 11:47:16 new-node-0 podman[2106]: [2023-07-05T11:47:16Z INFO nmstatectl::persist_nic] No changes. Jul 05 11:47:16 new-node-0 podman[2106]: std::io::Error: No such file or directory (os error 2) Jul 05 11:47:16 new-node-0 clever_pare[2118]: std::io::Error: No such file or directory (os error 2) Jul 05 11:47:16 new-node-0 clever_pare[2118]: W0705 11:47:16.013513 1 firstboot_complete_machineconfig.go:63] error: failed to persist network interfaces: failed to run nmstatectl: exit status 1 Jul 05 11:47:16 new-node-0 podman[2106]: W0705 11:47:16.013513 1 firstboot_complete_machineconfig.go:63] error: failed to persist network interfaces: failed to run nmstatectl: exit status 1 Jul 05 11:47:16 new-node-0 podman[2106]: I0705 11:47:16.013525 1 firstboot_complete_machineconfig.go:64] Sleeping 1 minute for retry Jul 05 11:47:16 new-node-0 clever_pare[2118]: I0705 11:47:16.013525 1 firstboot_complete_machineconfig.go:64] Sleeping 1 minute for retry
Expected results:
The problem found is the same as tracked in https://issues.redhat.com/browse/OCPBUGS-14298 and thus considered resolved. It's therefore not clear why newly created OpenShift Container Platform 4 - Node may experience that issue and while updating the MachineSet with OpenShift Container Platform 4.13.4 AMI does resolve the issue, this approach is not considered feasible for a fleet of multiple OpenShift Container Platform 4 - Cluster.
Additional info:
- is related to
-
OCPBUGS-14298 Upgrade to OCP 4.13.0 stuck due to machine-config error 'failed to run- nmstatectl: exit status 1'
- Closed
- links to