-
Bug
-
Resolution: Unresolved
-
Normal
-
None
-
4.14.z
-
None
-
False
-
Description of problem:
After updating a NodePool with a new configMap, the NodePool is now a ReconcileError loop
Version-Release number of selected component (if applicable):
How reproducible:
It's the first time I've run into this issue. I've updated the NodePool in the past with configMaps but this is the first time I've seen it degrading the node
Steps to Reproduce:
1. Create configMap with MachineConfig info 2. Update the NodePool to reference the newly created configMap 3.
Actual results:
The NodePool returns the following: {code:java} Warning ReconcileError 53m (x16 over 35d) nodepool-controller failed to get config: failed to check if we manage haproxy ignition config: failed to look up image metadata for quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:17dad53230cc8f95e1b993cc34a77351fb77bd903387661f0df41fff72746daa: failed to obtain image configuration for quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:17dad53230cc8f95e1b993cc34a77351fb77bd903387661f0df41fff72746daa: cannot retrieve image configuration for manifest sha256:bc428181ba542fba9d93aaf3ff362cf64c0dc7f45bc566d79d7b084ab04614d7 in manifest list sha256:17dad53230cc8f95e1b993cc34a77351fb77bd903387661f0df41fff72746daa: received unexpected HTTP status: 502 Bad Gateway
The hosted-cluster-config-operator returns the following:
{"level":"info","ts":"2025-03-26T15:37:26Z","msg":"discovered mco image","controller":"inplaceupgrader","object":{"name":"nodepool-prod-stc-infosec-dc-rdu2-1","namespace":"prod-stc-infosec-dc-rdu2-prod-stc-infosec-dc-rdu2"},"namespace":"prod-stc-infosec-dc-rdu2-prod-stc-infosec-dc-rdu2","name":"nodepool-prod-stc-infosec-dc-rdu2-1","reconcileID":"6e6aaa5c-46eb-4174-8925-9832200e731b","image":"quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:7e93f4635bf0ffa1796b375bab78ac2644c6a07d001ce8441a6294dfc6fe3488"} {"level":"info","ts":"2025-03-26T15:37:26Z","msg":"Reconciled MachineSet","controller":"inplaceupgrader","object":{"name":"nodepool-prod-stc-infosec-dc-rdu2-1","namespace":"prod-stc-infosec-dc-rdu2-prod-stc-infosec-dc-rdu2"},"namespace":"prod-stc-infosec-dc-rdu2-prod-stc-infosec-dc-rdu2","name":"nodepool-prod-stc-infosec-dc-rdu2-1","reconcileID":"6e6aaa5c-46eb-4174-8925-9832200e731b","result":"unchanged"} {"level":"error","ts":"2025-03-26T15:37:26Z","msg":"Reconciler error","controller":"inplaceupgrader","object":{"name":"nodepool-prod-stc-infosec-dc-rdu2-1","namespace":"prod-stc-infosec-dc-rdu2-prod-stc-infosec-dc-rdu2"},"namespace":"prod-stc-infosec-dc-rdu2-prod-stc-infosec-dc-rdu2","name":"nodepool-prod-stc-infosec-dc-rdu2-1","reconcileID":"6e6aaa5c-46eb-4174-8925-9832200e731b","error":"degraded node found, cannot progress in-place upgrade. Degraded reason: disk validation failed: expected target osImageURL \"quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:6a255bb9116f3362a9800d607bcd6184c5124bd50a41b4b8c1707cdc663154e8\", have \"\" (\"b2b90997a32bae13c7d91be179737d37ac81607cc45f749fb6ce75dbc9bcd62f\")","stacktrace":"sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/hypershift/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:326\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/hypershift/...
The machine-config-daemon is reporting the following:
> oc logs -f machine-config-daemon-hosted-worker-024.nodes.prod.psi.rdu2.redhat.com I0326 15:38:06.332528 5640 start.go:61] Version: v4.14.0-202407100410.p0.gea6671e.assembly.stream.el8-dirty (ea6671e27ec6683aa986b7072b022f0393a0e15a) I0326 15:38:06.332568 5640 update.go:1995] Running: mount --rbind /run/secrets /rootfs/run/secrets I0326 15:38:06.338393 5640 update.go:1995] Running: mount --rbind /usr/bin /rootfs/run/machine-config-daemon-bin I0326 15:38:06.340814 5640 daemon.go:457] container is rhel8, target is rhel9 I0326 15:38:06.500197 5640 daemon.go:525] Invoking re-exec /run/bin/machine-config-daemon I0326 15:38:06.534097 5640 start.go:61] Version: v4.14.0-202407100410.p0.gea6671e.assembly.stream.el8-dirty (ea6671e27ec6683aa986b7072b022f0393a0e15a) I0326 15:38:06.534299 5640 update.go:1995] Running: systemctl daemon-reload I0326 15:38:06.845491 5640 rpm-ostree.go:88] Enabled workaround for bug 2111817 E0326 15:38:06.845593 5640 rpm-ostree.go:284] Merged secret file could not be validated; defaulting to cluster pull secret <nil> I0326 15:38:06.845615 5640 rpm-ostree.go:262] Linking ostree authfile to /var/lib/kubelet/config.json I0326 15:38:07.089993 5640 daemon.go:270] Booted osImageURL: (414.92.202305090606-0) b2b90997a32bae13c7d91be179737d37ac81607cc45f749fb6ce75dbc9bcd62f I0326 15:38:07.091273 5640 start.go:124] overriding kubernetes api to https://api.prod-stc-infosec-dc-rdu2.ocp-hub.prod.psi.redhat.com:6443 W0326 15:38:07.112659 5640 controller_context.go:111] unable to get owner reference (falling back to namespace): unable to setup event recorder as "POD_NAME" env variable is not set and there are no pods I0326 15:38:07.112873 5640 simple_featuregate_reader.go:171] Starting feature-gate-detector I0326 15:38:07.124098 5640 writer.go:88] NodeWriter initialized with credentials from /var/lib/kubelet/kubeconfig I0326 15:38:07.124210 5640 daemon.go:777] Starting MachineConfigDaemon - Hypershift I0326 15:38:08.171605 5640 daemon.go:890] Successfully read current/desired Config I0326 15:38:08.208520 5640 daemon.go:899] No diff detected. Assuming a previous update was completed. Checking on-disk state. E0326 15:38:09.315855 5640 writer.go:226] Marking Degraded due to: disk validation failed: expected target osImageURL "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:6a255bb9116f3362a9800d607bcd6184c5124bd50a41b4b8c1707cdc663154e8", have "" ("b2b90997a32bae13c7d91be179737d37ac81607cc45f749fb6ce75dbc9bcd62f") I0326 15:38:11.225232 5640 daemon.go:890] Successfully read current/desired Config I0326 15:38:11.260954 5640 daemon.go:899] No diff detected. Assuming a previous update was completed. Checking on-disk state. E0326 15:38:12.713085 5640 writer.go:226] Marking Degraded due to: disk validation failed: expected target osImageURL "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:6a255bb9116f3362a9800d607bcd6184c5124bd50a41b4b8c1707cdc663154e8", have "" ("b2b90997a32bae13c7d91be179737d37ac81607cc45f749fb6ce75dbc9bcd62f") I0326 15:38:28.768536 5640 daemon.go:890] Successfully read current/desired Config I0326 15:38:28.803236 5640 daemon.go:899] No diff detected. Assuming a previous update was completed. Checking on-disk state. E0326 15:38:29.555588 5640 writer.go:226] Marking Degraded due to: disk validation failed: expected target osImageURL "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:6a255bb9116f3362a9800d607bcd6184c5124bd50a41b4b8c1707cdc663154e8", have "" ("b2b90997a32bae13c7d91be179737d37ac81607cc45f749fb6ce75dbc9bcd62f") I0326 15:39:29.609959 5640 daemon.go:890] Successfully read current/desired Config I0326 15:39:29.646590 5640 daemon.go:899] No diff detected. Assuming a previous update was completed. Checking on-disk state. E0326 15:39:30.894835 5640 writer.go:226] Marking Degraded due to: disk validation failed: expected target osImageURL "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:6a255bb9116f3362a9800d607bcd6184c5124bd50a41b4b8c1707cdc663154e8", have "" ("b2b90997a32bae13c7d91be179737d37ac81607cc45f749fb6ce75dbc9bcd62f")
Expected results:{code:none} I expect the NodePool to not be degraded and the nodes to rollout the updated MachineConfig on all the nodes
Additional info:
I've went through https://docs.google.com/document/d/1fgP6Kv1D-75e1Ot0Kg-W2qPyxWDp2_CALltlBLuseec/edit?tab=t.0#heading=h.kcqrbb1p1rpj and applied the forcefile which rebooted the node to no avail.
The following was captured after executing the forcefile and the node auto-rebooted
sh-5.1# rpm-ostree status State: idle Warning: failed to finalize previous deployment error: failed to run semodule: Child process exited with code 127 check `journalctl -b -1 -u ostree-finalize-staged.service` Deployments: * b2b90997a32bae13c7d91be179737d37ac81607cc45f749fb6ce75dbc9bcd62f Version: 414.92.202305090606-0 (2023-05-09T06:09:26Z)
sh-5.1# journalctl -b -1 -u ostree-finalize-staged.service Mar 26 15:32:14 hosted-worker-024.nodes.prod.psi.rdu2.redhat.com systemd[1]: Finished OSTree Finalize Staged Deployment. Mar 26 15:33:02 hosted-worker-024.nodes.prod.psi.rdu2.redhat.com systemd[1]: Stopping OSTree Finalize Staged Deployment... Mar 26 15:33:02 hosted-worker-024.nodes.prod.psi.rdu2.redhat.com ostree[899516]: Finalizing staged deployment Mar 26 15:33:03 hosted-worker-024.nodes.prod.psi.rdu2.redhat.com ostree[899516]: Copying /etc changes: 19 modified, 0 removed, 578 added Mar 26 15:33:03 hosted-worker-024.nodes.prod.psi.rdu2.redhat.com ostree[899516]: Copying /etc changes: 19 modified, 0 removed, 578 added Mar 26 15:33:03 hosted-worker-024.nodes.prod.psi.rdu2.redhat.com ostree[899524]: semodule: error while loading shared libraries: /lib64/libsemanage.so.2: invalid ELF header Mar 26 15:33:03 hosted-worker-024.nodes.prod.psi.rdu2.redhat.com ostree[899516]: error: failed to run semodule: Child process exited with code 127 Mar 26 15:33:03 hosted-worker-024.nodes.prod.psi.rdu2.redhat.com systemd[1]: ostree-finalize-staged.service: Control process exited, code=exited, status=1/FAILURE Mar 26 15:33:03 hosted-worker-024.nodes.prod.psi.rdu2.redhat.com systemd[1]: ostree-finalize-staged.service: Failed with result 'exit-code'. Mar 26 15:33:03 hosted-worker-024.nodes.prod.psi.rdu2.redhat.com systemd[1]: Stopped OSTree Finalize Staged Deployment.
sh-5.1# journalctl -b -1 -u rpm-ostreed.service Mar 26 13:51:20 hosted-worker-024.nodes.prod.psi.rdu2.redhat.com systemd[1]: Starting rpm-ostree System Management Daemon... Mar 26 13:51:20 hosted-worker-024.nodes.prod.psi.rdu2.redhat.com rpm-ostree[836319]: Reading config file '/etc/rpm-ostreed.conf' Mar 26 13:51:20 hosted-worker-024.nodes.prod.psi.rdu2.redhat.com rpm-ostree[836319]: In idle state; will auto-exit in 64 seconds Mar 26 13:51:20 hosted-worker-024.nodes.prod.psi.rdu2.redhat.com systemd[1]: Started rpm-ostree System Management Daemon. Mar 26 13:51:20 hosted-worker-024.nodes.prod.psi.rdu2.redhat.com rpm-ostree[836319]: client(id:cli dbus:1.2846 unit:crio-38580971eeeb66b069f7936bb7f9738476da228cd5a3c41928c065745f774134.scope uid:0) added; new total=1 Mar 26 13:51:20 hosted-worker-024.nodes.prod.psi.rdu2.redhat.com rpm-ostree[836319]: client(id:cli dbus:1.2846 unit:crio-38580971eeeb66b069f7936bb7f9738476da228cd5a3c41928c065745f774134.scope uid:0) vanished; remaining=0 Mar 26 13:51:20 hosted-worker-024.nodes.prod.psi.rdu2.redhat.com rpm-ostree[836319]: In idle state; will auto-exit in 64 seconds Mar 26 13:52:25 hosted-worker-024.nodes.prod.psi.rdu2.redhat.com systemd[1]: rpm-ostreed.service: Deactivated successfully. Mar 26 15:10:32 hosted-worker-024.nodes.prod.psi.rdu2.redhat.com systemd[1]: Starting rpm-ostree System Management Daemon... Mar 26 15:10:32 hosted-worker-024.nodes.prod.psi.rdu2.redhat.com rpm-ostree[884229]: Reading config file '/etc/rpm-ostreed.conf' Mar 26 15:10:32 hosted-worker-024.nodes.prod.psi.rdu2.redhat.com rpm-ostree[884229]: In idle state; will auto-exit in 63 seconds Mar 26 15:10:32 hosted-worker-024.nodes.prod.psi.rdu2.redhat.com systemd[1]: Started rpm-ostree System Management Daemon. Mar 26 15:10:32 hosted-worker-024.nodes.prod.psi.rdu2.redhat.com rpm-ostree[884229]: client(id:cli dbus:1.3004 unit:crio-39b8158b8fae369b8e6ec77a002294da372e365e88f4a808d9f831e2748f4efc.scope uid:0) added; new total=1 Mar 26 15:10:32 hosted-worker-024.nodes.prod.psi.rdu2.redhat.com rpm-ostree[884229]: client(id:cli dbus:1.3004 unit:crio-39b8158b8fae369b8e6ec77a002294da372e365e88f4a808d9f831e2748f4efc.scope uid:0) vanished; remaining=0 Mar 26 15:10:32 hosted-worker-024.nodes.prod.psi.rdu2.redhat.com rpm-ostree[884229]: In idle state; will auto-exit in 64 seconds Mar 26 15:11:37 hosted-worker-024.nodes.prod.psi.rdu2.redhat.com systemd[1]: rpm-ostreed.service: Deactivated successfully. Mar 26 15:32:10 hosted-worker-024.nodes.prod.psi.rdu2.redhat.com systemd[1]: Starting rpm-ostree System Management Daemon... Mar 26 15:32:10 hosted-worker-024.nodes.prod.psi.rdu2.redhat.com rpm-ostree[898552]: Reading config file '/etc/rpm-ostreed.conf' Mar 26 15:32:10 hosted-worker-024.nodes.prod.psi.rdu2.redhat.com rpm-ostree[898552]: In idle state; will auto-exit in 61 seconds Mar 26 15:32:10 hosted-worker-024.nodes.prod.psi.rdu2.redhat.com systemd[1]: Started rpm-ostree System Management Daemon. Mar 26 15:32:10 hosted-worker-024.nodes.prod.psi.rdu2.redhat.com rpm-ostree[898552]: client(id:machine-config-operator dbus:1.3048 unit:crio-e4a835e92e855aabb0198cd4db56c70f7269398921c1a764d808ffe44834a665.scope uid:0) added; new total=1 Mar 26 15:32:10 hosted-worker-024.nodes.prod.psi.rdu2.redhat.com rpm-ostree[898552]: Loaded sysroot Mar 26 15:32:10 hosted-worker-024.nodes.prod.psi.rdu2.redhat.com rpm-ostree[898552]: Locked sysroot Mar 26 15:32:10 hosted-worker-024.nodes.prod.psi.rdu2.redhat.com rpm-ostree[898552]: Initiated txn Cleanup for client(id:machine-config-operator dbus:1.3048 unit:crio-e4a835e92e855aabb0198cd4db56c70f7269398921c1a764d808ffe44834a665.scope uid:0> Mar 26 15:32:10 hosted-worker-024.nodes.prod.psi.rdu2.redhat.com rpm-ostree[898552]: Process [pid: 898548 uid: 0 unit: crio-e4a835e92e855aabb0198cd4db56c70f7269398921c1a764d808ffe44834a665.scope] connected to transaction progress Mar 26 15:32:10 hosted-worker-024.nodes.prod.psi.rdu2.redhat.com rpm-ostree[898552]: Txn Cleanup on /org/projectatomic/rpmostree1/rhcos successful Mar 26 15:32:10 hosted-worker-024.nodes.prod.psi.rdu2.redhat.com rpm-ostree[898552]: Unlocked sysroot Mar 26 15:32:10 hosted-worker-024.nodes.prod.psi.rdu2.redhat.com rpm-ostree[898552]: Process [pid: 898548 uid: 0 unit: crio-e4a835e92e855aabb0198cd4db56c70f7269398921c1a764d808ffe44834a665.scope] disconnected from transaction progress Mar 26 15:32:10 hosted-worker-024.nodes.prod.psi.rdu2.redhat.com rpm-ostree[898552]: client(id:machine-config-operator dbus:1.3048 unit:crio-e4a835e92e855aabb0198cd4db56c70f7269398921c1a764d808ffe44834a665.scope uid:0) vanished; remaining=0 Mar 26 15:32:10 hosted-worker-024.nodes.prod.psi.rdu2.redhat.com rpm-ostree[898552]: In idle state; will auto-exit in 61 seconds Mar 26 15:32:10 hosted-worker-024.nodes.prod.psi.rdu2.redhat.com rpm-ostree[898552]: client(id:cli dbus:1.3049 unit:crio-e4a835e92e855aabb0198cd4db56c70f7269398921c1a764d808ffe44834a665.scope uid:0) added; new total=1 Mar 26 15:32:10 hosted-worker-024.nodes.prod.psi.rdu2.redhat.com rpm-ostree[898552]: client(id:cli dbus:1.3049 unit:crio-e4a835e92e855aabb0198cd4db56c70f7269398921c1a764d808ffe44834a665.scope uid:0) vanished; remaining=0 Mar 26 15:32:10 hosted-worker-024.nodes.prod.psi.rdu2.redhat.com rpm-ostree[898552]: In idle state; will auto-exit in 62 seconds Mar 26 15:32:10 hosted-worker-024.nodes.prod.psi.rdu2.redhat.com rpm-ostree[898552]: client(id:machine-config-operator dbus:1.3050 unit:crio-e4a835e92e855aabb0198cd4db56c70f7269398921c1a764d808ffe44834a665.scope uid:0) added; new total=1 Mar 26 15:32:10 hosted-worker-024.nodes.prod.psi.rdu2.redhat.com rpm-ostree[898552]: Loaded sysroot Mar 26 15:32:10 hosted-worker-024.nodes.prod.psi.rdu2.redhat.com rpm-ostree[898552]: Locked sysroot Mar 26 15:32:10 hosted-worker-024.nodes.prod.psi.rdu2.redhat.com rpm-ostree[898552]: Initiated txn Rebase for client(id:machine-config-operator dbus:1.3050 unit:crio-e4a835e92e855aabb0198cd4db56c70f7269398921c1a764d808ffe44834a665.scope uid:0)> Mar 26 15:32:10 hosted-worker-024.nodes.prod.psi.rdu2.redhat.com rpm-ostree[898552]: Process [pid: 898569 uid: 0 unit: crio-e4a835e92e855aabb0198cd4db56c70f7269398921c1a764d808ffe44834a665.scope] connected to transaction progress Mar 26 15:32:10 hosted-worker-024.nodes.prod.psi.rdu2.redhat.com rpm-ostree[898552]: Fetching ostree-unverified-image:docker://quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:6a255bb9116f3362a9800d607bcd6184c5124bd50a41b4b8c1707cdc663154> Mar 26 15:32:14 hosted-worker-024.nodes.prod.psi.rdu2.redhat.com rpm-ostree[898552]: Created new deployment /ostree/deploy/rhcos/deploy/27551d8ffb2ffbc3af714f5e8cdada699c02b7ae6b3f7fe3a5a16024a43aab2e.4 Mar 26 15:32:14 hosted-worker-024.nodes.prod.psi.rdu2.redhat.com rpm-ostree[898552]: sanitycheck(/usr/bin/true) successful Mar 26 15:32:14 hosted-worker-024.nodes.prod.psi.rdu2.redhat.com rpm-ostree[898552]: Pruned container image layers: 0 Mar 26 15:32:15 hosted-worker-024.nodes.prod.psi.rdu2.redhat.com rpm-ostree[898552]: Txn Rebase on /org/projectatomic/rpmostree1/rhcos successful Mar 26 15:32:15 hosted-worker-024.nodes.prod.psi.rdu2.redhat.com rpm-ostree[898552]: Unlocked sysroot Mar 26 15:32:15 hosted-worker-024.nodes.prod.psi.rdu2.redhat.com rpm-ostree[898552]: Process [pid: 898569 uid: 0 unit: crio-e4a835e92e855aabb0198cd4db56c70f7269398921c1a764d808ffe44834a665.scope] disconnected from transaction progress Mar 26 15:32:16 hosted-worker-024.nodes.prod.psi.rdu2.redhat.com rpm-ostree[898552]: client(id:machine-config-operator dbus:1.3050 unit:crio-e4a835e92e855aabb0198cd4db56c70f7269398921c1a764d808ffe44834a665.scope uid:0) vanished; remaining=0 Mar 26 15:32:16 hosted-worker-024.nodes.prod.psi.rdu2.redhat.com rpm-ostree[898552]: In idle state; will auto-exit in 61 seconds Mar 26 15:32:16 hosted-worker-024.nodes.prod.psi.rdu2.redhat.com systemd[1]: Stopping rpm-ostree System Management Daemon... Mar 26 15:32:16 hosted-worker-024.nodes.prod.psi.rdu2.redhat.com systemd[1]: rpm-ostreed.service: Deactivated successfully. Mar 26 15:32:16 hosted-worker-024.nodes.prod.psi.rdu2.redhat.com systemd[1]: Stopped rpm-ostree System Management Daemon. Mar 26 15:32:16 hosted-worker-024.nodes.prod.psi.rdu2.redhat.com systemd[1]: rpm-ostreed.service: Consumed 3.116s CPU time.