-
Bug
-
Resolution: Unresolved
-
Minor
-
None
-
4.12.z
-
Low
-
No
-
False
-
-
Description of problem:
Node was created today with worker label. It was labeled as a loadbalancer to match mcp selector. MCP saw the selector and moved to Updating but the machine-config-daemon pod isn't responding. We tried deleting the pod and it still didn't pick up that it needed to get a new config. Manually editing the desired config appears to workaround the issue but shouldn't be necessary.
Node created today: [dasmall@supportshell-1 03803880]$ oc get nodes worker-048.kub3.sttlwazu.vzwops.com -o yaml | yq .metadata.creationTimestamp '2024-04-30T17:17:56Z' Node has worker and loadbalancer roles: [dasmall@supportshell-1 03803880]$ oc get node worker-048.kub3.sttlwazu.vzwops.com NAME STATUS ROLES AGE VERSION worker-048.kub3.sttlwazu.vzwops.com Ready loadbalancer,worker 1h v1.25.14+a52e8df MCP shows a loadbalancer needing Update and 0 nodes in worker pool: [dasmall@supportshell-1 03803880]$ oc get mcp NAME CONFIG UPDATED UPDATING DEGRADED MACHINECOUNT READYMACHINECOUNT UPDATEDMACHINECOUNT DEGRADEDMACHINECOUNT AGE loadbalancer rendered-loadbalancer-1486d925cac5a9366d6345552af26c89 False True False 4 3 3 0 87d master rendered-master-47f6fa5afe8ce8f156d80a104f8bacae True False False 3 3 3 0 87d worker rendered-worker-a6be9fb3f667b76a611ce51811434cf9 True False False 0 0 0 0 87d workerperf rendered-workerperf-477d3621fe19f1f980d1557a02276b4e True False False 38 38 38 0 87d Status shows mcp updating: [dasmall@supportshell-1 03803880]$ oc get mcp loadbalancer -o yaml | yq .status.conditions[4] lastTransitionTime: '2024-04-30T17:33:21Z' message: All nodes are updating to rendered-loadbalancer-1486d925cac5a9366d6345552af26c89 reason: '' status: 'True' type: Updating Node still appears happy with worker MC: [dasmall@supportshell-1 03803880]$ oc get node worker-048.kub3.sttlwazu.vzwops.com -o yaml | grep rendered- machineconfiguration.openshift.io/currentConfig: rendered-worker-a6be9fb3f667b76a611ce51811434cf9 machineconfiguration.openshift.io/desiredConfig: rendered-worker-a6be9fb3f667b76a611ce51811434cf9 machineconfiguration.openshift.io/desiredDrain: uncordon-rendered-worker-a6be9fb3f667b76a611ce51811434cf9 machineconfiguration.openshift.io/lastAppliedDrain: uncordon-rendered-worker-a6be9fb3f667b76a611ce51811434cf9 machine-config-daemon pod appears idle: [dasmall@supportshell-1 03803880]$ oc logs -n openshift-machine-config-operator machine-config-daemon-wx2b8 -c machine-config-daemon 2024-04-30T17:48:29.868191425Z I0430 17:48:29.868156 19112 start.go:112] Version: v4.12.0-202311220908.p0.gef25c81.assembly.stream-dirty (ef25c81205a65d5361cfc464e16fd5d47c0c6f17) 2024-04-30T17:48:29.871340319Z I0430 17:48:29.871328 19112 start.go:125] Calling chroot("/rootfs") 2024-04-30T17:48:29.871602466Z I0430 17:48:29.871593 19112 update.go:2110] Running: systemctl daemon-reload 2024-04-30T17:48:30.066554346Z I0430 17:48:30.066006 19112 rpm-ostree.go:85] Enabled workaround for bug 2111817 2024-04-30T17:48:30.297743470Z I0430 17:48:30.297706 19112 daemon.go:241] Booted osImageURL: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:20b4937e8d107af19d8e39329e1767471b78ba6abd07b5a3e328dafd7b146858 (412.86.202311271639-0) 828584d351fcb58e4d799cebf271094d5d9b5c1a515d491ee5607b1dcf6ebf6b 2024-04-30T17:48:30.324852197Z I0430 17:48:30.324543 19112 start.go:101] Copied self to /run/bin/machine-config-daemon on host 2024-04-30T17:48:30.325677959Z I0430 17:48:30.325666 19112 start.go:188] overriding kubernetes api to https://api-int.kub3.sttlwazu.vzwops.com:6443 2024-04-30T17:48:30.326381479Z I0430 17:48:30.326368 19112 metrics.go:106] Registering Prometheus metrics 2024-04-30T17:48:30.326447815Z I0430 17:48:30.326440 19112 metrics.go:111] Starting metrics listener on 127.0.0.1:8797 2024-04-30T17:48:30.327835814Z I0430 17:48:30.327811 19112 writer.go:93] NodeWriter initialized with credentials from /var/lib/kubelet/kubeconfig 2024-04-30T17:48:30.327932144Z I0430 17:48:30.327923 19112 update.go:2125] Starting to manage node: worker-048.kub3.sttlwazu.vzwops.com 2024-04-30T17:48:30.332123862Z I0430 17:48:30.332097 19112 rpm-ostree.go:394] Running captured: rpm-ostree status 2024-04-30T17:48:30.332928272Z I0430 17:48:30.332909 19112 daemon.go:1049] Detected a new login session: New session 1 of user core. 2024-04-30T17:48:30.332935796Z I0430 17:48:30.332926 19112 daemon.go:1050] Login access is discouraged! Applying annotation: machineconfiguration.openshift.io/ssh 2024-04-30T17:48:30.368619942Z I0430 17:48:30.368598 19112 daemon.go:1298] State: idle 2024-04-30T17:48:30.368619942Z Deployments: 2024-04-30T17:48:30.368619942Z * ostree-unverified-registry:quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:20b4937e8d107af19d8e39329e1767471b78ba6abd07b5a3e328dafd7b146858 2024-04-30T17:48:30.368619942Z Digest: sha256:20b4937e8d107af19d8e39329e1767471b78ba6abd07b5a3e328dafd7b146858 2024-04-30T17:48:30.368619942Z Version: 412.86.202311271639-0 (2024-04-30T17:05:27Z) 2024-04-30T17:48:30.368619942Z LayeredPackages: kernel-devel kernel-headers 2024-04-30T17:48:30.368619942Z 2024-04-30T17:48:30.368619942Z ostree-unverified-registry:quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:20b4937e8d107af19d8e39329e1767471b78ba6abd07b5a3e328dafd7b146858 2024-04-30T17:48:30.368619942Z Digest: sha256:20b4937e8d107af19d8e39329e1767471b78ba6abd07b5a3e328dafd7b146858 2024-04-30T17:48:30.368619942Z Version: 412.86.202311271639-0 (2024-04-30T17:05:27Z) 2024-04-30T17:48:30.368619942Z LayeredPackages: kernel-devel kernel-headers 2024-04-30T17:48:30.368907860Z I0430 17:48:30.368884 19112 coreos.go:54] CoreOS aleph version: mtime=2023-08-08 11:20:41.285 +0000 UTC build=412.86.202308081039-0 imgid=rhcos-412.86.202308081039-0-metal.x86_64.raw 2024-04-30T17:48:30.368932886Z I0430 17:48:30.368926 19112 coreos.go:71] Ignition provisioning: time=2024-04-30T17:03:44Z 2024-04-30T17:48:30.368938120Z I0430 17:48:30.368931 19112 rpm-ostree.go:394] Running captured: journalctl --list-boots 2024-04-30T17:48:30.372893750Z I0430 17:48:30.372884 19112 daemon.go:1307] journalctl --list-boots: 2024-04-30T17:48:30.372893750Z -2 847e119666d9498da2ae1bd89aa4c4d0 Tue 2024-04-30 17:03:13 UTC—Tue 2024-04-30 17:06:32 UTC 2024-04-30T17:48:30.372893750Z -1 9617b204b8b8412fb31438787f56a62f Tue 2024-04-30 17:09:06 UTC—Tue 2024-04-30 17:36:39 UTC 2024-04-30T17:48:30.372893750Z 0 3cbf6edcacde408b8979692c16e3d01b Tue 2024-04-30 17:39:20 UTC—Tue 2024-04-30 17:48:30 UTC 2024-04-30T17:48:30.372912686Z I0430 17:48:30.372891 19112 rpm-ostree.go:394] Running captured: systemctl list-units --state=failed --no-legend 2024-04-30T17:48:30.378069332Z I0430 17:48:30.378059 19112 daemon.go:1322] systemd service state: OK 2024-04-30T17:48:30.378069332Z I0430 17:48:30.378066 19112 daemon.go:987] Starting MachineConfigDaemon 2024-04-30T17:48:30.378121340Z I0430 17:48:30.378106 19112 daemon.go:994] Enabling Kubelet Healthz Monitor 2024-04-30T17:48:31.486786667Z I0430 17:48:31.486747 19112 daemon.go:457] Node worker-048.kub3.sttlwazu.vzwops.com is not labeled node-role.kubernetes.io/master 2024-04-30T17:48:31.491674986Z I0430 17:48:31.491594 19112 daemon.go:1243] Current+desired config: rendered-worker-a6be9fb3f667b76a611ce51811434cf9 2024-04-30T17:48:31.491674986Z I0430 17:48:31.491603 19112 daemon.go:1253] state: Done 2024-04-30T17:48:31.495704843Z I0430 17:48:31.495617 19112 daemon.go:617] Detected a login session before the daemon took over on first boot 2024-04-30T17:48:31.495704843Z I0430 17:48:31.495624 19112 daemon.go:618] Applying annotation: machineconfiguration.openshift.io/ssh 2024-04-30T17:48:31.503165515Z I0430 17:48:31.503052 19112 update.go:2110] Running: rpm-ostree cleanup -r 2024-04-30T17:48:32.232728843Z Bootloader updated; bootconfig swap: yes; bootversion: boot.1.1, deployment count change: -1 2024-04-30T17:48:35.755815139Z Freed: 92.3 MB (pkgcache branches: 0) 2024-04-30T17:48:35.764568364Z I0430 17:48:35.764548 19112 daemon.go:1563] Validating against current config rendered-worker-a6be9fb3f667b76a611ce51811434cf9 2024-04-30T17:48:36.120148982Z I0430 17:48:36.120119 19112 rpm-ostree.go:394] Running captured: rpm-ostree kargs 2024-04-30T17:48:36.179660790Z I0430 17:48:36.179631 19112 update.go:2125] Validated on-disk state 2024-04-30T17:48:36.182434142Z I0430 17:48:36.182406 19112 daemon.go:1646] In desired config rendered-worker-a6be9fb3f667b76a611ce51811434cf9 2024-04-30T17:48:36.196911084Z I0430 17:48:36.196879 19112 config_drift_monitor.go:246] Config Drift Monitor started
Version-Release number of selected component (if applicable):
4.12.45
How reproducible:
They can reproduce in multiple clusters
Actual results:
Node stays with rendered-worker config
Expected results:
machineconfigpool updating should prompt a change to the desired config which the machine-config-daemon pod then updates node to
Additional info:
here is the latest must-gather where this issue is occuring: https://attachments.access.redhat.com/hydra/rest/cases/03803880/attachments/3fd0cf52-a770-4525-aecd-3a437ea70c9b?usePresignedUrl=true