Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-33144

machine-config-daemon pod not picking up on label and mcp change to push out new rendered- config

XMLWordPrintable

    • Low
    • No
    • False
    • Hide

      None

      Show
      None

      Description of problem:

      Node was created today with worker label. It was labeled as a loadbalancer to match mcp selector. MCP saw the selector and moved to Updating but the machine-config-daemon pod isn't responding. We tried deleting the pod and it still didn't pick up that it needed to get a new config. Manually editing the desired config appears to workaround the issue but shouldn't be necessary.

      Node created today:
      
      [dasmall@supportshell-1 03803880]$ oc get nodes worker-048.kub3.sttlwazu.vzwops.com -o yaml | yq .metadata.creationTimestamp
      '2024-04-30T17:17:56Z'
      
      Node has worker and loadbalancer roles:
      
      [dasmall@supportshell-1 03803880]$ oc get node worker-048.kub3.sttlwazu.vzwops.com
      NAME                                  STATUS   ROLES                 AGE   VERSION
      worker-048.kub3.sttlwazu.vzwops.com   Ready    loadbalancer,worker   1h    v1.25.14+a52e8df
      
      
      MCP shows a loadbalancer needing Update and 0 nodes in worker pool:
      
      [dasmall@supportshell-1 03803880]$ oc get mcp
      NAME           CONFIG                                                   UPDATED   UPDATING   DEGRADED   MACHINECOUNT   READYMACHINECOUNT   UPDATEDMACHINECOUNT   DEGRADEDMACHINECOUNT   AGE
      loadbalancer   rendered-loadbalancer-1486d925cac5a9366d6345552af26c89   False     True       False      4              3                   3                     0                      87d
      master         rendered-master-47f6fa5afe8ce8f156d80a104f8bacae         True      False      False      3              3                   3                     0                      87d
      worker         rendered-worker-a6be9fb3f667b76a611ce51811434cf9         True      False      False      0              0                   0                     0                      87d
      workerperf     rendered-workerperf-477d3621fe19f1f980d1557a02276b4e     True      False      False      38             38                  38                    0                      87d
      
      
      Status shows mcp updating:
      
      [dasmall@supportshell-1 03803880]$ oc get mcp loadbalancer -o yaml | yq .status.conditions[4]
      lastTransitionTime: '2024-04-30T17:33:21Z'
      message: All nodes are updating to rendered-loadbalancer-1486d925cac5a9366d6345552af26c89
      reason: ''
      status: 'True'
      type: Updating
      
      
      Node still appears happy with worker MC:
      
      [dasmall@supportshell-1 03803880]$ oc get node worker-048.kub3.sttlwazu.vzwops.com -o yaml | grep rendered-
          machineconfiguration.openshift.io/currentConfig: rendered-worker-a6be9fb3f667b76a611ce51811434cf9
          machineconfiguration.openshift.io/desiredConfig: rendered-worker-a6be9fb3f667b76a611ce51811434cf9
          machineconfiguration.openshift.io/desiredDrain: uncordon-rendered-worker-a6be9fb3f667b76a611ce51811434cf9
          machineconfiguration.openshift.io/lastAppliedDrain: uncordon-rendered-worker-a6be9fb3f667b76a611ce51811434cf9
      
      
      machine-config-daemon pod appears idle:
      
      [dasmall@supportshell-1 03803880]$ oc logs -n openshift-machine-config-operator machine-config-daemon-wx2b8 -c machine-config-daemon
      2024-04-30T17:48:29.868191425Z I0430 17:48:29.868156   19112 start.go:112] Version: v4.12.0-202311220908.p0.gef25c81.assembly.stream-dirty (ef25c81205a65d5361cfc464e16fd5d47c0c6f17)
      2024-04-30T17:48:29.871340319Z I0430 17:48:29.871328   19112 start.go:125] Calling chroot("/rootfs")
      2024-04-30T17:48:29.871602466Z I0430 17:48:29.871593   19112 update.go:2110] Running: systemctl daemon-reload
      2024-04-30T17:48:30.066554346Z I0430 17:48:30.066006   19112 rpm-ostree.go:85] Enabled workaround for bug 2111817
      2024-04-30T17:48:30.297743470Z I0430 17:48:30.297706   19112 daemon.go:241] Booted osImageURL: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:20b4937e8d107af19d8e39329e1767471b78ba6abd07b5a3e328dafd7b146858 (412.86.202311271639-0) 828584d351fcb58e4d799cebf271094d5d9b5c1a515d491ee5607b1dcf6ebf6b
      2024-04-30T17:48:30.324852197Z I0430 17:48:30.324543   19112 start.go:101] Copied self to /run/bin/machine-config-daemon on host
      2024-04-30T17:48:30.325677959Z I0430 17:48:30.325666   19112 start.go:188] overriding kubernetes api to https://api-int.kub3.sttlwazu.vzwops.com:6443
      2024-04-30T17:48:30.326381479Z I0430 17:48:30.326368   19112 metrics.go:106] Registering Prometheus metrics
      2024-04-30T17:48:30.326447815Z I0430 17:48:30.326440   19112 metrics.go:111] Starting metrics listener on 127.0.0.1:8797
      2024-04-30T17:48:30.327835814Z I0430 17:48:30.327811   19112 writer.go:93] NodeWriter initialized with credentials from /var/lib/kubelet/kubeconfig
      2024-04-30T17:48:30.327932144Z I0430 17:48:30.327923   19112 update.go:2125] Starting to manage node: worker-048.kub3.sttlwazu.vzwops.com
      2024-04-30T17:48:30.332123862Z I0430 17:48:30.332097   19112 rpm-ostree.go:394] Running captured: rpm-ostree status
      2024-04-30T17:48:30.332928272Z I0430 17:48:30.332909   19112 daemon.go:1049] Detected a new login session: New session 1 of user core.
      2024-04-30T17:48:30.332935796Z I0430 17:48:30.332926   19112 daemon.go:1050] Login access is discouraged! Applying annotation: machineconfiguration.openshift.io/ssh
      2024-04-30T17:48:30.368619942Z I0430 17:48:30.368598   19112 daemon.go:1298] State: idle
      2024-04-30T17:48:30.368619942Z Deployments:
      2024-04-30T17:48:30.368619942Z * ostree-unverified-registry:quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:20b4937e8d107af19d8e39329e1767471b78ba6abd07b5a3e328dafd7b146858
      2024-04-30T17:48:30.368619942Z                    Digest: sha256:20b4937e8d107af19d8e39329e1767471b78ba6abd07b5a3e328dafd7b146858
      2024-04-30T17:48:30.368619942Z                   Version: 412.86.202311271639-0 (2024-04-30T17:05:27Z)
      2024-04-30T17:48:30.368619942Z           LayeredPackages: kernel-devel kernel-headers
      2024-04-30T17:48:30.368619942Z
      2024-04-30T17:48:30.368619942Z   ostree-unverified-registry:quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:20b4937e8d107af19d8e39329e1767471b78ba6abd07b5a3e328dafd7b146858
      2024-04-30T17:48:30.368619942Z                    Digest: sha256:20b4937e8d107af19d8e39329e1767471b78ba6abd07b5a3e328dafd7b146858
      2024-04-30T17:48:30.368619942Z                   Version: 412.86.202311271639-0 (2024-04-30T17:05:27Z)
      2024-04-30T17:48:30.368619942Z           LayeredPackages: kernel-devel kernel-headers
      2024-04-30T17:48:30.368907860Z I0430 17:48:30.368884   19112 coreos.go:54] CoreOS aleph version: mtime=2023-08-08 11:20:41.285 +0000 UTC build=412.86.202308081039-0 imgid=rhcos-412.86.202308081039-0-metal.x86_64.raw
      2024-04-30T17:48:30.368932886Z I0430 17:48:30.368926   19112 coreos.go:71] Ignition provisioning: time=2024-04-30T17:03:44Z
      2024-04-30T17:48:30.368938120Z I0430 17:48:30.368931   19112 rpm-ostree.go:394] Running captured: journalctl --list-boots
      2024-04-30T17:48:30.372893750Z I0430 17:48:30.372884   19112 daemon.go:1307] journalctl --list-boots:
      2024-04-30T17:48:30.372893750Z -2 847e119666d9498da2ae1bd89aa4c4d0 Tue 2024-04-30 17:03:13 UTC—Tue 2024-04-30 17:06:32 UTC
      2024-04-30T17:48:30.372893750Z -1 9617b204b8b8412fb31438787f56a62f Tue 2024-04-30 17:09:06 UTC—Tue 2024-04-30 17:36:39 UTC
      2024-04-30T17:48:30.372893750Z  0 3cbf6edcacde408b8979692c16e3d01b Tue 2024-04-30 17:39:20 UTC—Tue 2024-04-30 17:48:30 UTC
      2024-04-30T17:48:30.372912686Z I0430 17:48:30.372891   19112 rpm-ostree.go:394] Running captured: systemctl list-units --state=failed --no-legend
      2024-04-30T17:48:30.378069332Z I0430 17:48:30.378059   19112 daemon.go:1322] systemd service state: OK
      2024-04-30T17:48:30.378069332Z I0430 17:48:30.378066   19112 daemon.go:987] Starting MachineConfigDaemon
      2024-04-30T17:48:30.378121340Z I0430 17:48:30.378106   19112 daemon.go:994] Enabling Kubelet Healthz Monitor
      2024-04-30T17:48:31.486786667Z I0430 17:48:31.486747   19112 daemon.go:457] Node worker-048.kub3.sttlwazu.vzwops.com is not labeled node-role.kubernetes.io/master
      2024-04-30T17:48:31.491674986Z I0430 17:48:31.491594   19112 daemon.go:1243] Current+desired config: rendered-worker-a6be9fb3f667b76a611ce51811434cf9
      2024-04-30T17:48:31.491674986Z I0430 17:48:31.491603   19112 daemon.go:1253] state: Done
      2024-04-30T17:48:31.495704843Z I0430 17:48:31.495617   19112 daemon.go:617] Detected a login session before the daemon took over on first boot
      2024-04-30T17:48:31.495704843Z I0430 17:48:31.495624   19112 daemon.go:618] Applying annotation: machineconfiguration.openshift.io/ssh
      2024-04-30T17:48:31.503165515Z I0430 17:48:31.503052   19112 update.go:2110] Running: rpm-ostree cleanup -r
      2024-04-30T17:48:32.232728843Z Bootloader updated; bootconfig swap: yes; bootversion: boot.1.1, deployment count change: -1
      2024-04-30T17:48:35.755815139Z Freed: 92.3 MB (pkgcache branches: 0)
      2024-04-30T17:48:35.764568364Z I0430 17:48:35.764548   19112 daemon.go:1563] Validating against current config rendered-worker-a6be9fb3f667b76a611ce51811434cf9
      2024-04-30T17:48:36.120148982Z I0430 17:48:36.120119   19112 rpm-ostree.go:394] Running captured: rpm-ostree kargs
      2024-04-30T17:48:36.179660790Z I0430 17:48:36.179631   19112 update.go:2125] Validated on-disk state
      2024-04-30T17:48:36.182434142Z I0430 17:48:36.182406   19112 daemon.go:1646] In desired config rendered-worker-a6be9fb3f667b76a611ce51811434cf9
      2024-04-30T17:48:36.196911084Z I0430 17:48:36.196879   19112 config_drift_monitor.go:246] Config Drift Monitor started

      Version-Release number of selected component (if applicable):

          4.12.45

      How reproducible:

          They can reproduce in multiple clusters

      Actual results:

          Node stays with rendered-worker config

      Expected results:

          machineconfigpool updating should prompt a change to the desired config which the machine-config-daemon pod then updates node to

      Additional info:

          here is the latest must-gather where this issue is occuring:
      https://attachments.access.redhat.com/hydra/rest/cases/03803880/attachments/3fd0cf52-a770-4525-aecd-3a437ea70c9b?usePresignedUrl=true

              team-mco Team MCO
              rhn-support-dasmall Daniel Small
              Sergio Regidor de la Rosa Sergio Regidor de la Rosa
              Votes:
              0 Vote for this issue
              Watchers:
              10 Start watching this issue

                Created:
                Updated: