Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-56648

layered image update loops on failure to find systemd unit files, MCP degraded

XMLWordPrintable

    • Quality / Stability / Reliability
    • False
    • Hide

      None

      Show
      None
    • 5
    • Important
    • None
    • None
    • None
    • Rejected
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      Description of problem:

          During application of MC there is an error that zincati service does not exist. MCD loops indefinitely until node becomes degraded.

      Version-Release number of selected component (if applicable):

          4.19rc2

      How reproducible:

          I was successful deploying an image last week on this same cluster.
      After removing the MCP and node, then starting from scratch I'm now stuck after trying twice.

      Steps to Reproduce:

        Full details: https://github.com/dlbewley/demo-autofs/blob/update/layering/layering/readme.md 

          1. Create MCP called worker-automount
          2. Create machineconfigs for worker-automount MCP which enable autofs and other changes
          3. Create a MachineOSConfig v1 assoctiated with worker-automount with rpm autofs added via containerfile 
          4. Observe successful dnf insall of autofs rpm and successful push to internal registry
          5. Relabel node role to align to worker-automount MCP
          6. Watch MCD logs for node
          

      Actual results:

       Log file https://github.com/dlbewley/demo-autofs/blob/update/layering/layering/hub-v57jl-worker-0-dn4tm.mcd.log 

       I0525 22:50:52.933823    3452 update.go:2817] Removing SIGTERM protection
      E0525 22:50:52.934141    3452 writer.go:226] Marking Degraded due to: "error enabling units: Failed to enable unit: Unit file autofs.service does not exist.\n"
      I0525 22:51:36.812067    3452 certificate_writer.go:294] Certificate was synced from controllerconfig resourceVersion 7070796
      W0525 22:51:52.947425    3452 daemon.go:2738] Unable to check manifest for matching hash: error parsing image name "docker://quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:bb63b13cb9cd0b8c4398f17498f004aff2e7ad770f28c84dc532069ae3a76526": invalid image name "docker://quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:bb63b13cb9cd0b8c4398f17498f004aff2e7ad770f28c84dc532069ae3a76526", unknown transport "docker"
      I0525 22:51:52.948194    3452 image_manager_helper.go:92] Running captured: rpm-ostree kargs
      I0525 22:51:53.082506    3452 daemon.go:957] Preflight config drift check successful (took 135.197893ms)
      I0525 22:51:53.084032    3452 daemon.go:2580] Performing layered OS update
      I0525 22:51:53.147079    3452 update.go:2808] Adding SIGTERM protection
      I0525 22:51:53.147219    3452 upgrade_monitor.go:348] MCN Featuregate is not enabled. Please enable the TechPreviewNoUpgrade featureset to use MachineConfigNodes
      I0525 22:51:53.209434    3452 update.go:897] Checking Reconcilable for config rendered-worker-72d38a6c7ad0b42b1106ee4cf27b5718 to rendered-worker-automount-fc8896a7760f9ceccc935fe2940dc752
      I0525 22:51:53.354750    3452 update.go:2786] "drain is already completed on this node"
      I0525 22:51:53.360000    3452 update.go:939] Old MachineConfig rendered-worker-72d38a6c7ad0b42b1106ee4cf27b5718 / Image quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:bb63b13cb9cd0b8c4398f17498f004aff2e7ad770f28c84dc532069ae3a76526 -> New MachineConfig rendered-worker-automount-fc8896a7760f9ceccc935fe2940dc752 / Image image-registry.openshift-image-registry.svc:5000/openshift-machine-config-operator/os-image@sha256:a18f86073faf0659cfdf8753a2dc0697126515c264b2ce7da2a3b3b6f9931f7e
      I0525 22:51:53.360436    3452 update.go:2741] Running: rpm-ostree cleanup -p
      Deployments unchanged.
      I0525 22:51:53.464823    3452 update.go:2693] Updating OS to layered image "image-registry.openshift-image-registry.svc:5000/openshift-machine-config-operator/os-image@sha256:a18f86073faf0659cfdf8753a2dc0697126515c264b2ce7da2a3b3b6f9931f7e"
      I0525 22:51:53.464996    3452 image_manager_helper.go:92] Running captured: rpm-ostree --version
      I0525 22:51:53.510759    3452 image_manager_helper.go:194] Linking rpm-ostree authfile to /etc/mco/internal-registry-pull-secret.json
      I0525 22:51:53.510950    3452 rpm-ostree.go:183] Executing rebase to image-registry.openshift-image-registry.svc:5000/openshift-machine-config-operator/os-image@sha256:a18f86073faf0659cfdf8753a2dc0697126515c264b2ce7da2a3b3b6f9931f7e
      I0525 22:51:53.511074    3452 update.go:2741] Running: rpm-ostree rebase --experimental ostree-unverified-registry:image-registry.openshift-image-registry.svc:5000/openshift-machine-config-operator/os-image@sha256:a18f86073faf0659cfdf8753a2dc0697126515c264b2ce7da2a3b3b6f9931f7e
      Pulling manifest: ostree-unverified-registry:image-registry.openshift-image-registry.svc:5000/openshift-machine-config-operator/os-image@sha256:a18f86073faf0659cfdf8753a2dc0697126515c264b2ce7da2a3b3b6f9931f7e
      Importing: ostree-unverified-registry:image-registry.openshift-image-registry.svc:5000/openshift-machine-config-operator/os-image@sha256:a18f86073faf0659cfdf8753a2dc0697126515c264b2ce7da2a3b3b6f9931f7e (digest: sha256:a18f86073faf0659cfdf8753a2dc0697126515c264b2ce7da2a3b3b6f9931f7e)
      ostree chunk layers already present: 51
      custom layers already present: 2
      custom layers needed: 2 (18.0?MB)
      [0/2] Fetching layer 1f51303d225da7fe9a8 (1.4 MB)...done
      [1/2] Fetching layer 5d877c30355f1399875 (16.6 MB)...done
      Staging deployment...done
      Added:
        autofs-1:5.1.7-60.el9.x86_64
        libsss_autofs-2.9.6-4.el9_6.2.x86_64
        openldap-clients-2.6.8-4.el9.x86_64
      Changes queued for next boot. Run "systemctl reboot" to start a reboot
      I0525 22:52:35.921258    3452 update.go:1922] Updating files
      I0525 22:52:35.921384    3452 file_writers.go:234] Writing file "/usr/local/bin/nm-clean-initrd-state.sh"
      I0525 22:52:35.935435    3452 file_writers.go:234] Writing file "/etc/NetworkManager/conf.d/01-ipv6.conf"
      I0525 22:52:35.947770    3452 file_writers.go:234] Writing file "/etc/NetworkManager/conf.d/20-keyfiles.conf"
      I0525 22:52:35.958697    3452 file_writers.go:234] Writing file "/etc/NetworkManager/conf.d/99-vsphere.conf"
      I0525 22:52:35.970589    3452 file_writers.go:234] Writing file "/etc/NetworkManager/dispatcher.d/30-resolv-prepender"
      I0525 22:52:35.984255    3452 file_writers.go:234] Writing file "/etc/pki/ca-trust/source/anchors/openshift-config-user-ca-bundle.crt"
      I0525 22:52:35.996412    3452 file_writers.go:234] Writing file "/etc/kubernetes/apiserver-url.env"
      I0525 22:52:36.011610    3452 file_writers.go:234] Writing file "/etc/audit/rules.d/mco-audit-quiet-containers.rules"
      I0525 22:52:36.026693    3452 file_writers.go:234] Writing file "/etc/keepalived/monitor.conf"
      I0525 22:52:36.043301    3452 file_writers.go:234] Writing file "/etc/NetworkManager/dispatcher.d/99-esp-offload"
      I0525 22:52:36.059620    3452 file_writers.go:234] Writing file "/etc/tmpfiles.d/cleanup-cni.conf"
      I0525 22:52:36.078900    3452 file_writers.go:234] Writing file "/usr/local/bin/configure-ip-forwarding.sh"
      I0525 22:52:36.097108    3452 file_writers.go:234] Writing file "/usr/local/bin/configure-ovs.sh"
      I0525 22:52:36.122417    3452 file_writers.go:234] Writing file "/etc/containers/storage.conf"
      I0525 22:52:36.137962    3452 file_writers.go:234] Writing file "/etc/kubernetes/static-pod-resources/coredns/Corefile.tmpl"
      I0525 22:52:36.151096    3452 file_writers.go:234] Writing file "/etc/kubernetes/manifests/coredns.yaml"
      I0525 22:52:36.168778    3452 file_writers.go:234] Writing file "/etc/docker/certs.d/.create"
      I0525 22:52:36.187392    3452 file_writers.go:234] Writing file "/etc/mco/proxy.env"
      I0525 22:52:36.201891    3452 file_writers.go:234] Writing file "/etc/systemd/system.conf.d/10-default-env-godebug.conf"
      I0525 22:52:36.215477    3452 file_writers.go:234] Writing file "/etc/NetworkManager/dispatcher.d/99-gcp-disable-idpf-tx-checksum-off"
      I0525 22:52:36.239038    3452 file_writers.go:234] Writing file "/etc/modules-load.d/iptables.conf"
      I0525 22:52:36.262594    3452 file_writers.go:234] Writing file "/etc/kubernetes/static-pod-resources/keepalived/keepalived.conf.tmpl"
      I0525 22:52:36.280954    3452 file_writers.go:234] Writing file "/etc/kubernetes/static-pod-resources/keepalived/scripts/chk_default_ingress.sh.tmpl"
      I0525 22:52:36.296847    3452 file_writers.go:234] Writing file "/etc/kubernetes/manifests/keepalived.yaml"
      I0525 22:52:36.316419    3452 file_writers.go:234] Writing file "/etc/node-sizing-enabled.env"
      I0525 22:52:36.333887    3452 file_writers.go:234] Writing file "/usr/local/sbin/dynamic-system-reserved-calc.sh"
      I0525 22:52:36.355358    3452 file_writers.go:234] Writing file "/etc/systemd/system.conf.d/kubelet-cgroups.conf"
      I0525 22:52:36.371932    3452 file_writers.go:234] Writing file "/etc/systemd/system/kubelet.service.d/20-logging.conf"
      I0525 22:52:36.389240    3452 file_writers.go:234] Writing file "/etc/NetworkManager/conf.d/sdn.conf"
      I0525 22:52:36.404821    3452 file_writers.go:234] Writing file "/usr/local/bin/nmstate-configuration.sh"
      I0525 22:52:36.424396    3452 file_writers.go:234] Writing file "/etc/nmstate/nmstate.conf"
      I0525 22:52:36.440458    3452 file_writers.go:234] Writing file "/etc/NetworkManager/dispatcher.d/pre-up.d/10-ofport-request.sh"
      I0525 22:52:36.459975    3452 file_writers.go:234] Writing file "/etc/machine-config-daemon/generate_podman_policy_args.sh"
      I0525 22:52:36.482374    3452 file_writers.go:234] Writing file "/var/lib/kubelet/config.json"
      I0525 22:52:36.503976    3452 file_writers.go:234] Writing file "/usr/local/bin/resolv-prepender.sh"
      I0525 22:52:36.521954    3452 file_writers.go:234] Writing file "/etc/sysctl.d/arp.conf"
      I0525 22:52:36.540075    3452 file_writers.go:234] Writing file "/etc/sysctl.d/gc-thresh.conf"
      I0525 22:52:36.556888    3452 file_writers.go:234] Writing file "/etc/sysctl.d/inotify.conf"
      I0525 22:52:36.571402    3452 file_writers.go:234] Writing file "/etc/sysctl.d/enable-userfaultfd.conf"
      I0525 22:52:36.585089    3452 file_writers.go:234] Writing file "/etc/sysctl.d/vm-max-map.conf"
      I0525 22:52:36.600085    3452 file_writers.go:234] Writing file "/usr/local/bin/mco-hostname"
      I0525 22:52:36.613003    3452 file_writers.go:234] Writing file "/etc/kubernetes/kubelet-plugins/volume/exec/.dummy"
      I0525 22:52:36.633442    3452 file_writers.go:234] Writing file "/etc/NetworkManager/dispatcher.d/99-vsphere-disable-tx-udp-tnl"
      I0525 22:52:36.649742    3452 file_writers.go:234] Writing file "/usr/local/bin/vsphere-hostname.sh"
      I0525 22:52:36.663650    3452 file_writers.go:234] Writing file "/usr/local/bin/wait-for-br-ex-up.sh"
      I0525 22:52:36.680205    3452 file_writers.go:234] Writing file "/usr/local/bin/ipsec-connect-wait.sh"
      I0525 22:52:36.697165    3452 file_writers.go:234] Writing file "/usr/local/bin/wait-for-primary-ip.sh"
      I0525 22:52:36.713343    3452 file_writers.go:234] Writing file "/etc/containers/registries.conf"
      I0525 22:52:36.728603    3452 file_writers.go:234] Writing file "/etc/crio/crio.conf.d/00-default"
      I0525 22:52:36.745224    3452 file_writers.go:234] Writing file "/etc/machine-config-daemon/policy-for-old-podman.json"
      I0525 22:52:36.759200    3452 file_writers.go:234] Writing file "/etc/containers/policy.json"
      I0525 22:52:36.773321    3452 file_writers.go:234] Writing file "/etc/kubernetes/cloud.conf"
      I0525 22:52:36.786876    3452 file_writers.go:234] Writing file "/etc/kubernetes/crio-metrics-proxy.cfg"
      I0525 22:52:36.801379    3452 file_writers.go:234] Writing file "/etc/kubernetes/manifests/criometricsproxy.yaml"
      I0525 22:52:36.816360    3452 file_writers.go:234] Writing file "/etc/kubernetes/kubelet.conf"
      I0525 22:52:36.836196    3452 file_writers.go:234] Writing file "/usr/local/bin/kubenswrapper"
      I0525 22:52:36.849827    3452 file_writers.go:234] Writing file "/etc/auto.master.d/auto.home"
      I0525 22:52:36.862862    3452 file_writers.go:234] Writing file "/etc/sssd/sssd.conf"
      I0525 22:52:36.881425    3452 file_writers.go:234] Writing file "/etc/sssd/conf.d/homedir.conf"
      I0525 22:52:36.901171    3452 file_writers.go:294] Writing systemd unit "NetworkManager-clean-initrd-state.service"
      I0525 22:52:36.918744    3452 file_writers.go:208] Writing systemd unit dropin "01-kubens.conf"
      I0525 22:52:36.919927    3452 file_writers.go:194] Dropin for 10-mco-default-env.conf has no content, skipping write
      I0525 22:52:36.919964    3452 file_writers.go:208] Writing systemd unit dropin "10-mco-profile-unix-socket.conf"
      I0525 22:52:36.920557    3452 file_writers.go:208] Writing systemd unit dropin "05-mco-ordering.conf"
      I0525 22:52:36.921126    3452 file_writers.go:208] Writing systemd unit dropin "10-mco-default-madv.conf"
      I0525 22:52:37.972782    3452 update.go:2240] Preset systemd unit "crio.service"
      I0525 22:52:37.972841    3452 file_writers.go:208] Writing systemd unit dropin "mco-disabled.conf"
      I0525 22:52:37.993868    3452 update.go:2277] Could not reset unit preset for docker.socket, skipping. (Error msg: error running preset on unit: Failed to preset unit: Unit file docker.socket does not exist.
      )
      I0525 22:52:37.993910    3452 file_writers.go:294] Writing systemd unit "firstboot-osupdate.target"
      I0525 22:52:38.012076    3452 file_writers.go:208] Writing systemd unit dropin "01-after-configure-ovs.conf"
      I0525 22:52:38.029503    3452 update.go:2277] Could not reset unit preset for ipsec.service, skipping. (Error msg: error running preset on unit: Failed to preset unit: Unit file ipsec.service does not exist.
      )
      I0525 22:52:38.029597    3452 file_writers.go:294] Writing systemd unit "kubelet-auto-node-size.service"
      I0525 22:52:38.064568    3452 file_writers.go:294] Writing systemd unit "kubelet-dependencies.target"
      I0525 22:52:38.989506    3452 update.go:2240] Preset systemd unit "kubelet-dependencies.target"
      I0525 22:52:38.989551    3452 file_writers.go:208] Writing systemd unit dropin "01-kubens.conf"
      I0525 22:52:38.990460    3452 file_writers.go:194] Dropin for 10-mco-default-env.conf has no content, skipping write
      I0525 22:52:38.990496    3452 file_writers.go:208] Writing systemd unit dropin "10-mco-on-prem-wait-resolv.conf"
      I0525 22:52:38.991208    3452 file_writers.go:208] Writing systemd unit dropin "10-mco-default-madv.conf"
      I0525 22:52:38.991797    3452 file_writers.go:294] Writing systemd unit "kubelet.service"
      I0525 22:52:39.012854    3452 file_writers.go:294] Writing systemd unit "kubens.service"
      I0525 22:52:39.031722    3452 file_writers.go:294] Writing systemd unit "machine-config-daemon-firstboot.service"
      I0525 22:52:39.049094    3452 file_writers.go:294] Writing systemd unit "machine-config-daemon-pull.service"
      I0525 22:52:39.068441    3452 file_writers.go:294] Writing systemd unit "nmstate-configuration.service"
      I0525 22:52:39.089008    3452 file_writers.go:294] Writing systemd unit "node-valid-hostname.service"
      I0525 22:52:39.108525    3452 file_writers.go:294] Writing systemd unit "nodeip-configuration-vsphere-upi.service"
      I0525 22:52:39.125796    3452 file_writers.go:294] Writing systemd unit "nodeip-configuration.service"
      I0525 22:52:39.143031    3452 file_writers.go:294] Writing systemd unit "on-prem-resolv-prepender.path"
      I0525 22:52:39.160687    3452 file_writers.go:294] Writing systemd unit "on-prem-resolv-prepender.service"
      I0525 22:52:39.179779    3452 file_writers.go:294] Writing systemd unit "ovs-configuration.service"
      I0525 22:52:39.197423    3452 file_writers.go:208] Writing systemd unit dropin "10-ovs-vswitchd-restart.conf"
      I0525 22:52:40.268807    3452 update.go:2240] Preset systemd unit "ovs-vswitchd.service"
      I0525 22:52:40.268865    3452 file_writers.go:208] Writing systemd unit dropin "10-ovsdb-restart.conf"
      I0525 22:52:40.269785    3452 file_writers.go:194] Dropin for 10-mco-default-env.conf has no content, skipping write
      I0525 22:52:41.161007    3452 update.go:2240] Preset systemd unit "rpm-ostreed.service"
      I0525 22:52:41.161058    3452 file_writers.go:294] Writing systemd unit "vsphere-hostname.service"
      I0525 22:52:41.194624    3452 file_writers.go:294] Writing systemd unit "wait-for-br-ex-up.service"
      I0525 22:52:41.216878    3452 file_writers.go:294] Writing systemd unit "wait-for-ipsec-connect.service"
      I0525 22:52:41.237211    3452 file_writers.go:294] Writing systemd unit "wait-for-primary-ip.service"
      I0525 22:52:41.256307    3452 file_writers.go:208] Writing systemd unit dropin "mco-disabled.conf"
      I0525 22:52:41.273497    3452 update.go:2277] Could not reset unit preset for zincati.service, skipping. (Error msg: error running preset on unit: Failed to preset unit: Unit file zincati.service does not exist.
      )
      I0525 22:52:41.273656    3452 file_writers.go:294] Writing systemd unit "kubelet-cleanup.service"
      I0525 22:52:41.295576    3452 file_writers.go:294] Writing systemd unit "setsebool-nfs-home.service"
      I0525 22:52:41.347079    3452 update.go:2873] Already in desired image quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:bb63b13cb9cd0b8c4398f17498f004aff2e7ad770f28c84dc532069ae3a76526
      I0525 22:52:41.347117    3452 update.go:2741] Running: rpm-ostree cleanup -p
      Pruned images: 1 (layers: 2)
      Freed: 59.0?MB (pkgcache branches: 0)
      I0525 22:52:43.946933    3452 update.go:2817] Removing SIGTERM protection
      E0525 22:52:43.947036    3452 writer.go:226] Marking Degraded due to: "error enabling units: Failed to enable unit: Unit file autofs.service does not exist.\n"
      W0525 22:53:13.073395    3452 daemon.go:2738] Unable to check manifest for matching hash: error parsing image name "docker://quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:bb63b13cb9cd0b8c4398f17498f004aff2e7ad770f28c84dc532069ae3a76526": invalid image name "docker://quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:bb63b13cb9cd0b8c4398f17498f004aff2e7ad770f28c84dc532069ae3a76526", unknown transport "docker"
      I0525 22:53:13.073449    3452 image_manager_helper.go:92] Running captured: rpm-ostree kargs   

      Expected results:

      MCP updates node to custom image.    

      Additional info:

      It seems as though machineconfigs are being applied before the image is in place on the node?
      
      
      [root@hub-v57jl-worker-0-dn4tm ~]# rpm-ostree status
      State: busy
      Transaction: rebase --experimental 'ostree-unverified-registry:image-registry.openshift-image-registry.svc:5000/openshift-machine-config-operator/os-image@sha256:a18f86073faf0659cfdf8753a2dc0697126515c264b2ce7da2a3b3b6f9931f7e'
        Initiator: client(id:machine-config-operator dbus:1.5349 unit:crio-8ce30d46daaa44c80724020560851c8077761590c768e7fab07a38897a1ac296.scope uid:0)
      Deployments:
      ● ostree-unverified-registry:quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:bb63b13cb9cd0b8c4398f17498f004aff2e7ad770f28c84dc532069ae3a76526
                         Digest: sha256:bb63b13cb9cd0b8c4398f17498f004aff2e7ad770f28c84dc532069ae3a76526
                        Version: 9.6.20250514-0 (2025-05-14T23:44:17Z)
      
      # some time later...
      
      [root@hub-v57jl-worker-0-dn4tm ~]# rpm-ostree status
      State: idle
      Deployments:
      ● ostree-unverified-registry:quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:bb63b13cb9cd0b8c4398f17498f004aff2e7ad770f28c84dc532069ae3a76526
                         Digest: sha256:bb63b13cb9cd0b8c4398f17498f004aff2e7ad770f28c84dc532069ae3a76526
                        Version: 9.6.20250514-0 (2025-05-14T23:44:17Z)    

              zzlotnik@redhat.com Zack Zlotnik
              dbewley@redhat.com Dale Bewley
              None
              None
              Sergio Regidor de la Rosa Sergio Regidor de la Rosa
              None
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

                Created:
                Updated: