Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-58023

machine-config-daemon is kill while updating the system-units on the node, leading to disabled system-units after reboot.

XMLWordPrintable

    • Quality / Stability / Reliability
    • False
    • Hide

      None

      Show
      None
    • 3
    • Important
    • None
    • None
    • None
    • None
    • In Progress
    • Release Note Not Required
    • None
    • None
    • None
    • None
    • None

      Description

      While updating the node, more specifically while updating the system-units the machine-config-daemon is receiving a SIGTERM, remove the SIGTERM protection, and kill itself.
      After the node has rebooted, the system-units are still disabled leading to some inconsistency.

      Environment

      RHOCP 4.17.33, Single node cluster + 3 workers
      Customer is using a custom image (osImageURL).

      logs

      Based on the current information shared by the customer, the issue is happening randomly on different cluster while the machine-config-daemon is updating the node.

      machine-config-daemon log
      2025-06-18T20:53:30.623044384+00:00 stderr F I0618 20:53:30.622990  160840 file_writers.go:294] Writing systemd unit "restart-host.timer"
      2025-06-18T20:53:30.745661116+00:00 stderr F I0618 20:53:30.745619  160840 file_writers.go:307] Disabling systemd unit restart-host.timer before re-writing it
      2025-06-18T20:53:34.230978037+00:00 stderr F I0618 20:53:34.230670  160840 file_writers.go:294] Writing systemd unit "tpm-lockout.service"
      2025-06-18T20:53:34.298968775+00:00 stderr F I0618 20:53:34.298886  160840 file_writers.go:307] Disabling systemd unit tpm-lockout.service before re-writing it
      2025-06-18T20:53:37.538554080+00:00 stderr F I0618 20:53:37.538502  160840 file_writers.go:294] Writing systemd unit "enable-usbguard.service"
      2025-06-18T20:53:37.776100348+00:00 stderr F I0618 20:53:37.776046  160840 file_writers.go:307] Disabling systemd unit enable-usbguard.service before re-writing it
      2025-06-18T20:53:38.370326227+00:00 stderr F I0618 20:53:38.370262  160840 daemon.go:1323] Got SIGTERM, but actively updating
      2025-06-18T20:53:38.414833961+00:00 stderr F I0618 20:53:38.414769  160840 update.go:2689] Removing SIGTERM protection
      2025-06-18T20:53:38.414833961+00:00 stderr F E0618 20:53:38.414821  160840 writer.go:226] Marking Degraded due to: "daemon could not write systemd unit: disabling enable-usbguard.service failed: signal: terminated (output: )"
      2025-06-18T20:53:39.640953621+00:00 stderr F W0618 20:53:39.640907  160840 daemon.go:1398] Got an error from auxiliary tools: kubelet health check has failed 1 times: Get "http://localhost:10248/healthz": dial tcp [::1]:10248: connect: connection refused
      

              rh-ee-rsaini Rishabh Saini
              rhn-support-vlours Vincent Lours
              None
              None
              Sergio Regidor de la Rosa Sergio Regidor de la Rosa
              None
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

                Created:
                Updated: