Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-43765

MachineConfigs should not have Restart=on-failure for oneshot systemd units

    • Important
    • None
    • MCO Sprint 261, MCO Sprint 262, MCO Sprint 264, MCO Sprint 265
    • 4
    • False
    • Hide

      None

      Show
      None
    • Hide
      * Previously, {op-system-base-full} CoreOS templates that were shipped by the Machine Config Operator (MCO) caused node scaling to fail on {rh-openstack-first}. This issue happened because of an issue with `systemd` and the presence of a legacy boot image from older versions of {product-title}. With this release, a patch fixes the issue with `systemd` and removes the legacy boot image, so that node scaling can continue as expected. (link:https://issues.redhat.com/browse/OCPBUGS-43765[*OCPBUGS-43765*])
      Show
      * Previously, {op-system-base-full} CoreOS templates that were shipped by the Machine Config Operator (MCO) caused node scaling to fail on {rh-openstack-first}. This issue happened because of an issue with `systemd` and the presence of a legacy boot image from older versions of {product-title}. With this release, a patch fixes the issue with `systemd` and removes the legacy boot image, so that node scaling can continue as expected. (link: https://issues.redhat.com/browse/OCPBUGS-43765 [* OCPBUGS-43765 *])
    • Bug Fix
    • Done

      This is a clone of issue OCPBUGS-42577. The following is the description of the original issue:

      This is a clone of issue OCPBUGS-42324. The following is the description of the original issue:

      Description of problem:

      This is a spinoff of https://issues.redhat.com/browse/OCPBUGS-38012. For additional context please see that bug.
      
      The TLDR is that Restart=on-failure for oneshot units were only supported in systemd v244 and onwards, meaning any bootimage for 4.12 and previous doesn't support this on firstboot, and upgraded clusters would no longer be able to scale nodes if it references any such service.
      
      Right now this is only https://github.com/openshift/machine-config-operator/blob/master/templates/common/openstack/units/afterburn-hostname.service.yaml#L16-L24 which isn't covered by https://issues.redhat.com/browse/OCPBUGS-38012

      Version-Release number of selected component (if applicable):

      4.16 right now

      How reproducible:

      Uncertain, but https://issues.redhat.com/browse/OCPBUGS-38012 is 100%

      Steps to Reproduce:

          1.install old openstack cluster
          2.upgrade to 4.16
          3.attempt to scale node
          

      Actual results:

          

      Expected results:

          

      Additional info:

          

            [OCPBUGS-43765] MachineConfigs should not have Restart=on-failure for oneshot systemd units

            Since the problem described in this issue should be resolved in a recent advisory, it has been closed.

            For information on the advisory (Important: OpenShift Container Platform 4.16.32 bug fix and security update), and where to find the updated files, follow the link below.

            If the solution does not work for you, open a new bug report.
            https://access.redhat.com/errata/RHSA-2025:0650

            Errata Tool added a comment - Since the problem described in this issue should be resolved in a recent advisory, it has been closed. For information on the advisory (Important: OpenShift Container Platform 4.16.32 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2025:0650

            Verified using IPI on OSP version 4.16.0-0.nightly-2025-01-12-214227

            1.Scale up a node usin 4.12 image
            We can see in the journal:

            Tue 2025-01-14 10:20:17 UTC localhost.localdomain machine-config-daemon-firstboot.service[2207]: I0114 10:20:17.497663    2230 daemon.go:320] Booted osImageURL:  (412.86.202402272018-0) d46dacdf4ad6aaf1d2fc9fa501f20526d2644f370aa3dd09f4e0286d46fa7b0
            

            The node is created without problems. We don't see any message complaining about the RestartMode:

            sh-5.1# journalctl -o with-unit | grep RestartMode
            

            2. Scale up using a 4.16 image

            The node was created without problems.

            We can move the status to VERIFIED.

            Sergio Regidor de la Rosa added a comment - Verified using IPI on OSP version 4.16.0-0.nightly-2025-01-12-214227 1.Scale up a node usin 4.12 image We can see in the journal: Tue 2025-01-14 10:20:17 UTC localhost.localdomain machine-config-daemon-firstboot.service[2207]: I0114 10:20:17.497663 2230 daemon.go:320] Booted osImageURL: (412.86.202402272018-0) d46dacdf4ad6aaf1d2fc9fa501f20526d2644f370aa3dd09f4e0286d46fa7b0 The node is created without problems. We don't see any message complaining about the RestartMode: sh-5.1# journalctl -o with-unit | grep RestartMode 2. Scale up using a 4.16 image The node was created without problems. We can move the status to VERIFIED.

              jerzhang@redhat.com Yu Qi Zhang
              openshift-crt-jira-prow OpenShift Prow Bot
              Sergio Regidor de la Rosa Sergio Regidor de la Rosa
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

                Created:
                Updated:
                Resolved: