Uploaded image for project: 'Satellite'
  1. Satellite
  2. SAT-37179

Tracer Resolve API Request Can Reboot dbus-broker After firewalld Causing firewalld service To Be In Failed State

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Normal Normal
    • None
    • None
    • katello-tracer
    • False
    • sat-endeavour
    • None
    • None
    • None
    • None

      Description of problem:

      Using the tracer api endpoint, api/hosts/[ID]/traces/resolve, to restart services on a RHEL9 content host, it is possible that the dbus-broker service gets restarted after the firewalld service, which causes firewalld to be in an inactive state.

      The issue was noticed and reproduced on RHEL9, but likely is affected on other versions of RHEL.

      As a side note, we can reproduce this issue easily on a system by restarting dbus-broker right after firewalld:

      systemctl restart firewalld; systemctl restart dbus-broker; sleep 10; systemctl status firewalld
      

      API Request to List Services Needing a Restart:

      curl -u admin:redhat -k -H "Content-type: application/json" -X GET https://satellite.example.com/api/hosts/10/traces | python -m json.tool | grep "\"id\":\|restart_command" | grep "systemctl" -B1
      Updating Subscription Management repositories.
        % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                       Dload  Upload   Total   Spent    Left  Speed
      100  3229  100  3229    0     0   1078      0  0:00:02  0:00:02 --:--:--  1078
                  "id": 140,
                  "restart_command": "systemctl restart auditd",
                  "id": 136,
                  "restart_command": "systemctl restart chronyd",
                  "id": 139,
                  "restart_command": "systemctl restart dbus-broker",
                  "id": 134,
                  "restart_command": "systemctl restart firewalld",
                  "id": 138,
                  "restart_command": "systemctl restart irqbalance",
                  "id": 144,
                  "restart_command": "systemctl restart NetworkManager",
                  "id": 142,
                  "restart_command": "systemctl restart qemu-guest-agent",
                  "id": 145,
                  "restart_command": "systemctl restart rsyslog",
                  "id": 135,
                  "restart_command": "systemctl restart sshd",
                  "id": 133,
                  "restart_command": "systemctl restart systemd-logind",
      

      API Request to Restart Services:

      curl -u admin:redhat -k -H "Content-type: application/json" -X PUT -d '{"trace_ids":["136","139","134","138","144","142","145","135","133"]}' https://satellite.example.com/api/hosts/10/traces/resolve
      

      Firewalld Inactive State:

      systemctl status firewalld
      ○ firewalld.service - firewalld - dynamic firewall daemon
           Loaded: loaded (/usr/lib/systemd/system/firewalld.service; enabled; preset: enabled)
           Active: inactive (dead) since Wed 2025-08-20 09:52:55 MDT; 2min 18s ago
         Duration: 138ms
             Docs: man:firewalld(1)
          Process: 32391 ExecStart=/usr/sbin/firewalld --nofork --nopid $FIREWALLD_ARGS (code=killed, signal=TERM)
         Main PID: 32391 (code=killed, signal=TERM)
              CPU: 229ms
      

      Dynflow Console Showing Order of Services Being Restarted:

      script: |-
        RETVAL=0
        systemctl restart systemd-logind
        if [ $? -ne 0 ]; then
          RETVAL=1
        fi
        systemctl restart firewalld
        if [ $? -ne 0 ]; then
          RETVAL=1
        fi
        systemctl restart sshd
        if [ $? -ne 0 ]; then
          RETVAL=1
        fi
        systemctl restart chronyd
        if [ $? -ne 0 ]; then
          RETVAL=1
        fi
        systemctl restart irqbalance
        if [ $? -ne 0 ]; then
          RETVAL=1
        fi
        systemctl restart dbus-broker
        if [ $? -ne 0 ]; then
          RETVAL=1
        fi
        systemctl restart qemu-guest-agent
        if [ $? -ne 0 ]; then
          RETVAL=1
        fi
        systemctl restart NetworkManager
        if [ $? -ne 0 ]; then
          RETVAL=1
        fi
        systemctl restart rsyslog
        if [ $? -ne 0 ]; then
          RETVAL=1
        fi
      

      How reproducible:
      It doesn't always reproduce the same, which indicates the traces/resolve api call restarts the services in a random order. I had to take a live snapshot of my content host, and repeat 2-3 times to reproduce.
       

      Is this issue a regression from an earlier version:
      No, reproduced on 6.16 and 6.17.
       

      Steps to Reproduce:

      1. Issue is not easy to reproduce, as I was not able to get this to reproduce during typical upgrades from minor releases (I tested going from 9.0 => 9.1 .. 9.6), and even skipping around.

      2. I had to update some packages to a particular version, and then update them to a particular version to mimic a real life scenario where you update in the middle of the same minor release.

      3. Also, the Satellite has to have a content view that has packages with different date filters, in order to update to the specific packages, or else dnf dependency could cause the packages to update to the wrong version, and the firewalld and dbus-broker services would not both be present for tracer to report a restart of those services.

      3. I will put the exact steps to reproduce in the additional section as it requires a lot of text.

      Actual behavior:
      Using traces/resolve api call, the dbus-broker service gets restarted after the firewalld service, which causes the firewalld service to be in inactive state

      Expected behavior:
      Expect the dbus-broker to get restarted first, as to not interfere with other service restarts.

      Business Impact / Additional info:

      To reproduce on RHEL9 I started with a fresh 9.0 host.

      My Satellite had synced content for RHEL9 AppStream and BaseOS on June 6th:

      Repo-id            : rhel-9-for-x86_64-appstream-rpms
      Repo-name          : Red Hat Enterprise Linux 9 for x86_64 - AppStream (RPMs)
      Repo-revision      : 1749195033
      Repo-updated       : Fri 06 Jun 2025 01:30:33 AM MDT
      
      Repo-id            : rhel-9-for-x86_64-baseos-rpms
      Repo-name          : Red Hat Enterprise Linux 9 for x86_64 - BaseOS (RPMs)
      Repo-revision      : 1749194564
      Repo-updated       : Fri 06 Jun 2025 01:22:44 AM MDT
      

      I then ran dnf upgrade my RHEL 9.0 content host for these packages:

      dnf upgrade glibc-2.34-168.el9_6.20.x86_64 glibc-common-2.34-168.el9_6.20.x86_64 glibc-gconv-extra-2.34-168.el9_6.20.x86_64 glibc-langpack-en-2.34-168.el9_6.20.x86_64 kernel-tools-5.14.0-570.24.1.el9_6.x86_64 kernel-tools-libs-5.14.0-570.24.1.el9_6.x86_64 python-unversioned-command-3.9.21-2.el9_6.1.noarch python3-3.9.21-2.el9_6.1.x86_64 python3-libs-3.9.21-2.el9_6.1.x86_64 sudo-1.9.5p2-10.el9_6.1.x86_64 NetworkManager-1:1.52.0-4.el9_6.x86_64 NetworkManager-libnm-1:1.52.0-4.el9_6.x86_64 NetworkManager-team-1:1.52.0-4.el9_6.x86_64 NetworkManager-tui-1:1.52.0-4.el9_6.x86_64 bind-libs-32:9.16.23-29.el9_6.x86_64 bind-license-32:9.16.23-29.el9_6.noarch bind-utils-32:9.16.23-29.el9_6.x86_64 device-mapper-multipath-0.8.7-35.el9_6.1.x86_64 device-mapper-multipath-libs-0.8.7-35.el9_6.1.x86_64 elfutils-debuginfod-client-0.192-6.el9_6.x86_64 elfutils-default-yama-scope-0.192-6.el9_6.noarch elfutils-libelf-0.192-6.el9_6.x86_64 elfutils-libs-0.192-6.el9_6.x86_64 iwl100-firmware-39.31.5.1-151.1.el9_6.noarch iwl1000-firmware-1:39.31.5.1-151.1.el9_6.noarch iwl105-firmware-18.168.6.1-151.1.el9_6.noarch iwl135-firmware-18.168.6.1-151.1.el9_6.noarch iwl2000-firmware-18.168.6.1-151.1.el9_6.noarch iwl2030-firmware-18.168.6.1-151.1.el9_6.noarch iwl3160-firmware-1:25.30.13.0-151.1.el9_6.noarch iwl5000-firmware-8.83.5.1_1-151.1.el9_6.noarch iwl5150-firmware-8.24.2.2-151.1.el9_6.noarch iwl6000g2a-firmware-18.168.6.1-151.1.el9_6.noarch iwl6050-firmware-41.28.5.1-151.1.el9_6.noarch iwl7260-firmware-1:25.30.13.0-151.1.el9_6.noarch kpartx-0.8.7-35.el9_6.1.x86_64 libdb-5.3.28-57.el9_6.x86_64 libdnf-plugin-subscription-manager-1.29.45.1-1.el9_6.x86_64 linux-firmware-20250513-151.1.el9_6.noarch linux-firmware-whence-20250513-151.1.el9_6.noarch microcode_ctl-4:20250211-1.20250512.1.el9_6.noarch python3-cloud-what-1.29.45.1-1.el9_6.x86_64 python3-subscription-manager-rhsm-1.29.45.1-1.el9_6.x86_64 rhc-1:0.2.7-1.el9_6.x86_64 subscription-manager-1.29.45.1-1.el9_6.x86_64 systemd-252-51.el9_6.1.x86_64 systemd-libs-252-51.el9_6.1.x86_64 systemd-pam-252-51.el9_6.1.x86_64 systemd-rpm-macros-252-51.el9_6.1.noarch systemd-udev-252-51.el9_6.1.x86_64 NetworkManager-tui-1:1.52.0-3.el9_6.x86_64 microcode_ctl-4:20250211-1.el9_6.noarch linux-firmware-20250415-146.5.el9_5.noarch iwl7260-firmware-1:25.30.13.0-151.el9_6.noarch iwl6050-firmware-41.28.5.1-151.el9_6.noarch iwl6000g2a-firmware-18.168.6.1-151.el9_6.noarch iwl5150-firmware-8.24.2.2-151.el9_6.noarch iwl5000-firmware-8.83.5.1_1-151.el9_6.noarch iwl3160-firmware-1:25.30.13.0-151.el9_6.noarch iwl2030-firmware-18.168.6.1-151.el9_6.noarch iwl2000-firmware-18.168.6.1-151.el9_6.noarch iwl135-firmware-18.168.6.1-151.el9_6.noarch iwl105-firmware-18.168.6.1-151.el9_6.noarch iwl1000-firmware-1:39.31.5.1-151.el9_6.noarch iwl100-firmware-39.31.5.1-151.el9_6.noarch device-mapper-multipath-0.8.7-35.el9.x86_64 elfutils-debuginfod-client-0.192-5.el9.x86_64 elfutils-libs-0.192-5.el9.x86_64 bind-utils-32:9.16.23-28.el9_6.x86_64 device-mapper-multipath-libs-0.8.7-35.el9.x86_64 elfutils-default-yama-scope-0.192-5.el9.noarch bind-libs-32:9.16.23-28.el9_6.x86_64 rhc-1:0.2.6-3.el9_6.x86_64 subscription-manager-1.29.45-1.el9.x86_64 python3-subscription-manager-rhsm-1.29.45-1.el9.x86_64 NetworkManager-team-1:1.52.0-3.el9_6.x86_64 python3-cloud-what-1.29.45-1.el9.x86_64 bind-license-32:9.16.23-28.el9_6.noarch linux-firmware-whence-20250415-146.5.el9_5.noarch NetworkManager-1:1.52.0-3.el9_6.x86_64 systemd-udev-252-51.el9.x86_64 systemd-252-51.el9.x86_64 NetworkManager-libnm-1:1.52.0-3.el9_6.x86_64 systemd-rpm-macros-252-51.el9.noarch systemd-libs-252-51.el9.x86_64 systemd-pam-252-51.el9.x86_64 libdnf-plugin-subscription-manager-1.29.45-1.el9.x86_64 elfutils-libelf-0.192-5.el9.x86_64 kpartx-0.8.7-35.el9.x86_64 libdb-5.3.28-55.el9.x86_64 sudo-1.9.5p2-10.el9_3.x86_64 kernel-tools-5.14.0-570.23.1.el9_6.x86_64 python3-3.9.21-2.el9.x86_64 python3-libs-3.9.21-2.el9.x86_64 kernel-tools-libs-5.14.0-570.23.1.el9_6.x86_64 python-unversioned-command-3.9.21-2.el9.noarch glibc-2.34-168.el9_6.19.x86_64 glibc-langpack-en-2.34-168.el9_6.19.x86_64 glibc-gconv-extra-2.34-168.el9_6.19.x86_64 glibc-common-2.34-168.el9_6.19.x86_64
      

      Then I synced the RHEL9 AppStream and BaseOS repositories, so the repository metadata showed this date:

      Repo-id            : rhel-9-for-x86_64-appstream-rpms
      Repo-name          : Red Hat Enterprise Linux 9 for x86_64 - AppStream (RPMs)
      Repo-revision      : 1755195796
      Repo-updated       : Thu 14 Aug 2025 12:23:16 PM MDT
      
      Repo-id            : rhel-9-for-x86_64-baseos-rpms
      Repo-name          : Red Hat Enterprise Linux 9 for x86_64 - BaseOS (RPMs)
      Repo-revision      : 1755195567
      Repo-updated       : Thu 14 Aug 2025 12:19:25 PM MDT
      

      Next, I ran dnf update with these packages (I understand I ran update this time, not sure if it is actually doing a full update, or actually updating the specific packages I added)

      dnf update linux-firmware-whence-20250513-151.1.el9_6.noarch systemd-libs-252-51.el9_6.1.x86_64 NetworkManager-libnm-1:1.52.0-4.el9_6.x86_64 elfutils-libelf-0.192-6.el9_6.x86_64 python3-cloud-what-1.29.45.1-1.el9_6.x86_64 python3-subscription-manager-rhsm-1.29.45.1-1.el9_6.x86_64 device-mapper-multipath-libs-0.8.7-35.el9_6.1.x86_64 bind-license-32:9.16.23-29.el9_6.noarch bind-libs-32:9.16.23-29.el9_6.x86_64 systemd-rpm-macros-252-51.el9_6.1.noarch systemd-pam-252-51.el9_6.1.x86_64 systemd-252-51.el9_6.1.x86_64 systemd-udev-252-51.el9_6.1.x86_64 NetworkManager-1:1.52.0-4.el9_6.x86_64 elfutils-default-yama-scope-0.192-6.el9_6.noarch elfutils-libs-0.192-6.el9_6.x86_64 kpartx-0.8.7-35.el9_6.1.x86_64 libdnf-plugin-subscription-manager-1.29.45.1-1.el9_6.x86_64 subscription-manager-1.29.45.1-1.el9_6.x86_64 rhc-1:0.2.7-1.el9_6.x86_64 device-mapper-multipath-0.8.7-35.el9_6.1.x86_64 elfutils-debuginfod-client-0.192-6.el9_6.x86_64 NetworkManager-tui-1:1.52.0-4.el9_6.x86_64 NetworkManager-team-1:1.52.0-4.el9_6.x86_64 microcode_ctl-4:20250211-1.20250512.1.el9_6.noarch bind-utils-32:9.16.23-29.el9_6.x86_64 iwl100-firmware-39.31.5.1-151.1.el9_6.noarch iwl1000-firmware-1:39.31.5.1-151.1.el9_6.noarch iwl105-firmware-18.168.6.1-151.1.el9_6.noarch iwl135-firmware-18.168.6.1-151.1.el9_6.noarch iwl2000-firmware-18.168.6.1-151.1.el9_6.noarch iwl2030-firmware-18.168.6.1-151.1.el9_6.noarch iwl3160-firmware-1:25.30.13.0-151.1.el9_6.noarch iwl5000-firmware-8.83.5.1_1-151.1.el9_6.noarch iwl5150-firmware-8.24.2.2-151.1.el9_6.noarch iwl6000g2a-firmware-18.168.6.1-151.1.el9_6.noarch iwl6050-firmware-41.28.5.1-151.1.el9_6.noarch iwl7260-firmware-1:25.30.13.0-151.1.el9_6.noarch linux-firmware-20250513-151.1.el9_6.noarch libdb-5.3.28-57.el9_6.x86_64 glibc-common-2.34-168.el9_6.20.x86_64 glibc-gconv-extra-2.34-168.el9_6.20.x86_64 glibc-langpack-en-2.34-168.el9_6.20.x86_64 glibc-2.34-168.el9_6.20.x86_64 python-unversioned-command-3.9.21-2.el9_6.1.noarch python3-3.9.21-2.el9_6.1.x86_64 python3-libs-3.9.21-2.el9_6.1.x86_64 kernel-tools-libs-5.14.0-570.24.1.el9_6.x86_64 kernel-tools-5.14.0-570.24.1.el9_6.x86_64 sudo-1.9.5p2-10.el9_6.1.x86_64
      

      Then I ran the traces api endpoint to list services and ids to be restarted:

      curl -u admin:redhat -k -H "Content-type: application/json" -X GET https://satellite.example.com/api/hosts/10/traces | python -m json.tool | grep "\"id\":\|restart_command" | grep "systemctl" -B1
      

      And finally, I restarted the services with the traces/resolve api call:

      curl -u admin:redhat -k -H "Content-type: application/json" -X PUT -d '{"trace_ids":["136","139","134","138","144","142","145","135","133"]}' https://satellite.example.com/api/hosts/10/traces/resolve
      

      Note, it doesn't always restart the services in the same order. Before running the last api call, I took a live snapshot of my content host so I could continue testing until I reproduce. It took 2-3 times to reproduce.

              Unassigned Unassigned
              rhn-support-myoder Michael Yoder
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

                Created:
                Updated: