Uploaded image for project: 'RHEL'
  1. RHEL
  2. RHEL-46560

Tuned reports an ERROR when the affinity of defunct process cannot be adjusted

    • tuned-2.24.0-0.1.rc1.el9
    • No
    • High
    • sst_cs_infra_services
    • ssg_core_services
    • 23
    • 3
    • False
    • Hide

      None

      Show
      None
    • None
    • None
    • None

      What were you trying to do that didn't work?

      Applying a cpu-partitioning tuned profile sometimes reports

      tuned.plugins.plugin_scheduler: Failed to set affinity of PID 24360 to '[0, 1, 32, 33]': [Errno 22] Invalid argument

      This can be linked to a log by crio:

      Jun 28 12:49:57 master1.winterfell-mno-2.lab.neat.nsn-rdnet.net crio[4103]: time="2024-06-28 12:49:57.837486513Z" level=warning msg="Found defunct process with PID 24360 (haagent)"

      The failure is real and should be logged. However the severity is causing trouble with system health monitoring services, because ERROR is typically considered to be an unrecoverable error worth investigating.

      The severity on customer side is pretty high, because their cluster monitoring is always red due to this issue, even though the workload is not affected.

      Please provide the package NVR for which bug is seen:

      tuned as shipped in NTO for OCP 4.14.31

      How reproducible:

      Alway after reboot on customer cluster.

      Expected results

      Tuned should be smart enough and check whether the process (or interrupt) is actually present and running after getting this failure. The operating system is dynamic and such race conditions can happen (a process disappearing quickly for example). Tuned should ignore such transitional failures it can recognize as such.

            pzacik@redhat.com Pavol Zacik
            msivak@redhat.com Martin Sivak
            Jaroslav Skarvada Jaroslav Skarvada
            Robin Hack Robin Hack
            Votes:
            0 Vote for this issue
            Watchers:
            9 Start watching this issue

              Created:
              Updated: