Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-4218

highperformance irq balancing support causes the /etc/sysconfig/irqbalance to slowly grow unbounded

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Done
    • Icon: Normal Normal
    • None
    • 4.12.0, 4.11.z, 4.10.z, 4.9.z, 4.8.z
    • Node / CRI-O
    • None
    • Moderate
    • None
    • False
    • Hide

      None

      Show
      None

      Description of problem:

      The highperformance hooks can update the irqbalance configuration file to adjust the current banned cpu list,
      
      depending on pod annotations. This is done to make sure that guaranteed pods that declare they don't want to have IRQ
      
      processing on the CPUs they're assigned to, gets that.
      When the update is done in-place, so likely most of the time,  a stray '\n' is added on every update. The single character
      
      always adds up, creating a unbounded growth of the configuration file.
      In practical terms, this is likely not really too bad, 
      because we will need to get in the ballpark of thousands updates before the file size starts becoming concerning.
      
      
      This bug is likely present since the first addition of the high performance hooks

      Version-Release number of selected component (if applicable):

      $ crio --version
      crio version 1.24.1-11.rhaos4.11.gitb0d2ef3.el8
      Version:          1.24.1-11.rhaos4.11.gitb0d2ef3.el8
      GoVersion:        go1.18.1
      Compiler:         gc
      Platform:         linux/amd64
      Linkmode:         dynamic
      BuildTags:        exclude_graphdriver_devicemapper, containers_image_ostree_stub, seccomp, selinux
      SeccompEnabled:   true
      AppArmorEnabled:  false
      $ kubectl --version
      Client Version: 4.10.9
      Server Version: 4.11.0-0.nightly-2022-07-19-104004
      Kubernetes Version: v1.24.0+9546431

      How reproducible:

      Always

      Steps to Reproduce:

      setup a cluster >= OCP 4.11 (for practical purposes, but the bug is likely here since the introduction of irq handling)
      apply a performance profile like this minimal example[1], to get the runtimeclass deployed which enables the feature
      create/delete pods like the example[2]
       

      Actual results:

      notice (oc debug node/...) after each pod creation a \n is added to /etc/sysconfig/irqbalance

      Expected results:

      obviously the file size can change (and grow) depending on the actual CPU ban list, but the rewrites should not cause unlimited '\n' to be added to the file

      Additional info:

      This has been fixed in 4.12 already, I am opening the bug to allow backports to earlier releases.
      
      See: https://github.com/cri-o/cri-o/issues/6086
      Patch: https://github.com/cri-o/cri-o/pull/6087
      
      

       

            pehunt@redhat.com Peter Hunt
            msivak@redhat.com Martin Sivak
            Sunil Choudhary Sunil Choudhary
            Francesco Romani
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

              Created:
              Updated:
              Resolved: