Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-59403

4.21: CPU Pinned not visible in IRQ after pod restart (follow up)

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Normal Normal
    • 4.21
    • premerge
    • Node / CRI-O
    • Quality / Stability / Reliability
    • False
    • Hide

      None

      Show
      None
    • None
    • None
    • None
    • None
    • None
    • None
    • In Progress
    • Release Note Not Required
    • None
    • None
    • None
    • None
    • None

      Initially merged race condition fix in upstream https://github.com/cri-o/cri-o/pull/9228

      However, there are 3 more outstanding issues:

      i) This prior patch addressing race conditions in this code section was
      incomplete as it used 2 different locks for irqbalance and irq SMP
      affinity files. This still allowed for a race condition wrt irqbalance
      configuration.

      ii) systemctl restart irqbalance:

      +       // If the irqbalance service is enabled, restart it and return.
      +       // systemd's StartLimitBurst might cause issues here when container restarts occur in very
      +       // quick succession and the parameter must be reconfigured for this to work correctly.
      +       // See:
      +       // https://github.com/cri-o/cri-o/pull/8834/commits/b96928dcbb7956e0ebde42238e88955831411216
      

      However, the problem here is that PR8834 is more of a workaround. Also, when a systemd service is restart in very quick succession, the service will actually ignore subsequent restart requests while it's still restarting. This might potentially an issue, to be investigated (e.g.: `for i in

      {1..3}

      ; do systemctl restart irqbalance & done` will only yield a single restart)

      iii) kubelet can actual request crio to start a replacement container before deleting the old one, leading to invalid irq smp balance state. See private comment below.

      This bug shall address i) and iii) as we already have a workaround in ii). However, ii) might still have to be addressed properly.

              akaris@redhat.com Andreas Karis
              akaris@redhat.com Andreas Karis
              None
              None
              Min Li Min Li
              None
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

                Created:
                Updated: