-
Bug
-
Resolution: Unresolved
-
Normal
-
premerge
-
Quality / Stability / Reliability
-
False
-
-
None
-
None
-
None
-
None
-
None
-
None
-
In Progress
-
Release Note Not Required
-
None
-
None
-
None
-
None
-
None
Initially merged race condition fix in upstream https://github.com/cri-o/cri-o/pull/9228
However, there are 3 more outstanding issues:
i) This prior patch addressing race conditions in this code section was
incomplete as it used 2 different locks for irqbalance and irq SMP
affinity files. This still allowed for a race condition wrt irqbalance
configuration.
ii) systemctl restart irqbalance:
+ // If the irqbalance service is enabled, restart it and return. + // systemd's StartLimitBurst might cause issues here when container restarts occur in very + // quick succession and the parameter must be reconfigured for this to work correctly. + // See: + // https://github.com/cri-o/cri-o/pull/8834/commits/b96928dcbb7956e0ebde42238e88955831411216
However, the problem here is that PR8834 is more of a workaround. Also, when a systemd service is restart in very quick succession, the service will actually ignore subsequent restart requests while it's still restarting. This might potentially an issue, to be investigated (e.g.: `for i in
{1..3}; do systemctl restart irqbalance & done` will only yield a single restart)
iii) kubelet can actual request crio to start a replacement container before deleting the old one, leading to invalid irq smp balance state. See private comment below.
This bug shall address i) and iii) as we already have a workaround in ii). However, ii) might still have to be addressed properly.
- links to