Whenever etcd compaction batches take more than 10ms to perform compaction becomes request blocking. This upstream PR provides more detail
https://github.com/etcd-io/etcd/pull/19405
While investigating the kube apiserver request latency spike in 4.19 we have determined that RHEL 9.6 based host OS seems to yield a 10-20x increase in compaction duration. With this PR the impact on apiserver request latency is reduced by approximately 50% yet compaction duration remains high. We're trying to chase that with the kernel team independently.
Given that 4.17 and later are on 3.5.16+ it would be good to backport this fix through 4.17.
- blocks
-
OCPBUGS-53447 etcd compaction can become blocking when it shouldn't in 3.5.16+
-
- Closed
-
- clones
-
OCPBUGS-51838 etcd compaction can become blocking when it shouldn't in 3.5.16+
-
- Closed
-
- depends on
-
OCPBUGS-51838 etcd compaction can become blocking when it shouldn't in 3.5.16+
-
- Closed
-
- is cloned by
-
OCPBUGS-53447 etcd compaction can become blocking when it shouldn't in 3.5.16+
-
- Closed
-
- links to
-
RHBA-2025:2705 OpenShift Container Platform 4.18.z bug fix update
Since the problem described in this issue should be resolved in a recent advisory, it has been closed.
For information on the advisory (Important: OpenShift Container Platform 4.18.5 bug fix and security update), and where to find the updated files, follow the link below.
If the solution does not work for you, open a new bug report.
https://access.redhat.com/errata/RHSA-2025:2705