Loading...

XML

Word

Printable

Description:

While using RHACS 4.8.0 on both RHOCP 4.16 and 4.17 clusters, on the latter we have seen occasional (but disruptive) disk I/O read activity from both the collector and scanner-db pods.
Those peak filesystem read operations are visible when in the cadvisor metrics when querying container_fs_reads_total and the top offenders on both nodes (so far, a control-plane an and infra node) were the ollector and scanner-db.
While stracing the PID of both pods, during those peak read_fs timeframe, within a 12 minutes timeframe we're seeing extensive disk I/O read syscalls from the collector's PID, but nothing particularly stands out as an anomalous single offender.

The above-described activity leads to increased disk I/O wait on two nodes and was initially seen to disrupt apiserver and etcd pods activity.