Uploaded image for project: 'Red Hat Advanced Cluster Security'
  1. Red Hat Advanced Cluster Security
  2. ROX-31938

Peak disk I/O activity from the collector & scanner-db leading to node instability

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Undefined Undefined
    • None
    • None
    • Collector
    • None
    • Incidents & Support
    • False
    • Hide

      None

      Show
      None
    • False
    • Rox Sprint 4.10D, Rox Sprint 4.10E
    • Critical

      Description:

      • While using RHACS 4.8.0 on both RHOCP 4.16 and 4.17 clusters, on the latter we have seen occasional (but disruptive) disk I/O read activity from both the collector and scanner-db pods.
      • Those peak filesystem read operations are visible when in the cadvisor metrics when querying container_fs_reads_total and the top offenders on both nodes (so far, a control-plane an and infra node) were the ollector and scanner-db.
      • While stracing the PID of both pods, during those peak read_fs timeframe, within a 12 minutes timeframe we're seeing extensive disk I/O read syscalls from the collector's PID, but nothing particularly stands out as an anomalous single offender.

      The above-described activity leads to increased disk I/O wait on two nodes and was initially seen to disrupt apiserver and etcd pods activity.

              rh-ee-ovalenti Olivier Valentin
              rhn-support-rsandu Robert Sandu
              ACS Collector
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

                Created:
                Updated: