Certain pods like CatalogSource are scheduled on master (control plane) nodes. This can lead to increased CPU usage, particularly from the Operator Package Manager (OPM) process, which in turn causes performance degradation. In severe cases, this resource contention may result in the failure or delay of Operator installations, as noted in https://issues.redhat.com/browse/OCPBUGS-43966
Root Cause:
During testing, the issue was identified as CPU contention on the master nodes. By default, the CatalogSource pods come with tolerations that allow them to run on master nodes. In our environment, the master nodes were already under high load, which likely triggered the observed problems with the Operator Lifecycle Manager (OLM).
Workaround Implemented:
To mitigate this issue, we applied the following workaround:
- Disabled the default Red Hat catalog source:
oc patch operatorhub/cluster --type merge --patch '{"spec":{"sources":[ {"disabled":true,"name":"redhat-operators"}]}}'
Manually re-created the CatalogSource without tolerations, ensuring that the pod schedules on a non-master node.Outcome:
After this change, the CatalogSource pod was scheduled on a worker node with more available resources. As a result, previously stuck or delayed install plans started appearing almost instantly, resolving the issue effectively.
--------------------------------