-
Task
-
Resolution: Done
-
Critical
-
None
-
None
-
None
-
None
-
5
-
False
-
-
False
-
Yes
-
---
-
---
-
MK - Sprint 221
WHAT
Investigate why nightly tests fail due to lack of resources. It is very likely that the created cluster is too small to accommodate all the resources installed during deployment of kas-fleet-manager.
The following events were observed for two created namespaces:
77m Warning FailedScheduling pod/strimzi-cluster-operator.v0.26.0-11-d9bb4d5-s8djg 0/9 nodes are available: 2 Insufficient cpu, 2 node(s) had taint {node-role.kubernetes.io/master: }, that the pod didn't tolerate, 2 node(s) were unschedulable, 3 node(s) had taint {node-role.kubernetes.io/infra: }, that the pod didn't tolerate. 77m Warning FailedScheduling pod/strimzi-cluster-operator.v0.26.0-11-d9bb4d5-s8djg 0/9 nodes are available: 2 Insufficient cpu, 2 node(s) had taint {node-role.kubernetes.io/master: }, that the pod didn't tolerate, 2 node(s) were unschedulable, 3 node(s) had taint {node-role.kubernetes.io/infra: }, that the pod didn't tolerate. 71m Warning FailedScheduling pod/strimzi-cluster-operator.v0.26.0-11-d9bb4d5-s8djg 0/9 nodes are available: 2 Insufficient cpu, 2 node(s) had taint {node-role.kubernetes.io/master: }, that the pod didn't tolerate, 2 node(s) were unschedulable, 3 node(s) had taint {node-role.kubernetes.io/infra: }, that the pod didn't tolerate. 18m Warning FailedScheduling pod/strimzi-cluster-operator.v0.26.0-11-d9bb4d5-s8djg 0/9 nodes are available: 1 node(s) were unschedulable, 2 Insufficient cpu, 3 node(s) had taint {node-role.kubernetes.io/infra: }, that the pod didn't tolerate, 3 node(s) had taint {node-role.kubernetes.io/master: }, that the pod didn't tolerate.
Manually adding additional machine pool with a size of m5.4xlarge remediated these issues. It might be worth investigating, if changing the size of the nodes in our config to bigger size (e.g. m5.4xlarge instead of m5.2xlarge) will solve the issue. If not - it might be required to add machine pool programmatically when running the tests.
WHY
To make nightly tests pass
HOW
<Suggestions for how this may be solved.> [Optional]
DONE
Include the following where applicable:
- <bulleted list of functional acceptance criteria that need to be completed>
- <call out anything on the documentation side that's needed as a result of this task being completed>
- <any metrics, monitoring dashboards and alerts that need to be created or be updated>
- <SOP creation or updates>
Guidelines
The following steps should be adhered to:
- Required tests should be put in place - unit, integration, manual test cases (if necessary)
- CI and all relevant tests passing
- Changes have been verified by one additional reviewer against:
- each required environment
- each supported upgrade path
- If the changes could have an impact on the clients (either UI or CLI), a JIRA should be created for making the required changes on the client side and acknowledged by one of the client side team members. PR has been merged