Loading...

XML

Word

Printable

Type: Task
Resolution: Done
Priority: Critical
Fix Version/s: None
Affects Version/s: None
Component/s: None
Labels:
None

Story Points:
5
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Ready:
False
Discussed with Team:
Yes
Git Pull Request:
https://github.com/bf2fc6cc711aee1a0c2a/kas-fleet-manager/pull/1147, https://github.com/bf2fc6cc711aee1a0c2a/kas-fleet-manager/pull/1153
[QE] How to address?:
---
[QE] Why QE missed?:
---

Sprint:
MK - Sprint 221

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

WHAT

Investigate why nightly tests fail due to lack of resources. It is very likely that the created cluster is too small to accommodate all the resources installed during deployment of kas-fleet-manager.

The following events were observed for two created namespaces:

77m         Warning   FailedScheduling               pod/strimzi-cluster-operator.v0.26.0-11-d9bb4d5-s8djg                 0/9 nodes are available: 2 Insufficient cpu, 2 node(s) had taint {node-role.kubernetes.io/master: }, that the pod didn't tolerate, 2 node(s) were unschedulable, 3 node(s) had taint {node-role.kubernetes.io/infra: }, that the pod didn't tolerate.
77m         Warning   FailedScheduling               pod/strimzi-cluster-operator.v0.26.0-11-d9bb4d5-s8djg                 0/9 nodes are available: 2 Insufficient cpu, 2 node(s) had taint {node-role.kubernetes.io/master: }, that the pod didn't tolerate, 2 node(s) were unschedulable, 3 node(s) had taint {node-role.kubernetes.io/infra: }, that the pod didn't tolerate.
71m         Warning   FailedScheduling               pod/strimzi-cluster-operator.v0.26.0-11-d9bb4d5-s8djg                 0/9 nodes are available: 2 Insufficient cpu, 2 node(s) had taint {node-role.kubernetes.io/master: }, that the pod didn't tolerate, 2 node(s) were unschedulable, 3 node(s) had taint {node-role.kubernetes.io/infra: }, that the pod didn't tolerate.
18m         Warning   FailedScheduling               pod/strimzi-cluster-operator.v0.26.0-11-d9bb4d5-s8djg                 0/9 nodes are available: 1 node(s) were unschedulable, 2 Insufficient cpu, 3 node(s) had taint {node-role.kubernetes.io/infra: }, that the pod didn't tolerate, 3 node(s) had taint {node-role.kubernetes.io/master: }, that the pod didn't tolerate.

Manually adding additional machine pool with a size of m5.4xlarge remediated these issues. It might be worth investigating, if changing the size of the nodes in our config to bigger size (e.g. m5.4xlarge instead of m5.2xlarge) will solve the issue. If not - it might be required to add machine pool programmatically when running the tests.

WHY

To make nightly tests pass

HOW

<Suggestions for how this may be solved.> [Optional]

DONE

Include the following where applicable:

<bulleted list of functional acceptance criteria that need to be completed>
<call out anything on the documentation side that's needed as a result of this task being completed>
<any metrics, monitoring dashboards and alerts that need to be created or be updated>
<SOP creation or updates>

Guidelines

The following steps should be adhered to:

Required tests should be put in place - unit, integration, manual test cases (if necessary)
CI and all relevant tests passing
Changes have been verified by one additional reviewer against:
each required environment
each supported upgrade path
If the changes could have an impact on the clients (either UI or CLI), a JIRA should be created for making the required changes on the client side and acknowledged by one of the client side team members. PR has been merged

Assignee:: Manyanda Chitimbo

Reporter:: Pawel paszki

Team:: MK - Control Plane

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Created:: 2022/06/15 7:20 AM

Updated:: 2022/07/01 11:15 AM

Resolved:: 2022/07/01 11:15 AM

Details

Description

WHAT

WHY

HOW

DONE

Guidelines

Attachments

Easy Agile Planning Poker

Activity

People

Dates