Uploaded image for project: 'Managed Service - Streams'
  1. Managed Service - Streams
  2. MGDSTRM-8834

Investigate why nightly tests fail due to lack of resources

XMLWordPrintable

    • Icon: Task Task
    • Resolution: Done
    • Icon: Critical Critical
    • None
    • None
    • None
    • None
    • MK - Sprint 221

      WHAT

      Investigate why nightly tests fail due to lack of resources. It is very likely that the created cluster is too small to accommodate all the resources installed during deployment of kas-fleet-manager. 

      The following events were observed for two created namespaces:

      77m         Warning   FailedScheduling               pod/strimzi-cluster-operator.v0.26.0-11-d9bb4d5-s8djg                 0/9 nodes are available: 2 Insufficient cpu, 2 node(s) had taint {node-role.kubernetes.io/master: }, that the pod didn't tolerate, 2 node(s) were unschedulable, 3 node(s) had taint {node-role.kubernetes.io/infra: }, that the pod didn't tolerate.
      77m         Warning   FailedScheduling               pod/strimzi-cluster-operator.v0.26.0-11-d9bb4d5-s8djg                 0/9 nodes are available: 2 Insufficient cpu, 2 node(s) had taint {node-role.kubernetes.io/master: }, that the pod didn't tolerate, 2 node(s) were unschedulable, 3 node(s) had taint {node-role.kubernetes.io/infra: }, that the pod didn't tolerate.
      71m         Warning   FailedScheduling               pod/strimzi-cluster-operator.v0.26.0-11-d9bb4d5-s8djg                 0/9 nodes are available: 2 Insufficient cpu, 2 node(s) had taint {node-role.kubernetes.io/master: }, that the pod didn't tolerate, 2 node(s) were unschedulable, 3 node(s) had taint {node-role.kubernetes.io/infra: }, that the pod didn't tolerate.
      18m         Warning   FailedScheduling               pod/strimzi-cluster-operator.v0.26.0-11-d9bb4d5-s8djg                 0/9 nodes are available: 1 node(s) were unschedulable, 2 Insufficient cpu, 3 node(s) had taint {node-role.kubernetes.io/infra: }, that the pod didn't tolerate, 3 node(s) had taint {node-role.kubernetes.io/master: }, that the pod didn't tolerate. 

      Manually adding additional machine pool with a size of m5.4xlarge remediated these issues. It might be worth investigating, if changing the size of the nodes in our config  to bigger size (e.g. m5.4xlarge instead of m5.2xlarge) will solve the issue. If not - it might be required to add machine pool programmatically when running the tests.

      WHY

      To make nightly tests pass

      HOW

      <Suggestions for how this may be solved.> [Optional]

      DONE

      Include the following where applicable:

      • <bulleted list of functional acceptance criteria that need to be completed>
      • <call out anything on the documentation side that's needed as a result of this task being completed>
      • <any metrics, monitoring dashboards and alerts that need to be created or be updated>
      • <SOP creation or updates>

      Guidelines

      The following steps should be adhered to:

      • Required tests should be put in place - unit, integration, manual test cases (if necessary)
      • CI and all relevant tests passing
      • Changes have been verified by one additional reviewer against:
      • each required environment
      • each supported upgrade path
      • If the changes could have an impact on the clients (either UI or CLI), a JIRA should be created for making the required changes on the client side and acknowledged by one of the client side team members. PR has been merged
         

              mchitimb-1 Manyanda Chitimbo
              ppaszki Pawel paszki
              MK - Control Plane
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

                Created:
                Updated:
                Resolved: