-
Bug
-
Resolution: Unresolved
-
Undefined
-
None
-
4.22
-
None
-
False
-
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
Description of problem:
AWS edge zone jobs are failing in cloud-provider-aws (CCM-AWS) tests when the selected (compute/worker) node is not allowed to schedule.
cloud-provider-aws tests are exposed through OTE, edge-zone jobs are unscheduled by design, requiring workloads to tolerate to intentionally run workloads on top of it, that's not the case of those tests
Considering the nodes is listed alphabetically, those edge zones is also labeled with node-role.kubernetes.io/worker, it is selected without checking the existence of node taints, as well without setting required tolerations:
taints:
- effect: NoSchedule
key: node-role.kubernetes.io/edge
Tests that we are observing impacting those jobs:
[cloud-provider-aws-e2e] loadbalancer NLB internal should be reachable with hairpinning traffic [Suite:openshift/conformance/parallel]
[cloud-provider-aws-e2e] loadbalancer CLB internal should be reachable with hairpinning traffic [Suite:openshift/conformance/parallel]
Version-Release number of selected component (if applicable):
4.22+
How reproducible:
always when the node is first in the list of worker nodes (which is listed by hostname)
Steps to Reproduce:
1.
2.
3.
Actual results:
Expected results:
1. cloud-provider-aws tests selects primarily worker nodes that can be scheduled
Additional info:
Example: jobs failing: https://prow.ci.openshift.org/view/gs/test-platform-results/logs/periodic-ci-openshift-release-main-nightly-4.22-e2e-aws-ovn-edge-zones/2025360636724121600
- is blocked by
-
SPLAT-2641 Issue track: CCM/AWS/e2e: <https://github.com/kubernetes/cloud-provider-aws/pull/1340> (support CCM tests on edge zones)
-
- In Progress
-
- is related to
-
SPLAT-2642 CI AWS/edge: run often edge zone nightly to get early issues in mainstream
-
- Closed
-