Details
-
Bug
-
Resolution: Unresolved
-
Blocker
-
4.13
-
None
-
Critical
-
13
-
Sprint 231, Sprint 232, Sprint 233, Sprint 234, Sprint 235, Sprint 236, Sprint 237, Sprint 238, Sprint 239, Sprint 240, Sprint 241, Sprint 242
-
12
-
Rejected
-
False
-
Description
Kube 1.26 introduced the warning level TopologyAwareHintsDisabled event. TopologyAwareHintsDisabled is fired by the EndpointSliceController whenever reconciling a service that has activated topology aware hints via the service.kubernetes.io/topology-aware-hints annotation, but there is not enough information in the existing cluster resources (typically nodes) to apply the topology aware hints.
When re-basing OpnShift onto Kube 1.26, are CI builds are failing (except on AWS), because these events are firing "pathologically", for example:
: [sig-arch] events should not repeat pathologically
events happened too frequently event happened 83 times, something is wrong: ns/openshift-dns service/dns-default - reason/TopologyAwareHintsDisabled Insufficient Node information: allocatable CPU or zone not specified on one or more nodes, addressType: IPv4 result=reject
AWS nodes seem to have the proper values in the nodes. GCP has the values also, but they are not "right" for the purposes of the EndpointSliceController:
event happened 38 times, something is wrong: ns/openshift-dns service/dns-default - reason/TopologyAwareHintsDisabled Unable to allocate minimum required endpoints to each zone without exceeding overload threshold (5 endpoints, 3 zones), addressType: IPv4 result=reject }
https://github.com/openshift/origin/pull/27666 will mask this problem (make it stop erroring in CI) but changes still need to be made in the product so end users are not subjected to these events.
Now links to:
test=[sig-arch] events should not repeat pathologically for namespace openshift-dns
Attachments
Issue Links
- blocks
-
OCPBUGS-11449 Excessive TopologyAwareHintsDisabled events due to service/dns-default with topology aware hints activated.
-
- POST
-
- is cloned by
-
OCPBUGS-11449 Excessive TopologyAwareHintsDisabled events due to service/dns-default with topology aware hints activated.
-
- POST
-
- is duplicated by
-
OCPBUGS-13366 DNS operator prone to spamming TopologyAwareHintsDisable events on GCP/Azure since May 5
-
- Closed
-
- relates to
-
OCPBUGS-13209 After custom tolerations of dns pod, the new pod stuck in pending state
-
- Verified
-
- links to