Loading...

XML

Word

Printable

Type: Bug
Resolution: Done-Errata
Priority: Critical
Fix Version/s: 4.13.z
Affects Version/s: 4.13
Component/s: Networking / DNS
Labels:
None

Severity:
Critical
Regression:
No
Sprint:
Sprint 234, Sprint 235, Sprint 236, Sprint 237, Sprint 238, Sprint 239, Sprint 240, Sprint 241, Sprint 242, Sprint 243, Sprint 244, Sprint 245, Sprint 246, Sprint 247, Sprint 248, Sprint 249, Sprint 250, Sprint 251, Sprint 252, Sprint 254, NE Sprint 255, NE Sprint 256
sprint_count:
22
Release Blocker:
Rejected
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Release Note Text:

Hide
* Previously, the DNS Operator did not verify that a cluster had ready nodes with available CPU in at least two availability zones, and the DNS daemon set did not use surge for rolling updates. As a result, clusters in which all nodes were in the same availability zone would repeatedly emit TopologyAwareHintsDisabled events for the cluster DNS service. With this release, the TopologyAwareHintsDisabled events are no longer emitted on clusters that do not have nodes in multiple availability zones and the issue has been resolved. (link:https://issues.redhat.com/browse/OCPBUGS-11449[*~~OCPBUGS-11449~~*])

Show
* Previously, the DNS Operator did not verify that a cluster had ready nodes with available CPU in at least two availability zones, and the DNS daemon set did not use surge for rolling updates. As a result, clusters in which all nodes were in the same availability zone would repeatedly emit TopologyAwareHintsDisabled events for the cluster DNS service. With this release, the TopologyAwareHintsDisabled events are no longer emitted on clusters that do not have nodes in multiple availability zones and the issue has been resolved. (link: https://issues.redhat.com/browse/OCPBUGS-11449 [* OCPBUGS-11449 *])
Release Note Type:
Bug Fix
Release Note Status:
In Progress
Target Version:

4.13.z

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

PX Impact Score:
PX Priority Data:

This is a clone of issue ~~OCPBUGS-5943~~. The following is the description of the original issue:
—
Kube 1.26 introduced the warning level TopologyAwareHintsDisabled event. TopologyAwareHintsDisabled is fired by the EndpointSliceController whenever reconciling a service that has activated topology aware hints via the service.kubernetes.io/topology-aware-hints annotation, but there is not enough information in the existing cluster resources (typically nodes) to apply the topology aware hints.

When re-basing OpnShift onto Kube 1.26, are CI builds are failing (except on AWS), because these events are firing "pathologically", for example:

: [sig-arch] events should not repeat pathologically
events happened too frequently event happened 83 times, something is wrong: ns/openshift-dns service/dns-default - reason/TopologyAwareHintsDisabled Insufficient Node information: allocatable CPU or zone not specified on one or more nodes, addressType: IPv4 result=reject

AWS nodes seem to have the proper values in the nodes. GCP has the values also, but they are not "right" for the purposes of the EndpointSliceController:

event happened 38 times, something is wrong: ns/openshift-dns service/dns-default - reason/TopologyAwareHintsDisabled Unable to allocate minimum required endpoints to each zone without exceeding overload threshold (5 endpoints, 3 zones), addressType: IPv4 result=reject }

https://github.com/openshift/origin/pull/27666 will mask this problem (make it stop erroring in CI) but changes still need to be made in the product so end users are not subjected to these events.

clones

OCPBUGS-5943 Excessive TopologyAwareHintsDisabled events due to service/dns-default with topology aware hints activated.

Closed

is blocked by

OCPBUGS-5943 Excessive TopologyAwareHintsDisabled events due to service/dns-default with topology aware hints activated.

Closed

is duplicated by

OCPBUGS-35713 DNS cluster operator degraded after fresh installation.

Closed

links to

openshift/cluster-dns-operator#366: [release-4.13] OCPBUGS-11449: Set DNS DaemonSet's maxSurge value to 10%

openshift/cluster-dns-operator#367: [release-4.13] OCPBUGS-11449: Enable topology-aware hints if, and only if, nodes have zones

openshift/cluster-dns-operator#417: [release-4.13] OCPBUGS-11449: Ignore max unavailable for status

openshift/cluster-dns-operator#418: [release-4.13] OCPBUGS-11449: Enable topology-aware hints iff nodes in >=2 zones

openshift/origin#27859: OCPBUGS-11449: [release-4.13] Allow cluster daemonsets to use maxSurge

RHBA-2024:4846 OpenShift Container Platform 4.13.z bug fix update

(4 links to)

Assignee:: Miciah Masters

Reporter:: OpenShift Prow Bot

QA Contact:: Melvin Joseph

Votes:: 0 Vote for this issue

Watchers:: 12 Start watching this issue

Created:: 2023/04/05 6:50 PM

Updated:: 2024/07/31 2:32 PM

Resolved:: 2024/07/31 2:32 PM

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates