-
Bug
-
Resolution: Unresolved
-
Critical
-
4.20
Description of problem:
After upgrading from OpenShift 4.19.22 to 4.20.12, the ingress cluster operator reports Degraded=True due to CanaryChecksSucceeding=False. This occurs because the ingress-canary DaemonSet has desired=0 and does not schedule any pods.
The affected environment is a supported compact cluster topology where all nodes are control-plane and unschedulable, with infra labels and taints as documented in KCS 7109585.
In previous versions (4.17–4.19), the DaemonSet correctly inherited tolerations defined in the IngressController.spec.nodePlacement.tolerations section (e.g.,
{ operator: Exists }), allowing pods to run on tainted infra/master nodes.
In 4.20.12, the ingress-canary DaemonSet appears to use only a fixed toleration (key: node-role.kubernetes.io/infra) and no longer respects custom tolerations defined by the user, resulting in no schedulable nodes and ingress operator degradation.
This breaks compatibility with previously working and supported cluster architectures.
Version-Release number of selected component (if applicable):
4.20.12
How reproducible:
The issue consistently occurs after upgrading from OpenShift 4.19.x to 4.20.12 in compact clusters where all nodes are control-plane (master), unschedulable, and tainted, with the IngressController configured to use custom tolerations (e.g., operator: Exists).
Steps to Reproduce:
1. Deploy an OpenShift 4.19.x cluster using a compact topology:
- All nodes are control-plane (master), tainted as unschedulable.
- Nodes are labeled as infra: node-role.kubernetes.io/infra=""
- IngressController is configured with tolerations:
spec:
nodePlacement:
tolerations: - operator: Exists
2. Confirm that the ingress-canary DaemonSet schedules pods and ingress operator is healthy.
3. Upgrade the cluster to OpenShift 4.20.12.
4. Observe that:
- The ingress cluster operator becomes Degraded with `CanaryChecksSucceeding=False`.
- The ingress-canary DaemonSet shows desired/current pods = 0.
- The toleration from IngressController is no longer respected.
- No pods are scheduled in `openshift-ingress-canary` namespace.
Actual results:
After upgrading to OpenShift 4.20.12, the ingress-canary DaemonSet no longer schedules any pods, and its desired/current count is 0. The ingress cluster operator reports Degraded=True with CanaryChecksSucceeding=False. The custom tolerations defined in the IngressController are ignored, and only a fixed toleration (key=node-role.kubernetes.io/infra) is present in the DaemonSet spec. This causes the canary route health checks to fail, despite the cluster functioning correctly otherwise.
Expected results:
The ingress-canary DaemonSet should inherit and apply the tolerations defined in the IngressController's spec.nodePlacement.tolerations. In a compact cluster with properly tainted infra/control-plane nodes, the DaemonSet should schedule pods accordingly. As in previous OpenShift versions (4.17–4.19), this would ensure the ingress operator remains healthy and canary checks succeed.
Additional info:
https://access.redhat.com/solutions/7109585