Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-76370

ingress-canary DaemonSet no longer honors IngressController tolerations in compact clusters after upgrade to OCP 4.20

    • None
    • False
    • Hide

      None

      Show
      None
    • None
    • Critical
    • None
    • None
    • Rejected
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      Description of problem:

      After upgrading from OpenShift 4.19.22 to 4.20.12, the ingress cluster operator reports Degraded=True due to CanaryChecksSucceeding=False. This occurs because the ingress-canary DaemonSet has desired=0 and does not schedule any pods.

      The affected environment is a supported compact cluster topology where all nodes are control-plane and unschedulable, with infra labels and taints as documented in KCS 7109585.

      In previous versions (4.17–4.19), the DaemonSet correctly inherited tolerations defined in the IngressController.spec.nodePlacement.tolerations section (e.g.,

      { operator: Exists }

      ), allowing pods to run on tainted infra/master nodes.

      In 4.20.12, the ingress-canary DaemonSet appears to use only a fixed toleration (key: node-role.kubernetes.io/infra) and no longer respects custom tolerations defined by the user, resulting in no schedulable nodes and ingress operator degradation.

      This breaks compatibility with previously working and supported cluster architectures.

      Version-Release number of selected component (if applicable):
      4.20.12

      How reproducible:

      The issue consistently occurs after upgrading from OpenShift 4.19.x to 4.20.12 in compact clusters where all nodes are control-plane (master), unschedulable, and tainted, with the IngressController configured to use custom tolerations (e.g., operator: Exists).

      Steps to Reproduce:

      1. Deploy an OpenShift 4.19.x cluster using a compact topology:

      • All nodes are control-plane (master), tainted as unschedulable.
      • Nodes are labeled as infra: node-role.kubernetes.io/infra=""
      • IngressController is configured with tolerations:
        spec:
        nodePlacement:
        tolerations:
      • operator: Exists

      2. Confirm that the ingress-canary DaemonSet schedules pods and ingress operator is healthy.

      3. Upgrade the cluster to OpenShift 4.20.12.

      4. Observe that:

      • The ingress cluster operator becomes Degraded with `CanaryChecksSucceeding=False`.
      • The ingress-canary DaemonSet shows desired/current pods = 0.
      • The toleration from IngressController is no longer respected.
      • No pods are scheduled in `openshift-ingress-canary` namespace.

      Actual results:

      After upgrading to OpenShift 4.20.12, the ingress-canary DaemonSet no longer schedules any pods, and its desired/current count is 0. The ingress cluster operator reports Degraded=True with CanaryChecksSucceeding=False. The custom tolerations defined in the IngressController are ignored, and only a fixed toleration (key=node-role.kubernetes.io/infra) is present in the DaemonSet spec. This causes the canary route health checks to fail, despite the cluster functioning correctly otherwise.

      Expected results:

      The ingress-canary DaemonSet should inherit and apply the tolerations defined in the IngressController's spec.nodePlacement.tolerations. In a compact cluster with properly tainted infra/control-plane nodes, the DaemonSet should schedule pods accordingly. As in previous OpenShift versions (4.17–4.19), this would ensure the ingress operator remains healthy and canary checks succeed.

      Additional info:
      https://access.redhat.com/solutions/7109585

              rfredett@redhat.com Ryan Fredette
              rhn-support-ravellan Ronald Avellaneda
              None
              None
              Anurag Saxena Anurag Saxena
              None
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

                Created:
                Updated: