Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-434

After FIPS enabled in S390X, ingress controller in degraded state

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Done
    • Icon: Normal Normal
    • None
    • 4.11.0
    • Networking / router
    • None
    • None
    • 3
    • Sprint 227, Sprint 228, Sprint 229, Sprint 230
    • 4
    • Rejected
    • False
    • Hide

      None

      Show
      None
    • Hide
      Cause: The Ingress Operator reported "Degraded=True" in the status conditions for an IngressController when it detected a misscheduled router pod, even if enough properly scheduled router pods were available.

      Consequence: The ingress clusteroperator reported "Degraded=True" status even when the default IngressController was not actually degraded.

      Fix: The Ingress Operator was changed not to reported "Degraded=True" in an IngressController's status conditions as long as that IngressController has enough available router pods. As part of this change, the "PodsScheduled" status condition on IngressControllers was removed; the information that was previously reported using this status condition is now reported instead in the "DeploymentReplicasMinAvailable" status condition on the IngressController when the minimum number of router pods is not available.

      Result: The ingress clusteroperator no longer reports "Degraded=True" status when misscheduled router pods are detected, provided enough router pods are available. When the minimum number of router pods is not available for some IngressController and the cause is that some router pods were misscheduled, this information is now reported in the "DeploymentReplicasMinAvailable" status condition message on that IngressController.
      Show
      Cause: The Ingress Operator reported "Degraded=True" in the status conditions for an IngressController when it detected a misscheduled router pod, even if enough properly scheduled router pods were available. Consequence: The ingress clusteroperator reported "Degraded=True" status even when the default IngressController was not actually degraded. Fix: The Ingress Operator was changed not to reported "Degraded=True" in an IngressController's status conditions as long as that IngressController has enough available router pods. As part of this change, the "PodsScheduled" status condition on IngressControllers was removed; the information that was previously reported using this status condition is now reported instead in the "DeploymentReplicasMinAvailable" status condition on the IngressController when the minimum number of router pods is not available. Result: The ingress clusteroperator no longer reports "Degraded=True" status when misscheduled router pods are detected, provided enough router pods are available. When the minimum number of router pods is not available for some IngressController and the cause is that some router pods were misscheduled, this information is now reported in the "DeploymentReplicasMinAvailable" status condition message on that IngressController.
    • Bug Fix

      Description of problem:

      After the enabling the FIPS in S390x , the ingress controller is repeatedly going into the degraded state. However the observation here is the ingress controller is in running state after a few failure, but it keep recreating the pod and the operator status showing as degraded.

      Version-Release number of selected component (if applicable):

      OCP Version: 4.11.0-rc.2

      How reproducible:

      Enable FIPS: True in image-config file 

      Steps to Reproduce:
      1. Enable FIPS: True in image-config file before the installation.
      2.
      3. oc get co

      Actual results:

       oc get co

      NAME                                       VERSION       AVAILABLE   PROGRESSING   DEGRADED   SINCE   MESSAGE

      authentication                             4.11.0-rc.2   True        False         False      7h29m   

      baremetal                                  4.11.0-rc.2   True        False         False      4d12h   

      cloud-controller-manager                   4.11.0-rc.2   True        False         False      4d12h   

      cloud-credential                           4.11.0-rc.2   True        False         False      4d12h   

      cluster-autoscaler                         4.11.0-rc.2   True        False         False      4d12h   

      config-operator                            4.11.0-rc.2   True        False         False      4d12h   

      console                                    4.11.0-rc.2   True        False         False      4d11h   

      csi-snapshot-controller                    4.11.0-rc.2   True        False         False      4d12h   

      dns                                        4.11.0-rc.2   True        False         False      4d12h   

      etcd                                       4.11.0-rc.2   True        False         False      4d11h   

      image-registry                             4.11.0-rc.2   True        False         False      4d11h   

      ingress                                    4.11.0-rc.2   True        False         True       4d11h   The "default" ingress controller reports Degraded=True: DegradedConditions: One or more other status conditions indicate a degraded state: PodsScheduled=False (PodsNotScheduled: Some pods are not scheduled: Pod "router-default-84689cdc5f-r87hs" is not yet scheduled: SchedulerError: binding rejected: running Bind plugin "DefaultBinder": Operation cannot be fulfilled on pods/binding "router-default-84689cdc5f-r87hs": pod router-default-84689cdc5f-r87hs is already assigned to node "worker-0.ocp-m1317001.lnxero1.boe" Pod "router-default-84689cdc5f-8z2fh" is not yet scheduled: SchedulerError: binding rejected: running Bind plugin "DefaultBinder": Operation cannot be fulfilled on pods/binding "router-default-84689cdc5f-8z2fh": pod router-default-84689cdc5f-8z2fh is already assigned to node "worker-0.ocp-m1317001.lnxero1.boe" Pod "router-default-84689cdc5f-s7z96" is not yet scheduled: SchedulerError: binding rejected: running Bind plugin "DefaultBinder": Operation cannot be fulfilled on pods/binding "router-default-84689cdc5f-s7z96": pod router-default-84689cdc5f-s7z96 is already assigned to node "worker-0.ocp-m1317001.lnxero1.boe" Pod "router-default-84689cdc5f-hslhn" is not yet scheduled: SchedulerError: binding rejected: running Bind plugin "DefaultBinder": Operation cannot be fulfilled on pods/binding "router-default-84689cdc5f-hslhn": pod router-default-84689cdc5f-hslhn is already assigned to node "worker-0.ocp-m1317001.lnxero1.boe" Pod "router-default-84689cdc5f-nf9vt" is not yet scheduled: SchedulerError: binding rejected: running Bind plugin "DefaultBinder": Operation cannot be fulfilled on pods/binding "router-default-84689cdc5f-nf9vt": pod router-default-84689cdc5f-nf9vt is already assigned to node "worker-0.ocp-m1317001.lnxero1.boe" Pod "router-default-84689cdc5f-mslzf" is not yet scheduled: SchedulerError: binding rejected: running Bind plugin "DefaultBinder": Operation cannot be fulfilled on pods/binding "router-default-84689cdc5f-mslzf": pod router-default-84689cdc5f-mslzf is already assigned to node "worker-0.ocp-m1317001.lnxero1.boe" Pod "router-default-84689cdc5f-mc8th" is not yet scheduled: SchedulerError: binding rejected: running Bind plugin "DefaultBinder": Operation cannot be fulfilled on pods/binding "router-default-84689cdc5f-mc8th": pod router-default-84689cdc5f-mc8th is already assigned to node "worker-0.ocp-m1317001.lnxero1.boe")

      insights                                   4.11.0-rc.2   True        False         False      4d12h   

      kube-apiserver                             4.11.0-rc.2   True        False         False      4d11h   

      kube-controller-manager                    4.11.0-rc.2   True        False         False      4d12h   

      kube-scheduler                             4.11.0-rc.2   True        False         False      4d12h   

      kube-storage-version-migrator              4.11.0-rc.2   True        False         False      4d11h   

      machine-api                                4.11.0-rc.2   True        False         False      4d12h   

      machine-approver                           4.11.0-rc.2   True        False         False      4d12h   

      machine-config                             4.11.0-rc.2   True        False         False      4d12h   

      marketplace                                4.11.0-rc.2   True        False         False      4d12h   

      monitoring                                 4.11.0-rc.2   True        False         False      4d11h   

      network                                    4.11.0-rc.2   True        False         False      4d12h   

      node-tuning                                4.11.0-rc.2   True        False         False      4d11h   

      openshift-apiserver                        4.11.0-rc.2   True        False         False      4d11h   

      openshift-controller-manager               4.11.0-rc.2   True        False         False      4d12h   

      openshift-samples                          4.11.0-rc.2   True        False         False      4d11h   

      operator-lifecycle-manager                 4.11.0-rc.2   True        False         False      4d12h   

      operator-lifecycle-manager-catalog         4.11.0-rc.2   True        False         False      4d12h   

      operator-lifecycle-manager-packageserver   4.11.0-rc.2   True        False         False      4d11h   

      service-ca                                 4.11.0-rc.2   True        False         False      4d12h   

      storage                                    4.11.0-rc.2   True        False         False      4d12h   

       

      Expected results:

      oc get co

      NAME                                       VERSION       AVAILABLE   PROGRESSING   DEGRADED   SINCE   MESSAGE

      authentication                             4.11.0-rc.2   True        False         False      9d      

      baremetal                                  4.11.0-rc.2   True        False         False      13d     

      cloud-controller-manager                   4.11.0-rc.2   True        False         False      13d     

      cloud-credential                           4.11.0-rc.2   True        False         False      13d     

      cluster-autoscaler                         4.11.0-rc.2   True        False         False      13d     

      config-operator                            4.11.0-rc.2   True        False         False      13d     

      console                                    4.11.0-rc.2   True        False         False      13d     

      csi-snapshot-controller                    4.11.0-rc.2   True        False         False      13d     

      dns                                        4.11.0-rc.2   True        False         False      13d     

      etcd                                       4.11.0-rc.2   True        False         False      13d     

      image-registry                             4.11.0-rc.2   True        False         False      13d     

      ingress                                    4.11.0-rc.2   True        False         False      13d     

      insights                                   4.11.0-rc.2   True        False         False      13d     

      kube-apiserver                             4.11.0-rc.2   True        False         False      13d     

      kube-controller-manager                    4.11.0-rc.2   True        False         False      13d     

      kube-scheduler                             4.11.0-rc.2   True        False         False      13d     

      kube-storage-version-migrator              4.11.0-rc.2   True        False         False      13d     

      machine-api                                4.11.0-rc.2   True        False         False      13d     

      machine-approver                           4.11.0-rc.2   True        False         False      13d     

      machine-config                             4.11.0-rc.2   True        False         False      13d     

      marketplace                                4.11.0-rc.2   True        False         False      13d     

      monitoring                                 4.11.0-rc.2   True        False         False      13d     

      network                                    4.11.0-rc.2   True        False         False      13d     

      node-tuning                                4.11.0-rc.2   True        False         False      13d     

      openshift-apiserver                        4.11.0-rc.2   True        False         False      13d     

      openshift-controller-manager               4.11.0-rc.2   True        False         False      13d     

      openshift-samples                          4.11.0-rc.2   True        False         False      13d     

      operator-lifecycle-manager                 4.11.0-rc.2   True        False         False      13d     

      operator-lifecycle-manager-catalog         4.11.0-rc.2   True        False         False      13d     

      operator-lifecycle-manager-packageserver   4.11.0-rc.2   True        False         False      13d     

      service-ca                                 4.11.0-rc.2   True        False         False      13d     

      storage                                    4.11.0-rc.2   True        False         False      13d     

       

      Additional info:

      Attached the Running ingress controller logs.

      The failed ingress controller pod is repeatedly creating in openshift-ingress namespaces.

      looks like two ingress controller pod is in running state, but the other failed pods were not cleaned up. So manually delete the failed pods fixed the issue.

       

      1. oc get pods -n openshift-ingress | wc -l

      451

       

      1. oc get pods -n openshift-ingress | grep Running

      router-default-84689cdc5f-9j44t   1/1     Running     4 (4d12h ago)   4d12h

      router-default-84689cdc5f-qn4gh   1/1     Running     3 (4d12h ago)   4d12h

       

      1. oc get pods -n openshift-ingress | grep -v Running | wc -l

      449

              mmasters1@redhat.com Miciah Masters
              mapillai Madhu Pillai
              Melvin Joseph Melvin Joseph
              Votes:
              0 Vote for this issue
              Watchers:
              10 Start watching this issue

                Created:
                Updated:
                Resolved: