Loading...

Type: Bug
Resolution: Done
Priority: Normal
Fix Version/s: None
Affects Version/s: 4.11.0
Component/s: Networking / router
Labels:
None

Activity Type:
Quality / Stability / Reliability
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Story Points:
3
Severity:
None
Regression:
None

Target Backport Versions:
None
Target Version:

4.13.0
Release Blocker:
Rejected
Sprint:
Sprint 227, Sprint 228, Sprint 229, Sprint 230
sprint_count:
4

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

Release Note Status:
None
Release Note Type:
Bug Fix
Release Note Text:

Hide
Cause: The Ingress Operator reported "Degraded=True" in the status conditions for an IngressController when it detected a misscheduled router pod, even if enough properly scheduled router pods were available.

Consequence: The ingress clusteroperator reported "Degraded=True" status even when the default IngressController was not actually degraded.

Fix: The Ingress Operator was changed not to reported "Degraded=True" in an IngressController's status conditions as long as that IngressController has enough available router pods. As part of this change, the "PodsScheduled" status condition on IngressControllers was removed; the information that was previously reported using this status condition is now reported instead in the "DeploymentReplicasMinAvailable" status condition on the IngressController when the minimum number of router pods is not available.

Result: The ingress clusteroperator no longer reports "Degraded=True" status when misscheduled router pods are detected, provided enough router pods are available. When the minimum number of router pods is not available for some IngressController and the cause is that some router pods were misscheduled, this information is now reported in the "DeploymentReplicasMinAvailable" status condition message on that IngressController.

Show
Cause: The Ingress Operator reported "Degraded=True" in the status conditions for an IngressController when it detected a misscheduled router pod, even if enough properly scheduled router pods were available. Consequence: The ingress clusteroperator reported "Degraded=True" status even when the default IngressController was not actually degraded. Fix: The Ingress Operator was changed not to reported "Degraded=True" in an IngressController's status conditions as long as that IngressController has enough available router pods. As part of this change, the "PodsScheduled" status condition on IngressControllers was removed; the information that was previously reported using this status condition is now reported instead in the "DeploymentReplicasMinAvailable" status condition on the IngressController when the minimum number of router pods is not available. Result: The ingress clusteroperator no longer reports "Degraded=True" status when misscheduled router pods are detected, provided enough router pods are available. When the minimum number of router pods is not available for some IngressController and the cause is that some router pods were misscheduled, this information is now reported in the "DeploymentReplicasMinAvailable" status condition message on that IngressController.

Escape Reason:
None
Escape Impact:
None
Corrective Measures:
None
SDLC stage when should've been found:
None

Description of problem:

After the enabling the FIPS in S390x , the ingress controller is repeatedly going into the degraded state. However the observation here is the ingress controller is in running state after a few failure, but it keep recreating the pod and the operator status showing as degraded.

Version-Release number of selected component (if applicable):

OCP Version: 4.11.0-rc.2

How reproducible:

Enable FIPS: True in image-config file

Steps to Reproduce:
1. Enable FIPS: True in image-config file before the installation.
2.
3. oc get co

Actual results:

oc get co

NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE MESSAGE

authentication 4.11.0-rc.2 True False False 7h29m

baremetal 4.11.0-rc.2 True False False 4d12h

cloud-controller-manager 4.11.0-rc.2 True False False 4d12h

cloud-credential 4.11.0-rc.2 True False False 4d12h

cluster-autoscaler 4.11.0-rc.2 True False False 4d12h

config-operator 4.11.0-rc.2 True False False 4d12h

console 4.11.0-rc.2 True False False 4d11h

csi-snapshot-controller 4.11.0-rc.2 True False False 4d12h

dns 4.11.0-rc.2 True False False 4d12h

etcd 4.11.0-rc.2 True False False 4d11h

image-registry 4.11.0-rc.2 True False False 4d11h

ingress 4.11.0-rc.2 True False True 4d11h The "default" ingress controller reports Degraded=True: DegradedConditions: One or more other status conditions indicate a degraded state: PodsScheduled=False (PodsNotScheduled: Some pods are not scheduled: Pod "router-default-84689cdc5f-r87hs" is not yet scheduled: SchedulerError: binding rejected: running Bind plugin "DefaultBinder": Operation cannot be fulfilled on pods/binding "router-default-84689cdc5f-r87hs": pod router-default-84689cdc5f-r87hs is already assigned to node "worker-0.ocp-m1317001.lnxero1.boe" Pod "router-default-84689cdc5f-8z2fh" is not yet scheduled: SchedulerError: binding rejected: running Bind plugin "DefaultBinder": Operation cannot be fulfilled on pods/binding "router-default-84689cdc5f-8z2fh": pod router-default-84689cdc5f-8z2fh is already assigned to node "worker-0.ocp-m1317001.lnxero1.boe" Pod "router-default-84689cdc5f-s7z96" is not yet scheduled: SchedulerError: binding rejected: running Bind plugin "DefaultBinder": Operation cannot be fulfilled on pods/binding "router-default-84689cdc5f-s7z96": pod router-default-84689cdc5f-s7z96 is already assigned to node "worker-0.ocp-m1317001.lnxero1.boe" Pod "router-default-84689cdc5f-hslhn" is not yet scheduled: SchedulerError: binding rejected: running Bind plugin "DefaultBinder": Operation cannot be fulfilled on pods/binding "router-default-84689cdc5f-hslhn": pod router-default-84689cdc5f-hslhn is already assigned to node "worker-0.ocp-m1317001.lnxero1.boe" Pod "router-default-84689cdc5f-nf9vt" is not yet scheduled: SchedulerError: binding rejected: running Bind plugin "DefaultBinder": Operation cannot be fulfilled on pods/binding "router-default-84689cdc5f-nf9vt": pod router-default-84689cdc5f-nf9vt is already assigned to node "worker-0.ocp-m1317001.lnxero1.boe" Pod "router-default-84689cdc5f-mslzf" is not yet scheduled: SchedulerError: binding rejected: running Bind plugin "DefaultBinder": Operation cannot be fulfilled on pods/binding "router-default-84689cdc5f-mslzf": pod router-default-84689cdc5f-mslzf is already assigned to node "worker-0.ocp-m1317001.lnxero1.boe" Pod "router-default-84689cdc5f-mc8th" is not yet scheduled: SchedulerError: binding rejected: running Bind plugin "DefaultBinder": Operation cannot be fulfilled on pods/binding "router-default-84689cdc5f-mc8th": pod router-default-84689cdc5f-mc8th is already assigned to node "worker-0.ocp-m1317001.lnxero1.boe")

insights 4.11.0-rc.2 True False False 4d12h

kube-apiserver 4.11.0-rc.2 True False False 4d11h

kube-controller-manager 4.11.0-rc.2 True False False 4d12h

kube-scheduler 4.11.0-rc.2 True False False 4d12h

kube-storage-version-migrator 4.11.0-rc.2 True False False 4d11h

machine-api 4.11.0-rc.2 True False False 4d12h

machine-approver 4.11.0-rc.2 True False False 4d12h

machine-config 4.11.0-rc.2 True False False 4d12h

marketplace 4.11.0-rc.2 True False False 4d12h

monitoring 4.11.0-rc.2 True False False 4d11h

network 4.11.0-rc.2 True False False 4d12h

node-tuning 4.11.0-rc.2 True False False 4d11h

openshift-apiserver 4.11.0-rc.2 True False False 4d11h

openshift-controller-manager 4.11.0-rc.2 True False False 4d12h

openshift-samples 4.11.0-rc.2 True False False 4d11h

operator-lifecycle-manager 4.11.0-rc.2 True False False 4d12h

operator-lifecycle-manager-catalog 4.11.0-rc.2 True False False 4d12h

operator-lifecycle-manager-packageserver 4.11.0-rc.2 True False False 4d11h

service-ca 4.11.0-rc.2 True False False 4d12h

storage 4.11.0-rc.2 True False False 4d12h

Expected results:

oc get co

NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE MESSAGE

authentication 4.11.0-rc.2 True False False 9d

baremetal 4.11.0-rc.2 True False False 13d

cloud-controller-manager 4.11.0-rc.2 True False False 13d

cloud-credential 4.11.0-rc.2 True False False 13d

cluster-autoscaler 4.11.0-rc.2 True False False 13d

config-operator 4.11.0-rc.2 True False False 13d

console 4.11.0-rc.2 True False False 13d

csi-snapshot-controller 4.11.0-rc.2 True False False 13d

dns 4.11.0-rc.2 True False False 13d

etcd 4.11.0-rc.2 True False False 13d

image-registry 4.11.0-rc.2 True False False 13d

ingress 4.11.0-rc.2 True False False 13d

insights 4.11.0-rc.2 True False False 13d

kube-apiserver 4.11.0-rc.2 True False False 13d

kube-controller-manager 4.11.0-rc.2 True False False 13d

kube-scheduler 4.11.0-rc.2 True False False 13d

kube-storage-version-migrator 4.11.0-rc.2 True False False 13d

machine-api 4.11.0-rc.2 True False False 13d

machine-approver 4.11.0-rc.2 True False False 13d

machine-config 4.11.0-rc.2 True False False 13d

marketplace 4.11.0-rc.2 True False False 13d

monitoring 4.11.0-rc.2 True False False 13d

network 4.11.0-rc.2 True False False 13d

node-tuning 4.11.0-rc.2 True False False 13d

openshift-apiserver 4.11.0-rc.2 True False False 13d

openshift-controller-manager 4.11.0-rc.2 True False False 13d

openshift-samples 4.11.0-rc.2 True False False 13d

operator-lifecycle-manager 4.11.0-rc.2 True False False 13d

operator-lifecycle-manager-catalog 4.11.0-rc.2 True False False 13d

operator-lifecycle-manager-packageserver 4.11.0-rc.2 True False False 13d

service-ca 4.11.0-rc.2 True False False 13d

storage 4.11.0-rc.2 True False False 13d

Additional info:

Attached the Running ingress controller logs.

The failed ingress controller pod is repeatedly creating in openshift-ingress namespaces.

looks like two ingress controller pod is in running state, but the other failed pods were not cleaned up. So manually delete the failed pods fixed the issue.

oc get pods -n openshift-ingress | wc -l

451

oc get pods -n openshift-ingress | grep Running

router-default-84689cdc5f-9j44t 1/1 Running 4 (4d12h ago) 4d12h

router-default-84689cdc5f-qn4gh 1/1 Running 3 (4d12h ago) 4d12h

oc get pods -n openshift-ingress | grep -v Running | wc -l

449

- - Sort By Name
  - Sort By Date
  - Ascending
  - Descending
  - Thumbnails
  - List
  - Download All

logs_f_router-default
8 kB
2022/08/23 6:23 AM

links to

openshift/cluster-ingress-operator#854: OCPBUGS-434: Absorb PodsScheduled condition into MinAvailable

Details

Description

Attachments

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates

Hide