Loading...

XML

Word

Printable

Type: Bug
Resolution: Unresolved
Priority: Normal
Fix Version/s: None
Affects Version/s: 4.18
Component/s: kube-scheduler
Labels:
None

Activity Type:
Quality / Stability / Reliability
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Story Points:
None
Severity:
None
Regression:
None

Target Backport Versions:
None
Target Version:
None
Release Blocker:
None
Sprint:
None

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

PX Impact Score:

Release Note Status:
None
Release Note Type:
None
Release Note Text:
None

Escape Reason:
None
Escape Impact:
None
Corrective Measures:
None
SDLC stage when should've been found:
None

Description of problem:

    In an effort to ensure all HA components are not degraded by design during normal e2e test or upgrades, we are collecting all operators that are blipping Degraded=True during any payload job run.

This card captures kube-scheduler operator that blips Degraded=True during upgrade runs.


Example Job: https://prow.ci.openshift.org/view/gs/test-platform-results/logs/periodic-ci-openshift-release-master-ci-4.18-upgrade-from-stable-4.17-e2e-azure-ovn-upgrade/1843275894844559360
  
Reasons associated with the blip: NodeController_MasterNodesReady

For now, we put an exception in the test. But it is expected that teams take action to fix those and remove the exceptions after the fix go in.

See linked issue for more explanation on the effort.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

Found a new reason, job example:

: [Monitor:legacy-cvo-invariants][bz-kube-scheduler] clusteroperator/kube-scheduler should not change condition/Degraded expand_less2h2m5s{  2 unexpected clusteroperator state transitions during e2e test run.  These did not match any known exceptions, so they cause this test-case to fail:

Nov 20 07:12:46.091 E clusteroperator/kube-scheduler condition/Degraded reason/GuardController_SyncError::NodeController_MasterNodesReady status/True GuardControllerDegraded: Unable to apply pod openshift-kube-scheduler-guard-ip-10-0-121-171.us-east-2.compute.internal changes: Operation cannot be fulfilled on pods "openshift-kube-scheduler-guard-ip-10-0-121-171.us-east-2.compute.internal": the object has been modified; please apply your changes to the latest version and try again\nNodeControllerDegraded: The master nodes not ready: node "ip-10-0-121-171.us-east-2.compute.internal" not ready since 2025-11-20 07:12:24 +0000 UTC because KubeletNotReady (container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: no CNI configuration file in /etc/kubernetes/cni/net.d/. Has your network provider started?)
Nov 20 07:12:46.091 - 2s    E clusteroperator/kube-scheduler condition/Degraded reason/GuardController_SyncError::NodeController_MasterNodesReady status/True GuardControllerDegraded: Unable to apply pod openshift-kube-scheduler-guard-ip-10-0-121-171.us-east-2.compute.internal changes: Operation cannot be fulfilled on pods "openshift-kube-scheduler-guard-ip-10-0-121-171.us-east-2.compute.internal": the object has been modified; please apply your changes to the latest version and try again\nNodeControllerDegraded: The master nodes not ready: node "ip-10-0-121-171.us-east-2.compute.internal" not ready since 2025-11-20 07:12:24 +0000 UTC because KubeletNotReady (container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: no CNI configuration file in /etc/kubernetes/cni/net.d/. Has your network provider started?)

7 unwelcome but acceptable clusteroperator state transitions during e2e test run.  These should not happen, but because they are tied to exceptions, the fact that they did happen is not sufficient to cause this test-case to fail:

Nov 20 07:06:07.804 E clusteroperator/kube-scheduler condition/Degraded reason/NodeController_MasterNodesReady status/True NodeControllerDegraded: The master nodes not ready: node "ip-10-0-120-40.us-east-2.compute.internal" not ready since 2025-11-20 07:06:05 +0000 UTC because KubeletNotReady (container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: no CNI configuration file in /etc/kubernetes/cni/net.d/. Has your network provider started?) (exception: https://issues.redhat.com/browse/OCPBUGS-38663)
Nov 20 07:06:07.804 - 24s   E clusteroperator/kube-scheduler condition/Degraded reason/NodeController_MasterNodesReady status/True NodeControllerDegraded: The master nodes not ready: node "ip-10-0-120-40.us-east-2.compute.internal" not ready since 2025-11-20 07:06:05 +0000 UTC because KubeletNotReady (container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: no CNI configuration file in /etc/kubernetes/cni/net.d/. Has your network provider started?) (exception: https://issues.redhat.com/browse/OCPBUGS-38663)
Nov 20 07:06:32.501 W clusteroperator/kube-scheduler condition/Degraded reason/AsExpected status/False NodeControllerDegraded: All master nodes are ready (exception: Degraded=False is the happy case)
Nov 20 07:12:48.697 W clusteroperator/kube-scheduler condition/Degraded reason/AsExpected status/False NodeControllerDegraded: All master nodes are ready\nGuardControllerDegraded: Unable to apply pod openshift-kube-scheduler-guard-ip-10-0-121-171.us-east-2.compute.internal changes: Operation cannot be fulfilled on pods "openshift-kube-scheduler-guard-ip-10-0-121-171.us-east-2.compute.internal": the object has been modified; please apply your changes to the latest version and try again (exception: Degraded=False is the happy case)
Nov 20 07:18:03.320 E clusteroperator/kube-scheduler condition/Degraded reason/NodeController_MasterNodesReady status/True NodeControllerDegraded: The master nodes not ready: node "ip-10-0-7-90.us-east-2.compute.internal" not ready since 2025-11-20 07:17:57 +0000 UTC because KubeletNotReady (container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: no CNI configuration file in /etc/kubernetes/cni/net.d/. Has your network provider started?) (exception: https://issues.redhat.com/browse/OCPBUGS-38663)
Nov 20 07:18:03.320 - 17s   E clusteroperator/kube-scheduler condition/Degraded reason/NodeController_MasterNodesReady status/True NodeControllerDegraded: The master nodes not ready: node "ip-10-0-7-90.us-east-2.compute.internal" not ready since 2025-11-20 07:17:57 +0000 UTC because KubeletNotReady (container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: no CNI configuration file in /etc/kubernetes/cni/net.d/. Has your network provider started?) (exception: https://issues.redhat.com/browse/OCPBUGS-38663)
Nov 20 07:18:20.883 W clusteroperator/kube-scheduler condition/Degraded reason/AsExpected status/False NodeControllerDegraded: All master nodes are ready (exception: Degraded=False is the happy case)
}

relates to

TRT-1578 Ensure all HA components are not degraded by design during upgrades

Assignee:: Workloads Team Bot Account

Reporter:: Ken Zhang

Need Info From:: None

Contributors:: None

QA Contact:: None

Doc Contact:: None

Votes:: 0 Vote for this issue

Watchers:: 5 Start watching this issue

Created:: 2024/08/19 5:16 PM

Updated:: 2025/11/21 9:01 PM

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates