Loading...

XML

Word

Printable

Type: Bug
Resolution: Cannot Reproduce
Priority: Critical
Fix Version/s: None
Affects Version/s: 4.15.0
Component/s: Performance Addon Operator
Labels:

Activity Type:
Quality / Stability / Reliability
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Story Points:
None
Severity:
None
Regression:
No
Latest Status Summary:
2024-03-11: Must gather logs did not show an issue, asked for new logs if possible. Similar issue linked to this bug. Will followup with a live deploy session on the environment if no evidence comes up

Target Backport Versions:
None
Target Version:
None
Release Blocker:
None
Sprint:
CNF Compute Sprint 250, CNF Compute Sprint 251
sprint_count:
2

Internal Whiteboard:
RH Private Keywords:

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

Release Note Status:
None
Release Note Type:
None
Release Note Text:
None

Escape Reason:
None
Escape Impact:
None
Corrective Measures:
None
SDLC stage when should've been found:
None

Description of problem:

Fails to apply performanceprofile, after complete cluster nodes reboot to downgrad cgroup version from v2 to v1, first node stuck on SchedulingDisabled.
There are two types of failures observerd:
  - first, and most common, relevant mcp stuck on pause and node stuck on Ready,SchedulingDisabled. On first reboot that had to reduce cgroup version from the v2 to v1.
  - second, node rebooted (additional reboot to apply changes), changes aplayed and node is stuck on NotReady,SchedulingDisabled in Updating state forever.

Version-Release number of selected component (if applicable):

    4.15.0 (GA)

How reproducible:

    always

Steps to Reproduce:

    1. deploy disconnected cluster
    2. apply or create performanceprofile config

    3.1 wait for the relevant mcp node will change state to the Ready,SchedulingDisabled     

    3.2. wait for the complete cluster reboot
    4. wait for another reboot only for the relevant mcp to apply pp config on nodes

Actual results:

    first case: relevant mcp node stuck on Ready,SchedulingDisabled  with paused mcp; no reboot. 
    second case: node stuck on NotReady,SchedulingDisabled in Updating stage

Expected results:

    relevant mcp nodes reboot, pp config applied to the nodes, cgroup downgraded to the v1

Additional info:

    For the first case: manual un-pausing mcp doesn't give any result, node stay stuck on Ready,SchedulingDisabled state, mcp stuck in Updating state forever. All nodes belongs to this mcp didn't change cgroup to the v1.


Logs: must gather for the both cases can be found at https://file.emea.redhat.com/~elgerman/OCPBUGS-30064/
(each log was collected on the clean/new ocp deployment)

Assignee:: Yanir Quinn

Reporter:: Elena German

Need Info From:: None

Contributors:: None

QA Contact:: Gowrishankar Rajaiyan

Doc Contact:: None

Votes:: 0 Vote for this issue

Watchers:: 6 Start watching this issue

Created:: 2024/02/28 10:37 PM

Updated:: 2025/07/23 11:46 AM

Resolved:: 2024/03/17 8:25 PM

Details

Description

Attachments

Easy Agile Planning Poker

Activity

People

Dates