-
Bug
-
Resolution: Cannot Reproduce
-
Critical
-
None
-
4.15.0
-
No
-
CNF Compute Sprint 250, CNF Compute Sprint 251
-
2
-
False
-
-
-
2024-03-11: Must gather logs did not show an issue, asked for new logs if possible. Similar issue linked to this bug. Will followup with a live deploy session on the environment if no evidence comes up
-
Description of problem:
Fails to apply performanceprofile, after complete cluster nodes reboot to downgrad cgroup version from v2 to v1, first node stuck on SchedulingDisabled. There are two types of failures observerd: - first, and most common, relevant mcp stuck on pause and node stuck on Ready,SchedulingDisabled. On first reboot that had to reduce cgroup version from the v2 to v1. - second, node rebooted (additional reboot to apply changes), changes aplayed and node is stuck on NotReady,SchedulingDisabled in Updating state forever.
Version-Release number of selected component (if applicable):
4.15.0 (GA)
How reproducible:
always
Steps to Reproduce:
1. deploy disconnected cluster 2. apply or create performanceprofile config 3.1 wait for the relevant mcp node will change state to the Ready,SchedulingDisabled 3.2. wait for the complete cluster reboot 4. wait for another reboot only for the relevant mcp to apply pp config on nodes
Actual results:
first case: relevant mcp node stuck on Ready,SchedulingDisabled with paused mcp; no reboot. second case: node stuck on NotReady,SchedulingDisabled in Updating stage
Expected results:
relevant mcp nodes reboot, pp config applied to the nodes, cgroup downgraded to the v1
Additional info:
For the first case: manual un-pausing mcp doesn't give any result, node stay stuck on Ready,SchedulingDisabled state, mcp stuck in Updating state forever. All nodes belongs to this mcp didn't change cgroup to the v1. Logs: must gather for the both cases can be found at https://file.emea.redhat.com/~elgerman/OCPBUGS-30064/ (each log was collected on the clean/new ocp deployment)