-
Bug
-
Resolution: Done-Errata
-
Undefined
-
None
-
4.17.z, 4.16.z, 4.18.z, 4.19
-
None
-
Quality / Stability / Reliability
-
False
-
-
None
-
Moderate
-
Yes
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
Description of problem:
Although there is no longer a panic and the CGU works as expected applying policies in batches, the status is reported differently and in a misleading way. It appears that the CurrentBatchRemediationProgress gets set to nil when the CGU is completed and the conditions now show Completed instead of TimedOut even when the first batch did time out. It also shows a message saying all clusters are compliant with all policies, even if this is untrue.
Version-Release number of selected component (if applicable):
showed in TALM from brew versions v4.19.0-38, v4.18.1-14, and v4.16.4-9 in CI
How reproducible:
always
Steps to Reproduce:
1. Create CGU with max concurrency of 1 and two clusters where the first cluster is powered off 2. Wait for the first cluster/batch to time out and the second cluster/batch to succeed. 3. Check conditions/run oc get on the CGU and notice that it says Completed with message "All clusters are compliant with all the managed policies" even though .status.clusters[0].currentPolicy.status is NonCompliant
Actual results:
Condition type Succeeded has reason Completed
Expected results:
Condition type Succeeded has reason TimedOut
Additional info:
[klaskosk@klaskosk-thinkpadp1gen3 ~]$ KUBECONFIG=~/kniqe16-kubeconfig oc get cgu -n talm-test talm-cgu NAME AGE STATE DETAILS talm-cgu 11m Completed All clusters are compliant with all the managed policies
Here's the oc get output to show how deceiving it appears, also a google drive with the logs collected.
- blocks
-
OCPBUGS-54988 CGU says completed and all clusters compliant when first batch times out
-
- Verified
-
- is cloned by
-
OCPBUGS-54988 CGU says completed and all clusters compliant when first batch times out
-
- Verified
-
- is related to
-
OCPBUGS-54348 TALM Soak Annotation evaluates single "FirstCompliantAt" per CGU instead of per Policy
-
- Closed
-
-
OCPBUGS-54738 CGU second batch fails when one cluster powered off regression
-
- Closed
-
- links to
-
RHEA-2025:146889 OpenShift Container Platform 4.16.4 CNF vRAN extras update
- mentioned on