-
Bug
-
Resolution: Not a Bug
-
Undefined
-
None
-
4.13.z
-
Quality / Stability / Reliability
-
False
-
-
None
-
Important
-
No
-
None
-
None
-
None
-
CNF RAN Sprint 238, CNF RAN Sprint 239, CNF RAN Sprint 240
-
3
-
None
-
None
-
None
-
None
-
None
-
None
-
None
Description of problem:
When multiple clusters are specified in a CGU, if one cluster is offline policies will not be remediated on the operational cluster
Version-Release number of selected component (if applicable):
TALM 4.13.1, TALM 4.12.4
How reproducible:
Always
Steps to Reproduce:
1. Configure a hub cluster with two managed clusters. 2. Create a CGU which: - Specifies both clusters. - Concurrency = 1 - Timeout = 9 3. Power-off first cluster in specified list. 4. Enable CGU
Actual results:
GGU Times Out on both clusters.
Expected results:
CGU times out on first cluster. Second cluster completes successfully before the CGU times out.
Additional info:
Hub logs, cluster config can be found here: https://drive.google.com/drive/folders/1fFIeUO9X6h-o9OTGtAQc87ptT89sAFsh?usp=sharing
This happens consistently in CI automation. Running the same automated test as a one-off outside of CI gives inconsistent results with some passes. --- Printing CGU spec - talm-test: generated-cgu-multi-spokes-one-unavailable :
backup: false
precaching: false
enable: true
clusters:
- worker-0
- worker-1
clusterselector: []
clusterlabelselectors: []
remediationstrategy:
canaries: []
maxconcurrency: 1
timeout: 9
managedpolicies:
- generated-policy-multi-spokes-one-unavailable
blockingcrs: []
actions:
beforeenable:
addclusterlabels: {}
deleteclusterlabels: {}
aftercompletion:
addclusterlabels:
talmcomplete: ""
deleteclusterlabels: {}
deleteobjects: true
batchtimeoutaction: ""
--- Printing CGU status - talm-test: generated-cgu-multi-spokes-one-unavailable :
placementbindings: []
placementrules: []
copiedpolicies: []
conditions:
- type: ClustersSelected
status: "True"
observedgeneration: 0
lasttransitiontime: "2023-06-17T01:45:37-04:00"
reason: ClusterSelectionCompleted
message: All selected clusters are valid
- type: Validated
status: "True"
observedgeneration: 0
lasttransitiontime: "2023-06-17T01:45:37-04:00"
reason: ValidationCompleted
message: Completed validation
- type: Progressing
status: "False"
observedgeneration: 0
lasttransitiontime: "2023-06-17T01:55:37-04:00"
reason: TimedOut
message: Policy remediation took too long
- type: Succeeded
status: "False"
observedgeneration: 0
lasttransitiontime: "2023-06-17T01:55:37-04:00"
reason: TimedOut
message: Policy remediation took too long
remediationplan:
- - worker-0
- - worker-1
managedpoliciesns:
generated-policy-multi-spokes-one-unavailable: talm-test
saferesourcenames: {}
managedpoliciesforupgrade:
- name: generated-policy-multi-spokes-one-unavailable
namespace: talm-test
managedpoliciescompliantbeforeupgrade: []
managedpoliciescontent: {}
clusters:
- name: worker-0
state: timedout
currentpolicy:
name: generated-policy-multi-spokes-one-unavailable
status: NonCompliant
- name: worker-1
state: timedout
currentpolicy:
name: generated-policy-multi-spokes-one-unavailable
status: NonCompliant
status:
startedat: "2023-06-17T01:45:37-04:00"
completedat: "2023-06-17T01:55:37-04:00"
currentbatch: 0
currentbatchstartedat: "0001-01-01T00:00:00Z"
currentbatchremediationprogress: {}
precaching: null
backup: null
computedmaxconcurrency: 1