Loading...

XML

Word

Printable

Type: Bug
Resolution: Not a Bug
Priority: Undefined
Fix Version/s: None
Affects Version/s: 4.13.z
Component/s: TALM Operator
Labels:
- cnf-vran:talm
- telco-priority-3

Severity:
Important
Regression:
No
Sprint:
CNF RAN Sprint 238, CNF RAN Sprint 239, CNF RAN Sprint 240
sprint_count:
3
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Internal Whiteboard:

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

Description of problem:

When multiple clusters are specified in a CGU, if one cluster is offline policies will not be remediated on the operational cluster

Version-Release number of selected component (if applicable):

TALM 4.13.1, TALM 4.12.4

How reproducible:

Always

Steps to Reproduce:

1. Configure a hub cluster with two managed clusters.
2. Create a CGU which:
   - Specifies both clusters. 
   - Concurrency = 1
   - Timeout = 9
3. Power-off first cluster in specified list.
4. Enable CGU

Actual results:

GGU Times Out on both clusters.

Expected results:

CGU times out on first cluster. Second cluster completes successfully before the CGU times out.

Additional info:

Hub logs, cluster config can be found here: https://drive.google.com/drive/folders/1fFIeUO9X6h-o9OTGtAQc87ptT89sAFsh?usp=sharing

This happens consistently in CI automation. Running the same automated test as a one-off outside of CI gives inconsistent results with some passes.   --- Printing CGU spec - talm-test: generated-cgu-multi-spokes-one-unavailable :
backup: false
precaching: false
enable: true
clusters:
- worker-0
- worker-1
clusterselector: []
clusterlabelselectors: []
remediationstrategy:
  canaries: []
  maxconcurrency: 1
  timeout: 9
managedpolicies:
- generated-policy-multi-spokes-one-unavailable
blockingcrs: []
actions:
  beforeenable:
    addclusterlabels: {}
    deleteclusterlabels: {}
  aftercompletion:
    addclusterlabels:
      talmcomplete: ""
    deleteclusterlabels: {}
    deleteobjects: true
batchtimeoutaction: ""

--- Printing CGU status - talm-test: generated-cgu-multi-spokes-one-unavailable :
placementbindings: []
placementrules: []
copiedpolicies: []
conditions:
- type: ClustersSelected
  status: "True"
  observedgeneration: 0
  lasttransitiontime: "2023-06-17T01:45:37-04:00"
  reason: ClusterSelectionCompleted
  message: All selected clusters are valid
- type: Validated
  status: "True"
  observedgeneration: 0
  lasttransitiontime: "2023-06-17T01:45:37-04:00"
  reason: ValidationCompleted
  message: Completed validation
- type: Progressing
  status: "False"
  observedgeneration: 0
  lasttransitiontime: "2023-06-17T01:55:37-04:00"
  reason: TimedOut
  message: Policy remediation took too long
- type: Succeeded
  status: "False"
  observedgeneration: 0
  lasttransitiontime: "2023-06-17T01:55:37-04:00"
  reason: TimedOut
  message: Policy remediation took too long
remediationplan:
- - worker-0
- - worker-1
managedpoliciesns:
  generated-policy-multi-spokes-one-unavailable: talm-test
saferesourcenames: {}
managedpoliciesforupgrade:
- name: generated-policy-multi-spokes-one-unavailable
  namespace: talm-test
managedpoliciescompliantbeforeupgrade: []
managedpoliciescontent: {}
clusters:
- name: worker-0
  state: timedout
  currentpolicy:
    name: generated-policy-multi-spokes-one-unavailable
    status: NonCompliant
- name: worker-1
  state: timedout
  currentpolicy:
    name: generated-policy-multi-spokes-one-unavailable
    status: NonCompliant
status:
  startedat: "2023-06-17T01:45:37-04:00"
  completedat: "2023-06-17T01:55:37-04:00"
  currentbatch: 0
  currentbatchstartedat: "0001-01-01T00:00:00Z"
  currentbatchremediationprogress: {}
precaching: null
backup: null
computedmaxconcurrency: 1

Assignee:: Jun Chen

Reporter:: Joshua Clark

QA Contact:: Joshua Clark

Need Info From:: Joshua Clark

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Created:: 2023/06/20 7:49 PM

Updated:: 2023/08/15 4:52 PM

Resolved:: 2023/08/15 4:52 PM

Details

Description

Attachments

Easy Agile Planning Poker

Activity

People

Dates