Loading...

XML

Word

Printable

Type: Epic
Resolution: Done
Priority: Major
Fix Version/s: ACM 2.8.0
Affects Version/s: None
Component/s: Cluster Lifecycle, QE
Labels:
- Green
- QE-Confidence:Green
- RFE
- Train-03
- doc-required
- pm-slack

Epic Name:
Add the possibility to retry posthooks when using the ClusterCurator resource
Blocked:
False
Blocked Reason:
None
Ready:
False
Color Status:
Green
Hierarchy Progress Bar:

0% To Do, 0% In Progress, 100% Done

SFDC Cases Links:
SFDC Cases Counter:
SFDC Cases Open:

Intelligence Requested:
Market:

We are using the ClusterCurator resource with ansible prehooks and posthooks to trigger upgrades of OpenShift clusters in ArgoCD. When the posthook fails in the curatorjob pod, there is no other way to retry the posthook than executing the ansible posthook manually in AWX.

Solution Proposal:

Per team discussion, here is the new proposal. By following the ArgoCD practice, we will append operator as the same level as spec. End user can specify operator.retryPosthook to retry the install/upgrade posthook one time

apiVersion: cluster.open-cluster-management.io/v1beta1
kind: ClusterCurator
metadata:
  name: xjcluster1
  namespace: xjcluster1
  labels:
    open-cluster-management: curator
operation:
  retryPosthook: installPosthook/upgradePosthook 
spec:
  desiredCuration: install/update
  install:
    towerAuthSecret: toweraccess
    prehook:
    - name: Demo Job Template
      extra_vars:
        sn_severity: 1
        sn_priority: 1
        appName: prehook job
        target_clusters:
          - my-cluster
    posthook:
    - name: Demo Job Template 2
      extra_vars:
        sn_severity: 2
        sn_priority: 2
        appName: posthook job
        target_clusters:
          - my-cluster

Case 1: install

1. end user creates a cluster curator CR, specify spec.desiredCuration = install

2. cluster cuartor controller fills in the spec.curatorJob when the curation is started

3. if any curator job fails, the controller updates the clusterCurator status conditions, remove the spec.desiredCuration (this is the current implementation), remove the operation field as well.

4. Once the posthook failure is figured out, end user can set up operation.retryPosthook in the same clusterCurator, the cluster curator is reconciled to just do the specified posthook once. update the clusterCurator status condition, remove the operation field

5. Make sure the retry just runs once. the removal of the operation field won't be reconciled again and again
To do this, need to add a check in the cluster curator reconcile predicate function
https://github.com/stolostron/cluster-curator-controller/blob/4eaa3d1d9db2f908b5507a86ccb4b2d5a811bb08/controllers/clustercurator_controller.go#[…]1

if newClusterCurator.Operation != oldClusterCurator.Operation && newClusterCurator.Operation == nil{
    return false
}

6. if the posthook fails again, go to step 4

Case 2: upgrade
That is basically as same as the install case. spec.desiredCuration = "upgrade" in step 1, And in step 3, spec.desiredCuration is remained.

Note: The cluster curator CR could be maintained by ArgoCD with auto sync on. After the retry operation is manually added by user, ArgoCD application controller could be triggered to clean it up as the cluster curator in the git repo is the source of truth. And our curator controller will clean it up at the end of retry anyway. The operation clean up could happen twice. In this case, we need to make sure no additional action would happen.

ACM Epic Done Checklist

See presentation and details.

Update with "Y" if Epic meets the requirement, "N" if it does not, or "N/A" if not applicable.

N/A FIPS Readiness
Y Works in Disconnected
N/A Global Proxy Support
N/A Installable to Infrastructure Nodes
Y No impacts to Performance and Scalability
N/A Backup and Restorable

Assignee:: Feng Xiang

Reporter:: Younes Ajbar

QA Contact:: Atif Shafi

Votes:: 0 Vote for this issue

Watchers:: 8 Start watching this issue

Created:: 2022/11/09 8:41 PM

Updated:: 2023/05/24 3:36 PM

Resolved:: 2023/05/24 3:36 PM

Details

Description

Solution Proposal:

Case 1: install

ACM Epic Done Checklist

Attachments

Easy Agile Planning Poker

Activity

People

Dates