Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-60465

RFE: [telco-hub] Implement deterministic sync-wave ordering for telco-hub reference-CRs to fix deployment race conditions

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Major Major
    • None
    • 4.20
    • GitOps ZTP
    • Quality / Stability / Reliability
    • False
    • Hide

      None

      Show
      None
    • None
    • None
    • None
    • None
    • Proposed
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      Description of problem:

      The current telco-hub reference-CRs sync-wave configuration contains critical inconsistencies causing deployment failures and race conditions in ArgoCD rollouts.

      Critical Problems:

      • 42 resources missing sync-wave annotations causing undefined deployment order
      • MultiClusterHub deploys before MultiClusterEngine (backward dependency)
      • Namespaced resources deploy before their namespaces exist
      • Storage consumers deploy before storage validation completes
      • Policy execution scattered across random sync-waves

      Impact:

      • Unreliable deployments requiring manual intervention
      • Extended deployment times due to retry loops
      • Support escalations for "random" deployment failures
      • Reduced confidence in GitOps automation

      Proposed Solution:

      Implement structured 8-phase sync-wave ordering following Kubernetes native resource dependencies:

      • Registry Foundation (-50)
      • Namespaces (-45)
      • Namespaced Resources (-40)
      • ArgoCD Resources (-35)
      • Independent Custom Resources (-30)
      • Policies and Validation (-25)
      • Storage-Dependent Services (-10)
      • ZTP Components (100)

      Acceptance Criteria:

      • [ ] All 68 reference-CRs files have appropriate sync-wave annotations
      • [ ] MCE deploys before MCH
      • [ ] All namespaces deploy before namespaced resources
      • [ ] Storage validation completes before storage consumers deploy
      • [ ] Consistent policy execution phase
      • [ ] Documentation updated with sync-wave strategy

      This should address long-standing deployment reliability issues and establish a foundation for predictable telco-hub rollouts.

              lochoa@redhat.com Leonardo Ochoa
              lochoa@redhat.com Leonardo Ochoa
              None
              None
              Alexander Gurenko Alexander Gurenko
              None
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

                Created:
                Updated: