Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-59876

Restoring AWS cluster via OADP results in nodes not ready

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Critical Critical
    • None
    • 4.20.0
    • HyperShift / ROSA
    • Quality / Stability / Reliability
    • False
    • Hide

      None

      Show
      None
    • None
    • Important
    • None
    • None
    • None
    • Rejected
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      Description of problem:
      Following the steps from Disaster recovery docs for with OADP 1.5, the restored cluster does not have Nodes ready. The Hypershift operator also logs errors related to security groups:

      {"level":"error","ts":"2025-07-28T11:34:24Z","msg":"Failed to reconcile NodePool","controller":"nodepool","controllerGroup":"hypershift.openshift.io","controllerKind":"NodePool","NodePool":{"name":"hc1-us-east-1a","namespace":"clusters"},"namespace":"clusters","name":"hc1-us-east-1a","reconcileID":"329e3719-7d3f-4728-b94d-bf4cac23c7bc","error":"failed to create machine template: failed to generate AWSMachineTemplateSpec: the default security group for the HostedCluster has not been created","stacktrace":"github.com/openshift/hypershift/hypershift-operator/controllers/nodepool.(*NodePoolReconciler).Reconcile\n\t/hypershift/hypershift-operator/controllers/nodepool/nodepool_controller.go:236\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).Reconcile\n\t/hypershift/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:116\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).reconcileHandler\n\t/hypershift/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:303\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).processNextWorkItem\n\t/hypershift/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:263\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).Start.func2.2\n\t/hypershift/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:224"}
      

      Version-Release number of selected component (if applicable):

      OCP 4.20 (4.20.0-0.nightly-multi-2025-07-23-044404)
      OADP plugin quay.io/redhat-user-workloads/ocp-art-tenant/oadp-hypershift-oadp-plugin-main:main (from July 28)
          

      How reproducible:

      Always    

      Steps to Reproduce:

          1. Create necessary resources on the management cluster for OADP: OADP operator subscription, DataProtectionApplication, BackupStorageLocation.
          2. Create the Backup resource, see it complete successfully:
              phase: Completed
                progress:
                  itemsBackedUp: 366
                  totalItems: 366
          3. Break the hosted cluster:
               Pause the HC and NP
               Delete the HCP namespace
               Delete the hanged resources, they usually are the capi objects, but I use the go app called termin8
               Make sure the HCP namespace is not in terminating state
               Delete the HC and NP
               Remove the finalizers of HP and NP
               Wait until OCP resources are:
                  HCP Namespace should not exists
                  HC and NP should not exists     
          4. Apply the Restore resource, see it complete:
             phase: Completed
               progress:
                itemsRestored: 367
                totalItems: 367

      Actual results:

      All pods in HCP namespace running, HostedCluster marked as Completed. Hypershift operator throwing errors. Nodes in the hosted cluster not ready.    

      Expected results:

      Nodes in hosted cluster being ready.    

      Additional info:

      Link to hypershift dump: https://drive.google.com/file/d/1iPgvy8m8uKVL9FlSdvpFtuh-SAdybc3o/view?usp=sharing    

              jparrill@redhat.com Juan Manuel Parrilla Madrid
              mgencur@redhat.com Martin Gencur
              None
              None
              Martin Gencur Martin Gencur
              None
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

                Created:
                Updated: