-
Bug
-
Resolution: Unresolved
-
Critical
-
None
-
4.20.0
-
Quality / Stability / Reliability
-
False
-
-
None
-
Important
-
None
-
None
-
None
-
Rejected
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
Description of problem:
Following the steps from Disaster recovery docs for with OADP 1.5, the restored cluster does not have Nodes ready. The Hypershift operator also logs errors related to security groups:
{"level":"error","ts":"2025-07-28T11:34:24Z","msg":"Failed to reconcile NodePool","controller":"nodepool","controllerGroup":"hypershift.openshift.io","controllerKind":"NodePool","NodePool":{"name":"hc1-us-east-1a","namespace":"clusters"},"namespace":"clusters","name":"hc1-us-east-1a","reconcileID":"329e3719-7d3f-4728-b94d-bf4cac23c7bc","error":"failed to create machine template: failed to generate AWSMachineTemplateSpec: the default security group for the HostedCluster has not been created","stacktrace":"github.com/openshift/hypershift/hypershift-operator/controllers/nodepool.(*NodePoolReconciler).Reconcile\n\t/hypershift/hypershift-operator/controllers/nodepool/nodepool_controller.go:236\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).Reconcile\n\t/hypershift/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:116\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).reconcileHandler\n\t/hypershift/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:303\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).processNextWorkItem\n\t/hypershift/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:263\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).Start.func2.2\n\t/hypershift/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:224"}
Version-Release number of selected component (if applicable):
OCP 4.20 (4.20.0-0.nightly-multi-2025-07-23-044404) OADP plugin quay.io/redhat-user-workloads/ocp-art-tenant/oadp-hypershift-oadp-plugin-main:main (from July 28)
How reproducible:
Always
Steps to Reproduce:
1. Create necessary resources on the management cluster for OADP: OADP operator subscription, DataProtectionApplication, BackupStorageLocation. 2. Create the Backup resource, see it complete successfully: phase: Completed progress: itemsBackedUp: 366 totalItems: 366 3. Break the hosted cluster: Pause the HC and NP Delete the HCP namespace Delete the hanged resources, they usually are the capi objects, but I use the go app called termin8 Make sure the HCP namespace is not in terminating state Delete the HC and NP Remove the finalizers of HP and NP Wait until OCP resources are: HCP Namespace should not exists HC and NP should not exists 4. Apply the Restore resource, see it complete: phase: Completed progress: itemsRestored: 367 totalItems: 367
Actual results:
All pods in HCP namespace running, HostedCluster marked as Completed. Hypershift operator throwing errors. Nodes in the hosted cluster not ready.
Expected results:
Nodes in hosted cluster being ready.
Additional info:
Link to hypershift dump: https://drive.google.com/file/d/1iPgvy8m8uKVL9FlSdvpFtuh-SAdybc3o/view?usp=sharing
- is related to
-
OCPBUGS-59880 AWS Public Hypershift clusters cannot be restored automatically
-
- Verified
-