Loading...

XML

Word

Printable

Type: Bug
Resolution: Unresolved
Priority: Critical
Fix Version/s: None
Affects Version/s: 4.20.0
Component/s: HyperShift / ROSA
Labels:
- triaged

Activity Type:
Quality / Stability / Reliability
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Story Points:
None
Severity:
Important
Regression:
None

Target Backport Versions:
None
Target Version:
None
Release Blocker:
Rejected
Sprint:
None

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

PX Impact Score:

Release Note Status:
None
Release Note Type:
None
Release Note Text:
None

Escape Reason:
None
Escape Impact:
None
Corrective Measures:
None
SDLC stage when should've been found:
None

Description of problem:
Following the steps from Disaster recovery docs for with OADP 1.5, the restored cluster does not have Nodes ready. The Hypershift operator also logs errors related to security groups:

{"level":"error","ts":"2025-07-28T11:34:24Z","msg":"Failed to reconcile NodePool","controller":"nodepool","controllerGroup":"hypershift.openshift.io","controllerKind":"NodePool","NodePool":{"name":"hc1-us-east-1a","namespace":"clusters"},"namespace":"clusters","name":"hc1-us-east-1a","reconcileID":"329e3719-7d3f-4728-b94d-bf4cac23c7bc","error":"failed to create machine template: failed to generate AWSMachineTemplateSpec: the default security group for the HostedCluster has not been created","stacktrace":"github.com/openshift/hypershift/hypershift-operator/controllers/nodepool.(*NodePoolReconciler).Reconcile\n\t/hypershift/hypershift-operator/controllers/nodepool/nodepool_controller.go:236\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).Reconcile\n\t/hypershift/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:116\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).reconcileHandler\n\t/hypershift/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:303\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).processNextWorkItem\n\t/hypershift/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:263\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).Start.func2.2\n\t/hypershift/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:224"}

Version-Release number of selected component (if applicable):

OCP 4.20 (4.20.0-0.nightly-multi-2025-07-23-044404)
OADP plugin quay.io/redhat-user-workloads/ocp-art-tenant/oadp-hypershift-oadp-plugin-main:main (from July 28)

How reproducible:

Always

Steps to Reproduce:

    1. Create necessary resources on the management cluster for OADP: OADP operator subscription, DataProtectionApplication, BackupStorageLocation.
    2. Create the Backup resource, see it complete successfully:
        phase: Completed
          progress:
            itemsBackedUp: 366
            totalItems: 366
    3. Break the hosted cluster:
         Pause the HC and NP
         Delete the HCP namespace
         Delete the hanged resources, they usually are the capi objects, but I use the go app called termin8
         Make sure the HCP namespace is not in terminating state
         Delete the HC and NP
         Remove the finalizers of HP and NP
         Wait until OCP resources are:
            HCP Namespace should not exists
            HC and NP should not exists     
    4. Apply the Restore resource, see it complete:
       phase: Completed
         progress:
          itemsRestored: 367
          totalItems: 367

Actual results:

All pods in HCP namespace running, HostedCluster marked as Completed. Hypershift operator throwing errors. Nodes in the hosted cluster not ready.

Expected results:

Nodes in hosted cluster being ready.

Additional info:

Link to hypershift dump: https://drive.google.com/file/d/1iPgvy8m8uKVL9FlSdvpFtuh-SAdybc3o/view?usp=sharing

causes

OCPSTRAT-2547 DR integration into the Hypershift CLI

In Progress

is related to

CNTRLPLANE-1690 Disaster recovery integration into the Hypershift CLI

In Progress

OCPBUGS-59880 AWS Public Hypershift clusters cannot be restored automatically

Closed

links to

openshift/hypershift#7210: OCPBUGS-59876: feat: implement CLI-based OIDC disaster recovery command

Assignee:: Juan Manuel Parrilla Madrid

Reporter:: Martin Gencur

Need Info From:: None

Contributors:: None

QA Contact:: Martin Gencur

Doc Contact:: None

Votes:: 0 Vote for this issue

Watchers:: 9 Start watching this issue

Created:: 2025/07/28 2:49 PM

Updated:: 2025/11/17 11:16 AM

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates