Loading...

XML

Word

Printable

Type: Bug
Resolution: Done
Priority: Undefined
Fix Version/s: ACM 2.7.Z
Affects Version/s: ACM 2.7.10
Component/s: Server Foundation
Labels:
- BackupAndRecovery

Blocked:
False
Blocked Reason:
None
Ready:
False
Intelligence Requested:
Market:
RH Private Keywords:

Severity:
Moderate

Regression:
No

SFDC Cases Links:
SFDC Cases Counter:
SFDC Cases Open:

PX Priority Data:
PX Impact Score:

Description of problem:

When restore data containing a ManagedCluster that was adopted via Hive's Adoption ability is applied to a new ACM hub, the ManagedCluster immediately re-joins as expected, but very quickly moves into an unknown state. After two hours, the ManagedCluster re-joins on its own without issue.

Version-Release number of selected component (if applicable):

2.7.10

How reproducible:

Always

Steps to Reproduce:

Adopt a cluster into an ACM hub that has the cluster-backup operator installed and configured. Use the instructions here to adopt: https://github.com/openshift/hive/blob/master/docs/using-hive.md#cluster-adoption
Ensure the cluster.open-cluster-management.io/backup=true label is applied to the adopted managedcluster's admin-kubeconfig secret.
Have the cluster-backup operator take a backup of the ACM hub.
Redeploy the ACM hub
Apply the restore to the newly deployed ACM hub

Actual results:

After initial restore, all ManagedClusters (deployed by the restored ACM hub as well as adopted clusters) re-join without issue:

[root@bastion.<redacted> ~]# oc get managedclusters
NAME                   HUB ACCEPTED   MANAGED CLUSTER URLS                                       JOINED   AVAILABLE   AGE
<redacted>   true           https://api.<redacted>:6443   True     True        2m52s
<redacted>   true           https://api.<redacted>:6443   True     True        2m52s

Shortly after, the adopted cluster goes into an unknown state:

[root@bastion.<redacted> ~]# oc get managedclusters
NAME                   HUB ACCEPTED   MANAGED CLUSTER URLS                                       JOINED   AVAILABLE   AGE
local-cluster          true           https://api.<redacted>:6443           True     True        4m9s
<redacted>   true           https://api.<redacted>:6443   True     True        25m
<redacted>   true           https://api.<redacted>:6443   True     Unknown     25m

After approximately two hours, the adopted cluster re-joins the ACM hub without any manual intervention:

[root@bastion.<redacted> ~]# oc get managedclusters
NAME                   HUB ACCEPTED   MANAGED CLUSTER URLS                                       JOINED   AVAILABLE   AGE
local-cluster          true           https://api.<redacted>:6443           True     True        5h26m
<redacted>   true           https://api.<redacted>:6443   True     True        120m
<redacted>   true           https://api.<redacted>:6443   True     True        120m

Expected results:

Adopted clusters successfully re-join the ACM hub post-restore and stay joined.

Additional info:

This behavior appears to be similar to https://issues.redhat.com/browse/ACM-8746.

- - Sort By Name
  - Sort By Date
  - Ascending
  - Descending
  - Thumbnails
  - List
  - Download All

image-2024-01-26-10-49-57-977.png
300 kB
2024/01/26 2:49 AM

Assignee:: Jian Zhu

Reporter:: Sean Carlisle

QA Contact:: Hui Chen

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Created:: 2024/01/17 3:57 PM

Updated:: 2024/07/02 1:52 PM

Resolved:: 2024/07/02 1:52 PM

Details

Description

Description of problem:

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

Actual results:

Expected results:

Additional info:

Attachments

Attachments

Easy Agile Planning Poker

Activity

People

Dates