-
Bug
-
Resolution: Unresolved
-
Undefined
-
None
-
None
-
None
-
False
-
-
False
-
-
-
-
Important
-
None
Description of problem:
In an IPv6 only enviornment, when a server is replaced via IBBF/IBR, the spoke cluster fails to rejoin the hub. The spoke cluster cannot reach the hub cluster.
The spoke is trying to reach the hub at:
Hub API: https://api.kni-qe-81.telcoqe.eng.rdu2.dc.redhat.com:6443
IPv6 address: 2620:52:9:1684:8c7:e6ff:fee3:5e21:6443
The klusterlet-agent logs show repeated timeouts trying to connect to the hub:
dial tcp [2620:52:9:1684:8c7:e6ff:fee3:5e21]:6443: i/o timeout
The klusterlet status also confirms this:
type: HubConnectionDegradedstatus: "True"reason: BootstrapSecretError,HubKubeConfigSecretMissingmessage: | Failed to create &SelfSubjectAccessReview... Post "https://api.kni-qe-81.telcoqe.eng.rdu2.dc.redhat.com:6443/...": dial tcp [2620:52:9:1684:8c7:e6ff:fee3:5e21]:6443: i/o timeout
This causes reimport to loop infinitely because the spoke cluster is never reported as healthy.
2026-02-23T13:53:44.247Z INFO ClusterInstanceController controller/clusterinstance_controller.go:157 Starting reconcile ClusterInstance {"name": "helix82", "namespace": "helix82"}
2026-02-23T13:53:44.247Z INFO ClusterInstanceController controller/clusterinstance_controller.go:178 Loaded ClusterInstance {"name": "helix82", "namespace": "helix82", "version": "3829902"}
2026-02-23T13:53:44.247Z INFO ClusterInstanceController.EnsureClusterIsReimported reinstall/reimport.go:173 Checking cluster health after provisioning {"name": "helix82", "namespace": "helix82", "version": "3829902", "timeSinceProvisioning": "6m32.247818148s", "gracePeriod": "5m0s", "gracePeriodRemaining": "-1m32.247818148s"}
2026-02-23T13:53:44.247Z INFO ClusterInstanceController.EnsureClusterIsReimported reinstall/reimport.go:256 Cluster lease update stopped after grace period - triggering reimport {"name": "helix82", "namespace": "helix82", "version": "3829902", "timeSinceProvisioning": "6m32.247818148s", "AvailableReason": "ManagedClusterLeaseUpdateStopped", "AvailableMessage": "Registration agent stopped updating its lease."}
2026-02-23T13:53:44.247Z INFO ClusterInstanceController.EnsureClusterIsReimported reinstall/reimport.go:269 Grace period expired and cluster still unhealthy, triggering reimport {"name": "helix82", "namespace": "helix82", "version": "3829902", "cluster": "helix82", "timeSinceProvisioning": "6m32.247818148s"}
2026-02-23T13:53:44.247Z INFO ClusterInstanceController.EnsureClusterIsReimported reinstall/reimport.go:278 Reimport already initiated, waiting for cluster to become healthy {"name": "helix82", "namespace": "helix82", "version": "3829902", "conditionMessage": "Klusterlet manifests applied to spoke cluster, waiting for cluster to become healthy"}
2026-02-23T13:53:44.247Z INFO ClusterInstanceController controller/clusterinstance_controller.go:214 Cluster reimport still in progress, requeuing {"name": "helix82", "namespace": "helix82", "version": "3829902"}
2026-02-23T13:53:44.247Z INFO ClusterInstanceController controller/clusterinstance_controller.go:154 Finished reconciling ClusterInstance {"name": "helix82", "namespace": "helix82", "version": "3829902"}
The root cause of this is a stale IPv6 neighbor cache on the hub cluster node running the API VIP. The new cluster is reinstalled with the preserved IP address of the old server, but uses the MAC address of the new server. Manually flushing the IPv6 neighbor cache on the hub node with the API VIP fixes the problem.
This has likely been an ongoing problem that was masked by ACM-23801. When the reimport logic was added to IBBF/IBR, this new problem was detected.
Version-Release number of selected component (if applicable): 2.16
How reproducible: Always
Steps to Reproduce:
- Reinstall server via IBR/IBBF
- Monitor managedcluster available=unknown
- Monitor siteconfig pod log, watching for reimport failures
- Monitor kubelet logs on spoke cluster, watching for HubConnectionDegradedstatus
- Attempt to ping API VIP from replaced spoke cluster (fails)
- Manually flush Ipv6 neighbor cache on hub node running API VIP ( sudo ip -6 neigh del <IP> dev <bridge name>)
- Monitor managedcluster available=true
Actual results:
Reinstalled spoke cluster cannot communicate with hub cluster API
Expected results:
Spoke cluster rejoins hub after successful IBR/IBBF