Description of problem:
hive-operator pod ended up in CrashLoopBackOff state. And logs reports:
E0206 16:21:23.401875 1 reflector.go:205] "Failed to watch" err="failed to list *v1.APIServer: apiservers.config.openshift.io is forbidden: User \"system:serviceaccount:multicluster-engine:hive-operator\" cannot list resource \"apiservers\" in API group \"config.openshift.io\" at the cluster scope" logger="UnhandledError" reflector="k8s.io/client-go@v0.34.2/tools/cache/reflector.go:290" type="*v1.APIServer"
time="2026-02-06T16:21:43Z" level=info msg="calculating metrics for all Hive" controller=metrics
time="2026-02-06T16:21:43Z" level=info msg="reconcile complete" controller=metrics elapsedMillis=0 elapsedMillisGT=0 outcome=unspecified
time="2026-02-06T16:21:43Z" level=error msg="Could not wait for Cache to sync" controller=hive-controller error="failed to wait for hive-controller caches to sync: timed out waiting for cache to be synced for Kind *v1.DaemonSet"
[controller-runtime] log.SetLogger(...) was never called; logs will not be displayed.
Detected at:
> goroutine 299 [running]:
> runtime/debug.Stack()
> runtime/debug/stack.go:26 +0x5e
> sigs.k8s.io/controller-runtime/pkg/log.eventuallyFulfillRoot()
> sigs.k8s.io/controller-runtime@v0.22.3/pkg/log/log.go:60 +0xcd
> sigs.k8s.io/controller-runtime/pkg/log.(*delegatingLogSink).Error(0xc000698c40, {0x58cd9c0, 0xc006041040}, {0x51adcfd, 0x21}, {0x0, 0x0, 0x0})
> sigs.k8s.io/controller-runtime@v0.22.3/pkg/log/deleg.go:139 +0x5d
> github.com/go-logr/logr.Logger.Error({{0x5915ac0?, 0xc000698c40?}, 0x0?}, {0x58cd9c0, 0xc006041040}, {0x51adcfd, 0x21}, {0x0, 0x0, 0x0})
> github.com/go-logr/logr@v1.4.3/logr.go:301 +0x145
> sigs.k8s.io/controller-runtime/pkg/internal/source.(*Kind[...]).Start.func1.1()
> sigs.k8s.io/controller-runtime@v0.22.3/pkg/internal/source/kind.go:76 +0x1a9
> k8s.io/apimachinery/pkg/util/wait.loopConditionUntilContext.func1({0x590b3d8?, 0xc000272960?}, 0xc00110fe80?)
> k8s.io/apimachinery@v0.34.2/pkg/util/wait/loop.go:53 +0x62
> k8s.io/apimachinery/pkg/util/wait.loopConditionUntilContext({0x590b3d8, 0xc000272960}, {0x58ef2f8, 0xc0004827e0}, 0x1, 0x0, 0xc0013a7fa8)
> k8s.io/apimachinery@v0.34.2/pkg/util/wait/loop.go:54 +0x115
> k8s.io/apimachinery/pkg/util/wait.PollUntilContextCancel({0x590b3d8, 0xc000272960}, 0x0?, 0x1, 0xc00110ffa8)
> k8s.io/apimachinery@v0.34.2/pkg/util/wait/poll.go:33 +0x56
> sigs.k8s.io/controller-runtime/pkg/internal/source.(*Kind[...]).Start.func1()
> sigs.k8s.io/controller-runtime@v0.22.3/pkg/internal/source/kind.go:64 +0xba
> created by sigs.k8s.io/controller-runtime/pkg/internal/source.(*Kind[...]).Start in goroutine 263
> sigs.k8s.io/controller-runtime@v0.22.3/pkg/internal/source/kind.go:56 +0x194
time="2026-02-06T16:21:43Z" level=error msg="error running manager" error="failed to wait for hive-controller caches to sync: timed out waiting for cache to be synced for Kind *v1.DaemonSet"
time="2026-02-06T16:21:43Z" level=info msg="leader lost" id=41200553-0e54-4327-bca2-d3562541a90c
Version-Release number of selected component (if applicable):
advanced-cluster-management.v2.16.0-213
multicluster-engine.v2.11.0-247
OCP 4.22.0-ec.1
How reproducible: so far 1st attempt to deploy OCP 4.22 with ACM-2.16
Steps to Reproduce:
- Install AMC 2.16 on the hub cluster
- Follow GitOps ZTP deployment procedure to deploy baremetal multi node cluster
- ...
Actual results:
hive-operator pod is in CrashLoopBack state
Spoke cluster is deployed but not imported to the ACM
Expected results:
hive-operator is up and running
Additional info:
-------------------------------------------------------------------------------------------------------
QE Hand Off Template (fill out when moving to Review) 2/10/26:
Summary of the Work:
What was implemented or fixed? Include a brief description of the problem (if applicable) and how it was addressed.
e.g., "Updated the UI to show validation errors for the form. The previous implementation did not surface backend validation issues."
Key Areas to Verify:
- What functionality should QE focus on? List what was tested or what is most important to validate.
- Ensure the new validation messages appear for required fields
- Confirm the workflow still completes as expected after validation fixes
- Any edge cases or high-risk areas touched by the change
Fix or Feature Availability:
When will this be available in a build?
Code merged on: YYYY-MM-DD
Expected downstream build tag (if known): example-build-tag
(Optional) Related PR(s): Link