Description of the problem:
As a part of Altiostar deployment, when deploying the spoke compact cluster (3 masters only) deployment fails (times out). After investigation hive-operator pod is crashing and nodes report unreachability. If agents are manually restarted and hive-operator pod is re-created there is a new attempt to deploy cluster until hive-operator crash again.
How reproducible:
Occasional, first noticed with 4.8.32 two weeks ago, now happening on 4.8.43
ACM 2.4.6
Steps to reproduce:
1. Start deploying compact cluster
2. Wait for agents to register
3.
Actual results:
Deployment does not proceed with
- The cluster has hosts that are not ready to install.
hive-operator is crashing with following in a log:
time="2022-08-10T11:29:30Z" level=info msg="reconcile complete" controller=hive elapsedMillis=620 elapsedMillisGT=0 outcome=unspecified
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x8 pc=0x123e5fc]
Expected results:
Deployment continues
- links to