-
Bug
-
Resolution: Duplicate
-
Undefined
-
None
-
4.13
-
None
-
Critical
-
None
-
Proposed
-
False
-
When a console rollout occurs and one of the managed clusters is not reachable, the pods don't start, and the console deployment gets stuck in creating. Errors like this appear in the log:
E0117 12:02:39.538925 1 auth.go:232] error contacting auth provider (retrying in 10s): Get "https://api.jephilli-latest-01-16-0641.devcluster.openshift.com:6443/.well-known/oauth-authorization-server": dial tcp: lookup api.jephilli-latest-01-16-0641.devcluster.openshift.com on 172.30.0.10:53: no such host
We need to tolerate unresponsive clusters and can't block rollout if one is not responding. This could break cluster upgrades among other issues.
I tried to destroy the unresponsive cluster, but the console operator did not seem to trigger a new rollout. I had to manually delete the previous stuck pods to fix the issue.