Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-5921

Multicluster: Unresponsive cluster prevents console pods from starting

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Duplicate
    • Icon: Undefined Undefined
    • None
    • 4.13
    • Management Console
    • None
    • Critical
    • None
    • Proposed
    • False
    • Hide

      None

      Show
      None

      When a console rollout occurs and one of the managed clusters is not reachable, the pods don't start, and the console deployment gets stuck in creating. Errors like this appear in the log:

      E0117 12:02:39.538925 1 auth.go:232] error contacting auth provider (retrying in 10s): Get "https://api.jephilli-latest-01-16-0641.devcluster.openshift.com:6443/.well-known/oauth-authorization-server": dial tcp: lookup api.jephilli-latest-01-16-0641.devcluster.openshift.com on 172.30.0.10:53: no such host

      We need to tolerate unresponsive clusters and can't block rollout if one is not responding. This could break cluster upgrades among other issues.

      I tried to destroy the unresponsive cluster, but the console operator did not seem to trigger a new rollout. I had to manually delete the previous stuck pods to fix the issue.

              rh-ee-jonjacks Jon Jackson
              spadgett@redhat.com Samuel Padgett
              YaDan Pei YaDan Pei
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

                Created:
                Updated:
                Resolved: