Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-5921

Multicluster: Unresponsive cluster prevents console pods from starting

    XMLWordPrintable

Details

    • Bug
    • Resolution: Duplicate
    • Undefined
    • None
    • 4.13
    • Management Console
    • None
    • Critical
    • Proposed
    • False
    • Hide

      None

      Show
      None

    Description

      When a console rollout occurs and one of the managed clusters is not reachable, the pods don't start, and the console deployment gets stuck in creating. Errors like this appear in the log:

      E0117 12:02:39.538925 1 auth.go:232] error contacting auth provider (retrying in 10s): Get "https://api.jephilli-latest-01-16-0641.devcluster.openshift.com:6443/.well-known/oauth-authorization-server": dial tcp: lookup api.jephilli-latest-01-16-0641.devcluster.openshift.com on 172.30.0.10:53: no such host

      We need to tolerate unresponsive clusters and can't block rollout if one is not responding. This could break cluster upgrades among other issues.

      I tried to destroy the unresponsive cluster, but the console operator did not seem to trigger a new rollout. I had to manually delete the previous stuck pods to fix the issue.

      Attachments

        Activity

          People

            rh-ee-jonjacks Jon Jackson
            spadgett@redhat.com Samuel Padgett
            YaDan Pei YaDan Pei
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: