Uploaded image for project: 'OpenShift Hive'
  1. OpenShift Hive
  2. HIVE-2195

CD never clears WaitingForMachines condition when cluster is unreachable

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Normal Normal
    • None
    • None
    • False
    • None
    • False

      Encountered an issue where a customer manually stopped machines for a cluster deployment with an unreachable condition and the powerState within cd status gets stuck in a WaitingForMachines state.

      Steps to reproduce:

      1. Make cluster's kubeconfig invalid in some way to have an unreachable condition.
      2. Stop a machine for the cluster with no explicit cd.spec.powerState set, eg. hibernation controller expects to start the machines it found stopped.
      3. Observe the hibernation controller start the machine and the cd powerState being set to WaitingForMachines. This condition will never clear despite the machines having successfully started.
        - lastProbeTime: "2023-03-13T20:26:43Z"
          lastTransitionTime: "2023-03-13T20:26:43Z"
          message: 'Waiting for cluster machines to start. Some machines are not yet running:
            i-0142d17411ffb0202 (step 1/4)'
          reason: WaitingForMachines
          status: "False"
          type: Ready 

      https://redhat-internal.slack.com/archives/CE3ETN3J8/p1678735174858429

       

              Unassigned Unassigned
              abutcher@redhat.com Andrew Butcher
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

                Created:
                Updated: