Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-9367

Openshift private cluster fails to install due to missing worker nodes on ASH

XMLWordPrintable

    • Quality / Stability / Reliability
    • None
    • None
    • None
    • Important
    • None
    • x86_64
    • Rejected
    • None
    • In Progress
    • Release Note Not Required
    • None
    • None
    • None
    • None
    • None

      Description of problem:

      Installations of private clusters (publish: Internal) fails on Azure Stack Hub due to reference the incorrect load balancer when attempting to crate worker nodes.

      Version-Release number of selected component (if applicable):
      4.11.0-0.nightly-2022-07-05-083948 (and previous nightlies)

      How reproducible:

      always

      Steps to Reproduce:
      1. Attempt to install a private cluster on ASH
      2. wait until installer times out after removal of bootstrap
      3. log in to cluster and observe that only master nodes exist

      Actual results:

      Incomplete cluster with no worker nodes

      Expected results:

      cluster installs successfully

      Additional info:

      core@mghaganproxy:~$ oc get machines -A
      NAMESPACE NAME PHASE TYPE REGION ZONE AGE
      openshift-machine-api mgahagan220706-ff6dd-master-0 Running Standard_DS4_v2 mtcazs 6h16m
      openshift-machine-api mgahagan220706-ff6dd-master-1 Running Standard_DS4_v2 mtcazs 6h16m
      openshift-machine-api mgahagan220706-ff6dd-master-2 Running Standard_DS4_v2 mtcazs 6h16m
      openshift-machine-api mgahagan220706-ff6dd-worker-mtcazs-29hpn Failed 6h8m
      openshift-machine-api mgahagan220706-ff6dd-worker-mtcazs-c6mr4 Failed 6h8m
      openshift-machine-api mgahagan220706-ff6dd-worker-mtcazs-nl6h5 Failed 6h8m
      openshift-machine-api mgahagan220706-ff6dd-worker-mtcazs-rtgk8 Failed 5h14m

      inspecting one of the failed workers with oc describe we see:

      Error Message: failed to reconcile machine "mgahagan220706-ff6dd-worker-mtcazs-c6mr4": network.LoadBalancersClient#Get: Failure responding to request: StatusCode=404 – Original Error: autorest/azure: Service returned an error. Status=404 Code="ResourceNotFound" Message="The Resource 'Microsoft.Network/loadBalancers/mgahagan220706-ff6dd' under resource group 'mgahagan220706-ff6dd-rg' was not found. For more details please go to https://aka.ms/ARMResourceNotFoundFix"

      Given that this is a private cluster created with publish: Internal the proper load balancer to bind the worker node's nic to should be mgahagan220706-ff6dd-internal

              padillon Patrick Dillon
              mgahagan@redhat.com Mike Gahagan
              None
              None
              Jinyun Ma Jinyun Ma
              None
              Red Hat Employee
              Votes:
              0 Vote for this issue
              Watchers:
              11 Start watching this issue

                Created:
                Updated:
                Resolved: