Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-45017

OpenShift 4.17 IPI Private cluster installation failure due to non-empty resource group

XMLWordPrintable

    • Quality / Stability / Reliability
    • False
    • Hide

      None

      Show
      None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      Description of problem:

      When creating a Private IPI Cluster using an existing Azure Vnet with preconfigured Subnets, as per

      https://docs.openshift.com/container-platform/4.17/installing/installing_azure/ipi/installing-azure-private.html

      and

      https://docs.openshift.com/container-platform/4.17/installing/installing_azure/ipi/installing-azure-vnet.html

      an error is returned:

      FATAL failed to fetch Cluster API Manifests: failed to generate asset "Cluster API Manifests": failed to generate Azure manifests: failed to get azure ip availability: network.VirtualNetworksClient#CheckIPAddressAvailability: Failure responding to request: StatusCode=400 -- Original Error: autorest/azure: Service returned an error. Status=400 Code="PrivateIPAddressNotInAnySubnet" Message="Private static IP address 10.0.0.100 does not belong to the range of any subnet in the virtual network /subscriptions/6e700e21-5667-435d-8f78-f421bedbe936/resourceGroups/RG-AS-OCPNP/providers/Microsoft.Network/virtualNetworks/VN-AS-OCPNP." Details=[]

      This error appears during the creation of the manifests, i.e.

      $ openshift-install create manifests --dir <installation_directory> 

      No cluster creation step was attempted yet.

      For the above issue,  the customer found the "fix" in the GitHub Repo[1] and it worked partially:

      [1] https://github.com/openshift/installer/pull/9144

      However, after cloning the git repo, adding the PR changes to that file; "pkg/asset/manifests/azure/cluster.go"

      # ./openshift-install create manifests --dir installer_linux INFO Credentials loaded from file "/root/.azure/osServicePrincipal.json" INFO Consuming Install Config from target directory INFO Adding clusters... INFO Manifests created in: installer_linux/cluster-api, installer_linux/manifests and installer_linux/openshift

      We could see the internal LB was assigned the correct next available IP(which previously was not happening)

        networkSpec:
          apiServerLB:
            backendPool:
              name: ocpnp-pz6m2-internal
            frontendIPs:
            - name: ocpnp-pz6m2-internal-frontEnd
              privateIP: 10.89.188.4
            name: ocpnp-pz6m2-internal
            type: Internal 

      We used  " remotes/origin/release-4.17 " branch, from https://github.com/openshift/installer

      And then proceeded with a cluster build to test it further, but testing the cluster creation failed, this time with a new error message:

      # ./openshift-install create cluster --dir installer_linux
      INFO Consuming OpenShift Install (Manifests) from target directory
      INFO Consuming Master Machines from target directory
      INFO Consuming Worker Machines from target directory
      INFO Consuming Common Manifests from target directory
      INFO Consuming Openshift Manifests from target directory
      INFO Credentials loaded from file "/root/.azure/osServicePrincipal.json"
      FATAL failed to fetch Cluster Infrastructure Variables: failed to fetch dependency of "Cluster Infrastructure Variables": failed to generate asset "Platform Provisioning Check": platform.azure.resourceGroupName: Invalid value: "RG-AS-OCPNP": resource group must be empty but it has 7 resources like /subscriptions/6e700e21-5667-435d-8f78-f421bedbe936/resourceGroups/RG-AS-OCPNP/providers/Microsoft.Network/virtualNetworks/VN-AS-OCPNP, /subscriptions/6e700e21-5667-435d-8f78-f421bedbe936/resourceGroups/RG-AS-OCPNP/providers/Microsoft.Network/networkSecurityGroups/NSG-AS-OCPNP ...

      For some reason, when using an Existing Vnet / resource group etc, it's trying to validate the resource group is empty, which will never be the case.

      This above error appeared during v4.17 cluster installation and also v4.16.x (tried .0, .9 and .21 installer versions) wherein creating the manifests worked fine, but the error about non-empty resource group persists.

      root@ocpjump:~# ./openshift-install version
      ./openshift-install 4.16.9
      built from commit deb993e72a920fb1c68a578d1b4c598071fefea4
      release image quay.io/openshift-release-dev/ocp-release@sha256:115bba6836b9feffb81ad9101791619edd5f19d333580b7f62bd6721eeda82d2
      release architecture amd64
      root@ocpjump:~# ./openshift-install create manifests --dir installer_linux
      INFO Credentials loaded from file "/root/.azure/osServicePrincipal.json"
      INFO Consuming Install Config from target directory
      INFO Manifests created in: installer_linux/manifests and installer_linux/openshift
      root@ocpjump:~# ./openshift-install create cluster --dir installer_linux
      INFO Consuming Common Manifests from target directory
      INFO Consuming Master Machines from target directory
      INFO Consuming Worker Machines from target directory
      INFO Consuming OpenShift Install (Manifests) from target directory
      INFO Consuming Openshift Manifests from target directory
      INFO Credentials loaded from file "/root/.azure/osServicePrincipal.json"
      FATAL failed to fetch Cluster Infrastructure Variables: failed to fetch dependency of "Cluster Infrastructure Variables": failed to generate asset "Platform Provisioning Check": platform.azure.resourceGroupName: Invalid value: "RG-AS-OCPNP": resource group must be empty but it has 7 resources like /subscriptions/6e700e21-5667-435d-8f78-f421bedbe936/resourceGroups/RG-AS-OCPNP/providers/Microsoft.Network/virtualNetworks/VN-AS-OCPNP, /subscriptions/6e700e21-5667-435d-8f78-f421bedbe936/resourceGroups/RG-AS-OCPNP/providers/Microsoft.Network/networkSecurityGroups/NSG-AS-OCPNP ...

      Perhaps existing vnet and subnets in a private install isn't a very widely used deployment mechanism in Azure given this bug seems present for a while now. This seems peculiar as this deployment type exhibits the best security and customisation with on-premise cloud interconnects.

      Version-Release number of selected component (if applicable):

      v4.17.Z 

      Actual results:

          The private IPI cluster installation fails for OpenShift v4.17 on Azure

      Expected results:

          Private IPI OpenShift v4.17 cluster should be installed successfully on Azure

              rna-afk Aditya Narayanaswamy
              rhn-support-mmarkand Mridul Markandey
              None
              None
              Gaoyun Pei Gaoyun Pei
              None
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

                Created:
                Updated: