Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-3343

[vsphere] installation fails when setting user-defined folder in failure domain

    • Moderate
    • None
    • False
    • Hide

      None

      Show
      None
    • Hide
      Previously, when installing a cluster on VMWare vSphere, the installation could fail if the user specified a user-defined folder in the `failureDomain` section of the `install-config.yaml` file. With this update, the installation program correctly validates user-defined folders in the `failureDomain` section of the `install-config.yaml` file. (link:https://issues.redhat.com/browse/OCPBUGS-3343[*OCPBUGS-3343*])
      Show
      Previously, when installing a cluster on VMWare vSphere, the installation could fail if the user specified a user-defined folder in the `failureDomain` section of the `install-config.yaml` file. With this update, the installation program correctly validates user-defined folders in the `failureDomain` section of the `install-config.yaml` file. (link: https://issues.redhat.com/browse/OCPBUGS-3343 [* OCPBUGS-3343 *])
    • Done

      This is a clone of issue OCPBUGS-1627. The following is the description of the original issue:
      โ€”
      Description of problem:
      Two issues when setting user-defined folder in failureDomain.
      1. installer get error when setting folder as a path of user-defined folder in failureDomain.

      failureDomains setting in install-config.yaml:

          failureDomains:
          - name: us-east-1
            region: us-east
            zone: us-east-1a
            server: xxx
            topology:
              datacenter: IBMCloud
              computeCluster: /IBMCloud/host/vcs-mdcnc-workload-1
              networks:
              - multi-zone-qe-dev-1
              datastore: multi-zone-ds-1
              folder: /IBMCloud/vm/qe-jima
          - name: us-east-2
            region: us-east
            zone: us-east-2a
            server: xxx
            topology:
              datacenter: IBMCloud
              computeCluster: /IBMCloud/host/vcs-mdcnc-workload-2
              networks:
              - multi-zone-qe-dev-1
              datastore: multi-zone-ds-2
              folder: /IBMCloud/vm/qe-jima
          - name: us-east-3
            region: us-east
            zone: us-east-3a
            server: xxx
            topology:
              datacenter: IBMCloud
              computeCluster: /IBMCloud/host/vcs-mdcnc-workload-3
              networks:
              - multi-zone-qe-dev-1
              datastore: workload_share_vcsmdcncworkload3_joYiR
              folder: /IBMCloud/vm/qe-jima
          - name: us-west-1
            region: us-west
            zone: us-west-1a
            server: ibmvcenter.vmc-ci.devcluster.openshift.com
            topology:
              datacenter: datacenter-2
              computeCluster: /datacenter-2/host/vcs-mdcnc-workload-4
              networks:
              - multi-zone-qe-dev-1
              datastore: workload_share_vcsmdcncworkload3_joYiR

      Error message in terraform after completing ova image import:

      DEBUG vsphereprivate_import_ova.import[0]: Still creating... [1m40s elapsed] 
      DEBUG vsphereprivate_import_ova.import[3]: Creation complete after 1m40s [id=vm-367860] 
      DEBUG vsphereprivate_import_ova.import[1]: Creation complete after 1m49s [id=vm-367863] 
      DEBUG vsphereprivate_import_ova.import[0]: Still creating... [1m50s elapsed] 
      DEBUG vsphereprivate_import_ova.import[2]: Still creating... [1m50s elapsed] 
      DEBUG vsphereprivate_import_ova.import[2]: Still creating... [2m0s elapsed] 
      DEBUG vsphereprivate_import_ova.import[0]: Still creating... [2m0s elapsed] 
      DEBUG vsphereprivate_import_ova.import[2]: Creation complete after 2m2s [id=vm-367862] 
      DEBUG vsphereprivate_import_ova.import[0]: Still creating... [2m10s elapsed] 
      DEBUG vsphereprivate_import_ova.import[0]: Creation complete after 2m20s [id=vm-367861] 
      DEBUG data.vsphere_virtual_machine.template[0]: Reading... 
      DEBUG data.vsphere_virtual_machine.template[3]: Reading... 
      DEBUG data.vsphere_virtual_machine.template[1]: Reading... 
      DEBUG data.vsphere_virtual_machine.template[2]: Reading... 
      DEBUG data.vsphere_virtual_machine.template[3]: Read complete after 1s [id=42054e33-85d6-e310-7f4f-4c52a73f8338] 
      DEBUG data.vsphere_virtual_machine.template[1]: Read complete after 2s [id=42053e17-cc74-7c89-f5d1-059c9030ecc7] 
      DEBUG data.vsphere_virtual_machine.template[2]: Read complete after 2s [id=4205019f-26d8-f9b4-ac0c-2c073fd70b35] 
      DEBUG data.vsphere_virtual_machine.template[0]: Read complete after 2s [id=4205eaf2-c727-c647-ad44-bd9ad7023c56] 
      ERROR                                              
      ERROR Error: error trying to determine parent targetFolder: folder '/IBMCloud/vm//IBMCloud/vm' not found 
      ERROR                                              
      ERROR   with vsphere_folder.folder["IBMCloud-/IBMCloud/vm/qe-jima"], 
      ERROR   on main.tf line 61, in resource "vsphere_folder" "folder": 
      ERROR   61: resource "vsphere_folder" "folder" {   
      ERROR                                              
      ERROR failed to fetch Cluster: failed to generate asset "Cluster": failure applying terraform for "pre-bootstrap" stage: failed to create cluster: failed to apply Terraform: exit status 1 
      ERROR                                              
      ERROR Error: error trying to determine parent targetFolder: folder '/IBMCloud/vm//IBMCloud/vm' not found 
      ERROR                                              
      ERROR   with vsphere_folder.folder["IBMCloud-/IBMCloud/vm/qe-jima"], 
      ERROR   on main.tf line 61, in resource "vsphere_folder" "folder": 
      ERROR   61: resource "vsphere_folder" "folder" {   
      ERROR                                              
      ERROR   

      2.  installer get panic error when setting folder as user-defined folder name in failure domains.

      failure domain in install-config.yaml

          failureDomains:
          - name: us-east-1
            region: us-east
            zone: us-east-1a
            server: xxx
            topology:
              datacenter: IBMCloud
              computeCluster: /IBMCloud/host/vcs-mdcnc-workload-1
              networks:
              - multi-zone-qe-dev-1
              datastore: multi-zone-ds-1
              folder: qe-jima
          - name: us-east-2
            region: us-east
            zone: us-east-2a
            server: xxx
            topology:
              datacenter: IBMCloud
              computeCluster: /IBMCloud/host/vcs-mdcnc-workload-2
              networks:
              - multi-zone-qe-dev-1
              datastore: multi-zone-ds-2
              folder: qe-jima
          - name: us-east-3
            region: us-east
            zone: us-east-3a
            server: xxx
            topology:
              datacenter: IBMCloud
              computeCluster: /IBMCloud/host/vcs-mdcnc-workload-3
              networks:
              - multi-zone-qe-dev-1
              datastore: workload_share_vcsmdcncworkload3_joYiR
              folder: qe-jima
          - name: us-west-1
            region: us-west
            zone: us-west-1a
            server: xxx
            topology:
              datacenter: datacenter-2
              computeCluster: /datacenter-2/host/vcs-mdcnc-workload-4
              networks:
              - multi-zone-qe-dev-1
              datastore: workload_share_vcsmdcncworkload3_joYiR                                  

      panic error message in installer:

      INFO Obtaining RHCOS image file from 'https://rhcos.mirror.openshift.com/art/storage/releases/rhcos-4.12/412.86.202208101039-0/x86_64/rhcos-412.86.202208101039-0-vmware.x86_64.ova?sha256=' 
      INFO The file was found in cache: /home/user/.cache/openshift-installer/image_cache/rhcos-412.86.202208101039-0-vmware.x86_64.ova. Reusing... 
      panic: runtime error: index out of range [1] with length 1goroutine 1 [running]:
      github.com/openshift/installer/pkg/tfvars/vsphere.TFVars({{0xc0013bd068, 0x3, 0x3}, {0xc000b11dd0, 0x12}, {0xc000b11db8, 0x14}, {0xc000b11d28, 0x14}, {0xc000fe8fc0, ...}, ...})
          /go/src/github.com/openshift/installer/pkg/tfvars/vsphere/vsphere.go:79 +0x61b
      github.com/openshift/installer/pkg/asset/cluster.(*TerraformVariables).Generate(0x1d1ed360, 0x5?)
          /go/src/github.com/openshift/installer/pkg/asset/cluster/tfvars.go:847 +0x4798
       

      Based on explanation of field folder, looks like folder name should be ok. If it is not allowed to use folder name, need to validate the folder and update explain.

       

      sh-4.4$ ./openshift-install explain installconfig.platform.vsphere.failureDomains.topology.folder
      KIND:     InstallConfig
      VERSION:  v1RESOURCE: <string>
        folder is the name or inventory path of the folder in which the virtual machine is created/located.
       

       

       

      Version-Release number of selected component (if applicable):

      4.12.0-0.nightly-2022-09-20-095559

      How reproducible:

      always

      Steps to Reproduce:

      see description
      

      Actual results:

      installation has errors when set user-defined folder

      Expected results:

      installation is successful when set user-defined folder

      Additional info:

       

            [OCPBUGS-3343] [vsphere] installation fails when setting user-defined folder in failure domain

            I don't think this is a "solution". The platform.vsphere.folder parameter is deprecated.

            So one MUST be able to use platform.vsphere.failuredomain[].topology.folder instead to achieve the equivalent functionality.

            Kai-Uwe Rommel (Inactive) added a comment - I don't think this is a "solution". The platform.vsphere.folder parameter is deprecated. So one MUST be able to use platform.vsphere.failuredomain[].topology.folder instead to achieve the equivalent functionality.

            Errata Tool added a comment -

            Since the problem described in this issue should be resolved in a recent advisory, it has been closed.

            For information on the advisory, and where to find the updated files, follow the link below.

            If the solution does not work for you, open a new bug report.
            https://access.redhat.com/errata/RHSA-2022:7399

            Errata Tool added a comment - Since the problem described in this issue should be resolved in a recent advisory, it has been closed. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:7399

            Shang Gao added a comment -

            QE testing pass on OCP 4.12, thanks.

            1. set field folder in topology as folder name (without path), get error that full path is required.
            sh-4.4$ ./openshift-install create manifests --dir cluster
            ERROR failed to fetch Metadata: failed to load asset "Install Config": failed to create install config: invalid "install-config.yaml" file: platform.vsphere.folder: Invalid value: "folder-sgao": folder must be absolute path: expected prefix /IBMCloud/vm/

            2. set platform.vsphere.folder as inventory path, no vcenters defined, no folders defined in topology of fd. Get error in fd with another datacenter.
            sh-4.4$ ./openshift-install create manifests --dir cluster
            ERROR failed to fetch Master Machines: failed to load asset "Install Config": failed to create install config: invalid "install-config.yaml" file: platform.vsphere.folder: Invalid value: "/IBMCloud/vm/sgao-folder": folder must be in datacenter datacenter-2; please define it in a topology

            3. set platform.vsphere.folder as inventory path, no vcenters defined, no folders defined in topology of fd with same datacenter as default, define invalid folder in topology of fd with different datacenter, get error that folder defined does not exist in the correct datacenter.
            sh-4.4$ ./openshift-install create manifests --dir cluster
            ERROR failed to fetch Master Machines: failed to load asset "Install Config": failed to create install config: invalid "install-config.yaml" file: platform.vsphere.failureDomains.topology.folder: Invalid value: "/fake2/vm/sgao-folder": the folder defined does not exist in the correct datacenter

            4. set platform.vsphere.folder as inventory path, no vcenters defined, set different folder in fd topology

            sh-4.4$ govc find . -type f | grep sgao
            /IBMCloud/vm/sgao-folder-1
            /IBMCloud/vm/sgao-folder
            /datacenter-2/vm/sgao-folder-2

            platform:
            vsphere:
            apiVIPs:

            • 192.168.121.8
              cluster: vcs-mdcnc-workload-1
              datacenter: IBMCloud
              defaultDatastore: mdcnc-ds-shared
              ingressVIPs:
            • 192.168.121.9
              network: multi-zone-qe-dev-1
              password: eilio4wohr9Cae+xahp8
              username: jima@vsphere.local
              vCenter: ibmvcenter.vmc-ci.devcluster.openshift.com
              folder: /IBMCloud/vm/sgao-folder
              defaultMachinePlatform:
              zones:
            • "us-east-1"
            • "us-east-2"
            • "us-west-1"
              failureDomains:
            • name: us-east-1
              region: us-east
              zone: us-east-1a
              server: ibmvcenter.vmc-ci.devcluster.openshift.com
              topology:
              datacenter: IBMCloud
              computeCluster: /IBMCloud/host/vcs-mdcnc-workload-1
              networks:
            • multi-zone-qe-dev-1
              datastore: mdcnc-ds-1
              folder: /IBMCloud/vm/sgao-folder-1
            • name: us-east-2
              region: us-east
              zone: us-east-2a
              server: ibmvcenter.vmc-ci.devcluster.openshift.com
              topology:
              datacenter: IBMCloud
              computeCluster: /IBMCloud/host/vcs-mdcnc-workload-2
              networks:
            • multi-zone-qe-dev-1
              datastore: mdcnc-ds-shared
            • name: us-east-3
              region: us-east
              zone: us-east-3a
              server: ibmvcenter.vmc-ci.devcluster.openshift.com
              topology:
              datacenter: IBMCloud
              computeCluster: /IBMCloud/host/vcs-mdcnc-workload-3
              networks:
            • multi-zone-qe-dev-1
              datastore: mdcnc-ds-shared
            • name: us-west-1
              region: us-west
              zone: us-west-1a
              server: ibmvcenter.vmc-ci.devcluster.openshift.com
              topology:
              datacenter: datacenter-2
              computeCluster: /datacenter-2/host/vcs-mdcnc-workload-4
              networks:
            • multi-zone-qe-dev-1
              datastore: mdcnc-ds-shared
              folder: /datacenter-2/vm/sgao-folder-2

            installation is successful, ova images, control plane nodes and worker nodes are resided into expected topology with expected folder.

            5. explain of folder in topology is updated as expected.
            sh-4.4$ ./openshift-install explain installconfig.platform.vsphere.failureDomains.topology.folder
            KIND: InstallConfig
            VERSION: v1

            RESOURCE: <string>
            folder is the inventory path of the folder in which the virtual machine is created/located.

            6, with sub folder in both platform.vsphere.folder and fd topology, no vcenters defined
            installation is successful, ova images, control plane nodes and worker nodes are resided into expected topology with expected folder.

            Shang Gao added a comment - QE testing pass on OCP 4.12, thanks. 1. set field folder in topology as folder name (without path), get error that full path is required. sh-4.4$ ./openshift-install create manifests --dir cluster ERROR failed to fetch Metadata: failed to load asset "Install Config": failed to create install config: invalid "install-config.yaml" file: platform.vsphere.folder: Invalid value: "folder-sgao": folder must be absolute path: expected prefix /IBMCloud/vm/ 2. set platform.vsphere.folder as inventory path, no vcenters defined, no folders defined in topology of fd. Get error in fd with another datacenter. sh-4.4$ ./openshift-install create manifests --dir cluster ERROR failed to fetch Master Machines: failed to load asset "Install Config": failed to create install config: invalid "install-config.yaml" file: platform.vsphere.folder: Invalid value: "/IBMCloud/vm/sgao-folder": folder must be in datacenter datacenter-2; please define it in a topology 3. set platform.vsphere.folder as inventory path, no vcenters defined, no folders defined in topology of fd with same datacenter as default, define invalid folder in topology of fd with different datacenter, get error that folder defined does not exist in the correct datacenter. sh-4.4$ ./openshift-install create manifests --dir cluster ERROR failed to fetch Master Machines: failed to load asset "Install Config": failed to create install config: invalid "install-config.yaml" file: platform.vsphere.failureDomains.topology.folder: Invalid value: "/fake2/vm/sgao-folder": the folder defined does not exist in the correct datacenter 4. set platform.vsphere.folder as inventory path, no vcenters defined, set different folder in fd topology sh-4.4$ govc find . -type f | grep sgao /IBMCloud/vm/sgao-folder-1 /IBMCloud/vm/sgao-folder /datacenter-2/vm/sgao-folder-2 platform: vsphere: apiVIPs: 192.168.121.8 cluster: vcs-mdcnc-workload-1 datacenter: IBMCloud defaultDatastore: mdcnc-ds-shared ingressVIPs: 192.168.121.9 network: multi-zone-qe-dev-1 password: eilio4wohr9Cae+xahp8 username: jima@vsphere.local vCenter: ibmvcenter.vmc-ci.devcluster.openshift.com folder: /IBMCloud/vm/sgao-folder defaultMachinePlatform: zones: "us-east-1" "us-east-2" "us-west-1" failureDomains: name: us-east-1 region: us-east zone: us-east-1a server: ibmvcenter.vmc-ci.devcluster.openshift.com topology: datacenter: IBMCloud computeCluster: /IBMCloud/host/vcs-mdcnc-workload-1 networks: multi-zone-qe-dev-1 datastore: mdcnc-ds-1 folder: /IBMCloud/vm/sgao-folder-1 name: us-east-2 region: us-east zone: us-east-2a server: ibmvcenter.vmc-ci.devcluster.openshift.com topology: datacenter: IBMCloud computeCluster: /IBMCloud/host/vcs-mdcnc-workload-2 networks: multi-zone-qe-dev-1 datastore: mdcnc-ds-shared name: us-east-3 region: us-east zone: us-east-3a server: ibmvcenter.vmc-ci.devcluster.openshift.com topology: datacenter: IBMCloud computeCluster: /IBMCloud/host/vcs-mdcnc-workload-3 networks: multi-zone-qe-dev-1 datastore: mdcnc-ds-shared name: us-west-1 region: us-west zone: us-west-1a server: ibmvcenter.vmc-ci.devcluster.openshift.com topology: datacenter: datacenter-2 computeCluster: /datacenter-2/host/vcs-mdcnc-workload-4 networks: multi-zone-qe-dev-1 datastore: mdcnc-ds-shared folder: /datacenter-2/vm/sgao-folder-2 installation is successful, ova images, control plane nodes and worker nodes are resided into expected topology with expected folder. 5. explain of folder in topology is updated as expected. sh-4.4$ ./openshift-install explain installconfig.platform.vsphere.failureDomains.topology.folder KIND: InstallConfig VERSION: v1 RESOURCE: <string> folder is the inventory path of the folder in which the virtual machine is created/located. 6, with sub folder in both platform.vsphere.folder and fd topology, no vcenters defined installation is successful, ova images, control plane nodes and worker nodes are resided into expected topology with expected folder.

            Mike Pytlak (Inactive) added a comment - - edited

            Hi jcallen@redhat.com I was working through documenting installation bug fixes for the 4.12 release notes and came across this ticket. Not sure if this fix should be listed in the release notes as a bug fix. Please update the ticket accordingly. You can find details on how to specify release note text and how to indicate if the issue doesn't need to be added to the release notes here [1]. The "Doc Text for Bugs" section is about 2/3 down the page.

            [1] https://source.redhat.com/groups/public/atomicopenshift/atomicopenshift_wiki/openshift_bugzilla_process

            Mike Pytlak (Inactive) added a comment - - edited Hi jcallen@redhat.com I was working through documenting installation bug fixes for the 4.12 release notes and came across this ticket. Not sure if this fix should be listed in the release notes as a bug fix. Please update the ticket accordingly. You can find details on how to specify release note text and how to indicate if the issue doesn't need to be added to the release notes here [1] . The "Doc Text for Bugs" section is about 2/3 down the page. [1] https://source.redhat.com/groups/public/atomicopenshift/atomicopenshift_wiki/openshift_bugzilla_process

              Unassigned Unassigned
              openshift-crt-jira-prow OpenShift Prow Bot
              Shang Gao Shang Gao
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

                Created:
                Updated:
                Resolved: