Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-36773

4.16: in-tree volume tests are failing in UPI multi-zone jobs

    • Icon: Bug Bug
    • Resolution: Done-Errata
    • Icon: Normal Normal
    • 4.16.z
    • 4.16.z
    • Storage / Operators
    • None
    • Moderate
    • None
    • False
    • Hide

      None

      Show
      None
    • Release Note Not Required
    • In Progress

      Description of problem:

         The following tests are failing:
      
      - [sig-storage] In-tree Volumes [Driver: vsphere] [Testpattern: Inline-volume (ext4)] volumes should allow exec of files on the volume [Suite:openshift/conformance/parallel] [Suite:k8s] 
      - [sig-storage] In-tree Volumes [Driver: vsphere] [Testpattern: Pre-provisioned PV (block volmode)] volumes should store data [Suite:openshift/conformance/parallel] [Suite:k8s] 
      - [sig-storage] In-tree Volumes [Driver: vsphere] [Testpattern: Pre-provisioned PV (ext4)] volumes should store data [Suite:openshift/conformance/parallel] [Suite:k8s]
      
      Job: https://prow.ci.openshift.org/view/gs/test-platform-results/logs/periodic-ci-openshift-release-master-nightly-4.16-e2e-vsphere-ovn-upi-zones/1808605014239744000

      Version-Release number of selected component (if applicable):

          4.16 nightly

      How reproducible:

          consistently: https://prow.ci.openshift.org/job-history/gs/test-platform-results/logs/periodic-ci-openshift-release-master-nightly-4.16-e2e-vsphere-ovn-upi-zones

      Steps to Reproduce:

          1.
          2.
          3.
          

      Actual results:

          

      Expected results:

          

      Additional info:

          

            [OCPBUGS-36773] 4.16: in-tree volume tests are failing in UPI multi-zone jobs

            Errata Tool added a comment -

            Since the problem described in this issue should be resolved in a recent advisory, it has been closed.

            For information on the advisory (OpenShift Container Platform 4.16.11 bug fix update), and where to find the updated files, follow the link below.

            If the solution does not work for you, open a new bug report.
            https://access.redhat.com/errata/RHBA-2024:6401

            Errata Tool added a comment - Since the problem described in this issue should be resolved in a recent advisory, it has been closed. For information on the advisory (OpenShift Container Platform 4.16.11 bug fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2024:6401

            Wei Duan added a comment -

            Wei Duan added a comment - From the test history, it seems these cases were not executed from Jul 18.  https://prow.ci.openshift.org/job-history/gs/test-platform-results/logs/periodic-ci-openshift-release-master-nightly-4.16-e2e-vsphere-ovn-upi-zones

            Only the pre-provisioned / in-line tests are failing, because:

            1. 4.16 API server does not have a valid vSphere cloud config (--cloud-config="") and the admission plugin initializes an empty one here
            2. The empty cloud config has topology support disabled here.

            The tests were removed upstream from newer releases, I'll fix 4.16 only.

            Jan Safranek added a comment - Only the pre-provisioned / in-line tests are failing, because: 4.16 API server does not have a valid vSphere cloud config (--cloud-config="") and the admission plugin initializes an empty one here The empty cloud config has topology support disabled here . The tests were removed upstream from newer releases, I'll fix 4.16 only.

            Jan Safranek added a comment - - edited

            I checked periodic-ci-openshift-release-master-nightly-4.17-e2e-vsphere-ovn-zones and in-tree PVs get correct topology:

                 nodeAffinity:
                  required:
                    nodeSelectorTerms:
                    - matchExpressions:
                      - key: topology.csi.vmware.com/openshift-zone
                        operator: In
                        values:
                        - us-west-1a
                      - key: topology.csi.vmware.com/openshift-region
                        operator: In
                        values:
                        - us-west
                storageClassName: thin
                vsphereVolume:
                  fsType: ext4
                  volumePath: '[vsanDatastore-1] 03814766-f2bf-3580-2c91-3cfdfe998a50/_0098/_0016/_004a/a906d86a95d545b9a101b63d1f68ca4c.vmdk'

            (created a thin storage class + PVC + Pod manually)

            I rehearsed the tests in https://prow.ci.openshift.org/view/gs/test-platform-results/pr-logs/pull/openshift_release/54258/rehearse-54258-periodic-ci-openshift-release-master-nightly-4.17-e2e-vsphere-ovn-zones/1811689985233915904, dumping PVs/PVCs and StorageClasses into https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/test-platform-results/pr-logs/pull/openshift_release/54258/rehearse-54258-periodic-ci-openshift-release-master-nightly-4.17-e2e-vsphere-ovn-zones/1811689985233915904/artifacts/e2e-vsphere-ovn-zones/openshift-e2e-test/artifacts/, all tests succeeded.

            I'll try again with 4.16 (oops!)

            Jan Safranek added a comment - - edited I checked periodic-ci-openshift-release-master-nightly-4.17-e2e-vsphere-ovn-zones and in-tree PVs get correct topology:     nodeAffinity:       required:         nodeSelectorTerms:         - matchExpressions:           - key: topology.csi.vmware.com/openshift-zone             operator : In             values:             - us-west-1a           - key: topology.csi.vmware.com/openshift-region             operator : In             values:             - us-west     storageClassName: thin     vsphereVolume:       fsType: ext4       volumePath: '[vsanDatastore-1] 03814766-f2bf-3580-2c91-3cfdfe998a50/_0098/_0016/_004a/a906d86a95d545b9a101b63d1f68ca4c.vmdk' (created a thin storage class + PVC + Pod manually) I rehearsed the tests in https://prow.ci.openshift.org/view/gs/test-platform-results/pr-logs/pull/openshift_release/54258/rehearse-54258-periodic-ci-openshift-release-master-nightly-4.17-e2e-vsphere-ovn-zones/1811689985233915904, dumping PVs/PVCs and StorageClasses into https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/test-platform-results/pr-logs/pull/openshift_release/54258/rehearse-54258-periodic-ci-openshift-release-master-nightly-4.17-e2e-vsphere-ovn-zones/1811689985233915904/artifacts/e2e-vsphere-ovn-zones/openshift-e2e-test/artifacts/, all tests succeeded. I'll try again with 4.16 (oops!)

            Theory: PV label admission is not working and scheduler tries to use them on node that is in a different "zone" than the PV.

            Jan Safranek added a comment - Theory: PV label admission is not working and scheduler tries to use them on node that is in a different "zone" than the PV.

            The tests failed because the CSI driver was not able to attach volumes to a node with "VSLM task failed":

            Jul  3 21:50:26.656: INFO: At 2024-07-03 21:45:25 +0000 UTC - event for vsphere-injector: {attachdetach-controller } FailedAttachVolume: AttachVolume.Attach failed for volume "vsphere-nkjs7" : rpc error: code = Internal desc = failed to attach disk: "15d933ab-058d-411f-b8b6-caa24f59a008" with node: "420590a0-3010-65bc-2d9b-6f54de851f9e" err failed to attach cns volume: "15d933ab-058d-411f-b8b6-caa24f59a008" to node vm: "VirtualMachine:vm-989914 [VirtualCenterHost: vcs8e-vc.ocp2.dev.cluster.com, UUID: 420590a0-3010-65bc-2d9b-6f54de851f9e, Datacenter: Datacenter [Datacenter: Datacenter:datacenter-1001, VirtualCenterHost: vcs8e-vc.ocp2.dev.cluster.com]]". fault: "(*types.LocalizedMethodFault)(0xc000ee09e0)({\n DynamicData: (types.DynamicData) {\n },\n Fault: (types.CnsFault) {\n  BaseMethodFault: (types.BaseMethodFault) <nil>,\n  Reason: (string) (len=16) \"VSLM task failed\"\n },\n LocalizedMessage: (string) (len=32) \"CnsFault error: VSLM task failed\"\n})\n". opId: "0e13eecf"
             

            Jan Safranek added a comment - The tests failed because the CSI driver was not able to attach volumes to a node with "VSLM task failed": Jul 3 21:50:26.656: INFO: At 2024-07-03 21:45:25 +0000 UTC - event for vsphere-injector: {attachdetach-controller } FailedAttachVolume: AttachVolume.Attach failed for volume "vsphere-nkjs7" : rpc error: code = Internal desc = failed to attach disk: "15d933ab-058d-411f-b8b6-caa24f59a008" with node: "420590a0-3010-65bc-2d9b-6f54de851f9e" err failed to attach cns volume: "15d933ab-058d-411f-b8b6-caa24f59a008" to node vm: "VirtualMachine:vm-989914 [VirtualCenterHost: vcs8e-vc.ocp2.dev.cluster.com, UUID: 420590a0-3010-65bc-2d9b-6f54de851f9e, Datacenter: Datacenter [Datacenter: Datacenter:datacenter-1001, VirtualCenterHost: vcs8e-vc.ocp2.dev.cluster.com]]" . fault: "(*types.LocalizedMethodFault)(0xc000ee09e0)({\n DynamicData: (types.DynamicData) {\n },\n Fault: (types.CnsFault) {\n BaseMethodFault: (types.BaseMethodFault) <nil>,\n Reason: (string) (len=16) \" VSLM task failed\ "\n },\n LocalizedMessage: (string) (len=32) \" CnsFault error: VSLM task failed\ "\n})\n" . opId: "0e13eecf"

              rhn-engineering-jsafrane Jan Safranek
              rhn-support-rvanderp Richard Vanderpool
              Wei Duan Wei Duan
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

                Created:
                Updated:
                Resolved: