Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-11609

Storage e2e tests are using wrong endpoint on nodes for AWS Local Zones

XMLWordPrintable

    • No
    • False
    • Hide

      None

      Show
      None

      Description of problem:

      The list of e2e tests[1] are failing on openshift/conformance suite (openshift-e2e-test CI test) when running on clusters with AWS Local Zone nodes ('edge' compute nodes). This feature was introduced on 4.12, supporting when installing a cluster in existing VPC, and soon the installer[2] will fully automate the cluster installation, creating network components.
      
      The periodic[3] is always failing, and the exclusion list[4] needed to be added to remove false positives failures.
      
      We quickly discussed on this[4] thread, and the region 'extraction' is incorrect on e2e, and seems to be missed in some old PRs. The correct way uses regex instead of removing the last character to get the region, and mount the service endpoint. It was also discussed that the code have been removed from the upstream, but I am unsure if it's related with the in-tree cloud-provider cleanup - was the bug just moved to the aws cloud-provider repo too?
      
      Considering this is a bug on e2e, and AWS launched Local Zones when kube 1.25 is already exists, it would be nice to have a fix on e2e for 1.25+, and in the out-of-tree if it's there.
      
      [1] test failures: https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/openshift_release/36911/rehearse-36911-periodic-ci-openshift-release-master-nightly-4.13-e2e-aws-ovn-localzone-byo-vpc/1645452305094414336
      ~~~
      {  fail [k8s.io/kubernetes@v1.26.1/test/e2e/storage/drivers/in_tree.go:1604]: Apr 10 16:41:32.262: RequestError: send request failed
      caused by: Post "https://ec2.us-east-1-atl-1.amazonaws.com/": dial tcp: lookup ec2.us-east-1-atl-1.amazonaws.com on 172.30.0.10:53: no such host
      Ginkgo exit error 1: exit with code 1}
      ~~~
      [1B] test list: https://github.com/openshift/release/pull/36911/files#diff-36cb2108ae3b93ad7d4446a8ae0813bca3a127be3cb446149210be22a343a23fR13
      ~~~
      \[sig-network\]\[Feature:tap\] should create a pod with a tap interface \[apigroup:k8s.cni.cncf.io\] \[Suite:openshift/conformance/parallel\]
      \|\[sig-storage\] In-tree Volumes \[Driver: aws\] \[Testpattern: Inline-volume (default fs)\] volumes should allow exec of files on the volume \[Skipped:NoOptionalCapabilities\] \[Suite:openshift/conformance/parallel\] \[Suite:k8s\]
      \|\[sig-storage\] In-tree Volumes \[Driver: aws\] \[Testpattern: Inline-volume (default fs)\] volumes should store data \[Skipped:NoOptionalCapabilities\] \[Suite:openshift/conformance/parallel\] \[Suite:k8s\]
      \|\[sig-storage\] In-tree Volumes \[Driver: aws\] \[Testpattern: Inline-volume (ext4)\] volumes should allow exec of files on the volume \[Skipped:NoOptionalCapabilities\] \[Suite:openshift/conformance/parallel\] \[Suite:k8s\]
      \|\[sig-storage\] In-tree Volumes \[Driver: aws\] \[Testpattern: Inline-volume (ext4)\] volumes should store data \[Skipped:NoOptionalCapabilities\] \[Suite:openshift/conformance/parallel\] \[Suite:k8s\]
      \|\[sig-storage\] In-tree Volumes \[Driver: aws\] \[Testpattern: Pre-provisioned PV (block volmode)\] volumeMode should not mount / map unused volumes in a pod \[LinuxOnly\] \[Skipped:NoOptionalCapabilities\] \[Suite:openshift/conformance/parallel\] \[Suite:k8s\]
      \|\[sig-storage\] In-tree Volumes \[Driver: aws\] \[Testpattern: Pre-provisioned PV (block volmode)\] volumes should store data \[Skipped:NoOptionalCapabilities\] \[Suite:openshift/conformance/parallel\] \[Suite:k8s\]
      \|\[sig-storage\] In-tree Volumes \[Driver: aws\] \[Testpattern: Pre-provisioned PV (default fs)\] volumes should allow exec of files on the volume \[Skipped:NoOptionalCapabilities\] \[Suite:openshift/conformance/parallel\] \[Suite:k8s\]
      \|\[sig-storage\] In-tree Volumes \[Driver: aws\] \[Testpattern: Pre-provisioned PV (default fs)\] volumes should store data \[Skipped:NoOptionalCapabilities\] \[Suite:openshift/conformance/parallel\] \[Suite:k8s\]
      \|\[sig-storage\] In-tree Volumes \[Driver: aws\] \[Testpattern: Pre-provisioned PV (ext4)\] volumes should allow exec of files on the volume \[Skipped:NoOptionalCapabilities\] \[Suite:openshift/conformance/parallel\] \[Suite:k8s\]
      \|\[sig-storage\] In-tree Volumes \[Driver: aws\] \[Testpattern: Pre-provisioned PV (ext4)\] volumes should store data \[Skipped:NoOptionalCapabilities\] \[Suite:openshift/conformance/parallel\] \[Suite:k8s\]
      \|\[sig-storage\] In-tree Volumes \[Driver: aws\] \[Testpattern: Pre-provisioned PV (filesystem volmode)\] volumeMode should not mount / map unused volumes in a pod \[LinuxOnly\] \[Skipped:NoOptionalCapabilities\] \[Suite:openshift/conformance/parallel\] \[Suite:k8s\]
      ~~~
      
      [2] https://issues.redhat.com/browse/SPLAT-657
      
      [3] periodic-ci-openshift-release-master-nightly-4.13-e2e-aws-ovn-localzone-byo-vpc
      
      [4] https://issues.redhat.com/browse/SPLAT-728?focusedId=22022424&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-22022424
      

      Version-Release number of selected component (if applicable):

      4.13.0-0.nightly-2023-04-09-095122

      How reproducible:

      Always

      Steps to Reproduce:

      1. Create VPC and Local Zone subnets [A]
      2. Create a cluster installing in existing VPC on AWS (adding regular and Local Zone subnet IDs) [A]
      3. run openshift/conformance test suite
      
      [A] https://docs.openshift.com/container-platform/4.12/installing/installing_aws/installing-aws-localzone.html 

      Actual results:

      - [sig-storage] In-tree Volumes [Driver: aws] [Testpattern: Pre-provisioned PV (ext4)] volumes should allow exec of files on the volume [Skipped:NoOptionalCapabilities] [Suite:openshift/conformance/parallel] [Suite:k8s]
      ~~~
      {  fail [k8s.io/kubernetes@v1.26.1/test/e2e/storage/drivers/in_tree.go:1604]: Apr 10 16:41:32.262: RequestError: send request failed
      caused by: Post "https://ec2.us-east-1-atl-1.amazonaws.com/": dial tcp: lookup ec2.us-east-1-atl-1.amazonaws.com on 172.30.0.10:53: no such host
      Ginkgo exit error 1: exit with code 1}
      ~~~
      
      - [sig-storage] In-tree Volumes [Driver: aws] [Testpattern: Pre-provisioned PV (ext4)] volumes should store data [Skipped:NoOptionalCapabilities] [Suite:openshift/conformance/parallel] [Suite:k8s]
      
      ~~~
      {  fail [k8s.io/kubernetes@v1.26.1/test/e2e/storage/drivers/in_tree.go:1604]: Apr 10 16:42:15.868: RequestError: send request failed
      caused by: Post "https://ec2.us-east-1-atl-1.amazonaws.com/": dial tcp: lookup ec2.us-east-1-atl-1.amazonaws.com on 172.30.0.10:53: no such host
      Ginkgo exit error 1: exit with code 1}
      ~~~
      (...)
      https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/openshift_release/36911/rehearse-36911-periodic-ci-openshift-release-master-nightly-4.13-e2e-aws-ovn-localzone-byo-vpc/1645452305094414336

      Expected results:

      The endpoint used should be the from the region (https://ec2.us-east-1.amazonaws.com/) not extracted from the Zone Name us-east-1-atl-1a

      Additional info:

       

            hekumar@redhat.com Hemant Kumar
            rhn-support-mrbraga Marco Braga
            Penghao Wang Penghao Wang
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

              Created:
              Updated: