Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-1846

Facing issue while configuring egress IP pool in OCP cluster which uses STS

XMLWordPrintable

    • Critical
    • None
    • SDN Sprint 225
    • 1
    • Proposed
    • False
    • Hide

      None

      Show
      None

      This is a clone of issue OCPBUGS-1629. The following is the description of the original issue:

      Description of problem:

      It is a disconnected cluster on AWS. There is an issue configuring Egress IP where the cluster uses STS. While looking into cloud-network-config-controller pod it is trying to connect to the global sts service "https://sts.amazonaws.com/" rather it should connect to the regional one "https://ec2.ap-southeast-1.amazonaws.com".

      Version-Release number of selected component (if applicable):

       

      How reproducible:

      Always

      Steps to Reproduce:

      1. Create a disconected OCP cluster on AWS.
      $ oc get netnamespace | grep egress
      egress-ip-test                                     2689387    ["172.16.1.24"]
      
      $ oc get hostsubnet
      NAME                                              HOST                                              HOST IP        SUBNET          EGRESS CIDRS   EGRESS IPS
      ip-172-16-1-151.ap-southeast-1.compute.internal   ip-172-16-1-151.ap-southeast-1.compute.internal   172.16.1.151   10.130.0.0/23                  
      ip-172-16-1-53.ap-southeast-1.compute.internal    ip-172-16-1-53.ap-southeast-1.compute.internal    172.16.1.53    10.131.0.0/23                  ["172.16.1.24"]
      ip-172-16-2-15.ap-southeast-1.compute.internal    ip-172-16-2-15.ap-southeast-1.compute.internal    172.16.2.15    10.128.0.0/23                  
      ip-172-16-2-77.ap-southeast-1.compute.internal    ip-172-16-2-77.ap-southeast-1.compute.internal    172.16.2.77    10.128.2.0/23                  
      ip-172-16-3-111.ap-southeast-1.compute.internal   ip-172-16-3-111.ap-southeast-1.compute.internal   172.16.3.111   10.129.0.0/23                  
      ip-172-16-3-79.ap-southeast-1.compute.internal    ip-172-16-3-79.ap-southeast-1.compute.internal    172.16.3.79    10.129.2.0/23                  
      
      $ oc logs sdn-controller-6m5kb -n openshift-sdn I0922 04:09:53.348615       1 vnids.go:105] Allocated netid 2689387 for namespace "egress-ip-test"
      E0922 04:24:00.682018       1 egressip.go:254] Ignoring invalid HostSubnet ip-172-16-1-53.ap-southeast-1.compute.internal (host: "ip-172-16-1-53.ap-southeast-1.compute.internal", ip: "172.16.1.53", subnet: "10.131.0.0/23"): related node object "ip-172-16-1-53.ap-southeast-1.compute.internal" has an incomplete annotation "cloud.network.openshift.io/egress-ipconfig", CloudEgressIPConfig: <nil>
       $ oc logs cloud-network-config-controller-5c7556db9f-x78bs -n openshift-cloud-network-config-controller
      
      E0922 04:26:59.468726       1 controller.go:165] error syncing 'ip-172-16-2-77.ap-southeast-1.compute.internal': error retrieving the private IP configuration for node: ip-172-16-2-77.ap-southeast-1.compute.internal, err: error: cannot list ec2 instance for node: ip-172-16-2-77.ap-southeast-1.compute.internal, err: WebIdentityErr: failed to retrieve credentials
      caused by: RequestError: send request failed
      caused by: Post "https://sts.amazonaws.com/": dial tcp 54.239.29.25:443: i/o timeout, requeuing in node workqueue
      
      $ oc get Infrastructure -o yaml
      apiVersion: v1
      items:
      - apiVersion: config.openshift.io/v1
        kind: Infrastructure
        metadata:
          creationTimestamp: "2022-09-22T03:28:15Z"
          generation: 1
          name: cluster
          resourceVersion: "598"
          uid: 994da301-2a96-43b7-b43b-4b7c18d4b716
        spec:
          cloudConfig:
            name: ""
          platformSpec:
            aws:
              serviceEndpoints:
              - name: sts
                url: https://sts.ap-southeast-1.amazonaws.com
              - name: ec2
                url: https://ec2.ap-southeast-1.amazonaws.com
              - name: elasticloadbalancing
                url: https://elasticloadbalancing.ap-southeast-1.amazonaws.com
            type: AWS
        status:
          apiServerInternalURI: https://api-int.openshiftyy.ocpaws.sadiqueonline.com:6443
          apiServerURL: https://api.openshiftyy.ocpaws.sadiqueonline.com:6443
          controlPlaneTopology: HighlyAvailable
          etcdDiscoveryDomain: ""
          infrastructureName: openshiftyy-wfrpf
          infrastructureTopology: HighlyAvailable
          platform: AWS
          platformStatus:
            aws:
              region: ap-southeast-1
              serviceEndpoints:
              - name: ec2
                url: https://ec2.ap-southeast-1.amazonaws.com
              - name: elasticloadbalancing
                url: https://elasticloadbalancing.ap-southeast-1.amazonaws.com
              - name: sts
                url: https://sts.ap-southeast-1.amazonaws.com
            type: AWS
      kind: List
      metadata:
        resourceVersion: ""
      
      $ oc get secret aws-cloud-credentials -n openshift-machine-api -o json |jq -r .data.credentials |base64 -d 
      [default]
      sts_regional_endpoints = regional
      role_arn = arn:aws:iam::015719942846:role/sputhenp-sts-yy-openshift-machine-api-aws-cloud-credentials
      web_identity_token_file = /var/run/secrets/openshift/serviceaccount/token
       
      [ec2-user@ip-172-17-1-229 ~]$ oc get secret cloud-credential-operator-iam-ro-creds -n openshift-cloud-credential-operator -o json |jq -r .data.credentials |base64 -d 
      [default]
      sts_regional_endpoints = regional
      role_arn = arn:aws:iam::015719942846:role/sputhenp-sts-yy-openshift-cloud-credential-operator-cloud-creden
      web_identity_token_file = /var/run/secrets/openshift/serviceaccount/token
       
      [ec2-user@ip-172-17-1-229 ~]$ oc get secret installer-cloud-credentials -n openshift-image-registry -o json |jq -r .data.credentials |base64 -d 
      [default]
      sts_regional_endpoints = regional
      role_arn = arn:aws:iam::015719942846:role/sputhenp-sts-yy-openshift-image-registry-installer-cloud-credent
      web_identity_token_file = /var/run/secrets/openshift/serviceaccount/token
       
      [ec2-user@ip-172-17-1-229 ~]$ oc get secret cloud-credentials -n openshift-ingress-operator -o json |jq -r .data.credentials |base64 -d 
      [default]
      sts_regional_endpoints = regional
      role_arn = arn:aws:iam::015719942846:role/sputhenp-sts-yy-openshift-ingress-operator-cloud-credentials
      web_identity_token_file = /var/run/secrets/openshift/serviceaccount/token
       
      [ec2-user@ip-172-17-1-229 ~]$ oc get secret cloud-credentials -n openshift-cloud-network-config-controller -o json |jq -r .data.credentials |base64 -d 
      [default]
      sts_regional_endpoints = regional
      role_arn = arn:aws:iam::015719942846:role/sputhenp-sts-yy-openshift-cloud-network-config-controller-cloud-
      web_identity_token_file = /var/run/secrets/openshift/serviceaccount/token
       
      [ec2-user@ip-172-17-1-229 ~]$ oc get secret ebs-cloud-credentials -n openshift-cluster-csi-drivers -o json |jq -r .data.credentials |base64 -d
      [default]
      sts_regional_endpoints = regional
      role_arn = arn:aws:iam::015719942846:role/sputhenp-sts-yy-openshift-cluster-csi-drivers-ebs-cloud-credenti
      web_identity_token_file = /var/run/secrets/openshift/serviceaccount/token
       

       

      Actual results:

      Egress IP not configured properly and cloud-network-config-controller trying to connect to global STS service.

      Expected results:

      Egress IP should get configured and cloud-network-config-controller should connect to regional STS service instead of global.

      Additional info:

       

              pdiak@redhat.com Patryk Diak
              openshift-crt-jira-prow OpenShift Prow Bot
              Jean Chen Jean Chen
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

                Created:
                Updated:
                Resolved: