Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-35056

AWS - CPO can use incorrect CIDR range on the default worker security group

XMLWordPrintable

    • Moderate
    • No
    • Hypershift Sprint 254, Hypershift Sprint 255
    • 2
    • False
    • Hide

      None

      Show
      None
    • Hide
      * Previously, Amazon Web Services (AWS) HyperShift clusters leveraged their Amazon Virtual Private Cloud (VPC)'s primary classless inter-domain routing (CIDR) range to generate security group rules on the data plane. As a consequence, installing AWS HyperShift clusters into an AWS VPC with multiple CIDR ranges could cause the generated security group rules to be insufficient. With this update, security group rules are generated based on the provided Machine CIDR range to resolve this issue. (link:https://issues.redhat.com/browse/OCPBUGS-35056[*OCPBUGS-35056*])
      Show
      * Previously, Amazon Web Services (AWS) HyperShift clusters leveraged their Amazon Virtual Private Cloud (VPC)'s primary classless inter-domain routing (CIDR) range to generate security group rules on the data plane. As a consequence, installing AWS HyperShift clusters into an AWS VPC with multiple CIDR ranges could cause the generated security group rules to be insufficient. With this update, security group rules are generated based on the provided Machine CIDR range to resolve this issue. (link: https://issues.redhat.com/browse/OCPBUGS-35056 [* OCPBUGS-35056 *])
    • Bug Fix
    • In Progress

      This is a clone of issue OCPBUGS-34274. The following is the description of the original issue:

      Description of problem:

      AWS VPCs support a primary CIDR range and multiple secondary CIDR ranges: https://aws.amazon.com/about-aws/whats-new/2017/08/amazon-virtual-private-cloud-vpc-now-allows-customers-to-expand-their-existing-vpcs/ 

      Let's pretend a VPC exists with:

      • Primary CIDR range: 10.0.0.0/24 (subnet-a)
      • Seconday CIDR range: 10.1.0.0/24 (subnet-b)

      and a hostedcontrolplane object like:

        networking:
      ...
          machineNetwork:
          - cidr: 10.1.0.0/24
      ...
        olmCatalogPlacement: management
        platform:
          aws:
            cloudProviderConfig:
              subnet:
                id: subnet-b
              vpc: vpc-069a93c6654464f03

      Even though all EC2 instances will be spun up in subnet-b (10.1.0.0/24), CPO will detect the CIDR range of the VPC as 10.0.0.0/24 (https://github.com/openshift/hypershift/blob/0d10c822912ed1af924e58ccb8577d2bb1fd68be/control-plane-operator/controllers/hostedcontrolplane/hostedcontrolplane_controller.go#L4755-L4765) and create security group rules only allowing inboud traffic from 10.0.0.0/24. This specifically prevents these EC2 instances from communicating with the VPC Endpoint created by the awsendpointservice CR and reading the hosted control plane pods.

      Version-Release number of selected component (if applicable):

          Reproduced on a 4.14.20 ROSA HCP cluster, but the version should not matter

      How reproducible:

      100%    

      Steps to Reproduce:

          1. Create a VPC with at least one secondary CIDR block
          2. Install a ROSA HCP cluster providing the secondary CIDR block as the machine CIDR range and selecting the appropriate subnets within the secondary CIDR range   

      Actual results:

      * Observe that the default security group contains inbound security group rules allowing traffic from the VPC's primary CIDR block (not a CIDR range containing the cluster's worker nodes)
      
      * As a result, the EC2 instances (worker nodes) fail to reach the ignition-server

      Expected results:

      The EC2 instances are able to reach the ignition-server and HCP pods

      Additional info:

      This bug seems like it could be fixed by using the machine CIDR range for the security group instead of the VPC CIDR range. Alternatively, we could duplicate rules for every secondary CIDR block, but the default AWS quota is 60 inbound security group rules/security group, so it's another failure condition to keep in mind if we go that route.

       

      aws ec2 describe-vpcs output for a VPC with secondary CIDR blocks:    
      
      ❯ aws ec2 describe-vpcs --region us-east-2 --vpc-id vpc-069a93c6654464f03
      {
          "Vpcs": [
              {
                  "CidrBlock": "10.0.0.0/24",
                  "DhcpOptionsId": "dopt-0d1f92b25d3efea4f",
                  "State": "available",
                  "VpcId": "vpc-069a93c6654464f03",
                  "OwnerId": "429297027867",
                  "InstanceTenancy": "default",
                  "CidrBlockAssociationSet": [
                      {
                          "AssociationId": "vpc-cidr-assoc-0abbc75ac8154b645",
                          "CidrBlock": "10.0.0.0/24",
                          "CidrBlockState": {
                              "State": "associated"
                          }
                      },
                      {
                          "AssociationId": "vpc-cidr-assoc-098fbccc85aa24acf",
                          "CidrBlock": "10.1.0.0/24",
                          "CidrBlockState": {
                              "State": "associated"
                          }
                      }
                  ],
                  "IsDefault": false,
                  "Tags": [
                      {
                          "Key": "Name",
                          "Value": "test"
                      }
                  ]
              }
          ]
      }

              mshen.openshift Michael Shen (Inactive)
              openshift-crt-jira-prow OpenShift Prow Bot
              Jie Zhao Jie Zhao
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

                Created:
                Updated:
                Resolved: