Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-42564

Kube-Controller reconciling AWS NLB rules for security group incorrectly

XMLWordPrintable

    • None
    • False
    • Hide

      None

      Show
      None

      The customer is facing issue related to AWS security group inbound rules. OCP 4 AWS Private IPI cluster is deployed in existing VPC where customer created the following 6 public subnets with respective CIDR ranges, 3 after cluster creation for additional external ingresscontroller NLB, and 3 public subnets recently for other use-cases.

       

      - Older public subnets with their CIDRs for for additional external ingresscontroller NLB.
      --> subnet-0axxxxx - 10.44.102.0/27
      --> subnet-0exxxx - 10.44.102.32/27
      --> subnet-09xxxx - 10.44.102.64/27 
      - New public subnets with their CIDRs
      --> subnet-00axxxx - 10.44.102.96/27
      --> subnet-06xxxxxxx - 10.44.102.128/27
      --> subnet-0fxxxxxx - 10.44.102.160/27
      

      Now, the NLB is deployed in older public subnets so as expected the inbound rules were there in master and worker security groups allowing traffic from the respective older public subnets CIDRs allowing traffic on NLB healthcheck port 30441.

      When the EnsureLoadBalancer function triggered by kube-controller-manager for reconciling the NLB, it removed the existing inbound rules for older public subnets and and replaced those allowing traffic from new public subnets CIDRs on 30441 port which caused the issue.

      We figured out that KCM is doing so on the basis of following rules.
      --> https://github.com/openshift/cloud-provider-aws/blob/95c03b7b838f7c78efe8957b50c50a22cd625be7/pkg/providers/v1/aws.go#L243
      --> 1. prefer the subnet with the correct role tag. kubernetes.io/role/elb for public and kubernetes.io/role/internal-elb for private access
      --> 2. prefer the subnet with the cluster tag kubernetes.io/cluster/<Cluster Name>
      -->  3. prefer the subnet that is first in lexicographic order

      As per the rules, it checks for required tags, if those are there then it selects the subnets for inbound rules in lexicographic order (choosing the subnets for which inbound CIDRs rules will get added in the ascending order on the basis of subnet ID instead of picking up those subnets in which NLB is already deployed).

      I will share the logs, subnet ID and other few details in private comment.

       

              joelspeed Joel Speed
              rhn-support-aygarg Ayush Garg
              ying zhou ying zhou
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

                Created:
                Updated:
                Resolved: