Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-60806

[AWS, EgressIP] CNCC and OVN-Kubernetes are not handling 0 capacity in cloud env's correctly (0 and unset is not differentiated - 0 capacity is mistaken for Unlimited capacity by OVNK

XMLWordPrintable

    • Quality / Stability / Reliability
    • False
    • Hide

      None

      Show
      None
    • None
    • None
    • None
    • None
    • None
    • None
    • CORENET Sprint 278
    • 1
    • Customer Escalated, Customer Facing
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      Description of problem:

      When too many CloudPrivateIPConfig objects are scheduled onto the same node, the cloud-network-config-controller (CNCC) fails to assign them due to underlying cloud provider IP limits. These objects remain in CloudResponseError state indefinitely instead of being redistributed to other available egress-assignable nodes. 

      Version-Release number of selected component (if applicable):

       4.16.46   

      How reproducible:

      Intermittent, but consistently reproducible when:      
      
      - Cluster has multiple nodes labeled with k8s.ovn.org/egress-assignable.
      
      - Workload requires more egress IPs than a single node can support (e.g., > X secondary IPs on AWS).
      
      - CNCC continues to assign new CloudPrivateIPConfig to the saturated node.
      

      Steps to Reproduce:

          1.
          2.
          3.
          

      Actual results:

      - CNCC keeps assigning new IPs to a saturated node.
      - CloudPrivateIPConfig objects remain stuck in CloudResponseError.
      - No automatic redistribution to other egress-assignable nodes.

      Expected results:

      - CNCC should detect that a node has reached its IP/ENI limit.
      - Scheduler logic should redistribute new or failing CloudPrivateIPConfig objects to other available `egress-assignable` nodes automatically.

      Additional info:

      • OpenShift version: 4.16.46
      • Cloud provider: AWS
      • Example error from object status:
        status:
          conditions:
          - lastTransitionTime: "2025-08-21T09:41:59Z"
            message: cloud API failed to assign IP: exceeded interface address quota
            reason: CloudResponseError
            status: "False"
            type: Assigned 

              sseethar Surya Seetharaman
              rhn-support-hthakare Harshal Thakare
              None
              None
              Qiong Wang Qiong Wang
              None
              Votes:
              3 Vote for this issue
              Watchers:
              16 Start watching this issue

                Created:
                Updated: