-
Bug
-
Resolution: Unresolved
-
Major
-
None
-
4.16.z
-
Quality / Stability / Reliability
-
False
-
-
None
-
None
-
None
-
None
-
None
-
None
-
CORENET Sprint 278
-
1
-
Customer Escalated, Customer Facing
-
None
-
None
-
None
-
None
-
None
-
None
-
None
Description of problem:
When too many CloudPrivateIPConfig objects are scheduled onto the same node, the cloud-network-config-controller (CNCC) fails to assign them due to underlying cloud provider IP limits. These objects remain in CloudResponseError state indefinitely instead of being redistributed to other available egress-assignable nodes.
Version-Release number of selected component (if applicable):
4.16.46
How reproducible:
Intermittent, but consistently reproducible when: - Cluster has multiple nodes labeled with k8s.ovn.org/egress-assignable. - Workload requires more egress IPs than a single node can support (e.g., > X secondary IPs on AWS). - CNCC continues to assign new CloudPrivateIPConfig to the saturated node.
Steps to Reproduce:
1. 2. 3.
Actual results:
- CNCC keeps assigning new IPs to a saturated node. - CloudPrivateIPConfig objects remain stuck in CloudResponseError. - No automatic redistribution to other egress-assignable nodes.
Expected results:
- CNCC should detect that a node has reached its IP/ENI limit. - Scheduler logic should redistribute new or failing CloudPrivateIPConfig objects to other available `egress-assignable` nodes automatically.
Additional info:
- OpenShift version: 4.16.46
- Cloud provider: AWS
- Example error from object status:
status: conditions: - lastTransitionTime: "2025-08-21T09:41:59Z" message: cloud API failed to assign IP: exceeded interface address quota reason: CloudResponseError status: "False" type: Assigned