Loading...

XML

Word

Printable

Type: Bug
Resolution: Cannot Reproduce
Priority: Major
Fix Version/s: None
Affects Version/s: 4.18.z
Component/s: Networking / cloud-network-config-controller
Labels:
- SDN:OVNK:EgressIP
- aro

Activity Type:
Quality / Stability / Reliability
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Story Points:
None
Severity:
Important
Regression:
None

Target Backport Versions:
None
Target Version:
None
Release Blocker:
None
Sprint:
None

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

PX Impact Score:

Release Note Status:
None
Release Note Type:
None
Release Note Text:
None

Escape Reason:
None
Escape Impact:
None
Corrective Measures:
None
SDLC stage when should've been found:
None

Description of problem:

In Private ARO cluster with UDR (and possible other cloud deployments) CNCC fails to assign EIPs that (for whatever reason) triggers and error resonse from the underlying cloud provider. These objects remain in CloudResponseError state indefinitely instead of being redistributed to other available egress-assignable nodes.

Version-Release number of selected component (if applicable):

How reproducible:

    Always in ARO

Steps to Reproduce:

    1. Deploy 150 eips over 150 namespaces in an Private ARO cluster with UDR that has 3 nodes for egressip, i.e. labeled k8s.ovn.org/egress-assignable=true       
    2. Initiatie a rolling restart of the workers nodes
    3. Because node has an annotation that states capacity is 255, but Azure has a capacity limit of 300 security rules per nic - the actual limit is reached at ca 75 eips (in this scenario)
    4. CNCC ignores the response from Azure that states the error and continues to assign CloudPrivateIPConfig to the saturated node

Actual results:

- CNCC keeps assigning new IPs to a saturated node.
- CloudPrivateIPConfig objects remain stuck in CloudResponseError.
- No automatic redistribution to other egress-assignable nodes.

Expected results:

- CNCC should detect that thecloud provider is returning an error.
- Scheduler logic should redistribute new or failing CloudPrivateIPConfig objects to other available `egress-assignable` nodes automatically.

Additional info:

We're htting this undocumented limit because on ARO, EgressIPs are being added to the backend pool, this is being adressed in OCPBUGS-57447. Regardless, CNCC should be able to detect the errors that the cloud provider is returning.

Assignee:: Patryk Diak

Reporter:: John Johansson

QA Contact:: Anurag Saxena

Need Info From:: None

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Created:: 2025/10/24 9:28 AM

Updated:: 2025/12/08 3:40 PM

Resolved:: 2025/12/08 3:40 PM

Details

Description

Attachments

Easy Agile Planning Poker

Activity

People

Dates