Loading...

XML

Word

Printable

Type: Bug
Resolution: Obsolete
Priority: Normal
Fix Version/s: None
Affects Version/s: 4.14.z, 4.15.z, 4.17.z, 4.16.z, 4.18.z
Component/s: ExternalDNS Operator
Labels:
- ne-triaged

Activity Type:
Quality / Stability / Reliability
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Story Points:
2
Severity:
Critical
Regression:
None

Target Backport Versions:
None
Target Version:
None
Release Blocker:
Rejected
Sprint:
NI&D Sprint 276
sprint_count:
1

Customer Impact:

Customer Escalated, Customer Facing

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

PX Impact Score:

Release Note Status:
None
Release Note Type:
None
Release Note Text:
None

Escape Reason:
None
Escape Impact:
None
Corrective Measures:
None
SDLC stage when should've been found:
None

Description of problem:

When configuring the external DNS operator to use infoblox, observe that the client connects to the wapi endpoint just fine, but fails to process the requests - repeatedly dumping the following log:

time="2025-04-30T15:21:14Z" level=error msg="could not fetch A records from zone 'ti.devfg.rbc.com': WAPI request error: 400('400 Bad Request')\nContents:\n{ \"Error\": \"AdmConProtoError: Result set too large (> 1000)\", \n \"code\": \"Client.Ibap.Proto\", \n \"text\": \"Result set too large (> 1000)\"\n}\n"
time="2025-04-30T15:22:09Z" level=debug msg="fetch records from zone '<custom-domain-address>'"
2025/04/30 15:22:15 WAPI request error: 400('400 Bad Request')
Contents:
{ "Error": "AdmConProtoError: Result set too large (> 1000)",
"code": "Client.Ibap.Proto",
"text": "Result set too large (> 1000)"
}
time="2025-04-30T15:22:15Z" level=error msg="could not fetch A records from zone '<custom-domain-address>': WAPI request error: 400('400 Bad Request')\nContents:\n{ \"Error\": \"AdmConProtoError: Result set too large (> 1000)\", \n \"code\": \"Client.Ibap.Proto\", \n \"text\": \"Result set too large (> 1000)\"\n}\n"
time="2025-04-30T15:23:10Z" level=debug msg="fetch records from zone '<custom-domain-address>'"
2025/04/30 15:23:16 WAPI request error: 400('400 Bad Request')
Contents:
{ "Error": "AdmConProtoError: Result set too large (> 1000)",
"code": "Client.Ibap.Proto",
"text": "Result set too large (> 1000)"
}
time="2025-04-30T15:23:16Z" level=error msg="could not fetch A records from zone '<custom-domain-address>': WAPI request error: 400('400 Bad Request')\nContents:\n{ \"Error\": \"AdmConProtoError: Result set too large (> 1000)\", \n \"code\": \"Client.Ibap.Proto\", \n \"text\": \"Result set too large (> 1000)\"\n}\n"
time="2025-04-30T15:24:10Z" level=debug msg="fetch records from zone '<custom-domain-address>'"
2025/04/30 15:24:16 WAPI request error: 400('400 Bad Request')
Contents:
{ "Error": "AdmConProtoError: Result set too large (> 1000)",
"code": "Client.Ibap.Proto",
"text": "Result set too large (> 1000)"
}    


I suspect that this behavior is the same as outlined here:
https://github.com/kubernetes-sigs/external-dns/pull/953

Tested with some explicit curls:
#curl to check all IPV4 A-records (will take longer)
$time curl -vk 'https://1.2.3.4:443/wapi/v2.7/record:host?_max_results=50000&_proxy_search=GM&_return_fields=extattrs%2Cipv4addrs%2Cname%2Cview%2Czone'

#curl to just return the names (should be much faster)
$time curl -vk 'https://1.2.3.4:443/wapi/v2.7/record:host?_max_results=50000&_proxy_search=GM&_return_fields=extattrs%2Cname%2Cview%2Czone'

where 1.2.3.4. is wapi server

Please find the attached outputs (attached below in this jira), first query took close to 4 minutes and the second query took approximately 100 seconds

The data supplied from the curl indicate we are returning over 100k records!!
less query1-output.json | grep ipv4addr | wc -l
102393

Therefore it is likely that we are seeing an issue where the _max_results value is not being set properly, resulting in a runaway response that far exceeds the capacity of the client to return the results during the initial GET request. 

Workarounds (perhaps not possible with existing version of the operator, relative to the upstream build) appear to be setting an explicit ARG or ENV var that defines the max_results value.

Looking at the log samples, we're failing this call every 1 minute or so, which indicates to me that there is some timeout handler being reached - especially when reviewing that the manual curls take about 4 minutes to complete (because they are unbounded). 


time="2025-04-29T17:28:45Z" level=error msg="could not fetch A records from zone '<custom-domain>' ...

time="2025-04-29T17:29:45Z" level=error msg="could not fetch A records from zone '<custom-domain>' ...

time="2025-04-29T17:30:48Z" level=error msg="could not fetch A records from zone '<custom-domain>' ...

time="2025-04-29T17:31:46Z" level=error msg="could not fetch A records from zone '<custom-domain>' ...

etc...

Version-Release number of selected component (if applicable):

    external DNS operator available on openshift clusters (any version) is 1.2.0

How reproducible:

    every time - unable to deploy

Steps to Reproduce:

    1. 
    2.
    3.

Actual results:

    infoblox pull with externalDNS operator fails - cannot deploy

Expected results:

    we should not be stalled in this way, upstream fix implies we have an opportunity to import the solve into our deployed version of the operator

Additional info:

review these linked git issues:     
https://github.com/kubernetes-sigs/external-dns/pull/953
https://github.com/openshift/external-dns-operator/issues/221

Assignee:: Andrey Lebedev

Reporter:: Will Russell

QA Contact:: Hongan Li

Need Info From:: None

Votes:: 1 Vote for this issue

Watchers:: 9 Start watching this issue

Created:: 2025/05/12 5:56 PM

Updated:: 2025/10/10 12:04 AM

Resolved:: 2025/09/04 8:39 PM

Details

Description

Attachments

Easy Agile Planning Poker

Activity

People

Dates