-
Bug
-
Resolution: Obsolete
-
Normal
-
None
-
4.14.z, 4.15.z, 4.17.z, 4.16.z, 4.18.z
-
Quality / Stability / Reliability
-
False
-
-
2
-
Critical
-
None
-
None
-
None
-
Rejected
-
NI&D Sprint 276
-
1
-
Customer Escalated, Customer Facing
-
None
-
None
-
None
-
None
-
None
-
None
-
None
Description of problem:
When configuring the external DNS operator to use infoblox, observe that the client connects to the wapi endpoint just fine, but fails to process the requests - repeatedly dumping the following log: time="2025-04-30T15:21:14Z" level=error msg="could not fetch A records from zone 'ti.devfg.rbc.com': WAPI request error: 400('400 Bad Request')\nContents:\n{ \"Error\": \"AdmConProtoError: Result set too large (> 1000)\", \n \"code\": \"Client.Ibap.Proto\", \n \"text\": \"Result set too large (> 1000)\"\n}\n" time="2025-04-30T15:22:09Z" level=debug msg="fetch records from zone '<custom-domain-address>'" 2025/04/30 15:22:15 WAPI request error: 400('400 Bad Request') Contents: { "Error": "AdmConProtoError: Result set too large (> 1000)", "code": "Client.Ibap.Proto", "text": "Result set too large (> 1000)" } time="2025-04-30T15:22:15Z" level=error msg="could not fetch A records from zone '<custom-domain-address>': WAPI request error: 400('400 Bad Request')\nContents:\n{ \"Error\": \"AdmConProtoError: Result set too large (> 1000)\", \n \"code\": \"Client.Ibap.Proto\", \n \"text\": \"Result set too large (> 1000)\"\n}\n" time="2025-04-30T15:23:10Z" level=debug msg="fetch records from zone '<custom-domain-address>'" 2025/04/30 15:23:16 WAPI request error: 400('400 Bad Request') Contents: { "Error": "AdmConProtoError: Result set too large (> 1000)", "code": "Client.Ibap.Proto", "text": "Result set too large (> 1000)" } time="2025-04-30T15:23:16Z" level=error msg="could not fetch A records from zone '<custom-domain-address>': WAPI request error: 400('400 Bad Request')\nContents:\n{ \"Error\": \"AdmConProtoError: Result set too large (> 1000)\", \n \"code\": \"Client.Ibap.Proto\", \n \"text\": \"Result set too large (> 1000)\"\n}\n" time="2025-04-30T15:24:10Z" level=debug msg="fetch records from zone '<custom-domain-address>'" 2025/04/30 15:24:16 WAPI request error: 400('400 Bad Request') Contents: { "Error": "AdmConProtoError: Result set too large (> 1000)", "code": "Client.Ibap.Proto", "text": "Result set too large (> 1000)" } I suspect that this behavior is the same as outlined here: https://github.com/kubernetes-sigs/external-dns/pull/953 Tested with some explicit curls: #curl to check all IPV4 A-records (will take longer) $time curl -vk 'https://1.2.3.4:443/wapi/v2.7/record:host?_max_results=50000&_proxy_search=GM&_return_fields=extattrs%2Cipv4addrs%2Cname%2Cview%2Czone' #curl to just return the names (should be much faster) $time curl -vk 'https://1.2.3.4:443/wapi/v2.7/record:host?_max_results=50000&_proxy_search=GM&_return_fields=extattrs%2Cname%2Cview%2Czone' where 1.2.3.4. is wapi server
Please find the attached outputs (attached below in this jira), first query took close to 4 minutes and the second query took approximately 100 seconds The data supplied from the curl indicate we are returning over 100k records!! less query1-output.json | grep ipv4addr | wc -l 102393 Therefore it is likely that we are seeing an issue where the _max_results value is not being set properly, resulting in a runaway response that far exceeds the capacity of the client to return the results during the initial GET request. Workarounds (perhaps not possible with existing version of the operator, relative to the upstream build) appear to be setting an explicit ARG or ENV var that defines the max_results value. Looking at the log samples, we're failing this call every 1 minute or so, which indicates to me that there is some timeout handler being reached - especially when reviewing that the manual curls take about 4 minutes to complete (because they are unbounded). time="2025-04-29T17:28:45Z" level=error msg="could not fetch A records from zone '<custom-domain>' ... time="2025-04-29T17:29:45Z" level=error msg="could not fetch A records from zone '<custom-domain>' ... time="2025-04-29T17:30:48Z" level=error msg="could not fetch A records from zone '<custom-domain>' ... time="2025-04-29T17:31:46Z" level=error msg="could not fetch A records from zone '<custom-domain>' ... etc...
Version-Release number of selected component (if applicable):
external DNS operator available on openshift clusters (any version) is 1.2.0
How reproducible:
every time - unable to deploy
Steps to Reproduce:
1. 2. 3.
Actual results:
infoblox pull with externalDNS operator fails - cannot deploy
Expected results:
we should not be stalled in this way, upstream fix implies we have an opportunity to import the solve into our deployed version of the operator
Additional info:
review these linked git issues: https://github.com/kubernetes-sigs/external-dns/pull/953 https://github.com/openshift/external-dns-operator/issues/221