Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-37185

NMState requires root-servers.net to be resolvable even in disconnected (air gapped) environments.

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Duplicate
    • Icon: Undefined Undefined
    • None
    • 4.15
    • None
    • None
    • False
    • Hide

      None

      Show
      None

      Description of problem:

          While creating NNCP in disconnected (air gapped) clusters, where root-servers.net isn't resolvable, there is an extra 2 minutes of delay before NNCP is successfully applied due to DNS probes checking root-servers.net resolve.
      
      It happened two times to me in the last month while performing OCP PoC in air gapped environments. 
      
      I'd like to either have it explicitly documented somewhere (requirements for air gapped OCP installations maybe?) or ability to disable that DNS probe.
      
      Please note that expecting root-servers.net is being resolvable in air gapped installations may be challenging as it may require interference with core network infrastructure of highly restricted environments.

      Version-Release number of selected component (if applicable):

          

      How reproducible:

          Always when local DNS servers don't resolve root-servers.net 

      Steps to Reproduce

      1) NNCP is being created, logs from nmstate-handler on one of the nodes:
      
      {"level":"info","ts":"2024-07-17T10:32:14.383Z","logger":"enactmentstatus","msg":"status: {DesiredState:interfaces:\n- bridge:\n    port:\n    - name: enp8s0\n  name: ovn-backup\n  state: up\n  type: ovs-bridge\novn:\n  bridge-mappings:\n  - bridge: ovn-backup\n    localnet: backup\n    state: present\n DesiredStateMetaInfo:{Version: TimeStamp:0001-01-01 00:00:00 +0000 UTC} CapturedStates:map[] PolicyGeneration:3 Conditions:[]}","enactment":"rhocp-compute-0.ovn-backup"}
      {"level":"info","ts":"2024-07-17T10:32:14.414Z","logger":"enactmentconditions","msg":"NotifyProgressing","enactment":"rhocp-compute-0.ovn-backup"}
      {"level":"info","ts":"2024-07-17T10:32:14.422Z","logger":"enactmentstatus","msg":"status: {DesiredState:interfaces:\n- bridge:\n    port:\n    - name: enp8s0\n  name: ovn-backup\n  state: up\n  type: ovs-bridge\novn:\n  bridge-mappings:\n  - bridge: ovn-backup\n    localnet: backup\n    state: present\n DesiredStateMetaInfo:{Version: TimeStamp:0001-01-01 00:00:00 +0000 UTC} CapturedStates:map[] PolicyGeneration:3 Conditions:[{Type:Progressing Status:True Reason:ConfigurationProgressing Message:Applying desired state LastHeartbeatTime:2024-07-17 10:32:14.422332523 +0000 UTC m=+429667.913220316 LastTransitionTime:2024-07-17 10:32:14.422332523 +0000 UTC m=+429667.913220316} {Type:Failing Status:Unknown Reason:ConfigurationProgressing Message: LastHeartbeatTime:2024-07-17 10:32:14.422333071 +0000 UTC m=+429667.913220822 LastTransitionTime:2024-07-17 10:32:14.422333071 +0000 UTC m=+429667.913220822} {Type:Available Status:Unknown Reason:ConfigurationProgressing Message: LastHeartbeatTime:2024-07-17 10:32:14.422334054 +0000 UTC m=+429667.913221805 LastTransitionTime:2024-07-17 10:32:14.422334054 +0000 UTC m=+429667.913221805} {Type:Pending Status:False Reason:ConfigurationProgressing Message: LastHeartbeatTime:2024-07-17 10:32:14.422334503 +0000 UTC m=+429667.913222254 LastTransitionTime:2024-07-17 10:32:14.422334503 +0000 UTC m=+429667.913222254} {Type:Aborted Status:False Reason:ConfigurationProgressing Message: LastHeartbeatTime:2024-07-17 10:32:14.42233463 +0000 UTC m=+429667.913222384 LastTransitionTime:2024-07-17 10:32:14.42233463 +0000 UTC m=+429667.913222384}]}","enactment":"rhocp-compute-0.ovn-backup"}
      {"level":"error","ts":"2024-07-17T10:32:14.972Z","logger":"probe","msg":"failed checking DNS connectivity","error":"[lookup root-servers.net on 10.195.14.253:53: server misbehaving]","stacktrace":"github.com/nmstate/kubernetes-nmstate/pkg/probe.runDNS\n\t/go/src/github.com/openshift/kubernetes-nmstate/pkg/probe/probes.go:255\ngithub.com/nmstate/kubernetes-nmstate/pkg/probe.dnsCondition.func1\n\t/go/src/github.com/openshift/kubernetes-nmstate/pkg/probe/probes.go:219\nk8s.io/apimachinery/pkg/util/wait.loopConditionUntilContext.func1\n\t/go/src/github.com/openshift/kubernetes-nmstate/vendor/k8s.io/apimachinery/pkg/util/wait/loop.go:49\nk8s.io/apimachinery/pkg/util/wait.loopConditionUntilContext\n\t/go/src/github.com/openshift/kubernetes-nmstate/vendor/k8s.io/apimachinery/pkg/util/wait/loop.go:50\nk8s.io/apimachinery/pkg/util/wait.PollUntilContextTimeout\n\t/go/src/github.com/openshift/kubernetes-nmstate/vendor/k8s.io/apimachinery/pkg/util/wait/poll.go:48\ngithub.com/nmstate/kubernetes-nmstate/pkg/probe.Select\n\t/go/src/github.com/openshift/kubernetes-nmstate/pkg/probe/probes.go:277\ngithub.com/nmstate/kubernetes-nmstate/pkg/client.ApplyDesiredState\n\t/go/src/github.com/openshift/kubernetes-nmstate/pkg/client/client.go:159\ngithub.com/nmstate/kubernetes-nmstate/controllers/handler.(*NodeNetworkConfigurationPolicyReconciler).Reconcile\n\t/go/src/github.com/openshift/kubernetes-nmstate/controllers/handler/nodenetworkconfigurationpolicy_controller.go:226\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile\n\t/go/src/github.com/openshift/kubernetes-nmstate/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:122\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/go/src/github.com/openshift/kubernetes-nmstate/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:323\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/go/src/github.com/openshift/kubernetes-nmstate/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:274\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/go/src/github.com/openshift/kubernetes-nmstate/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:235"}
      
      
      
      (...)
      After two minutes:
      
      
      
      {"level":"error","ts":"2024-07-17T10:34:14.534Z","logger":"probe","msg":"failed checking DNS connectivity","error":"[lookup root-servers.net on 10.195.14.253:53: server misbehaving]","stacktrace":"github.com/nmstate/kubernetes-nmstate/pkg/probe.runDNS\n\t/go/src/github.com/openshift/kubernetes-nmstate/pkg/probe/probes.go:255\ngithub.com/nmstate/kubernetes-nmstate/pkg/probe.dnsCondition.func1\n\t/go/src/github.com/openshift/kubernetes-nmstate/pkg/probe/probes.go:219\nk8s.io/apimachinery/pkg/util/wait.loopConditionUntilContext.func2\n\t/go/src/github.com/openshift/kubernetes-nmstate/vendor/k8s.io/apimachinery/pkg/util/wait/loop.go:73\nk8s.io/apimachinery/pkg/util/wait.loopConditionUntilContext\n\t/go/src/github.com/openshift/kubernetes-nmstate/vendor/k8s.io/apimachinery/pkg/util/wait/loop.go:74\nk8s.io/apimachinery/pkg/util/wait.PollUntilContextTimeout\n\t/go/src/github.com/openshift/kubernetes-nmstate/vendor/k8s.io/apimachinery/pkg/util/wait/poll.go:48\ngithub.com/nmstate/kubernetes-nmstate/pkg/probe.Select\n\t/go/src/github.com/openshift/kubernetes-nmstate/pkg/probe/probes.go:277\ngithub.com/nmstate/kubernetes-nmstate/pkg/client.ApplyDesiredState\n\t/go/src/github.com/openshift/kubernetes-nmstate/pkg/client/client.go:159\ngithub.com/nmstate/kubernetes-nmstate/controllers/handler.(*NodeNetworkConfigurationPolicyReconciler).Reconcile\n\t/go/src/github.com/openshift/kubernetes-nmstate/controllers/handler/nodenetworkconfigurationpolicy_controller.go:226\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile\n\t/go/src/github.com/openshift/kubernetes-nmstate/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:122\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/go/src/github.com/openshift/kubernetes-nmstate/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:323\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/go/src/github.com/openshift/kubernetes-nmstate/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:274\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/go/src/github.com/openshift/kubernetes-nmstate/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:235"}
      {"level":"info","ts":"2024-07-17T10:34:14.705Z","logger":"probe","msg":"WARNING not selecting dns probe"}
      {"level":"info","ts":"2024-07-17T10:34:16.538Z","logger":"probe","msg":"Running 'ping' probe"}
      {"level":"info","ts":"2024-07-17T10:34:16.952Z","logger":"probe","msg":"Running 'api-server' probe"}
      {"level":"info","ts":"2024-07-17T10:34:16.997Z","logger":"probe","msg":"Running 'node-readiness' probe"}
      {"level":"info","ts":"2024-07-17T10:34:17.022Z","logger":"controllers.NodeNetworkConfigurationPolicy","msg":"nmstate","nodenetworkconfigurationpolicy":{"name":"ovn-backup"},"output":"setOutput: route-rules: {}\nroutes: {}\ninterfaces:\n- name: ovn-backup\n  type: ovs-bridge\n  state: up\n  bridge:\n    port:\n    - name: enp8s0\novn:\n  bridge-mappings:\n  - localnet: backup\n    state: present\n    bridge: ovn-backup\n\n \n"}
      {"level":"info","ts":"2024-07-17T10:34:17.023Z","logger":"enactmentconditions","msg":"NotifySuccess","enactment":"rhocp-compute-0.ovn-backup"}
      {"level":"info","ts":"2024-07-17T10:34:17.031Z","logger":"enactmentstatus","msg":"status: {DesiredState:interfaces:\n- bridge:\n    port:\n    - name: enp8s0\n  name: ovn-backup\n  state: up\n  type: ovs-bridge\novn:\n  bridge-mappings:\n  - bridge: ovn-backup\n    localnet: backup\n    state: present\n DesiredStateMetaInfo:{Version: TimeStamp:0001-01-01 00:00:00 +0000 UTC} CapturedStates:map[] PolicyGeneration:3 Conditions:[{Type:Progressing Status:False Reason:SuccessfullyConfigured Message: LastHeartbeatTime:2024-07-17 10:34:17.031838985 +0000 UTC m=+429790.522726737 LastTransitionTime:2024-07-17 10:34:17.031838985 +0000 UTC m=+429790.522726737} {Type:Failing Status:False Reason:SuccessfullyConfigured Message: LastHeartbeatTime:2024-07-17 10:34:17.031838839 +0000 UTC m=+429790.522726590 LastTransitionTime:2024-07-17 10:34:17.031838839 +0000 UTC m=+429790.522726590} {Type:Available Status:True Reason:SuccessfullyConfigured Message:successfully reconciled LastHeartbeatTime:2024-07-17 10:34:17.031838506 +0000 UTC m=+429790.522726281 LastTransitionTime:2024-07-17 10:34:17.031838506 +0000 UTC m=+429790.522726281} {Type:Pending Status:False Reason:SuccessfullyConfigured Message: LastHeartbeatTime:2024-07-17 10:34:17.031839155 +0000 UTC m=+429790.522726908 LastTransitionTime:2024-07-17 10:32:14 +0000 UTC} {Type:Aborted Status:False Reason:SuccessfullyConfigured Message: LastHeartbeatTime:2024-07-17 10:34:17.031839373 +0000 UTC m=+429790.522727124 LastTransitionTime:2024-07-17 10:32:14 +0000 UTC}]}","enactment":"rhocp-compute-0.ovn-backup"}
      {"level":"info","ts":"2024-07-17T10:34:17.048Z","logger":"controllers.NodeNetworkConfigurationPolicy.forceNNSRefresh","msg":"forcing NodeNetworkState refresh after NNCP applied","node":"rhocp-compute-0"}
      
      
      
      DIG ran from the node:
      sh-4.4# dig root-servers.net. @10.195.14.253; <<>> DiG 9.11.36-RedHat-9.11.36-3.el8_6.7 <<>> root-servers.net. @10.195.14.253
      ;; global options: +cmd
      ;; Got answer:
      ;; ->>HEADER<<- opcode: QUERY, status: REFUSED, id: 61858
      ;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1;; OPT PSEUDOSECTION:
      ; EDNS: version: 0, flags:; udp: 1232
      ;; QUESTION SECTION:
      ;root-servers.net.              IN      A;; Query time: 0 msec
      ;; SERVER: 10.195.14.253#53(10.195.14.253)
      ;; WHEN: Wed Jul 17 11:23:01 UTC 2024
      ;; MSG SIZE  rcvd: 45
      
      
      dnsmasq configuration (in the lab where I confirmed this behaviour):
      
      port=53
      domain-needed
      bogus-priv
      no-resolv
      no-poll
      log-queries
      address=/apps.ocp4.openshift.one/10.195.14.101
      address=/api.ocp4.openshift.one/10.195.14.100
      address=/api-int.ocp4.openshift.one/10.195.14.100
      address=/quay.ocp4.openshift.one/10.195.14.253
      address=/bastion.ocp4.openshift.one/10.195.14.253
      interface=eth1
      no-dhcp-interface=eth1
      bind-interfaces
      no-hosts
      
      

      Actual results:

          

      Expected results:

          

      Additional info:

          

              mkowalsk@redhat.com Mat Kowalski
              rszmigie Rafal Szmigiel
              Qiong Wang Qiong Wang
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

                Created:
                Updated:
                Resolved: