Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-11200

cluster failed to setup due to master node could not determine TopologyVersion

XMLWordPrintable

    • Critical
    • No
    • Rejected
    • False
    • Hide

      None

      Show
      None

      Description of problem:

      Sometimes setup cluster will be failed due to one or two master node cannot be ready. 
      
      from the following logs show ovnkube node cannot be finished due to ` Could not determine TopologyVersion via configmap`
      
      *************************
      I0330 04:08:11.632855    4488 default_node_network_controller.go:614] Gateway and management port readiness took 1.291059656s
      I0330 04:08:11.633066    4488 ovs.go:200] Exec(82): /usr/bin/ovs-ofctl -O OpenFlow13 --bundle replace-flows br-ex -
      I0330 04:08:11.633316    4488 node_ip_handler_linux.go:406] Skipping non-useable IP address for host: 127.0.0.1
      I0330 04:08:11.633337    4488 node_ip_handler_linux.go:406] Skipping non-useable IP address for host: 169.254.169.2
      I0330 04:08:11.633344    4488 node_ip_handler_linux.go:406] Skipping non-useable IP address for host: 10.130.0.2
      I0330 04:08:11.633351    4488 node_ip_handler_linux.go:406] Skipping non-useable IP address for host: ::1
      I0330 04:08:11.633359    4488 node_ip_handler_linux.go:406] Skipping non-useable IP address for host: fe80::75e3:9163:af34:53eb
      I0330 04:08:11.633365    4488 node_ip_handler_linux.go:406] Skipping non-useable IP address for host: fe80::f833:a0ff:fed6:68d1
      I0330 04:08:11.633374    4488 node_ip_handler_linux.go:406] Skipping non-useable IP address for host: fe80::843a:a8ff:fe44:2a82
      I0330 04:08:11.636214    4488 node_upgrade.go:84] Could not determine TopologyVersion via configmap, falling back to Nodes
      I0330 04:08:11.636360    4488 obj_retry.go:389] Stop channel got triggered: will stop retrying failed objects of type *factory.serviceForGateway
      I0330 04:08:11.636414    4488 obj_retry.go:389] Stop channel got triggered: will stop retrying failed objects of type *factory.endpointSliceForGateway
      I0330 04:08:11.657043    4488 ovs.go:203] Exec(82): stdout: ""
      I0330 04:08:11.657092    4488 ovs.go:204] Exec(82): stderr: ""
      I0330 04:08:11.657250    4488 reflector.go:227] Stopping reflector *v1.Node (0s) from k8s.io/client-go/informers/factory.go:150
      I0330 04:08:11.657317    4488 reflector.go:227] Stopping reflector *v1.Pod (0s) from k8s.io/client-go/informers/factory.go:150
      I0330 04:08:11.657383    4488 reflector.go:227] Stopping reflector *v1.Namespace (0s) from k8s.io/client-go/informers/factory.go:150
      I0330 04:08:11.657319    4488 handler.go:198] Sending *v1.Service event handler 1 for removal
      I0330 04:08:11.657362    4488 reflector.go:227] Stopping reflector *v1.EndpointSlice (0s) from k8s.io/client-go/informers/factory.go:150
      I0330 04:08:11.657404    4488 reflector.go:227] Stopping reflector *v1.Service (0s) from k8s.io/client-go/informers/factory.go:150
      I0330 04:08:11.658029    4488 handler.go:198] Sending *v1.EndpointSlice event handler 2 for removal
      I0330 04:08:11.658070    4488 handler.go:212] Removed *v1.EndpointSlice event handler 2
      ****************************************
      
       it's like time condition issue.  one master set the config map at `04:08:11.924904`
      
      I0330 04:08:11.924904       1 topology_version.go:38] Updated ConfigMap openshift-ovn-kubernetes/control-plane-status topology version to 5
      
      however both two failed nodes are trying to get the value before that time at ` 04:08:11.636214` ,  so failed I guess
      
      I0330 04:08:11.636214    4488 node_upgrade.go:84] Could not determine TopologyVersion via configmap, falling back to Nodes
      
      
      
      whole logs: 
      https://gcsweb-qe-private-deck-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/qe-private-deck/logs/periodic-ci-openshift-openshift-tests-private-release-4.14-amd64-nightly-azure-ipi-ovn-ipsec-p1-f4/1641284864571346944/artifacts/azure-ipi-ovn-ipsec-p1-f4/ipi-install-install/artifacts/

      Version-Release number of selected component (if applicable):

       

      How reproducible:

      not always

      Steps to Reproduce:

      1. setup cluster with ovn-kubernetes
      2.
      3.
      

      Actual results:

       

      Expected results:

       

      Additional info:

       

              bpickard@redhat.com Ben Pickard
              zzhao1@redhat.com Zhanqi Zhao
              Anurag Saxena Anurag Saxena
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

                Created:
                Updated:
                Resolved: