Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-43385

[Pre-merge Testing] ovnkube panic/crashloopbackoff during ovnkube restart/rollout

XMLWordPrintable

    • Critical
    • No
    • Proposed
    • False
    • Hide

      None

      Show
      None

      Description of problem: Observed ovnkube pod on newly scaled node crashing preventing node to be in NOTREADY state

      Version-Release number of selected component (if applicable): Cluster bot build
      build 4.18, openshift/ovn-kubernetes#2314,openshift/api#1997
      How reproducible: Always

      Steps to Reproduce:

      1. Build a cluster bot build 

      2. Launch cluster on that build

      3. Create couple of NS's and NADs inside them

      4. Scale a node on the cluster

      Actual results: ovnkube pod on newly scaled node crashing preventing node to be in NOTREADY state

      Expected results: node should be in READY state without CLBO in any ovn pods

      Additional info: Excerpt from `oc logs ocnkube-node -c ovnkube-controller` on scaled node

      I1015 18:35:03.063817    4850 reflector.go:359] Caches populated for *v1.NetworkAttachmentDefinition from github.com/k8snetworkplumbingwg/network-attachment-definition-client/pkg/client/informers/externalversions/factory.go:117
      I1015 18:35:03.089307    4850 reflector.go:296] Starting reflector *v1.UserDefinedNetwork (0s) from github.com/openshift/ovn-kubernetes/go-controller/pkg/crd/userdefinednetwork/v1/apis/informers/externalversions/factory.go:140
      I1015 18:35:03.089353    4850 reflector.go:332] Listing and watching *v1.UserDefinedNetwork from github.com/openshift/ovn-kubernetes/go-controller/pkg/crd/userdefinednetwork/v1/apis/informers/externalversions/factory.go:140
      I1015 18:35:03.091039    4850 reflector.go:359] Caches populated for *v1.UserDefinedNetwork from github.com/openshift/ovn-kubernetes/go-controller/pkg/crd/userdefinednetwork/v1/apis/informers/externalversions/factory.go:140
      I1015 18:35:03.161951    4850 controller.go:103] Adding controller [node-network-controller-manager NAD controller] event handlers
      I1015 18:35:03.161999    4850 shared_informer.go:313] Waiting for caches to sync for [node-network-controller-manager NAD controller]
      I1015 18:35:03.162006    4850 shared_informer.go:320] Caches are synced for [node-network-controller-manager NAD controller]
      I1015 18:35:03.162362    4850 controller.go:127] Starting controller [node-network-controller-manager NAD controller] with 1 workers
      I1015 18:35:03.163086    4850 network_manager.go:215] [node-network-controller-manager network manager]: syncing all networks
      I1015 18:35:03.163179    4850 network_manager.go:153] [node-network-controller-manager network manager]: finished syncing network l3-network-e2e-test-networking-udn-zmbjr, took 11.612µs
      I1015 18:35:03.163258    4850 network_attach_def_controller.go:149] [node-network-controller-manager NAD controller]: shutting down
      I1015 18:35:03.163725    4850 network_attach_def_controller.go:183] [node-network-controller-manager NAD controller]: finished syncing NAD e2e-test-networking-udn-s28lk/l3-network-e2e-test-networking-udn-s28lk, took 1.260204ms
      I1015 18:35:03.164667    4850 network_attach_def_controller.go:183] [node-network-controller-manager NAD controller]: finished syncing NAD e2e-test-networking-udn-zmbjr/l3-network-e2e-test-networking-udn-zmbjr, took 908.775µs
      panic: close of closed channel 
      
      goroutine 118 [running]:
      github.com/ovn-org/ovn-kubernetes/go-controller/pkg/controller.(*controller[...]).stop(0x29f4740)
          /go/src/github.com/openshift/ovn-kubernetes/go-controller/pkg/controller/controller.go:142 +0x25
      github.com/ovn-org/ovn-kubernetes/go-controller/pkg/controller.Stop(...)
          /go/src/github.com/openshift/ovn-kubernetes/go-controller/pkg/controller/controller.go:313
      github.com/ovn-org/ovn-kubernetes/go-controller/pkg/network-attach-def-controller.(*networkManagerImpl).Stop(0xc0008606c0)
          /go/src/github.com/openshift/ovn-kubernetes/go-controller/pkg/network-attach-def-controller/network_manager.go:81 +0x4f
      github.com/ovn-org/ovn-kubernetes/go-controller/pkg/network-attach-def-controller.(*NetAttachDefinitionController).Stop(0xc0004f23f0)
          /go/src/github.com/openshift/ovn-kubernetes/go-controller/pkg/network-attach-def-controller/network_attach_def_controller.go:151 +0x153
      github.com/ovn-org/ovn-kubernetes/go-controller/pkg/network-controller-manager.(*nodeNetworkControllerManager).Stop(0xc0004f2360)
          /go/src/github.com/openshift/ovn-kubernetes/go-controller/pkg/network-controller-manager/node_network_controller_manager.go:246 +0x4a
      github.com/ovn-org/ovn-kubernetes/go-controller/pkg/network-controller-manager.(*nodeNetworkControllerManager).Start.func1()
          /go/src/github.com/openshift/ovn-kubernetes/go-controller/pkg/network-controller-manager/node_network_controller_manager.go:175 +0x25
      github.com/ovn-org/ovn-kubernetes/go-controller/pkg/network-controller-manager.(*nodeNetworkControllerManager).Start(0xc0004f2360, {0x29d4ea8, 0xc00023dae0})
          /go/src/github.com/openshift/ovn-kubernetes/go-controller/pkg/network-controller-manager/node_network_controller_manager.go:197 +0x344
      main.runOvnKube.func4()
          /go/src/github.com/openshift/ovn-kubernetes/go-controller/cmd/ovnkube/ovnkube.go:560 +0x2ea
      created by main.runOvnKube in goroutine 1
          /go/src/github.com/openshift/ovn-kubernetes/go-controller/cmd/ovnkube/ovnkube.go:536 +0x58f
      
      

            sseethar Surya Seetharaman
            anusaxen Anurag Saxena
            Anurag Saxena Anurag Saxena
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated:
              Resolved: