Uploaded image for project: 'OpenShift Virtualization'
  1. OpenShift Virtualization
  2. CNV-29074

capk crashloop with external infra

XMLWordPrintable

    • Quality / Stability / Reliability
    • False
    • Hide

      None

      Show
      None
    • False
    • CNV I/U Operators Sprint 237, CNV I/U Operators Sprint 238
    • None

      In the external infra use case, capk will enter a crashloop on the mgmt cluster if cnv isn't installed. This is due to the vmi eviction controller within capk which attempts to watch VMIs on the mgmt cluster instead of the infra cluster. 

       

      To resolve this, we need the capk eviction controller to watch kubevirtmachine's and then using the infra client to get the vmi associated with the kubevirtmachine during the reconcile loop. The eviction controller should be on a resync timer in order to ensure kubevirtmachines are processed periodically to see if a vmi needs to be processed for eviction. It's possible this already occurs, but it should be checked.

       

      The capk error message looks like the output below

       

      E0524 21:11:49.063969       1 logr.go:279] controller-runtime/source "msg"="if kind is a CRD, it should be installed before calling Start" "error"="no matches for kind \"VirtualMachineInstance\" in version \"kubevirt.io/v1\""  "kind"={"Group":"kubevirt.io","Kind":"VirtualMachineInstance"}
      E0524 21:11:54.854598       1 controller.go:203] controller/virtualmachineinstance "msg"="Could not wait for Cache to sync" "error"="failed to wait for virtualmachineinstance caches to sync: timed out waiting for cache to be synced" "reconciler group"="kubevirt.io" "reconciler kind"="VirtualMachineInstance" 
      I0524 21:11:54.854658       1 logr.go:261]  "msg"="Stopping and waiting for non leader election runnables"  
      I0524 21:11:54.854673       1 logr.go:261]  "msg"="Stopping and waiting for leader election runnables"  
      I0524 21:11:54.854692       1 controller.go:240] controller/kubevirtcluster "msg"="Shutdown signal received, waiting for all workers to finish" "reconciler group"="infrastructure.cluster.x-k8s.io" "reconciler kind"="KubevirtCluster" 
      I0524 21:11:54.854707       1 controller.go:240] controller/kubevirtmachine "msg"="Shutdown signal received, waiting for all workers to finish" "reconciler group"="infrastructure.cluster.x-k8s.io" "reconciler kind"="KubevirtMachine" 
      I0524 21:11:54.854732       1 controller.go:242] controller/kubevirtcluster "msg"="All workers finished" "reconciler group"="infrastructure.cluster.x-k8s.io" "reconciler kind"="KubevirtCluster" 
      I0524 21:11:54.854749       1 controller.go:242] controller/kubevirtmachine "msg"="All workers finished" "reconciler group"="infrastructure.cluster.x-k8s.io" "reconciler kind"="KubevirtMachine" 
      I0524 21:11:54.854763       1 logr.go:261]  "msg"="Stopping and waiting for caches"  
      I0524 21:11:54.854866       1 reflector.go:225] Stopping reflector *v1beta1.Machine (1m2.391574275s) from pkg/mod/k8s.io/client-go@v0.23.1/tools/cache/reflector.go:167
      I0524 21:11:54.854922       1 reflector.go:225] Stopping reflector *v1alpha1.KubevirtCluster (1m2.204012497s) from pkg/mod/k8s.io/client-go@v0.23.1/tools/cache/reflector.go:167
      I0524 21:11:54.854938       1 logr.go:261]  "msg"="Stopping and waiting for webhooks"  
      I0524 21:11:54.854962       1 reflector.go:225] Stopping reflector *v1alpha1.KubevirtMachine (1m3.526932727s) from pkg/mod/k8s.io/client-go@v0.23.1/tools/cache/reflector.go:167
      I0524 21:11:54.855028       1 reflector.go:225] Stopping reflector *v1beta1.Cluster (1m4.381634941s) from pkg/mod/k8s.io/client-go@v0.23.1/tools/cache/reflector.go:167
      I0524 21:11:54.855070       1 reflector.go:225] Stopping reflector *v1.Secret (55.061758027s) from pkg/mod/k8s.io/client-go@v0.23.1/tools/cache/reflector.go:167
      I0524 21:11:54.855121       1 logr.go:261] controller-runtime/webhook "msg"="shutting down webhook server"  
      I0524 21:11:54.855185       1 logr.go:261]  "msg"="Wait completed, proceeding to shutdown the manager"  
      E0524 21:11:54.855240       1 logr.go:279] setup "msg"="problem running manager" "error"="failed to wait for virtualmachineinstance caches to sync: timed out waiting for cache to be synced"  
       

              nunnatsa Nahshon Unna Tsameret
              rhn-engineering-dvossel David Vossel (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

                Created:
                Updated:
                Resolved: