Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-57642

FLAKE: Cilium CNI temporarily breaks node readiness during live migration of Kubevirt VMs

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Normal Normal
    • None
    • 4.19.z
    • None
    • Quality / Stability / Reliability
    • False
    • Hide

      None

      Show
      None
    • None
    • Moderate
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      Description of problem:

      As seen in this run, the following test fails:

      [sig-kubevirt] migration when running openshift cluster on KubeVirt virtual machines and live migrate hosted control plane workers [Early] should maintain node readiness [Suite:openshift/conformance/parallel]   

      Version-Release number of selected component (if applicable):

      4.19, Cilium 1.15.1

      How reproducible:

          Often but not always

      Steps to Reproduce:

          1. Run Conformance tests against a hosted cluster that is started via Kubevirt and using Cilium CNI (setup can be found in this job: https://github.com/openshift/release/pull/63263
      
          

      Actual results:

      The test sometimes fails and breaks also other monitoring tests that monitor node readiness during the run of the whole test suite.

          [sig-kubevirt] migration when running openshift cluster on KubeVirt virtual machines and live migrate hosted control plane workers [Early] [It] should maintain node readiness [Suite:openshift/conformance/parallel]
        github.com/openshift/origin/test/extended/kubevirt/migration.go:61    [FAILED] Failed after 45.931s.
          nodes should maintain ready state
          Expected
              <int>: 2
          to equal
              <int>: 3
          In [It] at: github.com/openshift/origin/test/extended/kubevirt/migration.go:83 @ 06/16/25 11:53:41.306

      From node status (the lastTransitionTimestamp shows when it got ready again):

        - lastHeartbeatTime: "2025-06-16T12:42:58Z"
          lastTransitionTime: "2025-06-16T11:54:07Z"
          message: kubelet is posting ready status
          reason: KubeletReady
          status: "True"
          type: Ready 

      Events from time 11:5x:xx from event filter:

      11:53:39 -     Node 8dee117a6a36f0104e27-gnm4n-khmwg status is now: NodeNotReady
      11:54:07 (x2) - Node 8dee117a6a36f0104e27-gnm4n-khmwg status is now: NodeReady 

      Expected results:

          Nodes is ready all the time and doesn't loose readiness

      Additional info:

      The root cause is unknown.

              kmajcher@redhat.com Krzysztof Majcher
              mgencur@redhat.com Martin Gencur
              None
              None
              Martin Gencur Martin Gencur
              None
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

                Created:
                Updated: