Uploaded image for project: 'OpenShift Virtualization'
  1. OpenShift Virtualization
  2. CNV-52279

Intermittently vms with network bridges are going to Failed state, before crashing and recovering

XMLWordPrintable

    • 0.42
    • False
    • Hide

      None

      Show
      None
    • False
    • None
    • ---
    • ---
    • None

      Description of problem: Seen multiple times against 4.18 network T2 gating lanes, causes T2 gating tests to fail.

       
      e.g.: https://main-jenkins-csb-cnvqe.apps.ocp-c1.prod.psi.redhat.com/job/test-pytest-cnv-4.18-network-gating/50/  - 3 tests
      https://main-jenkins-csb-cnvqe.apps.ocp-c1.prod.psi.redhat.com/job/test-pytest-cnv-4.18-network-gating/40/testReport/ - 3 tests
      
      
      

      Version-Release number of selected component (if applicable):

      v4.18.0.rhel9-397 and v4.18.0.rhel9-351
      

      How reproducible:

      Intermittently - seen twice in last one week
      

      Steps to Reproduce:

      1. Create vms for network tests 
      2.
      3.
      

      Actual results:

      
      

      Expected results:

      
      

      Additional info:
      Vms for these tests were created, and in the events collected I see they initially failed scheduling:

      0/6 nodes are available: 1 Insufficient bridge.network.kubevirt.io/br6test, 2 node(s) didn't match Pod's node affinity/selector, 3 node(s) had untolerated taint {node-role.kubernetes.io/master: }. preemption: 0/6 nodes are available: 1 No preemption victims found for incoming pod, 5 Preemption is not helpful for scheduling.
      

      Then I see SyncFailed event with message "failed to configure vmi network: setup failed, err: Critical network error: failed to setup link [k6t-d9ccdbb6cf3]: interrupted system call"
      Subsequently a "The VirtualMachineInstance crashed." event message is seen before a "Killing" event with message "Stopping container compute" is seen. After that I see the vm was successfully created. These sequence of events are causing gating tests to fail (as we catch the vm in "Failed" status and assume it is not recoverable.

      Will attach collected must-gather

              phoracek@redhat.com Petr Horacek
              rhn-support-dbasunag Debarati Basu-Nag
              Yossi Segev Yossi Segev
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

                Created:
                Updated: