Uploaded image for project: 'OpenShift Virtualization'
  1. OpenShift Virtualization
  2. CNV-75726

High IO for short duration(5min) causes VirtualMachineInstance crash

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Undefined Undefined
    • None
    • None
    • Storage Platform
    • Quality / Stability / Reliability
    • 0.42
    • False
    • Hide

      None

      Show
      None
    • False
    • None
    • None

      Description of problem:

      High IO for short duration(5min) on all worker nodes causes some VirtualMachineInstance to crash

      Version-Release number of selected component (if applicable):

      fence-agents-remediation.v0.6.0.yaml
      kubevirt-hyperconverged-operator.v4.18.23.yaml
      node-healthcheck-operator.v0.10.1.yaml

      odf-operator.v4.18.14-rhodf.yaml
      fence-agents-remediation.v0.6.0.yaml
      mcg-operator.v4.18.14-rhodf.yaml
      odf-csi-addons-operator.v4.18.14-rhodf.yaml
      cephcsi-operator.v4.18.14-rhodf.yaml
      recipe.v4.18.14-rhodf.yaml
      ocs-operator.v4.18.14-rhodf.yaml
      ocs-client-operator.v4.18.14-rhodf.yaml
      odf-prometheus-operator.v4.18.14-rhodf.yaml
      rook-ceph-operator.v4.18.14-rhodf.yaml
      odf-dependencies.v4.18.14-rhodf.yaml
      node-healthcheck-operator.v0.10.1.yaml

       

      How reproducible:

      run stress-ng with below values for 5 mins on all nodes

      io-block-size: 4k
      io-write-bytes: 2g

      Steps to Reproduce:

      1.run io hog test using krkn (https://github.com/krkn-chaos/krkn)
      use below config for the io-hog scenario
      duration: 300
      workers: '' # leave it empty '' node cpu auto-detection
      hog-type: io
      image: quay.io/krkn-chaos/krkn-hog
      namespace: default
      io-block-size: 4k
      io-write-bytes: 2g
      io-target-pod-folder: /hog-data
      # node-name: "worker-0" # Uncomment to target a specific node by name
      io-target-pod-volume:
        name: node-volume
        hostPath:
          path: /root # a path writable by kubelet in the root filesystem of the node
      node-selector: "node-role.kubernetes.io/worker="
      number-of-nodes: ''
      taints: [] #example ["node-role.kubernetes.io/master:NoSchedule"]    
      ~                                                                  
      2.
      3.
      

      Actual results:

      openshift-kni-infra                    16m         Warning   Unhealthy                pod/keepalived-e10-h27-000-r660                                   Liveness probe failed: command timed out                                              
      virt-clone-clones                      14m         Warning   Stopped                  virtualmachineinstance/clone-vm-0-108                             The VirtualMachineInstance crashed. 

      Expected results:

      VirtualMachineInstance should able to withstand high IO usage for short duration, 

      Additional info:

      below is iostat output from one of the node, IO operations were done on sda(sda4), the root disk for the node
      avg-cpu:  %user   %nice %system %iowait  %steal   %idle                                                                                                                                                                                       
                 3.50    0.00   86.68    0.08    0.00    9.74
      Device            r/s     rkB/s   rrqm/s  %rrqm r_await rareq-sz     w/s     wkB/s   wrqm/s  %wrqm w_await wareq-sz     d/s     dkB/s   drqm/s  %drqm d_await dareq-sz     f/s f_await  aqu-sz  %util                                        
                
      nvme0c0n1        0.00      0.00     0.00   0.00    0.00     0.00   18.50    146.00    17.50  48.61    0.03     7.89    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00   0.80
      nvme0n1          0.00      0.00     0.00   0.00    0.00     0.00   18.50    146.00     0.00   0.00    0.00     7.89    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00   0.80
      rbd0             0.00      0.00     0.00   0.00    0.00     0.00    3.00    523.00     0.00   0.00 1209.83   174.33    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    3.63  52.75
      rbd1             0.00      0.00     0.00   0.00    0.00     0.00    3.00     12.00     0.00   0.00  679.50     4.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    2.04  83.70
      rbd10            0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00   0.00
      rbd11            0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00   0.00
      rbd12            0.00      0.00     0.00   0.00    0.00     0.00    1.50    522.00     0.00   0.00 2699.33   348.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    4.05  40.00
      rbd13            0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00   0.00
      rbd15            0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00   0.00
      rbd16            0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00   0.00
      rbd17            0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00   0.00
      rbd18            0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00   0.00
      rbd19            0.00      0.00     0.00   0.00    0.00     0.00    2.00     42.00     0.00   0.00  250.25    21.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.50  67.90
      rbd2             0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00   0.00
      rbd20            0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00   0.00
      rbd21            0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00   0.00
      rbd22           15.00    524.00     0.00   0.00   39.03    34.93    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.59  63.75
      rbd23            0.00      0.00     0.00   0.00    0.00     0.00    1.00     26.00     0.00   0.00 1369.50    26.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    1.37 100.00
      rbd3             0.00      0.00     0.00   0.00    0.00     0.00    0.50    512.00     0.00   0.00 1848.00  1024.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.92 100.00
      rbd4             0.00      0.00     0.00   0.00    0.00     0.00    0.50     64.00     0.00   0.00 3047.00   128.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    1.52  26.15
      rbd5             0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00   0.00
      rbd6             0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00   0.00
      rbd7             0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00   0.00
      rbd8            42.50   1564.00     0.00   0.00    9.94    36.80    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.42  36.05
      rbd9             0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00   0.00
      sda              0.00      0.00     0.00   0.00    0.00     0.00 1948.50  34840.00    72.00   3.56    0.07    17.88    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.13  12.40
      sdb              1.50      6.00     0.00   0.00    0.00     4.00  482.50   3730.00    12.00   2.43    0.54     7.73    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.26  55.50
      sdc              0.50      2.00     0.00   0.00    0.00     4.00  603.00   4756.00    11.00   1.79    0.56     7.89    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.34  65.10
      
      Though the  stress was done on sda, it utilization  is quite low compared to rdb ones which has high utilization and high wait time. The disks that are used for ceph are sdb and sdc
      
      ceph health check(ceph -s) did not indicate any "slow ops", when stress-ng was run for 5 min.

              alitke@redhat.com Adam Litke
              ysubrama@redhat.com Yogananth Subramanian
              Natalie Gavrielov Natalie Gavrielov
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

                Created:
                Updated: