Uploaded image for project: 'OpenShift Virtualization'
  1. OpenShift Virtualization
  2. CNV-21035

[2124406] Memory dump hp-volume pod keeps in pending sometimes

XMLWordPrintable

    • Medium
    • No

      Description of problem:
      happened to find that sometime after VM restart, the vm is scheduled to another node for some reason, then we tried to trigger the memory dump again, the hp-volume pod keeps in pending status since the memory and can not do a new memory dump since the previous is not finished.

      Version-Release number of selected component (if applicable):
      CNV-v4.12.0-450

      How reproducible:
      Sometimes

      Steps to Reproduce:
      1. Create a VM
      2. Do memory dump $ virtctl memory-dump get vm-fedora-datavolume --claim-name=memoryvolume --create-claim
      3. Restart the VM - sometimes the vm is scheduled to another node
      4. Do memory dump again $ virtctl memory-dump get vm-fedora-datavolume

      Actual results:
      $ oc get pod -n default
      NAME READY STATUS RESTARTS AGE
      hp-volume-w4nz8 0/1 Pending 0 19h
      virt-launcher-vm-fedora-datavolume-qjzxj 1/1 Running 0 19h

      Events:
      Type Reason Age From Message
      ---- ------ ---- ---- -------
      Warning FailedScheduling 118m (x2088 over 19h) default-scheduler 0/6 nodes are available: 3 node(s) had untolerated taint

      {node-role.kubernetes.io/master: }

      , 5 node(s) didn't match Pod's node affinity/selector, 5 node(s) had volume node affinity conflict. preemption: 0/6 nodes are available: 6 Preemption is not helpful for scheduling.
      Warning FailedScheduling 14m default-scheduler 0/6 nodes are available: 3 node(s) had untolerated taint

      {node-role.kubernetes.io/master: }

      , 5 node(s) didn't match Pod's node affinity/selector, 5 node(s) had volume node affinity conflict. preemption: 0/6 nodes are available: 6 Preemption is not helpful for scheduling.
      Warning FailedScheduling 14m default-scheduler 0/6 nodes are available: 3 node(s) had untolerated taint

      {node-role.kubernetes.io/master: }

      , 5 node(s) didn't match Pod's node affinity/selector, 5 node(s) had volume node affinity conflict. preemption: 0/6 nodes are available: 6 Preemption is not helpful for scheduling.
      Warning FailedScheduling 9m52s default-scheduler 0/6 nodes are available: 1 node(s) had untolerated taint

      {node.kubernetes.io/unschedulable: }

      , 1 node(s) were unschedulable, 3 node(s) had untolerated taint

      {node-role.kubernetes.io/master: }

      , 5 node(s) didn't match Pod's node affinity/selector, 5 node(s) had volume node affinity conflict. preemption: 0/6 nodes are available: 6 Preemption is not helpful for scheduling.
      Warning FailedScheduling 8m28s default-scheduler 0/6 nodes are available: 3 node(s) had untolerated taint

      {node-role.kubernetes.io/master: }

      , 5 node(s) didn't match Pod's node affinity/selector, 5 node(s) had volume node affinity conflict. preemption: 0/6 nodes are available: 6 Preemption is not helpful for scheduling.

      Expected results:
      maybe we could have a friendly warning about why the memory dump failed at this situation?

      Additional info:

              skagan@redhat.com Shelly Kagan
              yadu1@redhat.com Yan Du
              Yan Du Yan Du
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

                Created:
                Updated: