Uploaded image for project: 'OpenShift Virtualization'
  1. OpenShift Virtualization
  2. CNV-22522

[2143018] VM on heterogeneous AMD cluster may experience significant timejumps and soft locks

XMLWordPrintable

    • Urgent
    • None

      Description of problem:

      Reference: Bug 2125671

      On a heterogeneous AMD cluster where each node has own tsc-frequency - the lowest frequency set to all nodes, as expected:
      > $ for i in $(oc get node -o name);do echo $i;oc describe $i | grep tsc-freq; done
      > node/cnv-qe-infra-25.cnvqe2.lab.eng.rdu2.redhat.com
      > cpu-timer.node.kubevirt.io/tsc-frequency=1800000000
      > scheduling.node.kubevirt.io/tsc-frequency-1800000000=true
      > node/cnv-qe-infra-26.cnvqe2.lab.eng.rdu2.redhat.com
      > cpu-timer.node.kubevirt.io/tsc-frequency=2500000000
      > scheduling.node.kubevirt.io/tsc-frequency-1800000000=true
      > scheduling.node.kubevirt.io/tsc-frequency-2500000000=true
      > node/cnv-qe-infra-27.cnvqe2.lab.eng.rdu2.redhat.com
      > cpu-timer.node.kubevirt.io/tsc-frequency=3000000000
      > scheduling.node.kubevirt.io/tsc-frequency-1800000000=true
      > scheduling.node.kubevirt.io/tsc-frequency-3000000000=true

      And VM is asking for this frequency:
      > bash-4.4$ virsh dumpxml 1 | grep tsc
      > <timer name='tsc' frequency='1800000000'/>

      However, VM may observe time jumps in logs right after run or after migration:

      > Nov 15 13:22:28 rhel-tsc-10 systemd[4839]: Startup finished in 27ms.
      > Nov 15 13:22:28 rhel-tsc-10 systemd[1]: Started User Manager for UID 1000.
      > Nov 15 13:22:28 rhel-tsc-10 systemd[1]: Started Session 2 of user fedora.
      > Nov 15 16:20:18 rhel-tsc-10 kernel: clocksource: timekeeping watchdog on CPU0: Marking clocksource 'tsc' as unstable because the skew is too large:
      > Nov 15 16:20:18 rhel-tsc-10 kernel: clocksource: 'kvm-clock' wd_now: a007fc3247f wd_last: 5368ba9ca5 mask: ffffffffffffffff
      > Nov 15 16:20:18 rhel-tsc-10 kernel: clocksource: 'tsc' cs_now: 1200f50582d2 cs_last: 96329863ce mask: ffffffffffffffff
      > Nov 15 16:20:18 rhel-tsc-10 kernel: tsc: Marking TSC unstable due to clocksource watchdog
      > Nov 15 16:20:18 rhel-tsc-10 systemd[1]: Starting dnf makecache...

      and switching from tsc to kvm-clock:

      > # cat /sys/devices/system/clocksource/clocksource0/current_clocksource
      > kvm-clock

      Version-Release number of selected component (if applicable):
      4.11

              iholder@redhat.com Itamar Holder
              dshchedr@redhat.com Denys Shchedrivyi
              Denys Shchedrivyi Denys Shchedrivyi
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

                Created:
                Updated:
                Resolved: