Uploaded image for project: 'RHEL'
  1. RHEL
  2. RHEL-8666

cpuunclaimed script by bcc-tools throws ERROR "CPU samples arrived at skewed offsets"

    • Icon: Bug Bug
    • Resolution: Can't Do
    • Icon: Undefined Undefined
    • None
    • rhel-9.1.0
    • bcc
    • None
    • None
    • rhel-sst-kernel-tps
    • ssg_core_kernel
    • None
    • False
    • Hide

      None

      Show
      None
    • None
    • None
    • None
    • None
    • If docs needed, set a value
    • None

      Description of problem:

      cpuunclaimed script by bcc-tools throws ERROR "CPU samples arrived at skewed offsets"

      Version-Release number of selected component (if applicable):

      1. cat /etc/redhat-release
        Red Hat Enterprise Linux release 9.1 (Plow)
      1. uname -a
        Linux hp-bl460cg9-1.gsslab.pnq2.redhat.com 5.14.0-162.6.1.el9_1.x86_64 #1 SMP PREEMPT_DYNAMIC Fri Sep 30 07:36:03 EDT 2022 x86_64 x86_64 x86_64 GNU/Linux
      1. dmidecode | grep -iA 7 "system information"
        System Information
        Manufacturer: HP
        Product Name: ProLiant BL460c Gen9
        Version: Not Specified
        Serial Number: SGH512VBBE
        UUID: 30373237-3132-4753-4835-313256424245
        Wake-up Type: Power Switch
        SKU Number: 727021-B21
      1. rpm -qa | grep bcc
        bcc-tools-0.24.0-4.el9.x86_64
        bcc-0.24.0-4.el9.x86_64
        python3-bcc-0.24.0-4.el9.noarch

      How reproducible:

      Always

      Steps to Reproduce:

      Got to /usr/share/bcc/tools

      and Run:

      1. ./cpuunclaimed

      OR

      /usr/share/bcc/tools/cpuunclaimed

      1. ./cpuunclaimed
        Sampling run queues... Output every 1 seconds. Hit Ctrl-C to end.
        ERROR: CPU samples arrived at skewed offsets (CPUs may have powered down when idle), spanning 7161382 ns (expected < 4040404 ns). Debug with -J, and see the man page. As output may begin to be unreliable, exiting.
      1. ./cpuunclaimed 5 10
        Sampling run queues... Output every 5 seconds. Hit Ctrl-C to end.
        ERROR: CPU samples arrived at skewed offsets (CPUs may have powered down when idle), spanning 6643347 ns (expected < 4040404 ns). Debug with -J, and see the man page. As output may begin to be unreliable, exiting.

      When we run the cpuunclaimed script by full path, it shows output but with ERROR as well:

      1. /usr/share/bcc/tools/cpuunclaimed
        Sampling run queues... Output every 1 seconds. Hit Ctrl-C to end.
        %CPU 0.00%, unclaimed idle 0.00% <<---
        ERROR: CPU samples arrived at skewed offsets (CPUs may have powered down when idle), spanning 5163630 ns (expected < 4040404 ns). Debug with -J, and see the man page. As output may begin to be unreliable, exiting.

      I have also ran a stress-ng test in parallel on another terminal and it still throws error.

      Actual results:

      Script throws error:

      ERROR: CPU samples arrived at skewed offsets (CPUs may have powered down when idle)

      Expected results:

      It should not throw error "ERROR: CPU samples arrived at skewed offsets"

      Additional info:

      Looks like there was a fix with commit 77f4f663ad567e1ecf4528d25f00af548ac746b9 in upstream bcc :

      $ git show 77f4f663
      commit 77f4f663ad567e1ecf4528d25f00af548ac746b9
      Author: yonghong-song <ys114321@gmail.com>
      Date: Thu Jan 24 12:48:25 2019 -0800

      fix cpuunclaimed.py with cfs_rq structure change (#2164)

      Similar to runqlen.py, make proper adjustment for
      cfs_rq_partial structure so it can align with
      what the kernel expects.

      Signed-off-by: Yonghong Song <yhs@fb.com>

      diff --git a/tools/cpuunclaimed.py b/tools/cpuunclaimed.py
      index b862bad2..75ee9324 100755
      — a/tools/cpuunclaimed.py
      +++ b/tools/cpuunclaimed.py
      @@ -62,8 +62,9 @@ from time import sleep, strftime
      from ctypes import c_int
      import argparse
      import multiprocessing
      -from os import getpid, system
      +from os import getpid, system, open, close, dup, unlink, O_WRONLY
      import ctypes as ct
      +from tempfile import NamedTemporaryFile

      1. arguments
        examples = """examples:
        @@ -98,6 +99,66 @@ wakeup_s = float(1) / wakeup_hz
        ncpu = multiprocessing.cpu_count() # assume all are online
        debug = 0

      +# Linux 4.15 introduced a new field runnable_weight
      +# in linux_src:kernel/sched/sched.h as
      +# struct cfs_rq

      { +# struct load_weight load; +# unsigned long runnable_weight; +# unsigned int nr_running, h_nr_running; +# ...... +# }

      +# and this tool requires to access nr_running to get
      +# runqueue len information.
      +#
      +# The commit which introduces cfs_rq->runnable_weight
      +# field also introduces the field sched_entity->runnable_weight
      +# where sched_entity is defined in linux_src:include/linux/sched.h.
      +#
      +# To cope with pre-4.15 and 4.15/post-4.15 releases,
      +# we run a simple BPF program to detect whether
      +# field sched_entity->runnable_weight exists. The existence of
      +# this field should infer the existence of cfs_rq->runnable_weight.
      +#
      +# This will need maintenance as the relationship between these
      +# two fields may change in the future.
      +#
      .
      .
      .
      <..>

              jmarchan@redhat.com Jerome Marchand
              rhn-support-pray Pinak Ray
              Ziqian (Zamir) SUN Ziqian (Zamir) SUN
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

                Created:
                Updated:
                Resolved: