Uploaded image for project: 'RHEL'
  1. RHEL
  2. RHEL-9279

TCP timewait timer causes interference

Details

    • Major
    • sst_kernel_rts
    • ssg_core_kernel
    • 5
    • Hide

      10/30: Yellow. Possible fix identified upstream, still too early to forecast delivery.

      10/02: Long term resolution still under investigation

      Show
      10/30: Yellow. Possible fix identified upstream, still too early to forecast delivery. 10/02: Long term resolution still under investigation
    • False
    • Hide

      None

      Show
      None
    • CK-May-2024
    • If docs needed, set a value

    Description

      Description of problem:

      I am seeing OSLAT spikes above 20usec. This is quite rperoducible.
      Spikes due to timer waking up on isolated cores

      oslat-79159 [060] d...2.. 2480.504049: sched_switch: prev_comm=oslat prev_pid=79159 prev_prio=98 prev_state=R ==> next_comm=ktimers/60 next_pid=573 next_prio=88
      <...>-573 [060] ...s.12 2480.504050: softirq_entry: vec=1 [action=TIMER]
      <...>-573 [060] d..s113 2480.504051: timer_cancel: timer=00000000276551ad
      <...>-573 [060] ...s.13 2480.504051: timer_expire_entry: timer=00000000276551ad function=tw_timer_handler now=4297146368 baseclk=4297146368
      <...>-573 [060] ...s.13 2480.504054: timer_expire_exit: timer=00000000276551ad
      <...>-573 [060] ...s.12 2480.504054: softirq_exit: vec=1 [action=TIMER]
      <...>-573 [060] ...s.12 2480.504054: softirq_entry: vec=7 [action=SCHED]
      <...>-573 [060] ...s.12 2480.504056: softirq_exit: vec=7 [action=SCHED]
      <...>-573 [060] d...2.. 2480.504057: sched_switch: prev_comm=ktimers/60 prev_pid=573 prev_prio=88 prev_state=S ==> next_comm=ksoftirqd/60 next_pid=574 next_prio=88
      <...>-574 [060] d...2.. 2480.504059: sched_switch: prev_comm=ksoftirqd/60 prev_pid=574 prev_prio=88 prev_state=S ==> next_comm=oslat next_pid=79159 next_prio=98
      oslat-79159 [060] ....... 2480.504075: tracing_mark_write: oslat: Trace threshold (20 us) triggered with 26 us!
      oslat-79159 [060] d...4.. 2480.504078: sched_wakeup: comm=oslat pid=79155 prio=98 target_cpu=010

      BOOT_IMAGE=(hd0,gpt3)/ostree/rhcos-a9e8439864eb99b1974346c32fcd39c1b98563f7bc525ad6a13d4751aefc09fc/vmlinuz-5.14.0-284.23.1.rt14.308.el9_2.x86_64 ignition.platform.id=metal ostree=/ostree/boot.0/rhcos/a9e8439864eb99b1974346c32fcd39c1b98563f7bc525ad6a13d4751aefc09fc/0 root=UUID=cfbc2768-6425-4d00-9d48-21af39937a31 rw rootflags=prjquota boot=UUID=cc388c72-9224-4f92-979c-50e59f30384c systemd.unified_cgroup_hierarchy=0 systemd.legacy_systemd_cgroup_controller=1 crashkernel=512M skew_tick=1 nohz=on rcu_nocbs=2-55,58-111 tuned.non_isolcpus=03000000,00000003 systemd.cpu_affinity=0,1,56,57 intel_iommu=on iommu=pt isolcpus=managed_irq,2-55,58-111 nohz_full=2-55,58-111 nosoftlockup nmi_watchdog=0 mce=off rcutree.kthread_prio=11 default_hugepagesz=1G hugepagesz=1G hugepages=32 rcupdate.rcu_normal_after_boot=0 efi=runtime module_blacklist=irdma intel_pstate=disable tsc=reliable

      Full trace attached

      Version-Release number of selected component (if applicable):
      OCP: 4.13.5
      kernel: 5.14.0-284.23.1.rt14.308.el9_2.x86_64

      How reproducible:
      Cannot get through a 1hr run without hitting this

      Steps to Reproduce:
      1. Run OSLAT on a SNO
      2.
      3.

      Actual results:

      Expected results:

      Additional info:

      Attachments

        Activity

          People

            rh-ee-vschneid Valentin Schneider
            browsell@redhat.com Brent Rowsell
            Joshua Clark, Yang Liu
            Valentin Schneider Valentin Schneider
            Qiao Zhao Qiao Zhao
            Votes:
            0 Vote for this issue
            Watchers:
            20 Start watching this issue

            Dates

              Created:
              Updated: