Uploaded image for project: 'RHEL'
  1. RHEL
  2. RHEL-93579

Inconsistent clock jumps observed when deleting and re-creating test workloads on SNO with BlueField-3 NIC

Linking RHIVOS CVEs to...Migration: Automation ...SWIFT: POC ConversionSync from "Extern...XMLWordPrintable

    • None
    • Important
    • rhel-net-drivers
    • ssg_networking
    • None
    • Hide

      Aug 18: failed to reproduce on RHEL running VDU simulation manually. New direction - add RHEL machine as a worker node to MNO cluster and make the original test. Platform availability - end of August

      Aug 6: Waiting for mstconfig output. Platform is assigned for creating a reproducer, will be available after August 21

       

      Show
      Aug 18: failed to reproduce on RHEL running VDU simulation manually. New direction - add RHEL machine as a worker node to MNO cluster and make the original test. Platform availability - end of August Aug 6: Waiting for mstconfig output. Platform is assigned for creating a reproducer, will be available after August 21  
    • False
    • False
    • Hide

      None

      Show
      None
    • Yes
    • None
    • None
    • None
    • Known Issue
    • Done
    • Unspecified
    • Unspecified
    • Unspecified
    • None

      Description of problem:

      Clock jumps observed when deleting and re-creating test workloads on SNO with BlueField-3 NIC 

      Version-Release number of selected component (if applicable):

      4.19.0-rc.3    
      
      5.14.0-570.16.1.el9_6.aarch64+64k
      
      NAME="Red Hat Enterprise Linux CoreOS"
      VERSION="9.6.20250514-0 (Plow)"
      ID="rhel"
      ID_LIKE="fedora"
      VERSION_ID="9.6"
      PLATFORM_ID="platform:el9"
      PRETTY_NAME="Red Hat Enterprise Linux CoreOS 9.6.20250514-0 (Plow)"
      ANSI_COLOR="0;31"
      LOGO="fedora-logo-icon"
      CPE_NAME="cpe:/o:redhat:enterprise_linux:9::baseos"
      HOME_URL="https://www.redhat.com/"
      DOCUMENTATION_URL="https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/9"
      BUG_REPORT_URL="https://issues.redhat.com/"
      REDHAT_BUGZILLA_PRODUCT="Red Hat Enterprise Linux 9"
      REDHAT_BUGZILLA_PRODUCT_VERSION=9.6
      REDHAT_SUPPORT_PRODUCT="Red Hat Enterprise Linux"
      REDHAT_SUPPORT_PRODUCT_VERSION="9.6"
      OSTREE_VERSION='9.6.20250514-0'
      VARIANT=CoreOS
      VARIANT_ID=coreos
      OPENSHIFT_VERSION="4.19"

      How reproducible:

      Inconsistent, cannot correlate the clock jumps with a specific action but it can be reproduced when deleting and re-creating the test workloads several times

      Steps to Reproduce:

          1. Delete and re-create the test workloads in https://gitlab.cee.redhat.com/ocp-edge-qe/vdu-workload-emulator for several iterations
          2. Watch the linuxptp-daemon-container logs
      
          

      Actual results:

          2025-05-26T09:23:52.781656152+00:00 stdout F ptp4l[305718.539]: [ptp4l.0.config:4] clockcheck: clock jumped forward or running faster than expected!
      2025-05-26T09:31:23.749294172+00:00 stdout F ptp4l[306169.506]: [ptp4l.0.config:4] clockcheck: clock jumped backward or running slower than expected!
      2025-05-26T09:31:24.782181116+00:00 stdout F ptp4l[306170.539]: [ptp4l.0.config:4] clockcheck: clock jumped forward or running faster than expected!
      2025-05-26T09:33:16.094255680+00:00 stdout F ptp4l[306281.851]: [ptp4l.0.config:4] clockcheck: clock jumped forward or running faster than expected!
      2025-05-26T09:39:18.103485020+00:00 stdout F ptp4l[306643.860]: [ptp4l.0.config:4] clockcheck: clock jumped backward or running slower than expected!
      2025-05-26T09:57:16.652874146+00:00 stdout F ptp4l[307722.394]: [ptp4l.0.config:4] clockcheck: clock jumped backward or running slower than expected!
      2025-05-26T09:57:17.665695589+00:00 stdout F ptp4l[307723.421]: [ptp4l.0.config:4] clockcheck: clock jumped forward or running faster than expected!
      

      Expected results:

         No clock jumps observed 

      Additional info:

      ethtool -i enp1s0f0np0
      driver: mlx5_core
      version: 5.14.0-570.16.1.el9_6.aarch64+6
      firmware-version: 32.45.1020 (MT_0000000884)
      expansion-rom-version: 
      bus-info: 0000:01:00.0
      supports-statistics: yes
      supports-test: yes
      supports-eeprom-access: no
      supports-register-dump: no
      supports-priv-flags: yes
      
      We also tried a newer kernel driver(6.15.0-0.rc7.58.eln148.aarch64) but the issue still reproduced.
      
      must-gather and sosreport available at https://drive.google.com/drive/folders/1uKgFWLxKYkxbtix9ji6ozfx_MLwARKl8?usp=drive_link 
      
      SNO configuration available at https://gitlab.cee.redhat.com/ocp-edge-qe/ztp-site-configs/-/tree/cnfdg46-4.19?ref_type=heads

       

              rh-ee-bpoirier Benjamin Poirier
              mcornea@redhat.com Marius Cornea
              NVIDIA (Mellanox) Confidential Group
              Benjamin Poirier Benjamin Poirier
              Yalin Li Yalin Li
              Katie Drake Katie Drake
              Votes:
              0 Vote for this issue
              Watchers:
              16 Start watching this issue

                Created:
                Updated: