Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-59281

"chrony-wait.service" fails on all Nodes with "506 Cannot talk to daemon"

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Not a Bug
    • Icon: Major Major
    • None
    • 4.18.z
    • RHCOS
    • None
    • Quality / Stability / Reliability
    • False
    • Hide

      None

      Show
      None
    • None
    • Important
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      Description of problem:

      After upgrading to OpenShift Container Platform 4.18.15, the customer noticed that on all Nodes, the "chrony-wait.service" is in status failed:

      $ sudo systemctl status chrony-wait.service 
      × chrony-wait.service - Wait for chrony to synchronize system clock
           Loaded: loaded (/usr/lib/systemd/system/chrony-wait.service; disabled; preset: disabled)
           Active: failed (Result: timeout) since Wed 2025-07-09 12:27:39 UTC; 42min ago
             Docs: man:chronyc(1)
         Main PID: 1434 (code=exited, status=1/FAILURE)
              CPU: 113ms
      
      Jul 09 12:24:39 xxx-01-worker-az1-dp6ck systemd[1]: Starting Wait for chrony to synchronize system clock...
      Jul 09 12:27:39 xxx-01-worker-az1-dp6ck systemd[1]: chrony-wait.service: start operation timed out. Terminating.
      Jul 09 12:27:39 xxx-01-worker-az1-dp6ck systemd[1]: chrony-wait.service: Main process exited, code=exited, status=1/FAILURE
      Jul 09 12:27:39 xxx-01-worker-az1-dp6ck systemd[1]: chrony-wait.service: Failed with result 'timeout'.
      Jul 09 12:27:39 xxx-01-worker-az1-dp6ck systemd[1]: Failed to start Wait for chrony to synchronize system clock.

      We can see that chronyd is running as expected:

      # systemctl status chronyd
      ● chronyd.service - NTP client/server
           Loaded: loaded (/usr/lib/systemd/system/chronyd.service; enabled; preset: enabled)
          Drop-In: /usr/lib/systemd/system/chronyd.service.d
                   └─platform-chrony.conf
           Active: active (running) since Wed 2025-07-09 12:24:39 UTC; 1h 32min ago
             Docs: man:chronyd(8)
                   man:chrony.conf(5)
         Main PID: 1415 (chronyd)
            Tasks: 1 (limit: 1650126)
           Memory: 6.5M
              CPU: 214ms
           CGroup: /system.slice/chronyd.service
                   └─1415 /usr/sbin/chronyd -F 2
      
      Jul 09 12:24:49 xxx-dp6ck chronyd[1415]: Source 10.x.x.x online
      Jul 09 12:24:49 xxx-dp6ck chronyd[1415]: Source 162.x.x.x online
      Jul 09 12:24:49 xxx-dp6ck chronyd[1415]: Source 10.x.x.x online
      Jul 09 12:24:49 xxx-dp6ck chronyd[1415]: Source 162.x.x.x online
      Jul 09 12:24:49 xxx-dp6ck chronyd[1415]: Source 10.x.x.x online
      Jul 09 12:24:49 xxx-dp6ck chronyd[1415]: Source 162.x.x.148 online
      Jul 09 12:24:49 xxx-dp6ck chronyd[1415]: Source 10.x.x.x online
      Jul 09 12:24:49 xxx-dp6ck chronyd[1415]: Source 162.x.x.x online
      Jul 09 12:25:15 xxx-dp6ck chronyd[1415]: Selected source 162.x.x.148 (time.example.com)
      Jul 09 12:25:15 xxx-dp6ck chronyd[1415]: System clock TAI offset set to 37 seconds

      Connectivity to the time server seems to work as expected:

      # chronyc tracking
      Reference ID    : XXXXXXXX (censored.time.example.com)
      Stratum         : 3
      Ref time (UTC)  : Wed Jul 09 13:56:45 2025
      System time     : 0.000163174 seconds fast of NTP time
      Last offset     : +0.000122818 seconds
      RMS offset      : 0.000050753 seconds
      Frequency       : 31.114 ppm slow
      Residual freq   : +0.005 ppm
      Skew            : 0.048 ppm
      Root delay      : 0.001586515 seconds
      Root dispersion : 0.000255515 seconds
      Update interval : 518.0 seconds
      Leap status     : Normal
      
      # chronyc sources
      MS Name/IP address         Stratum Poll Reach LastRx Last sample               
      ===============================================================================
      ^+ cxx2.time.example.com         2  10   377   185   +103us[ +221us] +/-  942us
      ^+ cxx3.time.example.com         2   9   377   424   +184us[ +295us] +/-  725us
      ^* cxx4.time.example.com         2   9   377    39    +59us[ +182us] +/-  980us
      ^+ cxx5.time.example.com         2  10   377   335    -92us[  +22us] +/- 1085us
      ^- cxx6.time.example.com         2   9   377   384    +26us[ +138us] +/- 2331us
      ^- cxx7.time.example.com         2  10   377   875   +222us[ +364us] +/- 1990us
      ^- cxx8.time.example.com         2  10   377   126    +62us[ +182us] +/- 2346us
      ^- cxx9.time.example.com         2  10   377   359   -159us[  -46us] +/- 2516us

      When executing the command from the systemd unit manually we can see the following output:

      # /usr/bin/chronyc -h 127.0.0.1,::1 waitsync 0 0.1 0.0 1
      506 Cannot talk to daemon
      506 Cannot talk to daemon
      506 Cannot talk to daemon
      506 Cannot talk to daemon
      506 Cannot talk to daemon
      506 Cannot talk to daemon
      506 Cannot talk to daemon
      506 Cannot talk to daemon
      506 Cannot talk to daemon
      506 Cannot talk to daemon
      506 Cannot talk to daemon
      506 Cannot talk to daemon
      506 Cannot talk to daemon

      We observe this issue on all clusters that were upgraded to OpenShift Container Platform 4.18.15.

      Version-Release number of selected component (if applicable):

      OpenShift Container Platform 4.18.15

      How reproducible:

      Always on two customer clusters

      Steps to Reproduce:

      1. Install a cluster with OpenShift Container Platform 4.18.15
      2. Log into an OpenShift Node using SSH
      3. Observe that the login message already shows there is a failed service ("chrony-wait.service")
      4. Execute "sudo systemctl status chrony-wait.service"

      Actual results:

      The service shows: "chrony-wait.service: Failed with result 'timeout'."

      Expected results:

      • On login, there are no failed services.
      • The chrony-wait service finishes as expected.

      Additional info:

      • sosreport available in attached Support Case
      • must-gather available in attached Support Case

              Unassigned Unassigned
              rhn-support-skrenger Simon Krenger
              None
              None
              Michael Nguyen Michael Nguyen
              None
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

                Created:
                Updated:
                Resolved: