Details
-
Bug
-
Resolution: Unresolved
-
Normal
-
None
-
4.12
-
Moderate
-
No
-
SDN Sprint 250
-
1
-
False
-
-
No mention of the cause for the initial time drift; workaround is to delete sdn- pod in openshift-sdn
Description
Description of problem:
The sdn container from sdn pod seems to have the ready status affected in case of time drift events in the node's time. If the node boots with a configured hour xxh:xxmin and the chronyd changes the hour to x - 1h, the sdn container stuck in unready state for a quantity of minutes out of the normal. Usually the container becomes ready in some seconds and during the time drift events, the container stuck several minutes in unready state. The sdn pod is recovered after deleting it. Although, the container does not change the status to ready, in the container logs, its functions are performed properly. For example: - There is the message 'openshift-sdn network plugin ready' - The CNI_ADD and CNI_DEL events happen properly See more detail in the linked MG.
Version-Release number of selected component (if applicable):
4.12.39
How reproducible:
I was able to reproduce the bug with the below actions: - Cluster version ~~~ $ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.12.39 True False 104m Cluster version is 4.12.39 ~~~ - At the moment of the reboot, I have configured the below date to the OCP node: ~~~ $ ssh quickcluster@worker-0.bgomes41239.lab -- sudo date 022008492024 ~~~ - The sdn pod stuck in unready state too: ~~~ -- Reboot -- Feb 20 08:52:10 worker-0.bgomes41239.lab. chronyd[974]: Selected source x.x.x.x (clock.redhat.com) Feb 20 08:52:10 worker-0.bgomes41239.lab. chronyd[974]: System clock wrong by 3722.629599 seconds <------- Feb 20 09:54:12 worker-0.bgomes41239.lab. chronyd[974]: System clock was stepped by 3722.629599 seconds $ date && oc get pod -n openshift-sdn -owide sdn-9dr2f Tue Feb 20 09:53:05 AM WET 2024 <------ ~3 minutes in unready state where the container usually take seconds to become ready. There are events where this delays more than 20 minutes NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES sdn-9dr2f 1/2 Running 14 41m 10.0.89.78 worker-0.bgomes41239.lab. <none> <none> - lastProbeTime: null lastTransitionTime: "2024-02-20T09:50:56Z" message: 'containers with unready status: [sdn]' reason: ContainersNotReady status: "False" ~~~
Steps to Reproduce:
1. 2. 3.
Actual results:
Expected results:
Additional info: