Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-18360

ptp clock does not stay locked in GM configuration, offsets are very high

XMLWordPrintable

    • No
    • False
    • Hide

      None

      Show
      None
    • Hide
      9:/14: Apply new modified ptp config with required ublox command fixed offset issue.
      9/14: GM is stable now : side loaded with intree kernel, Modified ptp config t disable all constination except for gps and added surveyin command to execute via pluign. Also removed existing OC and BC ptp config. Qe to retest with other oc/BC ptpconfig loaded
      9/13: cnfde4 is blocked due to deployment failure. QE is going to try same test with intree driver to ublock testing and mark dependcy on intree driver as known issue
      9/12: Cnfde22 was due to mis configured FIFO schd with ice-gnss, cnfde4 has been recovered.
      Waiting for QE to confirm if the labs is recovered
      9/2/2023: Issues are seen in both dev and qe labs + reported by other teams. Discussing with the kernel/driver team for known issues. Marked as a blocker. Yellow
      Show
      9:/14: Apply new modified ptp config with required ublox command fixed offset issue. 9/14: GM is stable now : side loaded with intree kernel, Modified ptp config t disable all constination except for gps and added surveyin command to execute via pluign. Also removed existing OC and BC ptp config. Qe to retest with other oc/BC ptpconfig loaded 9/13: cnfde4 is blocked due to deployment failure. QE is going to try same test with intree driver to ublock testing and mark dependcy on intree driver as known issue 9/12: Cnfde22 was due to mis configured FIFO schd with ice-gnss, cnfde4 has been recovered. Waiting for QE to confirm if the labs is recovered 9/2/2023: Issues are seen in both dev and qe labs + reported by other teams. Discussing with the kernel/driver team for known issues. Marked as a blocker. Yellow

      Description of problem:

      ptp clocks were initially all locked, then went back to freerun state and offsets were very high.
      
      
      

      Version-Release number of selected component (if applicable):

      ptp-operator.v4.14.0-202308291102

      How reproducible:

      100%

      Steps to Reproduce:

      1. Deploy SNO with PTP GM configuration
      
      2. Check ptp metrics:
      oc -n openshift-ptp exec $(oc get pods -n openshift-ptp|awk '/linuxptp-daemon.*Running/ {print $1}') -c linuxptp-daemon-container -- curl -s 127.0.0.1:9091/metrics
      [...]
      # TYPE openshift_ptp_clock_state gauge
      openshift_ptp_clock_state{iface="CLOCK_REALTIME",node="cnfde4.ptp.lab.eng.bos.redhat.com",process="phc2sys"} 1
      openshift_ptp_clock_state{iface="ens3f0",node="cnfde4.ptp.lab.eng.bos.redhat.com",process="GM"} 1
      openshift_ptp_clock_state{iface="ens3f0",node="cnfde4.ptp.lab.eng.bos.redhat.com",process="gnss"} 1
      openshift_ptp_clock_state{iface="ens3fx",node="cnfde4.ptp.lab.eng.bos.redhat.com",process="ts2phc"} 1
      openshift_ptp_clock_state{iface="ens5fx",node="cnfde4.ptp.lab.eng.bos.redhat.com",process="ptp4l"} 1
      openshift_ptp_clock_state{iface="ens7fx",node="cnfde4.ptp.lab.eng.bos.redhat.com",process="ptp4l"} 1
      
      3. Run regression tests
      
      4. Check ptp metrics again:
      [kni@registry.ran-vcl01 ~]$ oc -n openshift-ptp exec $(oc get pods -n openshift-ptp|awk '/linuxptp-daemon.*Running/ {print $1}') -c linuxptp-daemon-container -- curl -s 127.0.0.1:9091/metrics # HELP cne_api_events_published Metric to get number of events published by the rest api [...]
      # HELP openshift_ptp_clock_class  # TYPE openshift_ptp_clock_class gauge openshift_ptp_clock_class{node="cnfde4.ptp.lab.eng.bos.redhat.com",process="ptp4l"} 248 # HELP openshift_ptp_clock_state 0 = FREERUN, 1 = LOCKED, 2 = HOLDOVER # TYPE openshift_ptp_clock_state gauge openshift_ptp_clock_state{iface="CLOCK_REALTIME",node="cnfde4.ptp.lab.eng.bos.redhat.com",process="phc2sys"} 0 openshift_ptp_clock_state{iface="ens3f0",node="cnfde4.ptp.lab.eng.bos.redhat.com",process="GM"} 1 openshift_ptp_clock_state{iface="ens3f0",node="cnfde4.ptp.lab.eng.bos.redhat.com",process="gnss"} 1 openshift_ptp_clock_state{iface="ens3fx",node="cnfde4.ptp.lab.eng.bos.redhat.com",process="ts2phc"} 0 openshift_ptp_clock_state{iface="ens5fx",node="cnfde4.ptp.lab.eng.bos.redhat.com",process="ptp4l"} 0 openshift_ptp_clock_state{iface="ens7fx",node="cnfde4.ptp.lab.eng.bos.redhat.com",process="ptp4l"} 0
      
      5. Check linuxptp-daemon logs: 
      oc logs -n openshift-ptp linuxptp-daemon-cszdm linuxptp-daemon-container
      [...]
      phc2sys[70911.867]: [ptp4l.4.config] CLOCK_REALTIME phc offset 249495482 s2 freq +100000000 delay    510
      ptp4l[70911.869]: [ptp4l.0.config] master offset    6267876 s2 freq -446715 path delay     12096
      ptp4l[70911.872]: [ptp4l.2.config] master offset    6202717 s2 freq -492415 path delay     30616
      ptp4l[70911.881]: [ptp4l.1.config] master offset    6181769 s2 freq -323470 path delay     20653
      ts2phc[70911.905]: [ts2phc.4.config] nmea sentence: GNRMC,204243.00,A,4233.24463,N,07126.64561,W,0.000,,300823,,,A,V
      ts2phc[70911.905]: [ts2phc.4.config] nmea sentence: GNGGA,204243.00,4233.24463,N,07126.64561,W,1,12,99.99,98.6,M,-33.1,M,,
      ts2phc[70911.905]: [ts2phc.4.config] nmea sentence: GNGSA,A,3,10,12,02,21,23,24,25,28,32,,,,99.99,99.99,99.99,1
      ts2phc[70911.905]: [ts2phc.4.config] nmea sentence: GNGSA,A,3,04,09,10,11,12,19,21,36,,,,,99.99,99.99,99.99,3
      ts2phc[70911.905]: [ts2phc.4.config] nmea sentence: GNGSA,A,3,,,,,,,,,,,,,99.99,99.99,99.99,5
      ts2phc[70911.905]: [ts2phc.4.config] nmea sentence: GPGSV,3,1,11,02,15,318,38,10,85,275,51,12,29,095,44,15,05,091,30,1
      ts2phc[70911.905]: [ts2phc.4.config] nmea sentence: GPGSV,3,2,11,18,05,184,36,21,19,311,43,23,52,142,47,24,31,052,43,1
      ts2phc[70911.905]: [ts2phc.4.config] nmea sentence: GPGSV,3,3,11,25,24,142,39,28,26,220,46,32,50,290,50,1
      ts2phc[70911.905]: [ts2phc.4.config] nmea sentence: GAGSV,3,1,11,04,73,338,49,09,29,280,44,10,28,215,44,11,38,247,42,7
      ts2phc[70911.905]: [ts2phc.4.config] nmea sentence: GAGSV,3,2,11,12,13,195,34,19,32,076,42,20,,,39,21,29,066,45,7
      ts2phc[70911.905]: [ts2phc.4.config] nmea sentence: GAGSV,3,3,11,27,01,024,07,31,06,171,38,36,27,306,45,7
      ts2phc[70911.905]: [ts2phc.4.config] nmea sentence: GNGST,204243.00,14,2.8,2.8,0.0000,1.2,1.2,1.2
      ts2phc[70911.905]: [ts2phc.4.config] nmea sentence: GNZDA,204243.00,30,08,2023,00,00
      ts2phc[70911.905]: [ts2phc.4.config] nmea sentence: GNGBS,204243.00,1.2,1.2,1.2,,,,,,
      ptp4l[70911.909]: [ptp4l.3.config] master offset    6054511 s2 freq -280247 path delay     27383
      phc2sys[70911.930]: [ptp4l.4.config] CLOCK_REALTIME phc offset 242287787 s2 freq +100000000 delay    507
      ptp4l[70911.932]: [ptp4l.0.config] master offset    6045556 s2 freq -205958 path delay     16924
      ptp4l[70911.934]: [ptp4l.2.config] master offset    5988211 s2 freq -244767 path delay     30616
      ptp4l[70911.944]: [ptp4l.1.config] master offset    5955775 s2 freq  -97507 path delay     20653
      ptp4l[70911.972]: [ptp4l.3.config] master offset    5815542 s2 freq  -89027 path delay     37203

       

      Actual results:

      Following deployment of SNO, ptp clocks were locked:
      
      oc -n openshift-ptp exec $(oc get pods -n openshift-ptp|awk '/linuxptp-daemon.*Running/ {print $1}') -c linuxptp-daemon-container -- curl -s 127.0.0.1:9091/metrics
      [...]
      # TYPE openshift_ptp_clock_state gauge openshift_ptp_clock_state{iface="CLOCK_REALTIME",node="cnfde4.ptp.lab.eng.bos.redhat.com",process="phc2sys"} 1 openshift_ptp_clock_state{iface="ens3f0",node="cnfde4.ptp.lab.eng.bos.redhat.com",process="GM"} 1 openshift_ptp_clock_state{iface="ens3f0",node="cnfde4.ptp.lab.eng.bos.redhat.com",process="gnss"} 1 openshift_ptp_clock_state{iface="ens3fx",node="cnfde4.ptp.lab.eng.bos.redhat.com",process="ts2phc"} 1 openshift_ptp_clock_state{iface="ens5fx",node="cnfde4.ptp.lab.eng.bos.redhat.com",process="ptp4l"} 1 openshift_ptp_clock_state{iface="ens7fx",node="cnfde4.ptp.lab.eng.bos.redhat.com",process="ptp4l"} 1
      
      
      Following ptp regression test:
      
      [kni@registry.ran-vcl01 ~]$ oc -n openshift-ptp exec $(oc get pods -n openshift-ptp|awk '/linuxptp-daemon.*Running/ {print $1}') -c linuxptp-daemon-container -- curl -s 127.0.0.1:9091/metrics
      # HELP cne_api_events_published Metric to get number of events published by the rest api
      [...]
      # HELP openshift_ptp_clock_class 
      # TYPE openshift_ptp_clock_class gauge
      openshift_ptp_clock_class{node="cnfde4.ptp.lab.eng.bos.redhat.com",process="ptp4l"} 248
      # HELP openshift_ptp_clock_state 0 = FREERUN, 1 = LOCKED, 2 = HOLDOVER
      # TYPE openshift_ptp_clock_state gauge
      openshift_ptp_clock_state{iface="CLOCK_REALTIME",node="cnfde4.ptp.lab.eng.bos.redhat.com",process="phc2sys"} 0
      openshift_ptp_clock_state{iface="ens3f0",node="cnfde4.ptp.lab.eng.bos.redhat.com",process="GM"} 1
      openshift_ptp_clock_state{iface="ens3f0",node="cnfde4.ptp.lab.eng.bos.redhat.com",process="gnss"} 1
      openshift_ptp_clock_state{iface="ens3fx",node="cnfde4.ptp.lab.eng.bos.redhat.com",process="ts2phc"} 0
      openshift_ptp_clock_state{iface="ens5fx",node="cnfde4.ptp.lab.eng.bos.redhat.com",process="ptp4l"} 0
      openshift_ptp_clock_state{iface="ens7fx",node="cnfde4.ptp.lab.eng.bos.redhat.com",process="ptp4l"} 0
      [...]
      # HELP openshift_ptp_max_offset_ns # TYPE openshift_ptp_max_offset_ns gaugeopenshift_ptp_max_offset_ns{from="master",iface="ens3fx",node="cnfde4.ptp.lab.eng.bos.redhat.com",process="ts2phc"} 1.110001577e+09openshift_ptp_max_offset_ns{from="master",iface="ens5fx",node="cnfde4.ptp.lab.eng.bos.redhat.com",process="ptp4l"} 6.33229302e+08openshift_ptp_max_offset_ns{from="master",iface="ens7fx",node="cnfde4.ptp.lab.eng.bos.redhat.com",process="ptp4l"} 6.54748878e+08openshift_ptp_max_offset_ns{from="phc",iface="CLOCK_REALTIME",node="cnfde4.ptp.lab.eng.bos.redhat.com",process="phc2sys"} 4.757662272e+09# HELP openshift_ptp_offset_ns # TYPE openshift_ptp_offset_ns gaugeopenshift_ptp_offset_ns{from="gnss",iface="ens3f0",node="cnfde4.ptp.lab.eng.bos.redhat.com",process="gnss"} 5openshift_ptp_offset_ns{from="master",iface="ens3fx",node="cnfde4.ptp.lab.eng.bos.redhat.com",process="ts2phc"} -9.99602805e+08openshift_ptp_offset_ns{from="master",iface="ens5fx",node="cnfde4.ptp.lab.eng.bos.redhat.com",process="ptp4l"} 3.17721399e+08openshift_ptp_offset_ns{from="master",iface="ens7fx",node="cnfde4.ptp.lab.eng.bos.redhat.com",process="ptp4l"} 3.41675695e+08openshift_ptp_offset_ns{from="phc",iface="CLOCK_REALTIME",node="cnfde4.ptp.lab.eng.bos.redhat.com",process="phc2sys"} -2.1703469e+08
      
      [kni@registry.ran-vcl01 ~]$ oc logs -n openshift-ptp linuxptp-daemon-cszdm linuxptp-daemon-container
      [...] phc2sys[70911.867]: [ptp4l.4.config] CLOCK_REALTIME phc offset 249495482 s2 freq +100000000 delay    510
      ptp4l[70911.869]: [ptp4l.0.config] master offset    6267876 s2 freq -446715 path delay     12096
      ptp4l[70911.872]: [ptp4l.2.config] master offset    6202717 s2 freq -492415 path delay     30616
      ptp4l[70911.881]: [ptp4l.1.config] master offset    6181769 s2 freq -323470 path delay     20653
      ts2phc[70911.905]: [ts2phc.4.config] nmea sentence: GNRMC,204243.00,A,4233.24463,N,07126.64561,W,0.000,,300823,,,A,V
      ts2phc[70911.905]: [ts2phc.4.config] nmea sentence: GNGGA,204243.00,4233.24463,N,07126.64561,W,1,12,99.99,98.6,M,-33.1,M,,
      ts2phc[70911.905]: [ts2phc.4.config] nmea sentence: GNGSA,A,3,10,12,02,21,23,24,25,28,32,,,,99.99,99.99,99.99,1
      ts2phc[70911.905]: [ts2phc.4.config] nmea sentence: GNGSA,A,3,04,09,10,11,12,19,21,36,,,,,99.99,99.99,99.99,3
      ts2phc[70911.905]: [ts2phc.4.config] nmea sentence: GNGSA,A,3,,,,,,,,,,,,,99.99,99.99,99.99,5
      ts2phc[70911.905]: [ts2phc.4.config] nmea sentence: GPGSV,3,1,11,02,15,318,38,10,85,275,51,12,29,095,44,15,05,091,30,1
      ts2phc[70911.905]: [ts2phc.4.config] nmea sentence: GPGSV,3,2,11,18,05,184,36,21,19,311,43,23,52,142,47,24,31,052,43,1
      ts2phc[70911.905]: [ts2phc.4.config] nmea sentence: GPGSV,3,3,11,25,24,142,39,28,26,220,46,32,50,290,50,1
      ts2phc[70911.905]: [ts2phc.4.config] nmea sentence: GAGSV,3,1,11,04,73,338,49,09,29,280,44,10,28,215,44,11,38,247,42,7
      ts2phc[70911.905]: [ts2phc.4.config] nmea sentence: GAGSV,3,2,11,12,13,195,34,19,32,076,42,20,,,39,21,29,066,45,7
      ts2phc[70911.905]: [ts2phc.4.config] nmea sentence: GAGSV,3,3,11,27,01,024,07,31,06,171,38,36,27,306,45,7
      ts2phc[70911.905]: [ts2phc.4.config] nmea sentence: GNGST,204243.00,14,2.8,2.8,0.0000,1.2,1.2,1.2
      ts2phc[70911.905]: [ts2phc.4.config] nmea sentence: GNZDA,204243.00,30,08,2023,00,00
      ts2phc[70911.905]: [ts2phc.4.config] nmea sentence: GNGBS,204243.00,1.2,1.2,1.2,,,,,,
      ptp4l[70911.909]: [ptp4l.3.config] master offset    6054511 s2 freq -280247 path delay     27383
      phc2sys[70911.930]: [ptp4l.4.config] CLOCK_REALTIME phc offset 242287787 s2 freq +100000000 delay    507
      ptp4l[70911.932]: [ptp4l.0.config] master offset    6045556 s2 freq -205958 path delay     16924
      ptp4l[70911.934]: [ptp4l.2.config] master offset    5988211 s2 freq -244767 path delay     30616
      ptp4l[70911.944]: [ptp4l.1.config] master offset    5955775 s2 freq  -97507 path delay     20653
      ptp4l[70911.972]: [ptp4l.3.config] master offset    5815542 s2 freq  -89027 path delay     37203 

      Expected results:

      ptp clock is in sync.
      offsets within clock lock range.

      Additional info:

       

      Restarting processes did result in clocks returning to locked state with the exception of ts2phc:
      [core@cnfde4 ~]$ ps -ef | grep ts2phc
      root 1803051 1276453 0 21:01 ? 00:00:00 /usr/sbin/ts2phc -f /var/run/ts2phc.4.config
      core 1806840 1806396 0 21:03 pts/0 00:00:00 grep --color=auto ts2phc
      [core@cnfde4 ~]$ sudo kill -9 1803051
      [core@cnfde4 ~]$ exit
      logout
      Connection to cnfde4.ptp.lab.eng.bos.redhat.com closed.
      [kni@registry.ran-vcl01 ~]$ oc -n openshift-ptp exec $(oc get pods -n openshift-ptp|awk '/linuxptp-daemon.*Running/{print $1}') -c linuxptp-daemon-container – curl -s 127.0.0.1:9091/metrics | grep clock_state
      
      HELP openshift_ptp_clock_state 0 = FREERUN, 1 = LOCKED, 2 = HOLDOVER TYPE openshift_ptp_clock_state gauge
      openshift_ptp_clock_state {iface="CLOCK_REALTIME",node="cnfde4.ptp.lab.eng.bos.redhat.com",process="phc2sys"} 1
      openshift_ptp_clock_state {iface="ens3f0",node="cnfde4.ptp.lab.eng.bos.redhat.com",process="GM"} 1
      openshift_ptp_clock_state {iface="ens3f0",node="cnfde4.ptp.lab.eng.bos.redhat.com",process="gnss"} 1
      openshift_ptp_clock_state {iface="ens3fx",node="cnfde4.ptp.lab.eng.bos.redhat.com",process="ts2phc"} 0
      openshift_ptp_clock_state {iface="ens5fx",node="cnfde4.ptp.lab.eng.bos.redhat.com",process="ptp4l"} 1
      openshift_ptp_clock_state {iface="ens7fx",node="cnfde4.ptp.lab.eng.bos.redhat.com",process="ptp4l"} 1
      [kni@registry.ran-vcl01 ~]$ oc logs -n openshift-ptp linuxptp-daemon-cszdm linuxptp-daemon-container
      [...]
      ts2phc[77885.232]: [ts2phc.4.config] nmea: no valid rmc fix
      ts2phc[77885.232]: [ts2phc.4.config] ens3f0 extts index 0 at 1693435173.078106223 corr 0 src 10666.4096 diff 1693424507078106223
      ts2phc[77885.232]: [ts2phc.4.config] ens3f0 ignoring invalid master time stamp
      phc2sys[77885.255]: [ptp4l.4.config] CLOCK_REALTIME phc offset         1 s2 freq -5780734 delay    574
      ptp4l[77885.269]: [ptp4l.1.config] master offset         -3 s2 freq -5687808 path delay       856
      ptp4l[77885.272]: [ptp4l.0.config] master offset          0 s2 freq -5687802 path delay       857
      ptp4l[77885.273]: [ptp4l.3.config] master offset          3 s2 freq -5690268 path delay       573
      ptp4l[77885.275]: [ptp4l.2.config] master offset         -1 s2 freq -5690276 path delay       859
      phc2sys[77885.317]: [ptp4l.4.config] CLOCK_REALTIME phc offset         6 s2 freq -5780729 delay    565
      ptp4l[77885.332]: [ptp4l.1.config] master offset         -1 s2 freq -5687805 path delay       856
      ptp4l[77885.335]: [ptp4l.0.config] master offset         -3 s2 freq -5687808 path delay       858
      ptp4l[77885.336]: [ptp4l.3.config] master offset         -1 s2 freq -5690275 path delay       573
      ptp4l[77885.337]: [ptp4l.2.config] master offset         -5 s2 freq -5690283 path delay       859
      

       

            aputtur@redhat.com Aneesh Puttur
            bblock@redhat.com Bonnie Block
            Bonnie Block Bonnie Block
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

              Created:
              Updated: