Loading...

XML

Word

Printable

Type: Bug
Resolution: Done
Priority: Major
Fix Version/s: None
Affects Version/s: 4.12
Component/s: Networking / ptp
Labels:
- telco-4.12.z
- telco-priority-2

Activity Type:
Quality / Stability / Reliability
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Story Points:
None
Severity:
Important
Regression:
No
Latest Status Summary:

Hide
6/23: QE is installing a 4.14 system today with the aim of validating this on Sunday. Then we can line up a 4.13 backport and regression.
6/22: pr merged; working logistics with QE for testing and backporting.
6/21: a short term fix is underway to address this issue. A PR has already been posted.

Show
6/23: QE is installing a 4.14 system today with the aim of validating this on Sunday. Then we can line up a 4.13 backport and regression. 6/22: pr merged; working logistics with QE for testing and backporting. 6/21: a short term fix is underway to address this issue. A PR has already been posted.

Target Backport Versions:
None
Target Version:

4.14.0
Release Blocker:
None
Sprint:
None

Internal Whiteboard:

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

PX Priority Data:
PX Impact Score:

Release Note Status:
None
Release Note Type:
None
Release Note Text:
None

Escape Reason:
None
Escape Impact:
None
Corrective Measures:
None
SDLC stage when should've been found:
None

Description of problem:

openshift_ptp_clock_class is not updated in Prometheus metrics when the incomming ptp packets clock class is changed

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

Step1: OCP linuxptp is synced from interface interface ens43f1, incomming clock class is 6 at first.

Step2: In OCP monitor console web (such as https://console-openshift-console.apps.hztt-pz-25ae06-1-1r-10-69-38-82.ocp.hz.nsn-rdnet.net/), add metrics query for openshift_ptp_clock_class, it's value is shown with 6. 

Step3: In ptp master side, do some change to make the ptp packets with clock class 248 (restart ptp master relative service to trigger clock class to 248, and there will be several seconds without ptp packets sending out), checking in linuxptp side, such as checking with pmc tool (kubectl get pods -n openshift-ptp |grep -i linuxptp |awk '{print $1}' |xargs -i kubectl exec -it -n openshift-ptp tp-daemon-container -- pmc -f /var/run/ptp4l.0.config -u -d 24 "GET PARENT_DATA_SET"), it shows the gm.ClockClass is already changed to 248, and also in metrics query for openshift_ptp_clock_class, it's value is also changed from 6 to 248.

Step4: After some seconds in ptp matster side, the clock class is changed from 248 back to 6, checing in linuxptp side, the gm.ClockClass is changed from 248 back to 6, but in 
metrics query for openshift_ptp_clock_class, it's value is always 248, and checking inside prometheus container "kubectl exec -it -n openshift-monitoring prometheus-k8s-0 -c prometheus -- bash" with "curl --cacert /var/run/secrets/kubernetes.io/serviceaccount/service-ca.crt -H "Authorization: Bearer $(cat /var/run/secrets/kubernetes.io/serviceaccount/token)" https://ptp-monitor-service.openshift-ptp.svc.cluster.local:8443/metrics", it shows clock class always 248

Actual results:

[3. Actual Result:]
The openshift_ptp_clock_class in prometheus metrics is always 248
workaround: two ways can fix this issue, 1, trigger ptp packets lost a while then recovery soon; or 2, delete linuxptp pods in OCP side to trigger new linuxptp pod creating.

Expected results:

[2. Expected Result:]
The openshift_ptp_clock_class in prometheus metrics should be changed back to 6

Additional info:

[4. Analysis of Logs:]
#In linuxptp container logs (20230530linuxptp1.log), we can see when there is some setting in ptp master side (restart ptp master to trigger clock class to 248), linuxptp logs shows annouce messege timeout for a while and later when packets incomming the clock class is 248:
2023-05-30T01:42:12.483007360Z ptp4l[1454728.799]: [ptp4l.0.config] port 1: SLAVE to LISTENING on ANNOUNCE_RECEIPT_TIMEOUT_EXPIRES
2023-05-30T01:42:12.483048396Z ptp4l[1454728.799]: [ptp4l.0.config] selected local clock b49691.fffe.d13271 as best master
2023-05-30T01:42:13.183202792Z phc2sys[1454729.499]: [ptp4l.0.config] port b49691.fffe.d13271-1 changed state
2023-05-30T01:42:13.183304437Z phc2sys[1454729.499]: [ptp4l.0.config] reconfiguring after port state change
2023-05-30T01:42:13.183570266Z phc2sys[1454729.500]: [ptp4l.0.config] selecting ens43f1 for synchronization
2023-05-30T01:42:13.183614620Z phc2sys[1454729.500]: [ptp4l.0.config] nothing to synchronize
2023-05-30T01:42:23.779279531Z I0530 01:42:23.779242   60724 main.go:120] ticker pull
2023-05-30T01:42:38.761658342Z ptp4l[1454755.078]: [ptp4l.0.config] selected best master clock 4ea8aa.fffe.7a3006
2023-05-30T01:42:38.761692895Z ptp4l[1454755.078]: [ptp4l.0.config] port 1: LISTENING to UNCALIBRATED on RS_SLAVE
2023-05-30T01:42:38.763098842Z I0530 01:42:38.763067   60724 daemon.go:443] clock change event identified
2023-05-30T01:42:38.763129016Z ptp4l[1685410958]:[ptp4l.0.config] CLOCK_CLASS_CHANGE 248.000000

#later in ptp master side the clock class is changed to 6 (please check ptp wireshark logs 20230530ptp.pcap fetched in OCP interface ens43f10), frome frame 9922 announce message to frame 9952, and in OCP linuxptp pmc tool output, we can see the gm clockclass is also changed from 248 to 6:
[core@master0 ~]$ kubectl exec -it -n openshift-ptp linuxptp-daemon-bgmcr -c linuxptp-daemon-container -- pmc -f /var/run/ptp4l.0.co    nfig -u -d 24 "GET PARENT_DATA_SET"
sending: GET PARENT_DATA_SET
b49691.fffe.d13271-0 seq 0 RESPONSE MANAGEMENT PARENT_DATA_SET
parentPortIdentity                    4ea8aa.fffe.7a3006-2
parentStats                           0
observedParentOffsetScaledLogVariance 0xffff
observedParentClockPhaseChangeRate    0x7fffffff
grandmasterPriority1                  128
gm.ClockClass                         248
gm.ClockAccuracy                      0x21
gm.OffsetScaledLogVariance            0x4e5d
grandmasterPriority2                  128
grandmasterIdentity                   4ea8aa.fffe.7a3006
[core@master0 ~]$ kubectl exec -it -n openshift-ptp linuxptp-daemon-bgmcr -c linuxptp-daemon-container -- pmc -f /var/run/ptp4l.0.co    nfig -u -d 24 "GET PARENT_DATA_SET"
sending: GET PARENT_DATA_SET
b49691.fffe.d13271-0 seq 0 RESPONSE MANAGEMENT PARENT_DATA_SET
parentPortIdentity                    4ea8aa.fffe.7a3006-2
parentStats                           0
observedParentOffsetScaledLogVariance 0xffff
observedParentClockPhaseChangeRate    0x7fffffff
grandmasterPriority1                  128
gm.ClockClass                         6
gm.ClockAccuracy                      0x21
gm.OffsetScaledLogVariance            0x4e5d
grandmasterPriority2                  128
grandmasterIdentity                   4ea8aa.fffe.7a3006

#But in console web, the clock class is always 248 (please check the har logs console-openshift-console.apps.hztt-pz-25ae06-1-1r-10-69-38-82.har of web console):
"text": "{\"status\":\"success\",\"data\":{\"resultType\":\"matrix\",\"result\":[{\"metric\":{\"__name__\":\"openshift_ptp_clock_class\",\"container\":\"kube-rbac-proxy\",\"endpoint\":\"metrics\",\"instance\":\"10.69.38.82:8443\",\"job\":\"ptp-monitor-service\",\"namespace\":\"openshift-ptp\",\"node\":\"master0.hztt-pz-25ae06-1-1r-10-69-38-82.ocp.hz.nsn-rdnet.net\",\"pod\":\"linuxptp-daemon-bgmcr\",\"process\":\"ptp4l\",\"prometheus\":\"openshift-monitoring/k8s\",\"service\":\"ptp-monitor-service\"},\"values\":[[1685413389.496,\"248\"],[1685413395.496,\"248\"],[1685413401.496,\"248\"],[1685413407.496,\"248\"],[1685413413.496,\"248\"],[1685413419.496,\"248\"],[1685413425.496,\"248\"],[1685413431.496,\"248\"],[1685413437.496,\"248\"],[1685413443.496,\"248\"],[1685413449.496,\"248\"],[1685413455.496,\"248\"],[1685413461.496,\"248\"],[1685413467.496,\"248\"],[1685413473.496,\"248\"],[1685413479.496,\"248\"],[1685413485.496,\"248\"]
,[1685415111.496,\"248\"],[1685415117.496,\"248\"],[1685415123.496,\"248\"],[1685415129.496,\"248\"],[1685415135.496,\"248\"],[1685415141.496,\"248\"],[1685415147.496,\"248\"],[1685415153.496,\"248\"],[1685415159.496,\"248\"],[1685415165.496,\"248\"],[1685415171.496,\"248\"],[1685415177.496,\"248\"],[1685415183.496,\"248\"],[1685415189.496,\"248\"]]}]}}\n"
},

#And inside prometheus container (promethus_output.txt) it's 248:
[core@master0 ~]$ kubectl exec -it -n openshift-monitoring prometheus-k8s-0 -- bash
bash-4.4$ curl --cacert /var/run/secrets/kubernetes.io/serviceaccount/service-ca.crt -H "Authorization: Bearer $(cat /var/run/secrets/kubernetes.io/serviceaccount/token)" https://ptp-monitor-service.openshift-ptp.svc.cluster.local:8443/metrics
# HELP cne_api_events_published Metric to get number of events published by the rest api
# TYPE cne_api_events_published gauge
cne_api_events_published{address="/cluster/node/master0.hztt-pz-25ae06-1-1r-10-69-38-82.ocp.hz.nsn-rdnet.net/sync/ptp-status/lock-state",status="success"} 45
cne_api_events_published{address="/cluster/node/master0.hztt-pz-25ae06-1-1r-10-69-38-82.ocp.hz.nsn-rdnet.net/sync/ptp-status/ptp-clock-class-change",status="success"} 7
cne_api_events_published{address="/cluster/node/master0.hztt-pz-25ae06-1-1r-10-69-38-82.ocp.hz.nsn-rdnet.net/sync/sync-status/os-clock-sync-state",status="success"} 25
# HELP cne_api_publishers Metric to get number of publishers
# TYPE cne_api_publishers gauge
cne_api_publishers{status="active"} 3
# HELP cne_events_ack Metric to get number of events produced
# TYPE cne_events_ack gauge
cne_events_ack{status="success",type="/cluster/node/master0.hztt-pz-25ae06-1-1r-10-69-38-82.ocp.hz.nsn-rdnet.net/sync/ptp-status/lock-state"} 45
cne_events_ack{status="success",type="/cluster/node/master0.hztt-pz-25ae06-1-1r-10-69-38-82.ocp.hz.nsn-rdnet.net/sync/ptp-status/ptp-clock-class-change"} 7
cne_events_ack{status="success",type="/cluster/node/master0.hztt-pz-25ae06-1-1r-10-69-38-82.ocp.hz.nsn-rdnet.net/sync/sync-status/os-clock-sync-state"} 25
# HELP openshift_ptp_clock_class
# TYPE openshift_ptp_clock_class gauge
openshift_ptp_clock_class{node="master0.hztt-pz-25ae06-1-1r-10-69-38-82.ocp.hz.nsn-rdnet.net",process="ptp4l"} 248

is cloned by

OCPBUGS-15413 Clone-4.13.z: openshift_ptp_clock_class is not updated in Prometheus metrics when the incomming ptp packets clock class is changed

Closed

OCPBUGS-15429 [4.12 clone] openshift_ptp_clock_class is not updated in Prometheus metrics when the incomming ptp packets clock class is changed

Closed

is depended on by

OCPBUGS-15413 Clone-4.13.z: openshift_ptp_clock_class is not updated in Prometheus metrics when the incomming ptp packets clock class is changed

Closed

links to

openshift/linuxptp-daemon#144: OCPBUGS-14681: fix clock class updates

Assignee:: Vitaly Grinberg

Reporter:: Nigel Smith

Need Info From:: None

Contributors:: None

QA Contact:: Ofer Bochan

Doc Contact:: None

Votes:: 0 Vote for this issue

Watchers:: 8 Start watching this issue

Created:: 2023/06/07 2:22 PM

Updated:: 2025/10/29 8:19 AM

Resolved:: 2023/07/04 8:34 AM

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates