-
Bug
-
Resolution: Done
-
Major
-
None
-
4.12
Description of problem:
openshift_ptp_clock_class is not updated in Prometheus metrics when the incomming ptp packets clock class is changed
Version-Release number of selected component (if applicable):
How reproducible:
Steps to Reproduce:
Step1: OCP linuxptp is synced from interface interface ens43f1, incomming clock class is 6 at first. Step2: In OCP monitor console web (such as https://console-openshift-console.apps.hztt-pz-25ae06-1-1r-10-69-38-82.ocp.hz.nsn-rdnet.net/), add metrics query for openshift_ptp_clock_class, it's value is shown with 6. Step3: In ptp master side, do some change to make the ptp packets with clock class 248 (restart ptp master relative service to trigger clock class to 248, and there will be several seconds without ptp packets sending out), checking in linuxptp side, such as checking with pmc tool (kubectl get pods -n openshift-ptp |grep -i linuxptp |awk '{print $1}' |xargs -i kubectl exec -it -n openshift-ptp tp-daemon-container -- pmc -f /var/run/ptp4l.0.config -u -d 24 "GET PARENT_DATA_SET"), it shows the gm.ClockClass is already changed to 248, and also in metrics query for openshift_ptp_clock_class, it's value is also changed from 6 to 248. Step4: After some seconds in ptp matster side, the clock class is changed from 248 back to 6, checing in linuxptp side, the gm.ClockClass is changed from 248 back to 6, but in metrics query for openshift_ptp_clock_class, it's value is always 248, and checking inside prometheus container "kubectl exec -it -n openshift-monitoring prometheus-k8s-0 -c prometheus -- bash" with "curl --cacert /var/run/secrets/kubernetes.io/serviceaccount/service-ca.crt -H "Authorization: Bearer $(cat /var/run/secrets/kubernetes.io/serviceaccount/token)" https://ptp-monitor-service.openshift-ptp.svc.cluster.local:8443/metrics", it shows clock class always 248
Actual results:
[3. Actual Result:] The openshift_ptp_clock_class in prometheus metrics is always 248 workaround: two ways can fix this issue, 1, trigger ptp packets lost a while then recovery soon; or 2, delete linuxptp pods in OCP side to trigger new linuxptp pod creating.
Expected results:
[2. Expected Result:] The openshift_ptp_clock_class in prometheus metrics should be changed back to 6
Additional info:
[4. Analysis of Logs:] #In linuxptp container logs (20230530linuxptp1.log), we can see when there is some setting in ptp master side (restart ptp master to trigger clock class to 248), linuxptp logs shows annouce messege timeout for a while and later when packets incomming the clock class is 248: 2023-05-30T01:42:12.483007360Z ptp4l[1454728.799]: [ptp4l.0.config] port 1: SLAVE to LISTENING on ANNOUNCE_RECEIPT_TIMEOUT_EXPIRES 2023-05-30T01:42:12.483048396Z ptp4l[1454728.799]: [ptp4l.0.config] selected local clock b49691.fffe.d13271 as best master 2023-05-30T01:42:13.183202792Z phc2sys[1454729.499]: [ptp4l.0.config] port b49691.fffe.d13271-1 changed state 2023-05-30T01:42:13.183304437Z phc2sys[1454729.499]: [ptp4l.0.config] reconfiguring after port state change 2023-05-30T01:42:13.183570266Z phc2sys[1454729.500]: [ptp4l.0.config] selecting ens43f1 for synchronization 2023-05-30T01:42:13.183614620Z phc2sys[1454729.500]: [ptp4l.0.config] nothing to synchronize 2023-05-30T01:42:23.779279531Z I0530 01:42:23.779242 60724 main.go:120] ticker pull 2023-05-30T01:42:38.761658342Z ptp4l[1454755.078]: [ptp4l.0.config] selected best master clock 4ea8aa.fffe.7a3006 2023-05-30T01:42:38.761692895Z ptp4l[1454755.078]: [ptp4l.0.config] port 1: LISTENING to UNCALIBRATED on RS_SLAVE 2023-05-30T01:42:38.763098842Z I0530 01:42:38.763067 60724 daemon.go:443] clock change event identified 2023-05-30T01:42:38.763129016Z ptp4l[1685410958]:[ptp4l.0.config] CLOCK_CLASS_CHANGE 248.000000 #later in ptp master side the clock class is changed to 6 (please check ptp wireshark logs 20230530ptp.pcap fetched in OCP interface ens43f10), frome frame 9922 announce message to frame 9952, and in OCP linuxptp pmc tool output, we can see the gm clockclass is also changed from 248 to 6: [core@master0 ~]$ kubectl exec -it -n openshift-ptp linuxptp-daemon-bgmcr -c linuxptp-daemon-container -- pmc -f /var/run/ptp4l.0.co nfig -u -d 24 "GET PARENT_DATA_SET" sending: GET PARENT_DATA_SET b49691.fffe.d13271-0 seq 0 RESPONSE MANAGEMENT PARENT_DATA_SET parentPortIdentity 4ea8aa.fffe.7a3006-2 parentStats 0 observedParentOffsetScaledLogVariance 0xffff observedParentClockPhaseChangeRate 0x7fffffff grandmasterPriority1 128 gm.ClockClass 248 gm.ClockAccuracy 0x21 gm.OffsetScaledLogVariance 0x4e5d grandmasterPriority2 128 grandmasterIdentity 4ea8aa.fffe.7a3006 [core@master0 ~]$ kubectl exec -it -n openshift-ptp linuxptp-daemon-bgmcr -c linuxptp-daemon-container -- pmc -f /var/run/ptp4l.0.co nfig -u -d 24 "GET PARENT_DATA_SET" sending: GET PARENT_DATA_SET b49691.fffe.d13271-0 seq 0 RESPONSE MANAGEMENT PARENT_DATA_SET parentPortIdentity 4ea8aa.fffe.7a3006-2 parentStats 0 observedParentOffsetScaledLogVariance 0xffff observedParentClockPhaseChangeRate 0x7fffffff grandmasterPriority1 128 gm.ClockClass 6 gm.ClockAccuracy 0x21 gm.OffsetScaledLogVariance 0x4e5d grandmasterPriority2 128 grandmasterIdentity 4ea8aa.fffe.7a3006 #But in console web, the clock class is always 248 (please check the har logs console-openshift-console.apps.hztt-pz-25ae06-1-1r-10-69-38-82.har of web console): "text": "{\"status\":\"success\",\"data\":{\"resultType\":\"matrix\",\"result\":[{\"metric\":{\"__name__\":\"openshift_ptp_clock_class\",\"container\":\"kube-rbac-proxy\",\"endpoint\":\"metrics\",\"instance\":\"10.69.38.82:8443\",\"job\":\"ptp-monitor-service\",\"namespace\":\"openshift-ptp\",\"node\":\"master0.hztt-pz-25ae06-1-1r-10-69-38-82.ocp.hz.nsn-rdnet.net\",\"pod\":\"linuxptp-daemon-bgmcr\",\"process\":\"ptp4l\",\"prometheus\":\"openshift-monitoring/k8s\",\"service\":\"ptp-monitor-service\"},\"values\":[[1685413389.496,\"248\"],[1685413395.496,\"248\"],[1685413401.496,\"248\"],[1685413407.496,\"248\"],[1685413413.496,\"248\"],[1685413419.496,\"248\"],[1685413425.496,\"248\"],[1685413431.496,\"248\"],[1685413437.496,\"248\"],[1685413443.496,\"248\"],[1685413449.496,\"248\"],[1685413455.496,\"248\"],[1685413461.496,\"248\"],[1685413467.496,\"248\"],[1685413473.496,\"248\"],[1685413479.496,\"248\"],[1685413485.496,\"248\"] ,[1685415111.496,\"248\"],[1685415117.496,\"248\"],[1685415123.496,\"248\"],[1685415129.496,\"248\"],[1685415135.496,\"248\"],[1685415141.496,\"248\"],[1685415147.496,\"248\"],[1685415153.496,\"248\"],[1685415159.496,\"248\"],[1685415165.496,\"248\"],[1685415171.496,\"248\"],[1685415177.496,\"248\"],[1685415183.496,\"248\"],[1685415189.496,\"248\"]]}]}}\n" }, #And inside prometheus container (promethus_output.txt) it's 248: [core@master0 ~]$ kubectl exec -it -n openshift-monitoring prometheus-k8s-0 -- bash bash-4.4$ curl --cacert /var/run/secrets/kubernetes.io/serviceaccount/service-ca.crt -H "Authorization: Bearer $(cat /var/run/secrets/kubernetes.io/serviceaccount/token)" https://ptp-monitor-service.openshift-ptp.svc.cluster.local:8443/metrics # HELP cne_api_events_published Metric to get number of events published by the rest api # TYPE cne_api_events_published gauge cne_api_events_published{address="/cluster/node/master0.hztt-pz-25ae06-1-1r-10-69-38-82.ocp.hz.nsn-rdnet.net/sync/ptp-status/lock-state",status="success"} 45 cne_api_events_published{address="/cluster/node/master0.hztt-pz-25ae06-1-1r-10-69-38-82.ocp.hz.nsn-rdnet.net/sync/ptp-status/ptp-clock-class-change",status="success"} 7 cne_api_events_published{address="/cluster/node/master0.hztt-pz-25ae06-1-1r-10-69-38-82.ocp.hz.nsn-rdnet.net/sync/sync-status/os-clock-sync-state",status="success"} 25 # HELP cne_api_publishers Metric to get number of publishers # TYPE cne_api_publishers gauge cne_api_publishers{status="active"} 3 # HELP cne_events_ack Metric to get number of events produced # TYPE cne_events_ack gauge cne_events_ack{status="success",type="/cluster/node/master0.hztt-pz-25ae06-1-1r-10-69-38-82.ocp.hz.nsn-rdnet.net/sync/ptp-status/lock-state"} 45 cne_events_ack{status="success",type="/cluster/node/master0.hztt-pz-25ae06-1-1r-10-69-38-82.ocp.hz.nsn-rdnet.net/sync/ptp-status/ptp-clock-class-change"} 7 cne_events_ack{status="success",type="/cluster/node/master0.hztt-pz-25ae06-1-1r-10-69-38-82.ocp.hz.nsn-rdnet.net/sync/sync-status/os-clock-sync-state"} 25 # HELP openshift_ptp_clock_class # TYPE openshift_ptp_clock_class gauge openshift_ptp_clock_class{node="master0.hztt-pz-25ae06-1-1r-10-69-38-82.ocp.hz.nsn-rdnet.net",process="ptp4l"} 248
- clones
-
OCPBUGS-14681 openshift_ptp_clock_class is not updated in Prometheus metrics when the incomming ptp packets clock class is changed
- Closed
- depends on
-
OCPBUGS-15413 Clone-4.13.z: openshift_ptp_clock_class is not updated in Prometheus metrics when the incomming ptp packets clock class is changed
- Closed
- links to