- 
    
Bug
 - 
    Resolution: Done
 - 
    
Major
 - 
    None
 - 
    4.12
 
- 
        Quality / Stability / Reliability
 - 
        False
 - 
        
 - 
        None
 - 
        Important
 - 
        No
 - 
        
 
- 
        None
 - 
        None
 - 
        None
 
- 
        None
 - 
        None
 - 
        None
 
- 
        None
 - 
        None
 - 
        None
 - 
        None
 
Description of problem:
openshift_ptp_clock_class is not updated in Prometheus metrics when the incomming ptp packets clock class is changed
Version-Release number of selected component (if applicable):
How reproducible:
Steps to Reproduce:
Step1: OCP linuxptp is synced from interface interface ens43f1, incomming clock class is 6 at first.
Step2: In OCP monitor console web (such as https://console-openshift-console.apps.hztt-pz-25ae06-1-1r-10-69-38-82.ocp.hz.nsn-rdnet.net/), add metrics query for openshift_ptp_clock_class, it's value is shown with 6. 
Step3: In ptp master side, do some change to make the ptp packets with clock class 248 (restart ptp master relative service to trigger clock class to 248, and there will be several seconds without ptp packets sending out), checking in linuxptp side, such as checking with pmc tool (kubectl get pods -n openshift-ptp |grep -i linuxptp |awk '{print $1}' |xargs -i kubectl exec -it -n openshift-ptp tp-daemon-container -- pmc -f /var/run/ptp4l.0.config -u -d 24 "GET PARENT_DATA_SET"), it shows the gm.ClockClass is already changed to 248, and also in metrics query for openshift_ptp_clock_class, it's value is also changed from 6 to 248.
Step4: After some seconds in ptp matster side, the clock class is changed from 248 back to 6, checing in linuxptp side, the gm.ClockClass is changed from 248 back to 6, but in 
metrics query for openshift_ptp_clock_class, it's value is always 248, and checking inside prometheus container "kubectl exec -it -n openshift-monitoring prometheus-k8s-0 -c prometheus -- bash" with "curl --cacert /var/run/secrets/kubernetes.io/serviceaccount/service-ca.crt -H "Authorization: Bearer $(cat /var/run/secrets/kubernetes.io/serviceaccount/token)" https://ptp-monitor-service.openshift-ptp.svc.cluster.local:8443/metrics", it shows clock class always 248 
Actual results:
[3. Actual Result:] The openshift_ptp_clock_class in prometheus metrics is always 248 workaround: two ways can fix this issue, 1, trigger ptp packets lost a while then recovery soon; or 2, delete linuxptp pods in OCP side to trigger new linuxptp pod creating.
Expected results:
[2. Expected Result:] The openshift_ptp_clock_class in prometheus metrics should be changed back to 6
Additional info:
[4. Analysis of Logs:]
#In linuxptp container logs (20230530linuxptp1.log), we can see when there is some setting in ptp master side (restart ptp master to trigger clock class to 248), linuxptp logs shows annouce messege timeout for a while and later when packets incomming the clock class is 248:
2023-05-30T01:42:12.483007360Z ptp4l[1454728.799]: [ptp4l.0.config] port 1: SLAVE to LISTENING on ANNOUNCE_RECEIPT_TIMEOUT_EXPIRES
2023-05-30T01:42:12.483048396Z ptp4l[1454728.799]: [ptp4l.0.config] selected local clock b49691.fffe.d13271 as best master
2023-05-30T01:42:13.183202792Z phc2sys[1454729.499]: [ptp4l.0.config] port b49691.fffe.d13271-1 changed state
2023-05-30T01:42:13.183304437Z phc2sys[1454729.499]: [ptp4l.0.config] reconfiguring after port state change
2023-05-30T01:42:13.183570266Z phc2sys[1454729.500]: [ptp4l.0.config] selecting ens43f1 for synchronization
2023-05-30T01:42:13.183614620Z phc2sys[1454729.500]: [ptp4l.0.config] nothing to synchronize
2023-05-30T01:42:23.779279531Z I0530 01:42:23.779242   60724 main.go:120] ticker pull
2023-05-30T01:42:38.761658342Z ptp4l[1454755.078]: [ptp4l.0.config] selected best master clock 4ea8aa.fffe.7a3006
2023-05-30T01:42:38.761692895Z ptp4l[1454755.078]: [ptp4l.0.config] port 1: LISTENING to UNCALIBRATED on RS_SLAVE
2023-05-30T01:42:38.763098842Z I0530 01:42:38.763067   60724 daemon.go:443] clock change event identified
2023-05-30T01:42:38.763129016Z ptp4l[1685410958]:[ptp4l.0.config] CLOCK_CLASS_CHANGE 248.000000
#later in ptp master side the clock class is changed to 6 (please check ptp wireshark logs 20230530ptp.pcap fetched in OCP interface ens43f10), frome frame 9922 announce message to frame 9952, and in OCP linuxptp pmc tool output, we can see the gm clockclass is also changed from 248 to 6:
[core@master0 ~]$ kubectl exec -it -n openshift-ptp linuxptp-daemon-bgmcr -c linuxptp-daemon-container -- pmc -f /var/run/ptp4l.0.co    nfig -u -d 24 "GET PARENT_DATA_SET"
sending: GET PARENT_DATA_SET
b49691.fffe.d13271-0 seq 0 RESPONSE MANAGEMENT PARENT_DATA_SET
parentPortIdentity                    4ea8aa.fffe.7a3006-2
parentStats                           0
observedParentOffsetScaledLogVariance 0xffff
observedParentClockPhaseChangeRate    0x7fffffff
grandmasterPriority1                  128
gm.ClockClass                         248
gm.ClockAccuracy                      0x21
gm.OffsetScaledLogVariance            0x4e5d
grandmasterPriority2                  128
grandmasterIdentity                   4ea8aa.fffe.7a3006
[core@master0 ~]$ kubectl exec -it -n openshift-ptp linuxptp-daemon-bgmcr -c linuxptp-daemon-container -- pmc -f /var/run/ptp4l.0.co    nfig -u -d 24 "GET PARENT_DATA_SET"
sending: GET PARENT_DATA_SET
b49691.fffe.d13271-0 seq 0 RESPONSE MANAGEMENT PARENT_DATA_SET
parentPortIdentity                    4ea8aa.fffe.7a3006-2
parentStats                           0
observedParentOffsetScaledLogVariance 0xffff
observedParentClockPhaseChangeRate    0x7fffffff
grandmasterPriority1                  128
gm.ClockClass                         6
gm.ClockAccuracy                      0x21
gm.OffsetScaledLogVariance            0x4e5d
grandmasterPriority2                  128
grandmasterIdentity                   4ea8aa.fffe.7a3006
#But in console web, the clock class is always 248 (please check the har logs console-openshift-console.apps.hztt-pz-25ae06-1-1r-10-69-38-82.har of web console):
"text": "{\"status\":\"success\",\"data\":{\"resultType\":\"matrix\",\"result\":[{\"metric\":{\"__name__\":\"openshift_ptp_clock_class\",\"container\":\"kube-rbac-proxy\",\"endpoint\":\"metrics\",\"instance\":\"10.69.38.82:8443\",\"job\":\"ptp-monitor-service\",\"namespace\":\"openshift-ptp\",\"node\":\"master0.hztt-pz-25ae06-1-1r-10-69-38-82.ocp.hz.nsn-rdnet.net\",\"pod\":\"linuxptp-daemon-bgmcr\",\"process\":\"ptp4l\",\"prometheus\":\"openshift-monitoring/k8s\",\"service\":\"ptp-monitor-service\"},\"values\":[[1685413389.496,\"248\"],[1685413395.496,\"248\"],[1685413401.496,\"248\"],[1685413407.496,\"248\"],[1685413413.496,\"248\"],[1685413419.496,\"248\"],[1685413425.496,\"248\"],[1685413431.496,\"248\"],[1685413437.496,\"248\"],[1685413443.496,\"248\"],[1685413449.496,\"248\"],[1685413455.496,\"248\"],[1685413461.496,\"248\"],[1685413467.496,\"248\"],[1685413473.496,\"248\"],[1685413479.496,\"248\"],[1685413485.496,\"248\"]
,[1685415111.496,\"248\"],[1685415117.496,\"248\"],[1685415123.496,\"248\"],[1685415129.496,\"248\"],[1685415135.496,\"248\"],[1685415141.496,\"248\"],[1685415147.496,\"248\"],[1685415153.496,\"248\"],[1685415159.496,\"248\"],[1685415165.496,\"248\"],[1685415171.496,\"248\"],[1685415177.496,\"248\"],[1685415183.496,\"248\"],[1685415189.496,\"248\"]]}]}}\n"
},
#And inside prometheus container (promethus_output.txt) it's 248:
[core@master0 ~]$ kubectl exec -it -n openshift-monitoring prometheus-k8s-0 -- bash
bash-4.4$ curl --cacert /var/run/secrets/kubernetes.io/serviceaccount/service-ca.crt -H "Authorization: Bearer $(cat /var/run/secrets/kubernetes.io/serviceaccount/token)" https://ptp-monitor-service.openshift-ptp.svc.cluster.local:8443/metrics
# HELP cne_api_events_published Metric to get number of events published by the rest api
# TYPE cne_api_events_published gauge
cne_api_events_published{address="/cluster/node/master0.hztt-pz-25ae06-1-1r-10-69-38-82.ocp.hz.nsn-rdnet.net/sync/ptp-status/lock-state",status="success"} 45
cne_api_events_published{address="/cluster/node/master0.hztt-pz-25ae06-1-1r-10-69-38-82.ocp.hz.nsn-rdnet.net/sync/ptp-status/ptp-clock-class-change",status="success"} 7
cne_api_events_published{address="/cluster/node/master0.hztt-pz-25ae06-1-1r-10-69-38-82.ocp.hz.nsn-rdnet.net/sync/sync-status/os-clock-sync-state",status="success"} 25
# HELP cne_api_publishers Metric to get number of publishers
# TYPE cne_api_publishers gauge
cne_api_publishers{status="active"} 3
# HELP cne_events_ack Metric to get number of events produced
# TYPE cne_events_ack gauge
cne_events_ack{status="success",type="/cluster/node/master0.hztt-pz-25ae06-1-1r-10-69-38-82.ocp.hz.nsn-rdnet.net/sync/ptp-status/lock-state"} 45
cne_events_ack{status="success",type="/cluster/node/master0.hztt-pz-25ae06-1-1r-10-69-38-82.ocp.hz.nsn-rdnet.net/sync/ptp-status/ptp-clock-class-change"} 7
cne_events_ack{status="success",type="/cluster/node/master0.hztt-pz-25ae06-1-1r-10-69-38-82.ocp.hz.nsn-rdnet.net/sync/sync-status/os-clock-sync-state"} 25
# HELP openshift_ptp_clock_class
# TYPE openshift_ptp_clock_class gauge
openshift_ptp_clock_class{node="master0.hztt-pz-25ae06-1-1r-10-69-38-82.ocp.hz.nsn-rdnet.net",process="ptp4l"} 248
- is cloned by
 - 
                    
OCPBUGS-15413 Clone-4.13.z: openshift_ptp_clock_class is not updated in Prometheus metrics when the incomming ptp packets clock class is changed
-         
 - Closed
 
 -         
 - 
                    
OCPBUGS-15429 [4.12 clone] openshift_ptp_clock_class is not updated in Prometheus metrics when the incomming ptp packets clock class is changed
-         
 - Closed
 
 -         
 
- is depended on by
 - 
                    
OCPBUGS-15413 Clone-4.13.z: openshift_ptp_clock_class is not updated in Prometheus metrics when the incomming ptp packets clock class is changed
-         
 - Closed
 
 -         
 
- links to