Loading...

XML

Word

Printable

Type: Bug
Resolution: Unresolved
Priority: Normal
Fix Version/s: None
Affects Version/s: 4.15
Component/s: Networking / ptp
Labels:
- ptp
- ptp-operator

Activity Type:
Quality / Stability / Reliability
Blocked:
True
Blocked Reason:

Hide

depending on release of OCPBUGS-55732

Show
depending on release of OCPBUGS-55732
Story Points:
None
Severity:
Important
Regression:
None
Latest Status Summary:

Hide
9/18: depending PR OCPBUGS-61180 is merging in 4.16
7/3: depending on OCPBUGS-55732
6/30: onHold : Will wait for unicast_master_table bug: Let's please mark as on hold until backported. We are moving all clusters to 4.16 now so that would be our goal version but we do want to follow up on this regardless

Show
9/18: depending PR OCPBUGS-61180 is merging in 4.16 7/3: depending on OCPBUGS-55732 6/30: onHold : Will wait for unicast_master_table bug: Let's please mark as on hold until backported. We are moving all clusters to 4.16 now so that would be our goal version but we do want to follow up on this regardless

Target Backport Versions:
None
Target Version:

4.15.z
Release Blocker:
None
Sprint:
CNF RAN Sprint 277
sprint_count:
1

RH Private Keywords:

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

PX Impact Score:

Release Note Status:
None
Release Note Type:
None
Release Note Text:
None

Escape Reason:
None
Escape Impact:
None
Corrective Measures:
None
SDLC stage when should've been found:
None

Description of problem:

Version-Release number of selected component (if applicable):

  PTP operator version 4.15.0-202505132237.

How reproducible:

    Systemic, but random.

Steps to Reproduce:

    1.install ptp operator, upgrade openshift from 4.14 to 4.15
    2.pod can get stuck in an error state and not refresh itself. Needs monitoring so that it will automatically cycle.

Actual results:

    This morning we had an outage that blocked trading activities due to clock sync errors, the root cause of which was that the linux PTP pods scheduled by the PTP operator were not working correctly. The pods were in healthy state according to OCP, while printing the following lines few times a second:

```
2025-06-02 00:33:25.106	
phc2sys[150296.077]: [ptp4l.0.config] Waiting for ptp4l...
2025-06-02 00:33:25.546	
I0602 00:33:25.546967  393813 daemon.go:745] Starting ptp4l...
2025-06-02 00:33:25.546	
I0602 00:33:25.546939  393813 daemon.go:844] Recreating ptp4l...
2025-06-02 00:33:25.547	
I0602 00:33:25.546974  393813 daemon.go:746] ptp4l cmd: /bin/chrt -f 10 /usr/sbin/ptp4l -f /var/run/ptp4l.0.config  -s -m 
2025-06-02 00:33:25.547	

2025-06-02 00:33:25.547	
I0602 00:33:25.547052  393813 daemon.go:674] ptp4l[1748824405]:[ptp4l.0.config] PTP_PROCESS_STATUS:1
2025-06-02 00:33:25.548	
failed to parse configuration file /var/run/ptp4l.0.config
2025-06-02 00:33:25.548	
line 72: missing table_id
2025-06-02 00:33:25.548	
E0602 00:33:25.548539  393813 daemon.go:829] CmdRun() error waiting for ptp4l: exit status 254
2025-06-02 00:33:25.548	

2025-06-02 00:33:25.548	
I0602 00:33:25.548583  393813 daemon.go:674] ptp4l[1748824405]:[ptp4l.0.config] PTP_PROCESS_STATUS:0
2025-06-02 00:33:26.106	
phc2sys[150297.077]: [ptp4l.0.config] Waiting for ptp4l...
2025-06-02 00:33:26.548	
I0602 00:33:26.548933  393813 daemon.go:745] Starting ptp4l...
2025-06-02 00:33:26.548	
I0602 00:33:26.548904  393813 daemon.go:844] Recreating ptp4l...
2025-06-02 00:33:26.548	
I0602 00:33:26.548942  393813 daemon.go:746] ptp4l cmd: /bin/chrt -f 10 /usr/sbin/ptp4l -f /var/run/ptp4l.0.config  -s -m 
2025-06-02 00:33:26.549	

2025-06-02 00:33:26.549	
I0602 00:33:26.548998  393813 daemon.go:674] ptp4l[1748824406]:[ptp4l.0.config] PTP_PROCESS_STATUS:1
2025-06-02 00:33:26.550	
failed to parse configuration file /var/run/ptp4l.0.config
2025-06-02 00:33:26.550	
line 72: missing table_id
```

This was fixed by a simple pod restart - configurations were not changed, whatever the problem was, was likely a temporary condition caused by the upgrade. Another node in the same cluster with the same pod and same configurations never had this issue.

Expected results:

    Pod should automatically know it is erred and restart.

Additional info:

Assignee:: Joseph Richard

Reporter:: Tyler Walker

Need Info From:: None

Contributors:: None

QA Contact:: Kirsten Laskoski

Doc Contact:: None

Votes:: 0 Vote for this issue

Watchers:: 8 Start watching this issue

Created:: 2025/06/04 4:44 PM

Updated:: 2025/09/30 7:12 PM

Details

Description

Attachments

Easy Agile Planning Poker

Activity

People

Dates