Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-57028

CLOCK_REALTIME state reports 1 (locked) when both active and inactive interfaces are down in BC/OC HA environment

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Done-Errata
    • Icon: Undefined Undefined
    • None
    • 4.19
    • Networking / ptp
    • Quality / Stability / Reliability
    • False
    • Hide

      None

      Show
      None
    • None
    • Important
    • Yes
    • 6/5: PR waiting to be reviewed
    • None
    • CNF RAN Sprint 271, CNF RAN Sprint 272
    • 2
    • Done
    • Release Note Not Required
    • N/A
    • None
    • None
    • None
    • None

      This is a clone of issue OCPBUGS-53247. The following is the description of the original issue:

      Description of problem:

      ptp test case failure in weekend CI

      Environment is BC/OC HA:
      PTP configs:
      bc1: with ens3f2 as slave interface
      bc2: with ens2f0 as slave interface
      slave-bc1 (ens1f0): downstream slave for bc1
      slave-bc2 (ens1f1): downstream slave for bc2
      boundary-ha haProfiles: bc1,bc2

      When active and inactive slave interfaces are down, CLOCK_REALTIME continues to show as LOCKED.

      Metrics show CLOCK_REALTIME and downstream slave reported LOCKED while bc1 and bc2 slave interfaces were down:

      # TYPE openshift_ptp_clock_state gauge
      openshift_ptp_clock_state{iface="CLOCK_REALTIME",node="helix73.telcoqe.eng.rdu2.dc.redhat.com",process="phc2sys"} 1
      openshift_ptp_clock_state{iface="ens1fx",node="helix73.telcoqe.eng.rdu2.dc.redhat.com",process="ptp4l"} 1
      openshift_ptp_clock_state{iface="ens2fx",node="helix73.telcoqe.eng.rdu2.dc.redhat.com",process="ptp4l"} 0
      openshift_ptp_clock_state{iface="ens3fx",node="helix73.telcoqe.eng.rdu2.dc.redhat.com",process="ptp4l"} 0 

      Version-Release number of selected component (if applicable):

      Cluster version is 4.19.0-0.nightly-2025-03-14-061055
      ptp-operator.v4.19.0-202503132339    

      How reproducible:

      100%    

      Steps to Reproduce:

          1.  Deploy 4.19 SNO with ptp HA BC/OC configuration
          2.  Bring down both bc1 and bc2 slave interfaces via ip link set
          3.  Monitor ptp_clock_state in metrics
          

      Actual results:

      CLOCK_REALTIME and downstream slave reported LOCKED while bc1 and bc2 slave interfaces were down

      Expected results:

      CLOCK_REALTIME should be in FREERUN while bc1 and bc2 slave interfaces were down    

      Additional info:

      CI test log:

        should move to FREERUN state when active and inactive interfaces are down [73094, test_id:73094]
        /var/lib/jenkins/workspace/ocp-far-edge-vran-tests/cnf-gotests/test/ran/ptp/tests/ptp_interfaces.go:266
        > Enter [BeforeEach] PTP Events and Metrics - interface down - /var/lib/jenkins/workspace/ocp-far-edge-vran-tests/cnf-gotests/test/ran/ptp/tests/ptp_interfaces.go:40 @ 03/16/25 22:28:15.646
      2025/03/16 22:28:20 Reached PTP clock state 1 for all interfaces
        < Exit [BeforeEach] PTP Events and Metrics - interface down - /var/lib/jenkins/workspace/ocp-far-edge-vran-tests/cnf-gotests/test/ran/ptp/tests/ptp_interfaces.go:40 @ 03/16/25 22:28:20.826 (5.179s)
        > Enter [It] should move to FREERUN state when active and inactive interfaces are down - /var/lib/jenkins/workspace/ocp-far-edge-vran-tests/cnf-gotests/test/ran/ptp/tests/ptp_interfaces.go:266 @ 03/16/25 22:28:20.826
      2025/03/16 22:28:20 found profile bc1 with status 1
      2025/03/16 22:28:20 found profile bc2 with status 0
        STEP: getting the active and inactive ha interfaces - /var/lib/jenkins/workspace/ocp-far-edge-vran-tests/cnf-gotests/test/ran/ptp/tests/ptp_interfaces.go:283 @ 03/16/25 22:28:20.834
        STEP: checking the interfaces are not ocp interfaces - /var/lib/jenkins/workspace/ocp-far-edge-vran-tests/cnf-gotests/test/ran/ptp/tests/ptp_interfaces.go:295 @ 03/16/25 22:28:21.234
        STEP: bringing active and inactive interfaces down - /var/lib/jenkins/workspace/ocp-far-edge-vran-tests/cnf-gotests/test/ran/ptp/tests/ptp_interfaces.go:307 @ 03/16/25 22:28:21.234
      2025/03/16 22:28:22 ens3f2 is successfully set to down
      2025/03/16 22:28:23 ens2f0 is successfully set to down
        STEP: validating ptp4l clock states are FREERUN - /var/lib/jenkins/workspace/ocp-far-edge-vran-tests/cnf-gotests/test/ran/ptp/tests/ptp_interfaces.go:317 @ 03/16/25 22:28:23.695
      2025/03/16 22:28:49 Reached PTP clock state 0 for ens3fx for 10s seconds
      2025/03/16 22:29:04 Reached PTP clock state 0 for ens2fx for 10s seconds
      2025/03/16 22:30:14 PTP clock states metrics:
      ptp_clock_state is 1 for iface CLOCK_REALTIME and process phc2sys
      ptp_clock_state is 1 for iface ens1fx and process ptp4l  [FAILED] Unexpected error:
            <*errors.errorString | 0xc001302780>: 
            openshift_ptp_clock_state has value 1 for ens1fx ptp4l, that is different than expected: 0
            {
                s: "openshift_ptp_clock_state has value 1 for ens1fx ptp4l, that is different than expected: 0",    

       

              aputtur@redhat.com Aneesh Puttur
              openshift-crt-jira-prow OpenShift Prow Bot
              None
              None
              Bonnie Block Bonnie Block
              None
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

                Created:
                Updated:
                Resolved: