Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-20017

linuxptp-daemon-container log flooded with 'uds port: management forward failed' entries

    XMLWordPrintable

Details

    • Important
    • No
    • CNF RAN Sprint 243
    • 1
    • False
    • Hide

      None

      Show
      None
    • N/A
    • Release Note Not Required
    • Hide
      9/28: PR ready
      9/26: Build dev version to update PMC with b0 and deploying to QE node to verify fixing UDS flooding. Also ptp4l upstream patch is created to change log to debug level
      9/21: issues was reported only in 4.14 and reproduce able only in qe22
      9/19: In discussion with linuxptp team to understand usage of event socket.
      9/14: New test build with uds_ro_address update for clock class event is avaiabled , spike task in progress
      9/11: Spike task on progress to use uds_ro_address ptp event socket to subscribe to clock class changes
      9/2/2023: PMC command is causing this, code has been refactored to reduce pmc calls More testing is required this week. Yellow
      8/28 triaged the bug and the issue is with the PMC call in BC/OC. PR will be ready this week ; g r e e n
      8/23 Assigned to QE to verify if this is still an issue after OCPBUGS-17971 is merged
      8/14 potentially a duplicate of OCPBUGS-17539. We will retest once this bug is addressed.
      8/09: Jack is investigating further
      Show
      9/28: PR ready 9/26: Build dev version to update PMC with b0 and deploying to QE node to verify fixing UDS flooding. Also ptp4l upstream patch is created to change log to debug level 9/21: issues was reported only in 4.14 and reproduce able only in qe22 9/19: In discussion with linuxptp team to understand usage of event socket. 9/14: New test build with uds_ro_address update for clock class event is avaiabled , spike task in progress 9/11: Spike task on progress to use uds_ro_address ptp event socket to subscribe to clock class changes 9/2/2023: PMC command is causing this, code has been refactored to reduce pmc calls More testing is required this week. Yellow 8/28 triaged the bug and the issue is with the PMC call in BC/OC. PR will be ready this week ; g r e e n 8/23 Assigned to QE to verify if this is still an issue after OCPBUGS-17971 is merged 8/14 potentially a duplicate of OCPBUGS-17539 . We will retest once this bug is addressed. 8/09: Jack is investigating further

    Description

      This is a clone of issue OCPBUGS-17513. The following is the description of the original issue:

      Description of problem:

      On a 4.14 setup the linuxptp-daemon gets flooded with a lot(91192 in ~8h) of `uds port: management forward failed` entries.

      Version-Release number of selected component (if applicable):

      ptp-operator.v4.14.0-202308022126

      How reproducible:

      100%

      Steps to Reproduce:

      1. oc -n openshift-ptp logs ds/linuxptp-daemon -c linuxptp-daemon-container | grep 'uds port: management forward failed' | wc -l  

      Actual results:

      24373

      Expected results:

      No failures.

      Additional info:

      In addition there also are a couple of `send failed: <nil>` occurrences though these are less frequent.
      
      ptpconfig:
      
      apiVersion: v1
      items:
      - apiVersion: ptp.openshift.io/v1
        kind: PtpConfig
        metadata:
          creationTimestamp: "2023-08-09T10:14:23Z"
          generation: 3
          name: boundary
          namespace: openshift-ptp
          resourceVersion: "75060"
          uid: 88154072-0caf-42a0-bdc4-66f5cce2c030
        spec:
          profile:
          - name: boundary
            phc2sysOpts: -a -r -n 24
            ptp4lConf: |
              [ens1f1]
              masterOnly 0
              [ens1f0]
              masterOnly 1
              [ens1f2]
              masterOnly 1
              [ens1f3]
              masterOnly 1        [global]
              #
              # Default Data Set
              #
              twoStepFlag 1
              slaveOnly 0
              priority1 128
              priority2 128
              domainNumber 24
              #utc_offset 37
              clockClass 248
              clockAccuracy 0xFE
              offsetScaledLogVariance 0xFFFF
              free_running 0
              freq_est_interval 1
              dscp_event 0
              dscp_general 0
              dataset_comparison G.8275.x
              G.8275.defaultDS.localPriority 128
              #
              # Port Data Set
              #
              logAnnounceInterval -3
              logSyncInterval -4
              logMinDelayReqInterval -4
              logMinPdelayReqInterval -4
              announceReceiptTimeout 3
              syncReceiptTimeout 0
              delayAsymmetry 0
              fault_reset_interval -4
              neighborPropDelayThresh 20000000
              masterOnly 0
              G.8275.portDS.localPriority 128
              #
              # Run time options
              #
              assume_two_step 0
              logging_level 6
              path_trace_enabled 0
              follow_up_info 0
              hybrid_e2e 0
              inhibit_multicast_service 0
              net_sync_monitor 0
              tc_spanning_tree 0
              tx_timestamp_timeout 50
              unicast_listen 0
              unicast_master_table 0
              unicast_req_duration 3600
              use_syslog 1
              verbose 0
              summary_interval 0
              kernel_leap 1
              check_fup_sync 0
              clock_class_threshold 7
              #
              # Servo Options
              #
              pi_proportional_const 0.0
              pi_integral_const 0.0
              pi_proportional_scale 0.0
              pi_proportional_exponent -0.3
              pi_proportional_norm_max 0.7
              pi_integral_scale 0.0
              pi_integral_exponent 0.4
              pi_integral_norm_max 0.3
              step_threshold 2.0
              first_step_threshold 0.00002
              max_frequency 900000000
              clock_servo pi
              sanity_freq_limit 200000000
              ntpshm_segment 0
              #
              # Transport options
              #
              transportSpecific 0x0
              ptp_dst_mac 01:1B:19:00:00:00
              p2p_dst_mac 01:80:C2:00:00:0E
              udp_ttl 1
              udp6_scope 0x0E
              uds_address /var/run/ptp4l
              #
              # Default interface options
              #
              clock_type BC
              network_transport L2
              delay_mechanism E2E
              time_stamping hardware
              tsproc_mode filter
              delay_filter moving_median
              delay_filter_length 10
              egressLatency 0
              ingressLatency 0
              boundary_clock_jbod 0
              #
              # Clock description
              #
              productDescription ;;
              revisionData ;;
              manufacturerIdentity 00:00:00
              userDescription ;
              timeSource 0xA0
            ptp4lOpts: -2 --summary_interval -4
            ptpSchedulingPolicy: SCHED_FIFO
            ptpSchedulingPriority: 10
            ptpSettings:
              logReduce: "true"
          recommend:
          - match:
            - nodeLabel: node-role.kubernetes.io/master
            priority: 4
            profile: boundary
      kind: List
      metadata:
        resourceVersion: ""

      Attachments

        Issue Links

          Activity

            People

              jacding@redhat.com Jack Ding
              openshift-crt-jira-prow OpenShift Prow Bot
              Marius Cornea Marius Cornea
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: