Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-55309

RH PTP pods exporting metrics for non-existing interfaces resulting in Grafana/Prometheus showing interfaces with openshift_ptp_offset_ns as 999999

XMLWordPrintable

    • Quality / Stability / Reliability
    • False
    • Hide

      None

      Show
      None
    • None
    • Important
    • None
    • 7/03: 4.14 code merged , but not verified
    • None
    • None
    • None
    • In Progress
    • Bug Fix
    • Hide
      *Cause*: The summary metrics were not being masked correctly.
      *Consequence*: When a port went into a faulty state, the part of the code responsible for setting the fault offset correctly masked the name. However, because the initial summary metrics weren't masked, a new interface appeared with a value that remained unchanged, effectively displaying incorrect/stale data.
      *Fix*: The masking/aliasing of the summary metrics was implemented.
      *Result*: Bug doesn’t present anymore.
      Show
      *Cause*: The summary metrics were not being masked correctly. *Consequence*: When a port went into a faulty state, the part of the code responsible for setting the fault offset correctly masked the name. However, because the initial summary metrics weren't masked, a new interface appeared with a value that remained unchanged, effectively displaying incorrect/stale data. *Fix*: The masking/aliasing of the summary metrics was implemented. *Result*: Bug doesn’t present anymore.

      Description of problem:

      RH PTP pod linux-ptp-daemon-xx pod is exporting metrics for non-existing interfaces when kept on running for more than 2-3 hours

      Version-Release number of selected component (if applicable):

      OCP v4.14

      How reproducible:

      Happening on customer environment.

      Steps to Reproduce:

      1. Apply PtpConfig mentioning VLAN interfaces e.g. ens8f0np0.20, ens9f1np1.400
      2. Next, wait for few hours & watch Prometheus/Grafan logs
      3. See the non-existing interfaces being reported as 999999 ns in Grafana.
          

      Actual results:

      Unknown interface are seen with high openshift_ptp_offset_ns value

      Expected results:

      To only see the interfaces mentioned in PtpConfig

      Additional info:

      As a workaround, I had suggested them to create a node-level Prometheus filter rule & that had helped them get the correct metrics as desired.
      openshift_ptp_offset_ns{interface=~"ens9f1np1.400|ens8f0np0.20"}
      
      This confirms the that incorrect metrics are also being exported by ptp4l for some unknown interfaces.

              micosta@redhat.com Michele Tomaso Costa
              rhn-support-adubey Akash Dubey
              None
              None
              Bonnie Block Bonnie Block
              None
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

                Created:
                Updated:
                Resolved: