Uploaded image for project: 'OpenShift Builds'
  1. OpenShift Builds
  2. BUILD-1146

TargetDown Alert on shared-resource-csi-driver-node-metrics Service if Cluster Monitoring is Enabled

XMLWordPrintable

    • False
    • None
    • False
    • Hide
      - Cause: When cluster monitoring is enabled in the openshift builds namespace
      - Consequence: TargetDown Alert on shared-resource-csi-driver-node-metrics Service is received
      - Fix: ServerName in CSI Driver Metrics Service Monitor is updated to correct address
      - Result: Cluster monitoring can query the correct address
      Show
      - Cause: When cluster monitoring is enabled in the openshift builds namespace - Consequence: TargetDown Alert on shared-resource-csi-driver-node-metrics Service is received - Fix: ServerName in CSI Driver Metrics Service Monitor is updated to correct address - Result: Cluster monitoring can query the correct address
    • Bug Fix
    • Done
    • Builds Sprint #19, Builds Sprint #20
    • 2
    • Moderate
    • Customer Facing

      Description of problem:

      When installing Build for Red Hat OpenShift operator 1.1 in OCP 4.16.16, we get this permanent alert message:
      "100% of the shared-resource-csi-driver-node-metrics/shared-resource-csi-driver-node-metrics targets in openshift-builds namespace have been unreachable for more than 15 minutes. This may be a symptom of network connectivity issues, down nodes, or failures within these components. Assess the health of the infrastructure and nodes running these targets and then contact support."
      Please find the attached must-gather.

      Prerequisites (if any, like setup, operators/versions):

      Steps to Reproduce

      1. Install OCP 4.16.16
      2. Install Build for Red Hat OpenShift 1.1
      3. Enable cluster-monitoring in openshift-builds namespace:
      apiVersion: v1
      kind: Namespace
      metadata:
        labels:
          openshift.io/cluster-monitoring: "true"
        name: openshift-builds 

       

      Actual results:

      The following TargetDown alerts is raised:

      "100% of the shared-resource-csi-driver-node-metrics/shared-resource-csi-driver-node-metrics targets in openshift-builds namespace have been unreachable for more than 15 minutes. This may be a symptom of network connectivity issues, down nodes, or failures within these components. Assess the health of the infrastructure and nodes running these targets and then contact support."

      Expected results: No alert

      Reproducibility (Always/Intermittent/Only Once): Always

      Acceptance criteria: 

       

      Definition of Done:

      Build Details:

      Additional info (Such as Logs, Screenshots, etc):

      The must-gather attached

       *

              avinkuma@redhat.com Avinal Kumar
              laurent.tourreau Laurent TOURREAU
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

                Created:
                Updated:
                Resolved: