Uploaded image for project: 'Red Hat Advanced Cluster Management'
  1. Red Hat Advanced Cluster Management
  2. ACM-12707

observabilityaddon degraded due to tls: failed to verify certificate: x509: certificate signed by unknown authority on new installed SNOs during ZTP Scale Testing


    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Normal Normal
    • None
    • ACM 2.11.0
    • Observability
    • 1
    • False
    • None
    • False
    • Moderate
    • None

      Description of problem:

      We observed 3 0f 3628 managed SNOs shows observabilityaddon degraded as shown below.

      # oc get observabilityaddon -A -ojson | jq -r '.items[] | "(.status.conditions[] | select(.type=="Degraded" and .status=="True").lastTransitionTime) (.metadata.namespace)"'

      {{ 2024-07-11T20:02:51Z vm01681 }}

      {{2024-07-11T19:34:23Z vm03095 }}

      {{2024-07-11T16:47:40Z vm03544 }}


      These three clusters are not shown in graphana UI. in the metrics-collector pod log,metrics-collector-deployment_pod.log, we see:

      level=error caller=logger.go:60 ts=2024-07-11T19:34:23.212608332Z component=collectrule/evaluator msg="failed to evaluate collect rule" err="Get \"https://prometheus-k8s.openshift-monitoring.svc:9091/api/v1/query?query=%281+-avg%28rate%28node_cpu_seconds_total%7Bmode%3D%22idle%22%7D%5B5m%5D%29%29%29%2A+100+%3E+70\": tls: failed to verify certificate: x509: certificate signed by unknown authority" rule="(1 - avg(rate(node_cpu_seconds_total{mode=\"idle\"}[5m]))) * 100 > 70"

      Talked to rh-ee-coquadro , was suggested to delete the observability-controller-open-cluster-management.io-observability-signer-client-cert. then the pod was recreated and the cluster was  connected to obs server.

      Version-Release number of selected component (if applicable):

      How reproducible:

      Steps to Reproduce:

      3. ...

      Actual results:

      Expected results:

      Additional info:

        1. hub-acm-must-gather.tar.gz
          105.41 MB
          Alex Krzos
        2. hub-acm-must-gather-2.tar.gz
          121.61 MB
          Alex Krzos
        3. metrics-collector-deployment_pod.log
          24.01 MB
          Ting Xue
        4. vm00024-obs-acm-must-gather.tar.gz
          3.20 MB
          Alex Krzos
        5. vm00024-obs-ocp-must-gather.tar.gz
          31.64 MB
          Alex Krzos
        6. vm00767-obsaddon-degraded-must-gather.tar.gz
          19.28 MB
          Alex Krzos
        7. vm01296-obsaddon-degraded-must-gather.tar.gz
          19.01 MB
          Alex Krzos

              rh-ee-coquadro Coleen Iona Quadros
              rhn-support-txue Ting Xue
              Xiang Yin Xiang Yin
              ACM QE Team
              0 Vote for this issue
              3 Start watching this issue
