Uploaded image for project: 'RH-SSO'
  1. RH-SSO
  2. RHSSO-3277

'rhsso-operator-metrics/rhsso-operator-metrics targets' alert after updating to 'rhsso-operator.7.6.12-opr-001'

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Major Major
    • None
    • 7.6.12
    • OpenShift - xPaaS
    • None
    • False
    • Hide

      None

      Show
      None
    • False

      After updating to ''rhsso-operator.7.6.12-opr-001'' the following alert is reported:

      - alerts:
        - activeAt: "2025-08-21T14:07:13.890274457Z"
          annotations:
            description: 100% of the rhsso-operator-metrics/rhsso-operator-metrics targets
              in nn-rhsso namespace have been unreachable for more than 15 minutes. This
              may be a symptom of network connectivity issues, down nodes, or failures within
              these components. Assess the health of the infrastructure and nodes running
              these targets and then contact support.
            runbook_url: https://github.com/openshift/runbooks/blob/master/alerts/cluster-monitoring-operator/TargetDown.md
            summary: Some targets were not reachable from the monitoring server for an extended
              period of time.
          labels:
            alertname: TargetDown
            job: rhsso-operator-metrics
            namespace: nn-rhsso
            service: rhsso-operator-metrics
            severity: warning
          state: firing
          value: "1e+02"
        annotations:
          description: '{{ printf "%.4g" $value }}% of the {{ $labels.job }}/{{ $labels.service
            }} targets in {{ $labels.namespace }} namespace have been unreachable for more
            than 15 minutes. This may be a symptom of network connectivity issues, down
            nodes, or failures within these components. Assess the health of the infrastructure
            and nodes running these targets and then contact support.'
          runbook_url: https://github.com/openshift/runbooks/blob/master/alerts/cluster-monitoring-operator/TargetDown.md
          summary: Some targets were not reachable from the monitoring server for an extended
            period of time.
        duration: 900
        evaluationTime: 0.014825242
        health: ok
        keepFiringFor: 0
        labels:
          severity: warning
        lastEvaluation: "2025-08-26T12:19:43.89164352Z"
        name: TargetDown
        query: 100 * ((1 - sum by (job, namespace, service) (up and on (namespace, pod)
          kube_pod_info) / count by (job, namespace, service) (up and on (namespace, pod)
          kube_pod_info)) or (count by (job, namespace, service) (up == 0) / count by (job,
          namespace, service) (up))) > 10
        state: firing
        type: alerting 

       

      The problem is that the service and the ServiceMonitor have tcp/8383 and tcp/86686 ports configured but the rhsso-operator only listen on tcp/8181:

      • service
      apiVersion: v1
        kind: Service
        metadata:
          creationTimestamp: "2025-09-03T15:38:19Z"
          labels:
            monitoring-key: middleware
            name: rhsso-operator
          name: rhsso-operator-metrics
          namespace: <namespace>
          ownerReferences:
          - apiVersion: apps/v1
            blockOwnerDeletion: true
            controller: true
            kind: Deployment
            name: rhsso-operator
            uid: 9b0716b2-5477-4527-8b7b-3a1abfbd306c
          resourceVersion: "1154230"
          uid: cd447a64-7e8a-4dcd-ae42-091d9ed254d6
        spec:
          clusterIP: 172.30.73.224
          clusterIPs:
          - 172.30.73.224
          internalTrafficPolicy: Cluster
          ipFamilies:
          - IPv4
          ipFamilyPolicy: SingleStack
          ports:
          - name: http-metrics
            port: 8383
            protocol: TCP
            targetPort: 8383
          - name: cr-metrics
            port: 8686
            protocol: TCP
            targetPort: 8686
          selector:
            name: rhsso-operator
          sessionAffinity: None
          type: ClusterIP
      • operator listening ports:

       

       

      oc debug node/<node> 
      sh-5.1# chroot /host 
      sh-5.1# NS=<namespace> 
      sh-5.1# POD=<rhsso-operator-pod-name> 
      sh-5.1# POD_ID=$( crictl pods --namespace=$NS --name=$POD -o json | jq -r '.items [].id' ) 
      sh-5.1# crictl inspectp $POD_ID | jq -r '.info.runtimeSpec.linux.namespaces[] | select( .type=="network" ) | .path' /var/run/netns/d900d288-f7c9-4ab0-b584-4e92928a3c43 
      sh-5.1# nsenter --net=/var/run/netns/d900d288-f7c9-4ab0-b584-4e92928a3c43 
      ss -tulpn 
      Netid State Recv-Q Send-Q Local Address:Port Peer Address:Port Process 
      tcp LISTEN 0 4096 :8081 *: users("manager",pid=3503301,fd=3))
       
      

       

      • on previous versions, for example 'rhsso-operator.7.6.11-opr-006' we see that rhsso-operator listens on tcp/8383 and tcp/8686 ports, the ports defined in the service and ServiceMonitor:
      NS=<namespace>
      POD=<rhsso-operator-pod-name>
      
      sh-5.1# POD_ID=$( crictl pods --namespace=$NS --name=$POD -o json |  jq -r '.items[].id' )
      sh-5.1# crictl inspectp $POD_ID | jq -r '.info.runtimeSpec.linux.namespaces[] | select( .type=="network" ) | .path'
      /var/run/netns/c486e14c-3c50-4a93-b541-6679a12e92c1
      sh-5.1# nsenter --net=/var/run/netns/c486e14c-3c50-4a93-b541-6679a12e92c1 ss -tulpn 
      Netid    State     Recv-Q    Send-Q        Local Address:Port         Peer Address:Port    Process                                          
      tcp      LISTEN    0         4096                      *:8383                    *:*        users:(("keycloak-operat",pid=3482046,fd=5))    
      tcp      LISTEN    0         4096                      *:8686                    *:*        users:(("keycloak-operat",pid=3482046,fd=6))  

       

      Thank you!

       

              Unassigned Unassigned
              rhn-support-malonso Maria Del Mar Alonso
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

                Created:
                Updated: