Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-72596

Getting node-tuning-operator down alert in HCP cluster

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Normal Normal
    • None
    • 4.19
    • HyperShift
    • None
    • None
    • False
    • Hide

      None

      Show
      None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      Description of problem:

      Getting alert for node-tuning operator down for all HCP clusters

      Version-Release number of selected component (if applicable):

      4.19   

      How reproducible:

       Always

      Additional Details:

      The hub cluster shows that node tuning operator is down for all the hosted cluster. However, the operator is up and working fine in the hosted cluster

       

      $ oc get servicemonitor/node-tuning-operator -n clusters-test -ojson | jq .spec.selector
      {
        "matchLabels": {
          "hypershift.openshift.io/control-plane-component": "cluster-node-tuning-operator",
          "name": "node-tuning-operator"
        }
      }
      
      $ oc get svc -l hypershift.openshift.io/control-plane-component=cluster-node-tuning-operator -n clusters-test
      NAME                   TYPE        CLUSTER-IP   EXTERNAL-IP   PORT(S)     AGE
      node-tuning-operator   ClusterIP   None         <none>        60000/TCP   164m 
      
      $ oc get svc -l hypershift.openshift.io/control-plane-component=cluster-node-tuning-operator -n clusters-test
      NAME                   TYPE        CLUSTER-IP   EXTERNAL-IP   PORT(S)     AGE
      node-tuning-operator   ClusterIP   None         <none>        60000/TCP   19h
      
      $ oc -n clusters-test rsh cluster-node-tuning-operator-867f8dcc55-9lv6g
      
      sh-5.1$ curl localhost:60000
      curl: (7) Failed to connect to localhost port 60000: Connection refused
      
      sh-5.1$ ss -tunlp 
      Netid              State               Recv-Q              Send-Q                           Local Address:Port                           Peer Address:Port             Process             
      tcp                LISTEN              0                   0                                            *:8080                                      *:*                 users:(("cluster-node-tu",pid=1,fd=6))

      So in short the metrics endpoint is exposed over the port :8080/metrics not on port 60000. 

       

      My assumption is that fix should be through the service yaml 

      https://github.com/openshift/hypershift/blob/main/control-plane-operator/controllers/hostedcontrolplane/v2/assets/cluster-node-tuning-operator/service.yaml#L10-L12

      https://github.com/openshift/hypershift/blob/main/control-plane-operator/controllers/hostedcontrolplane/v2/assets/cluster-node-tuning-operator/servicemonitor.yaml#L14

       

       

              jparrill@redhat.com Juan Manuel Parrilla Madrid
              rhn-support-hchaturv Himank Chaturvedi
              None
              None
              Jim Ma Jim Ma
              None
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

                Created:
                Updated: