-
Bug
-
Resolution: Unresolved
-
Normal
-
None
-
4.19
-
None
Description of problem:
Getting alert for node-tuning operator down for all HCP clusters
Version-Release number of selected component (if applicable):
4.19
How reproducible:
Always
Additional Details:
The hub cluster shows that node tuning operator is down for all the hosted cluster. However, the operator is up and working fine in the hosted cluster
$ oc get servicemonitor/node-tuning-operator -n clusters-test -ojson | jq .spec.selector { "matchLabels": { "hypershift.openshift.io/control-plane-component": "cluster-node-tuning-operator", "name": "node-tuning-operator" } } $ oc get svc -l hypershift.openshift.io/control-plane-component=cluster-node-tuning-operator -n clusters-test NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE node-tuning-operator ClusterIP None <none> 60000/TCP 164m $ oc get svc -l hypershift.openshift.io/control-plane-component=cluster-node-tuning-operator -n clusters-test NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE node-tuning-operator ClusterIP None <none> 60000/TCP 19h $ oc -n clusters-test rsh cluster-node-tuning-operator-867f8dcc55-9lv6g sh-5.1$ curl localhost:60000 curl: (7) Failed to connect to localhost port 60000: Connection refused sh-5.1$ ss -tunlp Netid State Recv-Q Send-Q Local Address:Port Peer Address:Port Process tcp LISTEN 0 0 *:8080 *:* users:(("cluster-node-tu",pid=1,fd=6))
So in short the metrics endpoint is exposed over the port :8080/metrics not on port 60000.
My assumption is that fix should be through the service yaml