Loading...

XML

Word

Printable

Type: Bug
Resolution: Done
Priority: Normal
Fix Version/s: 4.22.0
Affects Version/s: 4.18, 4.19, 4.20
Component/s: Node Tuning Operator
Labels:
None

Activity Type:
Quality / Stability / Reliability
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Story Points:
None
Severity:
Moderate
Regression:
None

Target Backport Versions:

4.18, 4.19, 4.20
Target Version:

4.21.z
Release Blocker:
None
Sprint:
None

RH Private Keywords:

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

PX Impact Score:

Release Note Status:
Done
Release Note Type:
Bug Fix
Release Note Text:

Hide
* Cause – cluster-node-tuning-operator Pod of the hosted cluster is not listening on 60000 port
* Consequence – No metrics for the cluster-node-tuning-operator Pod in hosted clusters
* Fix – Enable the NTO metrics server in HyperShift's hosted clusters
* Result – Metrics available for the cluster-node-tuning-operator Pod in hosted clusters

Show
* Cause – cluster-node-tuning-operator Pod of the hosted cluster is not listening on 60000 port * Consequence – No metrics for the cluster-node-tuning-operator Pod in hosted clusters * Fix – Enable the NTO metrics server in HyperShift's hosted clusters * Result – Metrics available for the cluster-node-tuning-operator Pod in hosted clusters

Escape Reason:
None
Escape Impact:
None
Corrective Measures:
None
SDLC stage when should've been found:
None

Description of problem:

The cluster-node-tuning-operator Pod of the hosted cluster located in the clusters-<HostedCluster> namespace is not listening on 60000 port, which makes the target to be down.

$ oc -n clusters-yhe-hosted get servicemonitor node-tuning-operator -o yaml
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  creationTimestamp: "2025-04-25T10:24:25Z"
  generation: 1
  name: node-tuning-operator
  namespace: clusters-yhe-hosted
  ownerReferences:
  - apiVersion: hypershift.openshift.io/v1beta1
    blockOwnerDeletion: true
    controller: true
    kind: HostedControlPlane
    name: yhe-hosted
    uid: 3b3d9487-8360-45e1-90ef-5ceab0782bc5
  resourceVersion: "97791"
  uid: 4ef2e56b-ce66-42cd-9ccd-41b5e9adb5b0
spec:
  endpoints:
  - metricRelabelings:
    - action: keep
      regex: nto_profile_calculated_total
      sourceLabels:
      - __name__
    - action: replace
      replacement: 63bed89b-b7bc-4687-9737-5e509989b333
      targetLabel: _id
    path: /metrics
    relabelings:
    - action: replace
      replacement: 63bed89b-b7bc-4687-9737-5e509989b333
      targetLabel: _id
    scheme: https
    targetPort: 60000
    tlsConfig:
      ca:
        configMap:
          key: ca.crt
          name: root-ca
      cert:
        secret:
          key: tls.crt
          name: metrics-client
      keySecret:
        key: tls.key
        name: metrics-client
      serverName: node-tuning-operator.clusters-yhe-hosted.svc
  namespaceSelector:
    matchNames:
    - clusters-yhe-hosted
  selector:
    matchLabels:
      hypershift.openshift.io/control-plane-component: cluster-node-tuning-operator
      name: node-tuning-operator 

$ oc -n clusters-yhe-hosted get svc node-tuning-operator -o yaml
apiVersion: v1
kind: Service
metadata:
  creationTimestamp: "2025-04-25T10:08:19Z"
  labels:
    hypershift.openshift.io/control-plane-component: cluster-node-tuning-operator
    name: node-tuning-operator
  name: node-tuning-operator
  namespace: clusters-yhe-hosted
  ownerReferences:
  - apiVersion: hypershift.openshift.io/v1beta1
    blockOwnerDeletion: true
    controller: true
    kind: HostedControlPlane
    name: yhe-hosted
    uid: 3b3d9487-8360-45e1-90ef-5ceab0782bc5
  resourceVersion: "87670"
  uid: b6469241-f657-4f73-ac54-aaa144c617e4
spec:
  clusterIP: None
  clusterIPs:
  - None
  internalTrafficPolicy: Cluster
  ipFamilies:
  - IPv4
  ipFamilyPolicy: SingleStack
  ports:
  - name: metrics
    port: 60000
    protocol: TCP
    targetPort: 60000
  selector:
    name: cluster-node-tuning-operator
  sessionAffinity: None
  type: ClusterIP
status:
  loadBalancer: {}

$ oc -n clusters-yhe-hosted get pod cluster-node-tuning-operator-87c75fd87-7shj4 -o wide
NAME                                           READY   STATUS    RESTARTS   AGE   IP            NODE                                            NOMINATED NODE   READINESS GATES
cluster-node-tuning-operator-87c75fd87-7shj4   1/1     Running   1          21h   10.129.0.97   ip-10-0-89-80.ap-northeast-1.compute.internal   <none>           <none>

$ oc -n clusters-yhe-hosted rsh cluster-node-tuning-operator-87c75fd87-7shj4
sh-5.1$ ss -tnlp
State    Recv-Q   Send-Q     Local Address:Port       Peer Address:Port   Process
LISTEN   0        0                      *:8080                  *:*       users:(("cluster-node-tu",pid=1,fd=7))

$ oc -n openshift-user-workload-monitoring rsh prometheus-user-workload-0
sh-5.1$ curl -k --cacert /etc/prometheus/certs/1_clusters-yhe-hosted_root-ca_ca.crt --cert /etc/prometheus/certs/0_clusters-yhe-hosted_metrics-client_tls.crt --key /etc/prometheus/certs/0_clusters-yhe-hosted_metrics-client_tls.key --resolve node-tuning-operator.clusters-yhe-hosted.svc:60000:10.129.0.97 https://node-tuning-operator.clusters-yhe-hosted.svc:60000/metrics
curl: (7) Failed to connect to node-tuning-operator.clusters-yhe-hosted.svc port 60000: Connection refused

Version-Release number of selected component (if applicable):

N/A

How reproducible:

Always

Steps to Reproduce:

1. Create Hosted Cluster
2. Check the port the cluster-node-tuning-operator Pod is listening on

Actual results:

Only 8080 port is shown

Expected results:

8080,4343, and 60000 ports are shown

Additional info:

As a comparison, the result in a normal OCP cluster is as follows:

$ oc -n openshift-cluster-node-tuning-operator rsh cluster-node-tuning-operator-78c7d9d6d6-m8k9m
sh-5.1$ ss -tnlp
State     Recv-Q    Send-Q         Local Address:Port          Peer Address:Port    Process
LISTEN    0         0                          *:60000                    *:*        users:(("cluster-node-tu",pid=1,fd=14))
LISTEN    0         0                          *:4343                     *:*        users:(("cluster-node-tu",pid=1,fd=9))
LISTEN    0         0                          *:8080                     *:*        users:(("cluster-node-tu",pid=1,fd=7))

$ oc -n openshift-monitoring rsh prometheus-k8s-0
sh-5.1$ curl -k --cacert /etc/prometheus/configmaps/serving-certs-ca-bundle/servic
e-ca.crt --cert /etc/prometheus/secrets/metrics-client-certs/tls.crt --key /etc/
prometheus/secrets/metrics-client-certs/tls.key --resolve node-tuning-operator.ope
nshift-cluster-node-tuning-operator.svc:60000:10.128.0.22 https://node-tuning-oper
ator.openshift-cluster-node-tuning-operator.svc:60000/metrics
# HELP nto_build_info A metric with a constant '1' value labeled version from which Node Tuning Operator was built.
# TYPE nto_build_info gauge
nto_build_info{version="v4.18.0-202502110432.p0.gb707be6.assembly.stream.el9-0-g743b132-dirty"} 1
# HELP nto_degraded_info Indicates whether the Node Tuning Operator is degraded.
# TYPE nto_degraded_info gauge
nto_degraded_info 0
# HELP nto_pod_labels_used_info Is the Pod label functionality turned on (1) or off (0)?
# TYPE nto_pod_labels_used_info gauge
nto_pod_labels_used_info 0
# HELP nto_profile_calculated_total The number of times a Tuned profile was calculated for a given node.
# TYPE nto_profile_calculated_total counter
nto_profile_calculated_total{node="ip-10-0-54-226.ap-northeast-1.compute.internal",profile="openshift-control-plane"} 94
nto_profile_calculated_total{node="ip-10-0-8-25.ap-northeast-1.compute.internal",profile="openshift-control-plane"} 91
nto_profile_calculated_total{node="ip-10-0-89-80.ap-northeast-1.compute.internal",profile="openshift-control-plane"} 88

relates to

OCPBUGS-63175 Non-functional HostedCluster node-tuning-operator ServiceMonitor should be removed

Closed

links to

openshift/cluster-node-tuning-operator#1438: OCPBUGS-55399:: Fix metrics for HyperShift

openshift/hypershift#7355: OCPBUGS-55399: Add NTO Service and ServiceMonitor

Assignee:: Jiri Mencak

Reporter:: Yiyong He

QA Contact:: Wen Wang

Doc Contact:: Andrew Taylor

Need Info From:: Wen Wang

Votes:: 0 Vote for this issue

Watchers:: 14 Start watching this issue

Created:: 2025/04/26 7:23 AM

Updated:: 2026/02/17 3:38 PM

Resolved:: 2026/02/17 3:38 PM

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates