Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-62863

'hypershift dump' should collect ServiceMonitor resources

XMLWordPrintable

    • Quality / Stability / Reliability
    • False
    • Hide

      None

      Show
      None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      Description of problem

      hypershift dump collects many resources, but currently not servicemonitors.monitoring.coreos.com. The control-plane operator creates several ServiceMonitors, such as this one, and dump should learn how to collect them.

      Version-Release number of selected component

      Seen in 4.20-era CI (see Additional info below), and confirmed in modern HyperShift code (see the Description of problem above).

      How reproducible

      Every time.

      Steps to Reproduce

      1. Take a dump.
      2. Check the gathered groups in the hosted namespace.

      Actual results

      Lots of entries like apps, batch, and core.

      Expected results

      A monitoring.coreos.com directory with ServiceMonitor (and PodMonitor?) children.

      Additional info

      While investigating OCPBUGS-62851, I was looking at e2e-hypershift output in the regressing pull. And while gathered artifacts show the management cluster's Prometheus complaining about... something in the host cluster ServiceMonitors:

      $ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/test-platform-results/pr-logs/pull/openshift_cluster-version-operator/1215/pull-ci-openshift-cluster-version-operator-main-e2e-hypershift/1952739873462947840/artifacts/e2e-hypershift/dump-management-cluster/artifacts/artifacts.tar | tar -xOz logs/artifacts/output/hostedcluster-d44932313dd1be2d3560-mgmt/namespaces/openshift-monitoring/pods/prometheus-k8s-0/prometheus/prometheus/logs/current.log | grep cluster-version-operator
      2025-08-05T15:53:48.316469617Z time=2025-08-05T15:53:48.316Z level=ERROR source=manager.go:176 msg="error reloading target set" component="scrape manager" err="invalid config id:serviceMonitor/e2e-clusters-ghd95-node-pool-6dl4k/cluster-version-operator/0"
      2025-08-05T15:53:48.316543150Z time=2025-08-05T15:53:48.316Z level=ERROR source=manager.go:176 msg="error reloading target set" component="scrape manager" err="invalid config id:serviceMonitor/e2e-clusters-bmg8g-proxy-jplkn/cluster-version-operator/0"
      2025-08-05T15:53:48.316617911Z time=2025-08-05T15:53:48.316Z level=ERROR source=manager.go:176 msg="error reloading target set" component="scrape manager" err="invalid config id:serviceMonitor/e2e-clusters-qnv7p-create-cluster-sxsvl/cluster-version-operator/0"
      

      There was nothing about that ServiceMonitor, e.g. in the e2e-clusters-qnv7p-create-cluster-sxsvl dump.

              sjenning Seth Jennings
              trking W. Trevor King
              None
              None
              XiuJuan Wang XiuJuan Wang
              None
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

                Created:
                Updated: