Uploaded image for project: 'Red Hat OpenShift Data Science'
  1. Red Hat OpenShift Data Science
  2. RHODS-7978

redhat-ods-applications namespace does not get cluster-monitoring=true label

XMLWordPrintable

    • 2
    • False
    • None
    • False
    • Testable
    • No
    • No
    • No
    • Pending
    • None
    • ML Serving Sprint 1.27, ML Serving Sprint 1.28
    • Medium

      Description of problem:

      The redhat-ods-applications namespace does not get cluster-monitoring=true label. There is no option to enable cluster monitoring at operator install time.

      It looks like this is manually set on the operator namespace (causing another issue) here: https://github.com/red-hat-data-services/odh-deployer/blob/main/deploy.sh#L148

      But never set on the applications ns.

      With it missing on the redhat-ods-applications namespace we see alerts and logs like this for UWM:

      level=warn ts=2023-04-13T18:57:37.539617816Z caller=operator.go:2255 component=prometheusoperator msg="skipping servicemonitor" error="it accesses file system via bearer token file which Prometheus specification prohibits" servicemonitor=redhat-ods-applications/odh-model-controller-metrics-monitor namespace=openshift-user-workload-monitoring prometheus=user-workload
      
      AlertManager alert for PrometheusOperatorRejectedResources 

      Prerequisites (if any, like setup, operators/versions):

      N/A

      Steps to Reproduce

      1. Fresh install

      Actual results: Alerts and warning logs

      Expected results: No alerts or warn logs

      Reproducibility (Always/Intermittent/Only Once): Always

      Build Details: 1.24

      Workaround: Add the label manually: https://access.redhat.com/solutions/6706741

      Additional info:

            vmahabal@redhat.com Vedant Mahabaleshwarkar
            rhn-support-mrobson Matt Robson
            Jorge Garcia Oncins Jorge Garcia Oncins
            Votes:
            1 Vote for this issue
            Watchers:
            10 Start watching this issue

              Created:
              Updated:
              Resolved: