Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-45683

cluster-monitoring-operator not creating the telemeter-client deployment

XMLWordPrintable

    • Important
    • None
    • False
    • Hide

      None

      Show
      None

      Description of problem:

      While auditing ROSA clusters that were not reporting metrics to our telemetry server, we discovered that a 4.14 cluster did not have the telemeter-client deployed at all. Deleting the existing cluster-monitoring-operator pod did not cause the replacement to create the deployment.

      CMO's logs did indicate it ran some logic related to the telemeter-client:

      $ oc logs cluster-monitoring-operator-6658cf5b88-tjrxb -n openshift-monitoring | grep -i tel
      I1204 20:11:57.405533       1 operator.go:593] Triggering an update due to ConfigMap or Secret: openshift-monitoring/telemeter-trusted-ca-bundle
      I1204 20:11:57.486751       1 base_controller.go:67] Waiting for caches to sync for OpenShiftMonitoringTelemeterClientCertRequester
      I1204 20:11:57.486801       1 base_controller.go:73] Caches are synced for OpenShiftMonitoringTelemeterClientCertRequester
      I1204 20:11:57.486813       1 base_controller.go:110] Starting #1 worker of OpenShiftMonitoringTelemeterClientCertRequester controller ...
      I1204 20:12:00.773638       1 tasks.go:69] running task 11 of 15: Updating Telemeter client
      I1204 20:12:00.858478       1 tasks.go:75] ran task 11 of 15: Updating Telemeter client
      

      However, watching for deployments in the openshift-monitoring namespace with oc get deployment -n openshift-monitoring -w while restarting the operator did not result in the deployment ever being listed. This indicates that the deployment is never being created in the first place, rather than some third-party component deleting it immediately after creation.

      The cluster-monitoring-config configmap has the following values set for the telemeter-client:

      ...
      telemeterClient:
        nodeSelector:
          node-role.kubernetes.io/infra: ''
        tolerations:
        - effect: NoSchedule
          key: node-role.kubernetes.io/infra
          operator: Exists
        telemeterServerURL: https://infogw.api.openshift.com
      ...
      

      Version-Release number of selected component (if applicable):

      4.14.35
          

      How reproducible:

      Unknown - other 4.14 ROSA clusters have been observed running the telemeter-client deployment
          

      Steps to Reproduce:

          1.
          2.
          3.
          

      Actual results:

      CMO never creates the telemeter-client deployment
          

      Expected results:

      CMO creates the telemeter-client
          

      Additional info:

      Cluster history indicates that the pull-secret may have been modified on November 15th. This was initially ruled out as the cause, since my expectation would be that CMO would still create the deployment object, and the resulting pod(s) would enter into an ImagePullBackoff state if the pull-secret was malformed
          

              spasquie@redhat.com Simon Pasquier
              tnierman.openshift Trevor Nierman
              Junqi Zhao Junqi Zhao
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

                Created:
                Updated:
                Resolved: