-
Bug
-
Resolution: Not a Bug
-
Undefined
-
None
-
4.14.z
-
Important
-
None
-
False
-
Description of problem:
While auditing ROSA clusters that were not reporting metrics to our telemetry server, we discovered that a 4.14 cluster did not have the telemeter-client deployed at all. Deleting the existing cluster-monitoring-operator pod did not cause the replacement to create the deployment.
CMO's logs did indicate it ran some logic related to the telemeter-client:
$ oc logs cluster-monitoring-operator-6658cf5b88-tjrxb -n openshift-monitoring | grep -i tel I1204 20:11:57.405533 1 operator.go:593] Triggering an update due to ConfigMap or Secret: openshift-monitoring/telemeter-trusted-ca-bundle I1204 20:11:57.486751 1 base_controller.go:67] Waiting for caches to sync for OpenShiftMonitoringTelemeterClientCertRequester I1204 20:11:57.486801 1 base_controller.go:73] Caches are synced for OpenShiftMonitoringTelemeterClientCertRequester I1204 20:11:57.486813 1 base_controller.go:110] Starting #1 worker of OpenShiftMonitoringTelemeterClientCertRequester controller ... I1204 20:12:00.773638 1 tasks.go:69] running task 11 of 15: Updating Telemeter client I1204 20:12:00.858478 1 tasks.go:75] ran task 11 of 15: Updating Telemeter client
However, watching for deployments in the openshift-monitoring namespace with oc get deployment -n openshift-monitoring -w while restarting the operator did not result in the deployment ever being listed. This indicates that the deployment is never being created in the first place, rather than some third-party component deleting it immediately after creation.
The cluster-monitoring-config configmap has the following values set for the telemeter-client:
... telemeterClient: nodeSelector: node-role.kubernetes.io/infra: '' tolerations: - effect: NoSchedule key: node-role.kubernetes.io/infra operator: Exists telemeterServerURL: https://infogw.api.openshift.com ...
Version-Release number of selected component (if applicable):
4.14.35
How reproducible:
Unknown - other 4.14 ROSA clusters have been observed running the telemeter-client deployment
Steps to Reproduce:
1. 2. 3.
Actual results:
CMO never creates the telemeter-client deployment
Expected results:
CMO creates the telemeter-client
Additional info:
Cluster history indicates that the pull-secret may have been modified on November 15th. This was initially ruled out as the cause, since my expectation would be that CMO would still create the deployment object, and the resulting pod(s) would enter into an ImagePullBackoff state if the pull-secret was malformed