Uploaded image for project: 'OCP Technical Release Team'
  1. OCP Technical Release Team
  2. TRT-534

Update service monitor to use client certificates for multiple repositories.

XMLWordPrintable

    • Icon: Story Story
    • Resolution: Unresolved
    • Icon: Normal Normal
    • None
    • None
    • None
    • False
    • None
    • False

      While investigating TRT-413, we discovered that many service monitors are configured to use bearer token authentication. Per this document https://github.com/deads2k/openshift-enhancements/blob/master/enhancements/monitoring/client-cert-scraping.md, we should try to use client certification authentication for metrics scraping. This is to make sure metrics collection still works even apiserver is not available. 

       

      Currently, the following repos have been identified to be fixed:

       

      ServiceMonitor Name Namespace PRs
      cloud-credential-operator openshift-cloud-credential-operator https://github.com/openshift/cloud-credential-operator/pull/483
      csi-driver-controller-monitor openshift-cluster-csi-drivers https://github.com/openshift/vmware-vsphere-csi-driver-operator/pull/103
        openshift-cluster-csi-drivers https://github.com/openshift/csi-driver-manila-operator/pull/153
        openshift-cluster-csi-drivers https://github.com/openshift/csi-driver-shared-resource-operator/pull/54
        openshift-cluster-csi-drivers https://github.com/openshift/gcp-filestore-csi-driver-operator/pull/6
        openshift-cluster-csi-drivers https://github.com/openshift/ovirt-csi-driver-operator/pull/102
           
      cluster-machine-approver openshift-cluster-machine-approver https://github.com/openshift/cluster-machine-approver/pull/169
      node-tuning-operator openshift-cluster-node-tuning-operator https://github.com/openshift/cluster-node-tuning-operator/pull/427
      cluster-samples-operator openshift-cluster-samples-operator https://github.com/openshift/cluster-samples-operator/pull/464
      cluster-storage-operator openshift-cluster-storage-operator https://github.com/openshift/cluster-storage-operator/pull/306
      cluster-version-operator openshift-cluster-version https://github.com/openshift/cluster-version-operator/pull/816
      config-operator openshift-config-operator https://github.com/openshift/cluster-config-operator/pull/259
      console openshift-console https://github.com/openshift/console-operator/pull/668
      console-operator openshift-console-operator https://github.com/openshift/console-operator/pull/668
      dns-default openshift-dns Didn't find the source
      dns-operator openshift-dns-operator https://github.com/openshift/cluster-dns-operator/pull/334
      image-registry openshift-image-registry https://github.com/openshift/cluster-image-registry-operator/pull/796
      image-registry-operator openshift-image-registry https://github.com/openshift/cluster-image-registry-operator/pull/796
      router-default openshift-ingress Didn't find the source
      ingress-operator openshift-ingress-operator https://github.com/openshift/cluster-ingress-operator/pull/816
      kube-scheduler openshift-kube-scheduler https://github.com/openshift/cluster-kube-scheduler-operator/pull/434
      cluster-autoscaler-operator openshift-machine-api https://github.com/openshift/cluster-autoscaler-operator/pull/249
      machine-api-controllers openshift-machine-api https://github.com/openshift/machine-api-operator/pull/1054
      machine-api-operator openshift-machine-api https://github.com/openshift/machine-api-operator/pull/1054
      machine-config-controller openshift-machine-config-operator https://github.com/openshift/machine-config-operator/pull/3277
      machine-config-daemon openshift-machine-config-operator https://github.com/openshift/machine-config-operator/pull/3277
      marketplace-operator openshift-marketplace https://github.com/operator-framework/operator-marketplace/pull/482
      cluster-monitoring-operator openshift-monitoring https://github.com/openshift/cluster-monitoring-operator/pull/1738
      openshift-state-metrics openshift-monitoring https://github.com/openshift/cluster-monitoring-operator/pull/1738
      prometheus-adapter openshift-monitoring https://github.com/openshift/cluster-monitoring-operator/pull/1738
      monitor-multus-admission-controller openshift-multus https://github.com/openshift/cluster-network-operator/pull/1522
      monitor-network openshift-multus https://github.com/openshift/cluster-network-operator/pull/1522
      network-operator openshift-network-operator https://github.com/openshift/cluster-network-operator/pull/1522
      catalog-operator openshift-operator-lifecycle-manager https://github.com/openshift/operator-framework-olm/pull/350
      olm-operator openshift-operator-lifecycle-manager https://github.com/openshift/operator-framework-olm/pull/350
      monitor-ovn-master-metrics openshift-ovn-kubernetes https://github.com/openshift/cluster-network-operator/pull/1522
      monitor-ovn-node openshift-ovn-kubernetes https://github.com/openshift/cluster-network-operator/pull/1522
      monitor-sdn openshift-sdn https://github.com/openshift/cluster-network-operator/pull/1522
      monitor-sdn-controller openshift-sdn https://github.com/openshift/cluster-network-operator/pull/1522

       

      Additionally, it is discovered that kube-rabc-proxy is not coded properly to automatically update client ca certificate. That issue is addressed with https://issues.redhat.com/browse/TRT-464. Until the fix lands to openshift, some of the above changes (repositories that uses kube-rbac-proxy) will not be effective. 

       

      For the repositories that are not using kube-rbac-proxy (e.g. storage operator), the above change can be merged and verified. 

       

      How to verify

      1. Make sure the corresponding ServiceMonitor object contains certFile and keyFile. 
      2. Make sure ServiceMonitor does NOT have bearerTokenFile configured. 
      3. With ServiceMonitor configuration verified above, check prometheus to make sure service for the corresponding namespace still works. A simple "up{namespace='')" check should be good enough.  

       

       

       

              kenzhang@redhat.com Ken Zhang
              rhn-engineering-dgoodwin Devan Goodwin
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

                Created:
                Updated: