Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-42098

HostedClusterConfigOperator used wrong certificate for Kube certificate authority

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Done-Errata
    • Icon: Normal Normal
    • None
    • 4.16
    • HyperShift
    • Important
    • No
    • False
    • Hide

      None

      Show
      None
    • Hide
      * Previously, during root certification rotation, the `metrics-server` pod in the data plane failed to start correctly. This happened because of a certificate issue. With this release, the `hostedClusterConfigOperator` resource sends the correct certificate to the data plane so that the `metrics-server` pod starts as expected. (link:https://issues.redhat.com/browse/OCPBUGS-42098[*OCPBUGS-42098*])
      Show
      * Previously, during root certification rotation, the `metrics-server` pod in the data plane failed to start correctly. This happened because of a certificate issue. With this release, the `hostedClusterConfigOperator` resource sends the correct certificate to the data plane so that the `metrics-server` pod starts as expected. (link: https://issues.redhat.com/browse/OCPBUGS-42098 [* OCPBUGS-42098 *])
    • Bug Fix
    • Done

      This is a clone of issue OCPBUGS-41328. The following is the description of the original issue:

      Description of problem:

          Rotating the root certificates (root CA) requires multiple certificates during the rotation process to prevent downtime as the server and client certificates are updated in the control and data planes. Currently, the HostedClusterConfigOperator uses the cluster-signer-ca from the control plane to create a kublet-serving-ca on the data plane. The cluster-signer-ca contains only a single certificate that is used for signing certificates for the kube-controller-manager. 
      
      During a rotation, the kublet-serving-ca will be updated with the new CA which triggers the metrics-server pod to restart and use the new CA. This will lead to an error in the metrics-server where it cannot scrape metrics as the kublet has yet to pickup the new certificate.
      
      E0808 16:57:09.829746       1 scraper.go:149] "Failed to scrape node" err="Get \"https://10.240.0.29:10250/metrics/resource\": tls: failed to verify certificate: x509: certificate signed by unknown authority" node="pres-cqogb7a10b7up68kvlvg-rkcpsms0805-default-00000130"
      
      rkc@rmac ~> kubectl get pods -n openshift-monitoring
      NAME                                                     READY   STATUS    RESTARTS   AGE
      metrics-server-594cd99645-g8bj7                          0/1     Running   0          2d20h
      metrics-server-594cd99645-jmjhj                          1/1     Running   0          46h 
      
      The HostedClusterConfigOperator should likely be using the KubeletClientCABundle from the control plane for the kublet-serving-ca in the data plane. This CA bundle will contain both the new and old CA such that all data plane components can remain up during the rotation process.

      Version-Release number of selected component (if applicable):

          

      How reproducible:

          

      Steps to Reproduce:

          1.
          2.
          3.
          

      Actual results:

          

      Expected results:

          

      Additional info:

          

            rcradick Ryan Cradick
            openshift-crt-jira-prow OpenShift Prow Bot
            Jie Zhao Jie Zhao
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated:
              Resolved: