This is a clone of issue OCPBUGS-41328. The following is the description of the original issue:
—
Description of problem:
Rotating the root certificates (root CA) requires multiple certificates during the rotation process to prevent downtime as the server and client certificates are updated in the control and data planes. Currently, the HostedClusterConfigOperator uses the cluster-signer-ca from the control plane to create a kublet-serving-ca on the data plane. The cluster-signer-ca contains only a single certificate that is used for signing certificates for the kube-controller-manager. During a rotation, the kublet-serving-ca will be updated with the new CA which triggers the metrics-server pod to restart and use the new CA. This will lead to an error in the metrics-server where it cannot scrape metrics as the kublet has yet to pickup the new certificate. E0808 16:57:09.829746 1 scraper.go:149] "Failed to scrape node" err="Get \"https://10.240.0.29:10250/metrics/resource\": tls: failed to verify certificate: x509: certificate signed by unknown authority" node="pres-cqogb7a10b7up68kvlvg-rkcpsms0805-default-00000130" rkc@rmac ~> kubectl get pods -n openshift-monitoring NAME READY STATUS RESTARTS AGE metrics-server-594cd99645-g8bj7 0/1 Running 0 2d20h metrics-server-594cd99645-jmjhj 1/1 Running 0 46h The HostedClusterConfigOperator should likely be using the KubeletClientCABundle from the control plane for the kublet-serving-ca in the data plane. This CA bundle will contain both the new and old CA such that all data plane components can remain up during the rotation process.
Version-Release number of selected component (if applicable):
How reproducible:
Steps to Reproduce:
1. 2. 3.
Actual results:
Expected results:
Additional info:
- clones
-
OCPBUGS-42098 HostedClusterConfigOperator used wrong certificate for Kube certificate authority
- Closed
- is blocked by
-
OCPBUGS-42098 HostedClusterConfigOperator used wrong certificate for Kube certificate authority
- Closed
- links to
-
RHBA-2024:8260 OpenShift Container Platform 4.16.z bug fix update