-
Epic
-
Resolution: Done
-
Major
-
None
-
MCD Certificate Improvements
-
False
-
None
-
False
-
Not Selected
-
To Do
-
TELCOSTRAT-87 - Single Core CPU CaaS Budget for DU Deployment w/ Single-Node OpenShift on Sapphire Rapids Platform
-
OCPSTRAT-832MCO should manage certificates via non-MC path
-
0% To Do, 0% In Progress, 100% Done
-
0
-
0
Latest status as of 4.14 freeze:
The MCD no longer uses MachineConfigs to update certs, but rather reads it off our internal resource "controllerconfig" directly. The MachineConfig path still exists but is a no-op (although the MCO still falsely claims an update is pending as a result). The MachineConfig removal work is ready, but waiting for windows-MCO to change their workflow so as to not break them.
--------------------------------
The logic for handling certificate rotation should live outside of the MachineConfig-files path as it stands today. This will allow certs to rotate live, through paused pools, without generating additional churn in rendered configs, and most, if not all, certificates do not require drains/reboots to the node.
Context
The MCO has, since the beginning of time, managed certificates. The general flow is a cluster configmap -> MCO -> controllerconfig -> MCC -> renderedconfig -> MCD -> laid down to disk as a regular file.
When we talk about certs, the MCD actually manages 4 (originally 5) certs: see https://docs.google.com/document/d/1ehdOYDY-SvUU9ffdIKlt7XaoMaZ0ioMNZMu31-Mo1l4/edit (this document is a bit outdated)
Of these, the only one we care about is "/etc/kubernetes/kubelet-ca.crt", which is a bundle of 5 (now 7) certs. This will be expanded on below.
Unlike regular files though, certificates rotate automatically at some set cadence. Prior to 4.7, this would cause the MCD to seemingly randomly start an update and reboot nodes, much to the annoyance of customers, so we made it disruptionless.
There was still one more problem, a lot of users pauses pools for additional safety (which is their way of saying we don't want you to disrupt our workloads), which still gated the certificate from actually rotating in when it updated. In 4.12 and previous versions, this means that at 80% of the 1 year mark, a new kube-apiserver-to-kubelet-signer cert would be generated. After ~12 hours, this would affect some operation (oc logs, etc.) since the old signer was no longer matching the apiserver's new cert. At the one year mark, this would proceed to break entirely the kubelet. To combat this, we added an alert MachineConfigControllerPausedPoolKubeletCA to warn the users about the effects and expiry, which was ok since this should only be an annual occurrence.
Updates for 4.13
In 4.13, we realized that the kubelet-ca cert was being read from a wrong location which updated the kube-apiserver-to-kubelet-signer I mentioned above, but not some other certs. This was not a problem since nobody was depending on them, but in 4.13, monitoring was updated to use the right certs which then subsequently caused reports of kubeletdown to fire, which then David Eads fixed via https://github.com/openshift/machine-config-operator/pull/3458
So now instead of expired certs we have correct certs, which is great, but now we realized that the cert rotation will happen much more frequently.
Previously on the system, we had:
admin-kubeconfig-signer, kubelet-signer, kube-apiserver-to-kubelet-signer, kube-control-plane-signer, kubelet-bootstrap-kubeconfig-signer
now with the correct certs, right after install we get: admin-kubeconfig-signer, kube-csr-signer_@1675718562, kubelet-signer, kube-apiserver-to-kubelet-signer, kube-control-plane-signer, kubelet-bootstrap-kubeconfig-signer, openshift-kube-apiserver-operator_node-system-admin-signer@1675718563
The most immediate issue was bootstrap drift, which John solved via https://github.com/openshift/machine-config-operator/pull/3513
But the issue here is now we are updating two certs:
- kube-csr-signer, rotated every month
- openshift-kube-controller-manager-operator_csr-signer-signer (called kubelet-signer until the first rotation), rotated every two months
Meaning that every month we would be generating at least 2 new machineconfigs (new one rotating in, old one rotating out) to manage this.
During install, due to how the certs are set up (bootstrap ones expire in 24h) this means you get 5 MCs within 24 hours: bootstrap bundle, incluster bundle, incluster bundle with 1 new, incluster bundle with 2 new, incluster bundle with 2 new 2 old removed
On top of this, previously the cluster chugged along with the expiry with only the warning, but now, when the old certs rotate and the pools paused, TargetDown and KubeletDown fires after a few hours, making it very bad from a user perspective.
Solutions
Solution1: don't do anything
Nothing should badly break, but the user will get critical alerts after ~1 month if they pause and upgrade to 4.13. Not a great UX
Solution2: revert the monitoring change or mask the alert
A bit late, but potentially doable? Masking the alert will likely mask real issues, though
Solution3: MVP MCD changes (Estimate: 1week)
The MCD update, MCD verification, MCD config drift monitor all ignore the kubelet-ca cert file. The MCD gets a new routine to update the file, reading from a configmap the MCC manages. The MCC still renders the cert but the cert will be updated even if the pool is paused
Solution4: MVP MCC changes (Estimate: a few days)
Have the controller splice in changes even when the pool is paused. John has a MVP here: https://github.com/openshift/machine-config-operator/compare/master...jkyros:machine-config-operator:mco-77-bypass-pause
This is a cleaner solution compared to 3, but will cause the pool to go into updating briefly. If there are other operations causing nodes to be cordoned, etc., we would have to consider overriding that.
Solution5: MCD cert management path (full, Estimate: 1 sprint)
The cert is removed from the rendered-config. The MCC will read it off the controllerconfig and render it into a custom configmap. The MCS will add this additional file when serving content, but it is not part of the rendered-MC otherwise. The MCD will have a new routine to manage the certs live directly.
The bootstrap MCS will also need to have a way to render it into the initial served configuration without it being part of the MachineConfigs (this is especially important for HyperShift). We will have to make sure the inplace updater doesn't break
We may also have to solve config drift problems from bootstrap to incluster, for self-driving and hypershift inplace
We also have to make sure the file isn't deleted upon an update to the new management, so the certs don't disappear for awhile, since the MCD would have seen the diff and deleted it
DOCS (WIP)
https://docs.google.com/document/d/1qXYV9Hj98QhJSKx_2IWBbU_bxu30YQtbR21mmOtBsIg/edit?usp=sharing
- blocks
-
IR-351 Make Image Registry Operator Optional
- Release Pending
- is depended on by
-
OCPSTRAT-1308 OpenShift Optional Capabilities (Phase 7)
- New
-
OCPSTRAT-36 OpenShift Optional Capabilities (Phase 4)
- Closed
-
OCPSTRAT-421 OpenShift Optional Capabilities (Phase 5)
- Closed
-
OCPSTRAT-731 OpenShift Optional Capabilities (Phase 6)
- Closed
- is related to
-
MCO-638 MCO behavior to consider during cluster hibernation
- Closed
-
IR-230 Move node-ca to another operator
- Closed
- links to