-
Feature Request
-
Resolution: Unresolved
-
Undefined
-
None
-
openshift-4.18
-
None
-
False
-
None
-
False
-
Not Selected
-
-
1. Proposed title of this feature request
Boot image skew limits
2. What is the nature and description of the request?
Provide a way to warn customers (and potentially block 4.(y+1) updates) when excessively old boot images are in use. Not all customers actively provision new machines, so while having a way to proactively warn customers who have configured MachineSets and such with outdated boot images would be nice, rejecting new machines at initial-Ignition-request-time may be sufficient. Error messages can discuss managedBootImages and link to the KCS about manually updating boot images to help impacted customers unstick themselves.
3. Why does the customer need this? (List the business requirements here)
There are occasional issues when new clusters attempt to use old boot images (MCO-540, MCO-519, MCO-1212, COS-1942). New features like ClusterImagePolicy also lead to machine-config server Ignition content that needs to be compatible with the boot image that's being asked to pivot to the new OS image. Currently the machine-config server is making compatibility calls based on the Ignition version in the request header. For example:
$ oc -n openshift-machine-config-operator logs -l k8s-app=machine-config-server --tail 1 I0828 22:22:41.449488 1 api.go:116] Pool worker requested by address:"10.0.146.248:57159" User-Agent:"Ignition/2.15.0" Accept-Header: "application/vnd.coreos.ignition+json;version=3.4.0, */*;q=0.1" I0828 21:18:39.328816 1 api.go:116] Pool worker requested by address:"10.0.183.35:62744" User-Agent:"Ignition/2.15.0" Accept-Header: "application/vnd.coreos.ignition+json;version=3.4.0, */*;q=0.1" I0828 22:28:43.677961 1 api.go:116] Pool worker requested by address:"10.0.183.35:28692" User-Agent:"Ignition/2.15.0" Accept-Header: "application/vnd.coreos.ignition+json;version=3.4.0, */*;q=0.1"
So we know to serve that node 3.4.0-compatible Ignition. But "which Ignition version?" is only part of the compatibility exposure, it doesn't cover things like "will Podman understand the policy.json config knobs I'm setting?". And it doesn't cover things like "RHCOS 410.8.20190520.0? Nobody is shipping security patches for 4.1 RHCOS anymore".
Having a more robust check in the machine-config server would make for more accessible messaging, because alerting like "you have a recent Ignition request with an incompatibly old boot image, please see..." is more actionable than the current "hey, some of these Machines are failing to join the cluster, good luck rooting around in their serial console output" that we'd generate today when we serve an old boot image some new Ignition it can't handle.
And besides being more accessible to cluster admins, having documented skew guards here would allow component teams to understand when they could reliably use new features that older RHCOS might not be familiar with (OCPBUGS-38809).
Work like the new, tech-preview in 4.16 managedBootImages can help reduce skew, but only in clusters where it is enabled. And in some disconnected/restricted-network or bare-metal-y situations, enabling new boot images requires mirroring and admin work that the cluster is unlikely to be able to automate. So this kind of skew guard would be useful, even in a world where managedBootImages was GA for more cloud providers.
4. List any affected packages or components.
RHCOS/Ignition/MCO. Maybe HyperShift, which also handles new-Machine Ignition?
- blocks
-
OCPNODE-2619 Move ClusterImagePolicy to v1
- New
-
OCPNODE-2690 Move ImagePolicy to v1
- New
- links to