-
Feature Request
-
Resolution: Unresolved
-
Normal
-
None
-
openshift-4.15, openshift-4.16, openshift-4.17
-
Improvement
-
False
-
None
-
False
-
Not Selected
-
-
-
1. Proposed title of this feature request
Add a warning in the `machine-config` CO and avoid pushing wrong changes to the nodes for any invalid `image.config/cluster` object.
2. What is the nature and description of the request?
Make product more robust and self-aware about bad changes.
3. Why does the customer need this? (List the business requirements here)
The image.config/cluster object if updated badly/incorrectly like below can cause the nodes in NotReady state when the nodes recover from unexpected shutdown. No new pod can be deployed in any of the OCP node due to images are not getting pulled.
~~~
$ oc get image.config/cluster -o yaml
....
....
spec:
registrySources:
allowedRegistries:
- quay.io
- registry.redhat.io
- docker.io
- registry.connect.redhat.com
- registry.access.redhat.com
- registry.redhat.io/httpd/httpd:v1@shaid:dadasdsa
$ oc describe pod httpd-9d8c8d96d-jmnx9
Warning Failed 6s kubelet Failed to pull image "registry.redhat.io/ubi8/httpd-24@sha256:2c1e57b646a5190cde0b2b638c7d1b473d4fba090e6f0e7ff44e9fe6a7341b8c": invalid policy in "/etc/containers/policy.json": Unknown key "registry.redhat.io/httpd/httpd:v1@shaid:dadasdsa"
~~~
The nodeip-configuration.service will fail and prevent the node to come up in Ready state:
~~~
- systemctl status nodeip-configuration.service -l --no-pager
Aug 29 12:08:35 sno-node.dpateriy-sno.example.com bash[3648179]: Error: invalid policy in "/etc/containers/policy.json": Unknown key "registry.redhat.io/httpd/httpd:v1@shaid:dadasdsa"
~~~
These type of changes if not done correctl can cause application as well as cluster outage to the customer if not handled correctly. Idea here is to avoid pushing bad changes and add warning in the machine-config CO saying that, the changes done to `image.config/cluster` object are bad and needs to be reviewd.
This enhancement can help make OCP more robust and reduce case count as customers will get aware of the bad changes.
4. List any affected packages or components.
MachineConfigOperator (MCO)