-
Bug
-
Resolution: Done-Errata
-
Critical
-
4.13, 4.12, 4.11, 4.14, 4.15
-
Critical
-
No
-
3
-
OTA 242, OTA 243, OTA 244
-
3
-
Rejected
-
False
-
-
-
Bug Fix
-
Proposed
This is a clone of issue OCPBUGS-19465 / OCPBUGS-18386. The following is the description of the original issue:
—
How reproducible:
Always
Steps to Reproduce:
1. the Kubernetes API introduces a new Pod Template parameter (`ephemeral`)
2. this parameter is not in the allowed list of the default SCC
3. customer is not allowed to edit the default SCCs nor we have a mechanism in place to update the built in SCCs AFAIK
4. users of existing clusters cannot use the new parameter without creating manual SCCs and assigning this SCC to service accounts themselves which looks clunky. This is documented in https://access.redhat.com/articles/6967808
Actual results:
Users of existing clusters cannot use ephemeral volumes after an upgrade
Expected results:
Users of existing clusters can use ephemeral volumes after an upgrade
Solution for OCP 4.13
The root issue of the bug is that CVO was (incorrectly) not reconciling SCC resources to the form specified by a manifest from release image. This has two consequences:
1. Changes in SCC resources mandated by new OCP versions were not applied to the cluster during upgrade (this is the culprit of the bug as it was reported)
2. User modifications of SCC resources were not reverted, allowing users to operate workloads on the clusters depending on the modified SCCs
The CVO was fixed in development and in 4.14 to reconcile the SCC resources as intended, but because of the point (2) above, we do not want to deliver the same strict fix to 4.13 and earlier, to avoid breaking existing workloads depending on user-modified SCCs. We decided to ship the smallest possible fix for the bug as reported: only reconcile Volumes field to the {}union{} of items mandated by the manifest and existing state of the cluster. Do not reconcile any other fields (we haven't shipped updates to these fields' values in all SCCs in 4.12 and 4.13, so it should be safe to do so).
To ensure safe update to 4.14 where CVO starts reconciling everything, the 4.13 backport make CVO detect any user modifications to SCC resources, and if any modification is detected, it flags Upgradeable=False to prevent clusters upgrading to 4.14 until the modifications are removed (which can include modifying the workloads so that they are able to execute with unmodified system SCCs). The Upgradeable=False message will link to KB article that will describe the resolution https://access.redhat.com/solutions/7033949.
Testing notes
These are the main items I recommend to test:
- Modify a SCC which is not annotated with release.openshift.io/create-only: "true", such as restricted-v2
- Modify any field other than Volumes. Both before and after the fix, the SCC should not be reconciled back to the form in manifest. After the fix, the Upgradeable=False guard should show up, mentioning the modified SCC's name
- Remove the allowPrivilegeEscalation field. Both before and after the fix, the field will be set to true immediately (=defaulting), no matter what is in the manifest. After the fix, if this field has a false value in the manifest, the Upgradeable=False guard should show up, mentioning the modified SCC's name
- Add an item to the Volumes field. Both before and after the fix, the SCC should not be reconciled back to the form in manifest (the added item should stay there). After the fix, the Upgradeable=False guard should show up, mentioning the modified SCC's name
- Delete an item from the Volumes field. Before the fix, the SCC is not reconciled back to the form in the manifest. After the fix, the SCC is reconciled back to the form in manifest. The Upgradeable=False guard does not show up.
- Add an item to the Volumes field and delete another one. Before the fix, the SCC is not reconciled at all. After the fix, the SCC should have the added item, but also the previously deleted item too. The Upgradeable=False guard should show up.
- Modify two such SCCs in a way that should flag Upgradeable=False. Make sure the message mentions both afterwards.
- While the Upgradeable=False guard is up...
- Delete the modified SCC. Before the fix, the SCC is recreated to the form in the manifest. After the fix, the SCC is recreated the same way and the Upgreadeable=False disappears.
- Manually edit the modified SCC and restore it to the form in the manifes. The Upgradeable=False disappears.
- While two SCCs are modified and mentioned in the Upgradeable=False message, delete one of them. After the fix, it gets recreated without the modification, the Upgradeable=False stays up but only mentions one SCC (the one that was not deleted & recreated), not two
- Force an upgrade to any recent 4.14 through with --force. The upgrade should proceed and finish. After the upgrade, there's no Upgradeable=False guard and the modified SCC is reconciled to the form in the used 4.14 payload image (modification is removed)
- Modify a SCC which does have the release.openshift.io/create-only: "true" annotation, such as restricted. The behavior for these should be identical before and after the fix.
- No matter what modifications you do, CVO never modifies the SCCs back to their manifest forms
- If you delete the allowPrivilegeEscalation field, it should get defaulted to true.
- If you delete such SCC entirely, it gets recreated to the manifest form
- No matter what you do with such SCC, it never causes Upgradeable=False to appear
- blocks
-
OCPBUGS-22198 Cluster Version Operator does not correctly reconcile SCC resources
- Closed
- clones
-
OCPBUGS-19465 Cluster Version Operator does not correctly reconcile SCC resources
- Closed
- is blocked by
-
OCPBUGS-19465 Cluster Version Operator does not correctly reconcile SCC resources
- Closed
- is cloned by
-
OCPBUGS-22198 Cluster Version Operator does not correctly reconcile SCC resources
- Closed
- links to
-
RHBA-2023:6130 OpenShift Container Platform 4.13.z bug fix update