-
Bug
-
Resolution: Done-Errata
-
Critical
-
4.13, 4.12, 4.11, 4.14, 4.15
-
Critical
-
No
-
2
-
OTA 244
-
1
-
Rejected
-
False
-
-
-
Bug Fix
-
Proposed
This is a clone of issue OCPBUGS-19472. The following is the description of the original issue:
—
This is a clone of issue OCPBUGS-19465 / OCPBUGS-18386. The following is the description of the original issue:
—
How reproducible:
Always
Steps to Reproduce:
1. the Kubernetes API introduces a new Pod Template parameter (`ephemeral`)
2. this parameter is not in the allowed list of the default SCC
3. customer is not allowed to edit the default SCCs nor we have a mechanism in place to update the built in SCCs AFAIK
4. users of existing clusters cannot use the new parameter without creating manual SCCs and assigning this SCC to service accounts themselves which looks clunky. This is documented in https://access.redhat.com/articles/6967808
Actual results:
Users of existing clusters cannot use ephemeral volumes after an upgrade
Expected results:
Users of existing clusters can use ephemeral volumes after an upgrade
Solution for OCP 4.13 and 4.12
The root issue of the bug is that CVO was (incorrectly) not reconciling SCC resources to the form specified by a manifest from release image. This has two consequences:
1. Changes in SCC resources mandated by new OCP versions were not applied to the cluster during upgrade (this is the culprit of the bug as it was reported)
2. User modifications of SCC resources were not reverted, allowing users to operate workloads on the clusters depending on the modified SCCs
The CVO was fixed in development and in 4.14 to reconcile the SCC resources as intended, but because of the point (2) above, we do not want to deliver the same strict fix to 4.13 and earlier, to avoid breaking existing workloads depending on user-modified SCCs. We decided to ship the smallest possible fix for the bug as reported: only reconcile Volumes field to the {}union{} of items mandated by the manifest and existing state of the cluster. Do not reconcile any other fields (we haven't shipped updates to these fields' values in all SCCs in 4.12 and 4.13, so it should be safe to do so).
To ensure safe update to 4.14 where CVO starts reconciling everything, the 4.13 backport make CVO detect any user modifications to SCC resources, and if any modification is detected, it flags Upgradeable=False to prevent clusters upgrading to 4.14 until the modifications are removed (which can include modifying the workloads so that they are able to execute with unmodified system SCCs). The Upgradeable=False message will link to KB article that will describe the resolution https://access.redhat.com/solutions/7033949.
In 4.12, the reconciliation behavior will be the same as in 4.13, but the Upgradeable=False gate will not be present.
Testing notes for 4.12
These are the main items I recommend to test:
- Modify a SCC which is not annotated with release.openshift.io/create-only: "true", such as restricted-v2
- Modify any field other than Volumes. Both before and after the fix, the SCC should not be reconciled back to the form in manifest. Delete the modified SCC. Both before and after the fix, the SCC is recreated to the form in the manifest.
- Remove the allowPrivilegeEscalation field. Both before and after the fix, the field will be set to true immediately (=defaulting), no matter what is in the manifest.
- Add an item to the Volumes field. Both before and after the fix, the SCC should not be reconciled back to the form in manifest (the added item should stay there).
- Delete an item from the Volumes field. Before the fix, the SCC is not reconciled back to the form in the manifest. After the fix, the SCC is reconciled back to the form in manifest.
- Add an item to the Volumes field and delete another one. Before the fix, the SCC is not reconciled at all. After the fix, the SCC should have the added item, but also the previously deleted item too.
- Modify a SCC which does have the release.openshift.io/create-only: "true" annotation, such as restricted. The behavior for these should be identical before and after the fix.
- No matter what modifications you do, CVO never modifies the SCCs back to their manifest forms
- If you delete the allowPrivilegeEscalation field, it should get defaulted to true.
- If you delete such SCC entirely, it gets recreated to the manifest form
- clones
-
OCPBUGS-19472 Cluster Version Operator does not correctly reconcile SCC resources
- Closed
- is blocked by
-
OCPBUGS-19472 Cluster Version Operator does not correctly reconcile SCC resources
- Closed
- links to
-
RHBA-2023:6276 OpenShift Container Platform 4.12.z bug fix update