Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-22198

Cluster Version Operator does not correctly reconcile SCC resources

    XMLWordPrintable

Details

    • Critical
    • No
    • 2
    • OTA 244
    • 1
    • Rejected
    • False
    • Hide

      None

      Show
      None
    • Hide
      * Previously, the Cluster Version Operator (CVO) did not reconcile `SecurityContextConstraints` (SCC) resources as expected. The CVO now properly reconciles the `volumes` field in the `SecurityContextConstraints` resources towards the state defined in the release image. User modifications to system SCC resources are tolerated. For more information on how SCC resources can impact updating, see link:https://access.redhat.com/solutions/7033949[Resolving Detected modified SecurityContextConstraints update gate before upgrading to 4.14
      ]. (link:https://issues.redhat.com/browse/OCPBUGS-22198[*OCPBUGS-22198*])

      Previously, the Cluster Version Operator did not reconcile SecurityContextConstraints (SCC) resources as expected. Cluster Version Operator (CVO) now properly reconciles the Volumes field SecurityContextConstraints resources towards the state defined in the release image. User modifications to system SCC resources are tolerated.

      Future OCP versions will stop tolerating user modifications of system SCC resources. See https://access.redhat.com/solutions/7033949 for more information.
      Show
      * Previously, the Cluster Version Operator (CVO) did not reconcile `SecurityContextConstraints` (SCC) resources as expected. The CVO now properly reconciles the `volumes` field in the `SecurityContextConstraints` resources towards the state defined in the release image. User modifications to system SCC resources are tolerated. For more information on how SCC resources can impact updating, see link: https://access.redhat.com/solutions/7033949 [Resolving Detected modified SecurityContextConstraints update gate before upgrading to 4.14 ]. (link: https://issues.redhat.com/browse/OCPBUGS-22198 [* OCPBUGS-22198 *]) Previously, the Cluster Version Operator did not reconcile SecurityContextConstraints (SCC) resources as expected. Cluster Version Operator (CVO) now properly reconciles the Volumes field SecurityContextConstraints resources towards the state defined in the release image. User modifications to system SCC resources are tolerated. Future OCP versions will stop tolerating user modifications of system SCC resources. See https://access.redhat.com/solutions/7033949 for more information.
    • Bug Fix
    • Proposed

    Description

      This is a clone of issue OCPBUGS-19472. The following is the description of the original issue:

      This is a clone of issue OCPBUGS-19465 / OCPBUGS-18386. The following is the description of the original issue:

      How reproducible:
      Always
      Steps to Reproduce:
      1. the Kubernetes API introduces a new Pod Template parameter (`ephemeral`)
      2. this parameter is not in the allowed list of the default SCC
      3. customer is not allowed to edit the default SCCs nor we have a  mechanism in  place to update the built in SCCs AFAIK
      4. users of existing clusters cannot use the new parameter without creating manual SCCs and assigning this SCC to service accounts themselves which looks clunky. This is documented in https://access.redhat.com/articles/6967808
      Actual results:
      Users of existing clusters cannot use ephemeral volumes after an upgrade
      Expected results:
      Users of existing clusters can use ephemeral volumes after an upgrade

      Solution for OCP 4.13 and 4.12

      The root issue of the bug is that CVO was (incorrectly) not reconciling SCC resources to the form specified by a manifest from release image. This has two consequences:

      1. Changes in SCC resources mandated by new OCP versions were not applied to the cluster during upgrade (this is the culprit of the bug as it was reported)
      2. User modifications of SCC resources were not reverted, allowing users to operate workloads on the clusters depending on the modified SCCs

      The CVO was fixed in development and in 4.14 to reconcile the SCC resources as intended, but because of the point (2) above, we do not want to deliver the same strict fix to 4.13 and earlier, to avoid breaking existing workloads depending on user-modified SCCs. We decided to ship the smallest possible fix for the bug as reported: only reconcile Volumes field to the {}union{} of items mandated by the manifest and existing state of the cluster. Do not reconcile any other fields (we haven't shipped updates to these fields' values in all SCCs in 4.12 and 4.13, so it should be safe to do so).

      To ensure safe update to 4.14 where CVO starts reconciling everything, the 4.13 backport make CVO detect any user modifications to SCC resources, and if any modification is detected, it flags Upgradeable=False to prevent clusters upgrading to 4.14 until the modifications are removed (which can include modifying the workloads so that they are able to execute with unmodified system SCCs). The Upgradeable=False message will link to KB article that will describe the resolution https://access.redhat.com/solutions/7033949.

      In 4.12, the reconciliation behavior will be the same as in 4.13, but the Upgradeable=False gate will not be present.

      Testing notes for 4.12

      These are the main items I recommend to test:

      1. Modify a SCC which is not annotated with release.openshift.io/create-only: "true", such as restricted-v2
        1. Modify any field other than Volumes. Both before and after the fix, the SCC should not be reconciled back to the form in manifest. Delete the modified SCC. Both before and after the fix, the SCC is recreated to the form in the manifest.
        2. Remove the allowPrivilegeEscalation field. Both before and after the fix, the field will be set to true immediately (=defaulting), no matter what is in the manifest.
        3. Add an item to the Volumes field. Both before and after the fix, the SCC should not be reconciled back to the form in manifest (the added item should stay there).
        4. Delete an item from the Volumes field. Before the fix, the SCC is not reconciled back to the form in the manifest. After the fix, the SCC is reconciled back to the form in manifest.
        5. Add an item to the Volumes field and delete another one. Before the fix, the SCC is not reconciled at all. After the fix, the SCC should have the added item, but also the previously deleted item too.
      2. Modify a SCC which does have the release.openshift.io/create-only: "true" annotation, such as restricted. The behavior for these should be identical before and after the fix.
        1. No matter what modifications you do, CVO never modifies the SCCs back to their manifest forms
        2. If you delete the allowPrivilegeEscalation field, it should get defaulted to true.
        3. If you delete such SCC entirely, it gets recreated to the manifest form

      Attachments

        Issue Links

          Activity

            People

              afri@afri.cz Petr Muller
              openshift-crt-jira-prow OpenShift Prow Bot
              Jian Li Jian Li
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: