Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-776

The lacking securityContext.seccompProfile.type of OLM's packageserver deployment is blocking OCP upgrade to 4.12

    XMLWordPrintable

Details

    • Bug
    • Resolution: Duplicate
    • Major
    • None
    • 4.12, 4.11.z
    • OLM
    • Important
    • [OLM-224] FBC/PSA - Pikachu
    • 1
    • False
    • Hide

      None

      Show
      None

    Description

      Description of problem: OCPBUGS-575 uncovered an issue where the cluster-version operator was failing to reconcile seccompProfile for some OLM deployments which were declared as CVO manifests.  Having manually recovered from that issue by deleteing the three manifests:

      $ oc --as system:admin -n openshift-operator-lifecycle-manager delete deployments catalog-operator olm-operator package-server-manager
      

      The cluster is now stuck a bit later on:

      $ oc --as system:admin adm upgrade
      info: An upgrade is in progress. Unable to apply 4.12.0-ec.2: the cluster operator operator-lifecycle-manager-packageserver is not available
      $ oc --as system:admin get -o json clusteroperator operator-lifecycle-manager-packageserver | jq -r '.status.conditions[] | .lastTransitionTime + " " + .type + "=" + .status + " " + .reason + ": " + .message'
      2020-05-21T19:36:51Z Degraded=False : 
      2022-08-31T16:53:50Z Available=False ClusterServiceVersionNotSucceeded: ClusterServiceVersion openshift-operator-lifecycle-manager/packageserver observed in phase Failed with reason: InstallCheckFailed, message: install failed: deployment packageserver not ready before timeout: deployment "packageserver" exceeded its progress deadline
      2022-08-31T17:14:24Z Progressing=False : Failed to deploy 0.17.0
      2020-05-21T19:36:51Z Upgradeable=True : Safe to upgrade
      $ oc --as system:admin -n openshift-operator-lifecycle-manager get -o json deployment packageserver | jq -r '.status.conditions[] | .lastTransitionTime + " " + .type + "=" + .status + " " + .reason + ": " + .message'
      2022-08-31T17:04:24Z Available=False MinimumReplicasUnavailable: Deployment does not have minimum availability.
      2022-08-31T17:04:24Z ReplicaFailure=True FailedCreate: pods "packageserver-647975568d-q46k2" is forbidden: violates PodSecurity "restricted:v1.24": allowPrivilegeEscalation != false (container "packageserver" must set securityContext.allowPrivilegeEscalation=false), unrestricted capabilities (container "packageserver" must set securityContext.capabilities.drop=["ALL"]), runAsNonRoot != true (pod or container "packageserver" must set securityContext.runAsNonRoot=true), seccompProfile (pod or container "packageserver" must set securityContext.seccompProfile.type to "RuntimeDefault" or "Localhost")
      2022-08-31T17:14:24Z Progressing=False ProgressDeadlineExceeded: ReplicaSet "packageserver-647975568d" has timed out progressing.
      

      The Deployment is indeed missing a pod securityContext declaration:

      $ oc --as system:admin -n openshift-operator-lifecycle-manager get -o json deployment packageserver | jq -r '.spec.template.spec | keys[]'
      affinity
      containers
      dnsPolicy
      nodeSelector
      priorityClassName
      restartPolicy
      schedulerName
      securityContext
      serviceAccount
      serviceAccountName
      terminationGracePeriodSeconds
      tolerations
      volumes
      

      The Deployment helpfully points at its controlling resource:

      $ oc --as system:admin -n openshift-operator-lifecycle-manager get -o json deployment packageserver | jq .metadata.ownerReferences
      [
        {
          "apiVersion": "operators.coreos.com/v1alpha1",
          "blockOwnerDeletion": false,
          "controller": false,
          "kind": "ClusterServiceVersion",
          "name": "packageserver",
          "uid": "dcf114e1-9f65-408d-8bc3-2b1638c9f151"
        }
      ]
      

      And the CSV is also missing a pod securityContext declaration:

      $ oc --as system:admin -n openshift-operator-lifecycle-manager get -o json clusterserviceversion packageserver | jq -r '.spec.install.spec.deployments[].spec.template.spec | keys[]'
      affinity
      containers
      nodeSelector
      priorityClassName
      serviceAccountName
      tolerations
      volumes
      

      The CSV sadly does not declare ownerReferences or managedFields:

      $ oc --as system:admin -n openshift-operator-lifecycle-manager get --show-managed-fields -o json clusterserviceversion packageserver | jq -r '.metadata | keys[]'
      annotations
      creationTimestamp
      generation
      labels
      name
      namespace
      resourceVersion
      uid
      

      But it seems to be based on the manifest which grew a securityContext entry here. Still unclear to me is why the 4.12.0-ec.2 OLM operator is failing to update the in-cluster CSV to include that property, although the reconciliation logic appears to be here, and I don't see anything obviously amiss.

      Attachments

        Issue Links

          Activity

            People

              anik120 Anik Bhattacharjee
              rhn-support-jiazha Jian Zhang
              Jian Zhang Jian Zhang
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: