Description of problem:
OCPBUGS-575 uncovered an issue where the cluster-version operator was failing to reconcile seccompProfile for some OLM deployments which were declared as CVO manifests. Having manually recovered from that issue by deleteing the three manifests:
$ oc --as system:admin -n openshift-operator-lifecycle-manager delete deployments catalog-operator olm-operator package-server-manager
The cluster is now stuck a bit later on:
$ oc --as system:admin adm upgrade
info: An upgrade is in progress. Unable to apply 4.12.0-ec.2: the cluster operator operator-lifecycle-manager-packageserver is not available
$ oc --as system:admin get -o json clusteroperator operator-lifecycle-manager-packageserver | jq -r '.status.conditions | .lastTransitionTime + " " + .type + "=" + .status + " " + .reason + ": " + .message'
2020-05-21T19:36:51Z Degraded=False :
2022-08-31T16:53:50Z Available=False ClusterServiceVersionNotSucceeded: ClusterServiceVersion openshift-operator-lifecycle-manager/packageserver observed in phase Failed with reason: InstallCheckFailed, message: install failed: deployment packageserver not ready before timeout: deployment "packageserver" exceeded its progress deadline
2022-08-31T17:14:24Z Progressing=False : Failed to deploy 0.17.0
2020-05-21T19:36:51Z Upgradeable=True : Safe to upgrade
$ oc --as system:admin -n openshift-operator-lifecycle-manager get -o json deployment packageserver | jq -r '.status.conditions | .lastTransitionTime + " " + .type + "=" + .status + " " + .reason + ": " + .message'
2022-08-31T17:04:24Z Available=False MinimumReplicasUnavailable: Deployment does not have minimum availability.
2022-08-31T17:04:24Z ReplicaFailure=True FailedCreate: pods "packageserver-647975568d-q46k2" is forbidden: violates PodSecurity "restricted:v1.24": allowPrivilegeEscalation != false (container "packageserver" must set securityContext.allowPrivilegeEscalation=false), unrestricted capabilities (container "packageserver" must set securityContext.capabilities.drop=["ALL"]), runAsNonRoot != true (pod or container "packageserver" must set securityContext.runAsNonRoot=true), seccompProfile (pod or container "packageserver" must set securityContext.seccompProfile.type to "RuntimeDefault" or "Localhost")
2022-08-31T17:14:24Z Progressing=False ProgressDeadlineExceeded: ReplicaSet "packageserver-647975568d" has timed out progressing.
The Deployment is indeed missing a pod securityContext declaration:
$ oc --as system:admin -n openshift-operator-lifecycle-manager get -o json deployment packageserver | jq -r '.spec.template.spec | keys'
The Deployment helpfully points at its controlling resource:
$ oc --as system:admin -n openshift-operator-lifecycle-manager get -o json deployment packageserver | jq .metadata.ownerReferences
And the CSV is also missing a pod securityContext declaration:
$ oc --as system:admin -n openshift-operator-lifecycle-manager get -o json clusterserviceversion packageserver | jq -r '.spec.install.spec.deployments.spec.template.spec | keys'
The CSV sadly does not declare ownerReferences or managedFields:
$ oc --as system:admin -n openshift-operator-lifecycle-manager get --show-managed-fields -o json clusterserviceversion packageserver | jq -r '.metadata | keys'
But it seems to be based on the manifest which grew a securityContext entry here. Still unclear to me is why the 4.12.0-ec.2 OLM operator is failing to update the in-cluster CSV to include that property, although the reconciliation logic appears to be here, and I don't see anything obviously amiss.