Description of problem: OCPBUGS-575 uncovered an issue where the cluster-version operator was failing to reconcile seccompProfile for some OLM deployments which were declared as CVO manifests. Having manually recovered from that issue by deleteing the three manifests:
$ oc --as system:admin -n openshift-operator-lifecycle-manager delete deployments catalog-operator olm-operator package-server-manager
The cluster is now stuck a bit later on:
$ oc --as system:admin adm upgrade info: An upgrade is in progress. Unable to apply 4.12.0-ec.2: the cluster operator operator-lifecycle-manager-packageserver is not available $ oc --as system:admin get -o json clusteroperator operator-lifecycle-manager-packageserver | jq -r '.status.conditions[] | .lastTransitionTime + " " + .type + "=" + .status + " " + .reason + ": " + .message' 2020-05-21T19:36:51Z Degraded=False : 2022-08-31T16:53:50Z Available=False ClusterServiceVersionNotSucceeded: ClusterServiceVersion openshift-operator-lifecycle-manager/packageserver observed in phase Failed with reason: InstallCheckFailed, message: install failed: deployment packageserver not ready before timeout: deployment "packageserver" exceeded its progress deadline 2022-08-31T17:14:24Z Progressing=False : Failed to deploy 0.17.0 2020-05-21T19:36:51Z Upgradeable=True : Safe to upgrade $ oc --as system:admin -n openshift-operator-lifecycle-manager get -o json deployment packageserver | jq -r '.status.conditions[] | .lastTransitionTime + " " + .type + "=" + .status + " " + .reason + ": " + .message' 2022-08-31T17:04:24Z Available=False MinimumReplicasUnavailable: Deployment does not have minimum availability. 2022-08-31T17:04:24Z ReplicaFailure=True FailedCreate: pods "packageserver-647975568d-q46k2" is forbidden: violates PodSecurity "restricted:v1.24": allowPrivilegeEscalation != false (container "packageserver" must set securityContext.allowPrivilegeEscalation=false), unrestricted capabilities (container "packageserver" must set securityContext.capabilities.drop=["ALL"]), runAsNonRoot != true (pod or container "packageserver" must set securityContext.runAsNonRoot=true), seccompProfile (pod or container "packageserver" must set securityContext.seccompProfile.type to "RuntimeDefault" or "Localhost") 2022-08-31T17:14:24Z Progressing=False ProgressDeadlineExceeded: ReplicaSet "packageserver-647975568d" has timed out progressing.
The Deployment is indeed missing a pod securityContext declaration:
$ oc --as system:admin -n openshift-operator-lifecycle-manager get -o json deployment packageserver | jq -r '.spec.template.spec | keys[]' affinity containers dnsPolicy nodeSelector priorityClassName restartPolicy schedulerName securityContext serviceAccount serviceAccountName terminationGracePeriodSeconds tolerations volumes
The Deployment helpfully points at its controlling resource:
$ oc --as system:admin -n openshift-operator-lifecycle-manager get -o json deployment packageserver | jq .metadata.ownerReferences [ { "apiVersion": "operators.coreos.com/v1alpha1", "blockOwnerDeletion": false, "controller": false, "kind": "ClusterServiceVersion", "name": "packageserver", "uid": "dcf114e1-9f65-408d-8bc3-2b1638c9f151" } ]
And the CSV is also missing a pod securityContext declaration:
$ oc --as system:admin -n openshift-operator-lifecycle-manager get -o json clusterserviceversion packageserver | jq -r '.spec.install.spec.deployments[].spec.template.spec | keys[]' affinity containers nodeSelector priorityClassName serviceAccountName tolerations volumes
The CSV sadly does not declare ownerReferences or managedFields:
$ oc --as system:admin -n openshift-operator-lifecycle-manager get --show-managed-fields -o json clusterserviceversion packageserver | jq -r '.metadata | keys[]' annotations creationTimestamp generation labels name namespace resourceVersion uid
But it seems to be based on the manifest which grew a securityContext entry here. Still unclear to me is why the 4.12.0-ec.2 OLM operator is failing to update the in-cluster CSV to include that property, although the reconciliation logic appears to be here, and I don't see anything obviously amiss.
- clones
-
OCPBUGS-575 The lacking securityContext.seccompProfile.type of OLM deployments is blocking OCP upgrade to 4.12
- Closed
- is caused by
-
OCPBUGS-858 package-server-manager does not migrate packageserver CSV from v0.17.0 to v0.18.3 on OCP 4.8 -> 4.9 upgrade
- Closed
- is related to
-
OCPBUGS-862 failure to upgrade OLM packageserver does not block OCP upgrade
- Closed