Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-15332

Cluster Resource Override Operator should not override resources while removing finalizers

XMLWordPrintable

    • Moderate
    • No
    • 3
    • WINC - Sprint 241, WINC - Sprint 242
    • 2
    • False
    • Hide

      None

      Show
      None

      Description of problem:

      If a pod was created before the cluster resource override operator was enabled in a namespace and it is enabled later, the cluster resource override operator may try to fix the resources in the spec whenever a patch operation for the pod object happens, even if the pod is running and such a patch in the spec is forbidden. This kind of patch attempt is rejected by the kube-apiserver at a validation phase (that happens after mutating webhooks mutated the request) and it fully invalidates the original patch. This can be really problematic under some scenarios.
      
      The most problematic situation in which this can happen is when a pod has a finalizer set and something (or somebody) tries to remove it. In this case, the cluster resource override operator intercepts the patch to remove the finalized from the metadata, imposes a spec modification and then the validation fails, making impossible to remove the pod finalizers and making the pod being stuck in the API forever.
      
      This can become even worse if the pod deletion is part of a drain during a cluster upgrade, because it blocks the upgrade.
      

      Version-Release number of selected component (if applicable):

      Tested on 4.12.0-202305262042
      

      How reproducible:

      Always
      

      Steps to Reproduce:

      1. Start with cluster resource override operator installed and configured and a namespace WITHOUT clusterresourceoverrides.admission.autoscaling.openshift.io/enabled=true label.
      2. Create a job which take several minutes to complete (e.g. a sleep), has a pod template spec that violates the cluster resource overrides operator configuration and has the batch.kubernetes.io/job-tracking annotation (this forces pods to be created with a job tracking finalizer, more details here: https://kubernetes.io/docs/concepts/workloads/controllers/job/#job-tracking-with-finalizers).
      3. While the job is running, add the label clusterresourceoverrides.admission.autoscaling.openshift.io/enabled=true true to the namespace, so it now tracks this namespace.
      4. Few minutes later, delete the job
      5. When pod is stuck deleting, try to remove the finalizer manually with `oc -n ${NAMESPACE} patch pod/${POD} --type json --patch='[ { "op": "remove", "path": "/metadata/finalizers" } ]'` or look at kube-controller-manager pod logs.
      
      

      Actual results:

      
      Result is that the pod is stuck terminating forever and finalizer deletion shows something like this (either if you try manually or check kube-controller-manager logs)
      
      The Pod "whatever-12345678-xxxxx" is invalid: spec: Forbidden: pod updates may not change fields other than `spec.containers[*].image`, `spec.initContainers[*].image`, `spec.activeDeadlineSeconds`, `spec.tolerations` (only additions to existing tolerations) or `spec.terminationGracePeriodSeconds` (allow it to be set to 1 if it was previously negative)
      
      This error happens although the user (or the kube-controller-manager, while trying to remove the finalized because it already tracked the job status) didn't try to patch the spec.
      
      If we run patch with higher log level and we see the full diff, we see that a spec update is tried and that update tries to reconcile the pod spec to what the cluster resource override operator would have done if the pod was being created at that moment.
      
      

      Expected results:

      
      Cluster resource override operator to not touch the spec of a pod in the situations where it is forbidden, specially while trying to remove a finalizer.
      
      

      Additional info:

      
      Temporarily removing the clusterresourceoverrides.admission.autoscaling.openshift.io/enabled=true label works as a workaround, because that stops cluster resource override operator from touching that namespace. 
      
      

            jkyros@redhat.com John Kyros
            rhn-support-palonsor Pablo Alonso Rodriguez
            Weinan Liu Weinan Liu
            Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

              Created:
              Updated:
              Resolved: