Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-35731

Race condition when deleting ServiceAccount

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Normal Normal
    • None
    • 4.16
    • Image Registry
    • Yes
    • False
    • Hide

      None

      Show
      None

      Description of problem:

      A ServiceAccount is not deleted due to a race condition in the controller manager. When deleting the SA, this is logged in the controller manager:

      2024-06-17T15:57:47.793991942Z I0617 15:57:47.793942       1 image_pull_secret_controller.go:233] "Internal registry pull secret auth data does not contain the correct number of entries" ns="test-qtreoisu" name="sink-eguqqiwm-dockercfg-vh8mw" expected=3 actual=0
      2024-06-17T15:57:47.794120755Z I0617 15:57:47.794080       1 image_pull_secret_controller.go:163] "Refreshing image pull secret" ns="test-qtreoisu" name="sink-eguqqiwm-dockercfg-vh8mw" serviceaccount="sink-eguqqiwm"

      As a result, the Secret is updated and the ServiceAccount owning the Secret is updated by the controller via server-side apply operation as can be seen in the managedFields:

      {
         "apiVersion":"v1",
         "imagePullSecrets":[
            {
               "name":"default-dockercfg-vdck9"
            },
            {
               "name":"kn-test-image-pull-secret"
            },
            {
               "name":"sink-eguqqiwm-dockercfg-vh8mw"
            }
         ],
         "kind":"ServiceAccount",
         "metadata":{
            "annotations":{
               "openshift.io/internal-registry-pull-secret-ref":"sink-eguqqiwm-dockercfg-vh8mw"
            },
            "creationTimestamp":"2024-06-17T15:57:47Z",
            "managedFields":[
               {
                  "apiVersion":"v1",
                  "fieldsType":"FieldsV1",
                  "fieldsV1":{
                     "f:imagePullSecrets":{
                        
                     },
                     "f:metadata":{
                        "f:annotations":{
                           "f:openshift.io/internal-registry-pull-secret-ref":{
                              
                           }
                        }
                     },
                     "f:secrets":{
                        "k:{\"name\":\"sink-eguqqiwm-dockercfg-vh8mw\"}":{
                           
                        }
                     }
                  },
                  "manager":"openshift.io/image-registry-pull-secrets_service-account-controller",
                  "operation":"Apply",
                  "time":"2024-06-17T15:57:47Z"
               }
            ],
            "name":"sink-eguqqiwm",
            "namespace":"test-qtreoisu",
            "resourceVersion":"104739",
            "uid":"eaae8d0e-8714-4c2e-9d20-c0c1a221eecc"
         },
         "secrets":[
            {
               "name":"sink-eguqqiwm-dockercfg-vh8mw"
            }
         ]
      }"Events":{
         "metadata":{
            
         },
         "items":null
      } 

      The ServiceAccount then hangs there and is NOT deleted.

      We have seen this only on OCP 4.16 (not on older versions) but already several time, like for example in this CI run which also has must-gather logs that can be investigated.

      Another run is here

      The controller code is new in 4.16 and it seems to be a regression.

      Version-Release number of selected component (if applicable):

      4.16.0-0.nightly-2024-06-14-130320

      How reproducible:

      It happens sometimes in our CI runs where we want to delete a ServiceAccount but it's hanging there. The test doesn't try to delete it again. It tries only once.

      Steps to Reproduce:

      The following reproducer works for me. Some service accounts keep handing there after running the script

      #!/usr/bin/env bash
      
      kubectl create namespace test
      
      for i in `seq 100`; do
      	(
      		kubectl create sa "my-sa-${i}" -n test
      		kubectl wait --for=jsonpath="{.metadata.annotations.openshift\\.io/internal-registry-pull-secret-ref}" sa/my-sa-${i}
      		kubectl delete sa/my-sa-${i}
      		kubectl wait --for=delete sa/my-sa-${i} --timeout=60s
      	)&
      done
      
      wait
      

      Actual results:

      ServiceAccount not deleted

      Expected results:

      ServiceAccount deleted

      Additional info:

       

            lusanche@redhat.com Luis Sanchez
            mgencur@redhat.com Martin Gencur
            Wen Wang Wen Wang
            Votes:
            0 Vote for this issue
            Watchers:
            8 Start watching this issue

              Created:
              Updated: