-
Bug
-
Resolution: Done-Errata
-
Normal
-
4.16
This is a clone of issue OCPBUGS-35731. The following is the description of the original issue:
—
Description of problem:
A ServiceAccount is not deleted due to a race condition in the controller manager. When deleting the SA, this is logged in the controller manager:
2024-06-17T15:57:47.793991942Z I0617 15:57:47.793942 1 image_pull_secret_controller.go:233] "Internal registry pull secret auth data does not contain the correct number of entries" ns="test-qtreoisu" name="sink-eguqqiwm-dockercfg-vh8mw" expected=3 actual=0 2024-06-17T15:57:47.794120755Z I0617 15:57:47.794080 1 image_pull_secret_controller.go:163] "Refreshing image pull secret" ns="test-qtreoisu" name="sink-eguqqiwm-dockercfg-vh8mw" serviceaccount="sink-eguqqiwm"
As a result, the Secret is updated and the ServiceAccount owning the Secret is updated by the controller via server-side apply operation as can be seen in the managedFields:
{ "apiVersion":"v1", "imagePullSecrets":[ { "name":"default-dockercfg-vdck9" }, { "name":"kn-test-image-pull-secret" }, { "name":"sink-eguqqiwm-dockercfg-vh8mw" } ], "kind":"ServiceAccount", "metadata":{ "annotations":{ "openshift.io/internal-registry-pull-secret-ref":"sink-eguqqiwm-dockercfg-vh8mw" }, "creationTimestamp":"2024-06-17T15:57:47Z", "managedFields":[ { "apiVersion":"v1", "fieldsType":"FieldsV1", "fieldsV1":{ "f:imagePullSecrets":{ }, "f:metadata":{ "f:annotations":{ "f:openshift.io/internal-registry-pull-secret-ref":{ } } }, "f:secrets":{ "k:{\"name\":\"sink-eguqqiwm-dockercfg-vh8mw\"}":{ } } }, "manager":"openshift.io/image-registry-pull-secrets_service-account-controller", "operation":"Apply", "time":"2024-06-17T15:57:47Z" } ], "name":"sink-eguqqiwm", "namespace":"test-qtreoisu", "resourceVersion":"104739", "uid":"eaae8d0e-8714-4c2e-9d20-c0c1a221eecc" }, "secrets":[ { "name":"sink-eguqqiwm-dockercfg-vh8mw" } ] }"Events":{ "metadata":{ }, "items":null }
The ServiceAccount then hangs there and is NOT deleted.
We have seen this only on OCP 4.16 (not on older versions) but already several time, like for example in this CI run which also has must-gather logs that can be investigated.
Another run is here
The controller code is new in 4.16 and it seems to be a regression.
Version-Release number of selected component (if applicable):
4.16.0-0.nightly-2024-06-14-130320
How reproducible:
It happens sometimes in our CI runs where we want to delete a ServiceAccount but it's hanging there. The test doesn't try to delete it again. It tries only once.
Steps to Reproduce:
The following reproducer works for me. Some service accounts keep handing there after running the script
#!/usr/bin/env bash kubectl create namespace test for i in `seq 100`; do ( kubectl create sa "my-sa-${i}" -n test kubectl wait --for=jsonpath="{.metadata.annotations.openshift\\.io/internal-registry-pull-secret-ref}" sa/my-sa-${i} kubectl delete sa/my-sa-${i} kubectl wait --for=delete sa/my-sa-${i} --timeout=60s )& done wait
Actual results:
ServiceAccount not deleted
Expected results:
ServiceAccount deleted
Additional info:
- clones
-
OCPBUGS-35731 Race condition when deleting ServiceAccount
- Closed
- is blocked by
-
OCPBUGS-35731 Race condition when deleting ServiceAccount
- Closed
- links to
-
RHBA-2024:6004 OpenShift Container Platform 4.16.z bug fix update