Loading...

XML

Word

Printable

Type: Bug
Resolution: Done-Errata
Priority: Normal
Fix Version/s: 4.16.z
Affects Version/s: 4.16
Component/s: Image Registry
Labels:
- groomed

Regression:
Yes
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Release Note Type:
Release Note Not Required
Release Note Status:
In Progress
Target Version:

4.16.z
Target Backport Versions:

4.16.z

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

This is a clone of issue ~~OCPBUGS-35731~~. The following is the description of the original issue:
—
Description of problem:

A ServiceAccount is not deleted due to a race condition in the controller manager. When deleting the SA, this is logged in the controller manager:

2024-06-17T15:57:47.793991942Z I0617 15:57:47.793942       1 image_pull_secret_controller.go:233] "Internal registry pull secret auth data does not contain the correct number of entries" ns="test-qtreoisu" name="sink-eguqqiwm-dockercfg-vh8mw" expected=3 actual=0
2024-06-17T15:57:47.794120755Z I0617 15:57:47.794080       1 image_pull_secret_controller.go:163] "Refreshing image pull secret" ns="test-qtreoisu" name="sink-eguqqiwm-dockercfg-vh8mw" serviceaccount="sink-eguqqiwm"

As a result, the Secret is updated and the ServiceAccount owning the Secret is updated by the controller via server-side apply operation as can be seen in the managedFields:

{
   "apiVersion":"v1",
   "imagePullSecrets":[
      {
         "name":"default-dockercfg-vdck9"
      },
      {
         "name":"kn-test-image-pull-secret"
      },
      {
         "name":"sink-eguqqiwm-dockercfg-vh8mw"
      }
   ],
   "kind":"ServiceAccount",
   "metadata":{
      "annotations":{
         "openshift.io/internal-registry-pull-secret-ref":"sink-eguqqiwm-dockercfg-vh8mw"
      },
      "creationTimestamp":"2024-06-17T15:57:47Z",
      "managedFields":[
         {
            "apiVersion":"v1",
            "fieldsType":"FieldsV1",
            "fieldsV1":{
               "f:imagePullSecrets":{
                  
               },
               "f:metadata":{
                  "f:annotations":{
                     "f:openshift.io/internal-registry-pull-secret-ref":{
                        
                     }
                  }
               },
               "f:secrets":{
                  "k:{\"name\":\"sink-eguqqiwm-dockercfg-vh8mw\"}":{
                     
                  }
               }
            },
            "manager":"openshift.io/image-registry-pull-secrets_service-account-controller",
            "operation":"Apply",
            "time":"2024-06-17T15:57:47Z"
         }
      ],
      "name":"sink-eguqqiwm",
      "namespace":"test-qtreoisu",
      "resourceVersion":"104739",
      "uid":"eaae8d0e-8714-4c2e-9d20-c0c1a221eecc"
   },
   "secrets":[
      {
         "name":"sink-eguqqiwm-dockercfg-vh8mw"
      }
   ]
}"Events":{
   "metadata":{
      
   },
   "items":null
}

The ServiceAccount then hangs there and is NOT deleted.

We have seen this only on OCP 4.16 (not on older versions) but already several time, like for example in this CI run which also has must-gather logs that can be investigated.

Another run is here

The controller code is new in 4.16 and it seems to be a regression.

Version-Release number of selected component (if applicable):

4.16.0-0.nightly-2024-06-14-130320

How reproducible:

It happens sometimes in our CI runs where we want to delete a ServiceAccount but it's hanging there. The test doesn't try to delete it again. It tries only once.

Steps to Reproduce:

The following reproducer works for me. Some service accounts keep handing there after running the script

#!/usr/bin/env bash

kubectl create namespace test

for i in `seq 100`; do
	(
		kubectl create sa "my-sa-${i}" -n test
		kubectl wait --for=jsonpath="{.metadata.annotations.openshift\\.io/internal-registry-pull-secret-ref}" sa/my-sa-${i}
		kubectl delete sa/my-sa-${i}
		kubectl wait --for=delete sa/my-sa-${i} --timeout=60s
	)&
done

wait

Actual results:

ServiceAccount not deleted

Expected results:

ServiceAccount deleted

Additional info:

clones

OCPBUGS-35731 Race condition when deleting ServiceAccount

Closed

is blocked by

OCPBUGS-35731 Race condition when deleting ServiceAccount

Closed

links to

openshift/openshift-controller-manager#324: [release-4.16] OCPBUGS-37526: Race condition when deleting ServiceAccount

RHBA-2024:6004 OpenShift Container Platform 4.16.z bug fix update

Assignee:: Luis Sanchez

Reporter:: OpenShift Prow Bot

QA Contact:: Wen Wang

Votes:: 0 Vote for this issue

Watchers:: 6 Start watching this issue

Created:: 2024/07/24 4:05 PM

Updated:: 2024/09/03 7:14 PM

Resolved:: 2024/09/03 7:14 PM

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates