-
Bug
-
Resolution: Done
-
Undefined
-
4.13.0
-
None
-
None
-
False
-
This is a clone of issue OCPBUGS-3195. The following is the description of the original issue:
—
Description of problem:
the service ca controller start func seems to return that error as soon as its context is cancelled (which seems to happen the moment the first signal is received): https://github.com/openshift/service-ca-operator/blob/42088528ef8a6a4b8c99b0f558246b8025584056/pkg/controller/starter.go#L24 that apparently triggers os.Exit(1) immediately https://github.com/openshift/service-ca-operator/blob/42088528ef8a6a4b8c99b0f55824[…]om/openshift/library-go/pkg/controller/controllercmd/builder.go the lock release doesn't happen until the periodic renew tick breaks out https://github.com/openshift/service-ca-operator/blob/42088528ef8a6a4b8c99b0f55824[…]/vendor/k8s.io/client-go/tools/leaderelection/leaderelection.go seems unlikely that you'd reach the call to le.release() before the call to os.Exit(1) in the other goroutine
Version-Release number of selected component (if applicable):
4.13.0
How reproducible:
~always
Steps to Reproduce:
1. oc delete -n openshift-service-ca pod <service-ca pod>
Actual results:
the old pod logs show:
W1103 09:59:14.370594 1 builder.go:106] graceful termination failed, controllers failed with error: stopped
and when a new pod comes up to replace it, it has to wait for a while before acquiring the leader lock
I1103 16:46:00.166173 1 leaderelection.go:248] attempting to acquire leader lease openshift-service-ca/service-ca-controller-lock... .... waiting .... I1103 16:48:30.004187 1 leaderelection.go:258] successfully acquired lease openshift-service-ca/service-ca-controller-lock
Expected results:
new pod can acquire the leader lease without waiting for the old pod's lease to expire
Additional info:
- clones
-
OCPBUGS-3195 Service-ca controller exits immediately with an error on sigterm
- Closed
- is blocked by
-
OCPBUGS-3195 Service-ca controller exits immediately with an error on sigterm
- Closed
- links to