Loading...

XML

Word

Printable

Type: Bug
Resolution: Unresolved
Priority: Normal
Fix Version/s: None
Affects Version/s: 4.22
Component/s: Machine Config Operator
Labels:
- mco-triaged

Activity Type:
None
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Story Points:
None
Severity:
Moderate
Regression:
None

Target Backport Versions:
None
Target Version:
None
Release Blocker:
None
Sprint:
None

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

PX Impact Score:

Release Note Status:
None
Release Note Type:
None
Release Note Text:
None

Escape Reason:
None
Escape Impact:
None
Corrective Measures:
None
SDLC stage when should've been found:
None

Description of problem:


When we configure the TLS in the "cluster" apiserver resource, some operator pods start reporting a backoff state like this:


$ omc get pods
NAME                           READY   STATUS             RESTARTS   AGE
etcd-operator-5fb7c9bc-r66js   0/1     CrashLoopBackOff   63         10h

$ omc logs etcd-operator-5fb7c9bc-r66js
2026-02-16T22:15:29.874393249Z I0216 22:15:29.874325       1 cmd.go:253] Using service-serving-cert provided certificates
2026-02-16T22:15:29.874393249Z I0216 22:15:29.874375       1 leaderelection.go:121] The leader election gives 4 retries and allows for 30s of clock skew. The kube-apiserver downtime tolerance is 78s. Worst non-graceful lease acquisition is 2m43s. Worst graceful lease acquisition is {26s}.
2026-02-16T22:15:29.874528883Z F0216 22:15:29.874507       1 cmd.go:182] open /var/run/secrets/kubernetes.io/serviceaccount/token: permission denied

Version-Release number of selected component (if applicable):

4.22

How reproducible:

Intermittent, rarerly.

Steps to Reproduce:

    1. Configure the TLS in the apiserver "cluster" resource

$ oc patch apiserver cluster --type json -p '[{ "op": "add", "path": "/spec/tlsSecurityProfile", "value":  {"type": "Old","old": {}}}]'
    2.
    3.

Actual results:


Many operator pods report Backoff state and cannot start properly. Since they cannot start, they can't recreate evicted pods and the update gets stuck because of poddisruptionbudgets.

Expected results:

No operator pod should be reporting Backoff state.

Additional info:


Deleting the pod manually fixes the issue. The new pod will be able to run without problems.

relates to

MCO-2110 Migrate mco_security tests

Closed

Assignee:: Team MCO

Reporter:: Sergio Regidor de la Rosa

Need Info From:: None

Contributors:: None

QA Contact:: Rahul Gangwar

Doc Contact:: None

Votes:: 0 Vote for this issue

Watchers:: 5 Start watching this issue

Created:: 2026/02/17 12:10 PM

Updated:: 2026/02/25 4:11 PM

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates