Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-76990

Operator pods in CrashLoopBackOff state after configuring TLS in the "cluster" apiserver resource

    • None
    • False
    • Hide

      None

      Show
      None
    • None
    • Moderate
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      Description of problem:

      
      When we configure the TLS in the "cluster" apiserver resource, some operator pods start reporting a backoff state like this:
      
      
      $ omc get pods
      NAME                           READY   STATUS             RESTARTS   AGE
      etcd-operator-5fb7c9bc-r66js   0/1     CrashLoopBackOff   63         10h
      
      $ omc logs etcd-operator-5fb7c9bc-r66js
      2026-02-16T22:15:29.874393249Z I0216 22:15:29.874325       1 cmd.go:253] Using service-serving-cert provided certificates
      2026-02-16T22:15:29.874393249Z I0216 22:15:29.874375       1 leaderelection.go:121] The leader election gives 4 retries and allows for 30s of clock skew. The kube-apiserver downtime tolerance is 78s. Worst non-graceful lease acquisition is 2m43s. Worst graceful lease acquisition is {26s}.
      2026-02-16T22:15:29.874528883Z F0216 22:15:29.874507       1 cmd.go:182] open /var/run/secrets/kubernetes.io/serviceaccount/token: permission denied
      
          

      Version-Release number of selected component (if applicable):

      4.22
          

      How reproducible:

      Intermittent, rarerly.
          

      Steps to Reproduce:

          1. Configure the TLS in the apiserver "cluster" resource
      
      $ oc patch apiserver cluster --type json -p '[{ "op": "add", "path": "/spec/tlsSecurityProfile", "value":  {"type": "Old","old": {}}}]'
          2.
          3.
          

      Actual results:

      
      Many operator pods report Backoff state and cannot start properly. Since they cannot start, they can't recreate evicted pods and the update gets stuck because of poddisruptionbudgets.
      
      
          

      Expected results:

      No operator pod should be reporting Backoff state.
      
      
          

      Additional info:

      
      Deleting the pod manually fixes the issue. The new pod will be able to run without problems.
      
          

              team-mco Team MCO
              sregidor@redhat.com Sergio Regidor de la Rosa
              None
              None
              Rahul Gangwar Rahul Gangwar
              None
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

                Created:
                Updated: