Uploaded image for project: 'OpenShift Service Mesh'
  1. OpenShift Service Mesh
  2. OSSM-5541

istio-operator Pod keeps waiting for leader lease for over 30 minutes without timeout.

XMLWordPrintable

    • Icon: Ticket Ticket
    • Resolution: Done-Errata
    • Icon: Undefined Undefined
    • OSSM 3.0-TP1
    • None
    • Maistra
    • None
    • False
    • None
    • False
    • Release Notes
    • Hide
      OSSM-5541 Previously, an istio operator pod might keep waiting for the leader lease in some restart conditions. Now, the leader election implementation has been enhanced to avoid this issue.
      Show
      OSSM-5541 Previously, an istio operator pod might keep waiting for the leader lease in some restart conditions. Now, the leader election implementation has been enhanced to avoid this issue.
    • Enhancement

      The istio-operator Pod keeps waiting for the leader lease for over 30 minutes without timeout.

      The issue can be reproduced with the following procedure:

      1. Stop the node where the istio-operator pod is running.
      2. Wait for about 6 minutes.
      3. The node will become NotReady, the old istio-operator pod will become Terminating, and a new istio-operator pod will get created but in 0/1 status.

      Actual Result:
      The new istio-operator pod will keep in 0/1(NotReady) status for over 30 minutes, perhaps forever.

      Expected Result:
      The new istio-operator pod should be able to get the leader lease within a timeout, say 5 minutes after getting created.

      Additional Information:
      ~~~
      $ oc get pods -o wide
      NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
      istio-operator-5446f67ff6-rdvn4 0/1 Running 0 37m 10.131.0.11 ip-10-0-146-212.ap-northeast-1.compute.internal <none> <none>
      istio-operator-5446f67ff6-trq75 1/1 Terminating 0 97m 10.129.2.11 ip-10-0-178-52.ap-northeast-1.compute.internal <none> <none>
      kiali-operator-7874d8d6cf-n7zrr 1/1 Running 0 37m 10.131.0.10 ip-10-0-146-212.ap-northeast-1.compute.internal <none> <none>
      kiali-operator-7874d8d6cf-qp7qz 1/1 Terminating 0 97m 10.129.2.10 ip-10-0-178-52.ap-northeast-1.compute.internal <none> <none>

      $ oc logs istio-operator-5446f67ff6-rdvn4
      ......

      {"level":"info","ts":1701768009.1261392,"logger":"leader","msg":"Not the leader. Waiting."} {"level":"info","ts":1701768027.0609295,"logger":"leader","msg":"Not the leader. Waiting."} {"level":"info","ts":1701768045.693518,"logger":"leader","msg":"Not the leader. Waiting."}

      ~~~

      Feasible workaround:
      A feasible workaround is to manually delete the istio-operator-lock configmap. By doing so, the new istio-operator can get the leader lease and become 1/1 Ready status.
      ~~~
      $ oc delete configmap istio-operator-lock
      configmap "istio-operator-lock" deleted
      ~~~

            yuaxu@redhat.com Yuanlin Xu
            rhn-support-yhe Yiyong He
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated:
              Resolved: