Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-11486

Extend leaseDurationSeconds in SNO

XMLWordPrintable

    • Quality / Stability / Reliability
    • False
    • Hide

      None

      Show
      None
    • 3
    • Moderate
    • No
    • Hide
      8/22: verification pending
      8/10: All Pull Requests for Extension merged, now waiting for QA Verification
      8/8: Upstream TopoLVM PR merged. Downstream TopoLVM rebased. Finalizing the plumbing to connect LVMS to TopoLVM and enable proper leader election.
      7/4: the associated case is closed but the customer wants to track the resolution. This requires a forecast.
      6/22: the case related to this bug is closed; unlinked this bug from the case. This bug is still needed to address the issue if needed.
      Show
      8/22: verification pending 8/10: All Pull Requests for Extension merged, now waiting for QA Verification 8/8: Upstream TopoLVM PR merged. Downstream TopoLVM rebased. Finalizing the plumbing to connect LVMS to TopoLVM and enable proper leader election. 7/4: the associated case is closed but the customer wants to track the resolution. This requires a forecast. 6/22: the case related to this bug is closed; unlinked this bug from the case. This bug is still needed to address the issue if needed.
    • None
    • None
    • OCP VE Sprint 239, OCP VE Sprint 240, OCP VE Sprint 241, OCP VE Sprint 242, OCP VE Sprint 243, OCP VE Sprint 244
    • 6
    • In Progress
    • Release Note Not Required
    • None
    • None
    • None
    • None
    • None

      Description of problem:

      In SNO scenario, if kube-apiserver restarted, topolvm-controller and lvm-operator Pod will crash because leader election failed

      Version-Release number of selected component (if applicable):

      OCP 4.12

      How reproducible:

      100%

      Steps to Reproduce:

      1. Kill kube-apiserver in SNO
      $ oc exec -it -n openshift-kube-apiserver kube-apiserver-XXXXXX  -c kube-apiserver -- /bin/sh -c "kill 1" 
      2. Watch the topolvm-controller Pod and lvm-operator Pod
      3. Check the logs before the crash:
      2023-04-06T17:45:52.566409204+08:00 stderr F E0406 09:45:52.566357       1 leaderelection.go:330] error retrieving resource lock openshift-storage/topolvm: Get "https://172.30.0.1:443/apis/coordination.k8s.io/v1/namespaces/openshift-storage/leases/topolvm": dial tcp 172.30.0.1:443: connect: connection refused
      2023-04-06T17:45:54.566987764+08:00 stderr F E0406 09:45:54.566945       1 leaderelection.go:330] error retrieving resource lock openshift-storage/topolvm: Get "https://172.30.0.1:443/apis/coordination.k8s.io/v1/namespaces/openshift-storage/leases/topolvm": dial tcp 172.30.0.1:443: connect: connection refused
      2023-04-06T17:45:56.567006728+08:00 stderr F E0406 09:45:56.566959       1 leaderelection.go:330] error retrieving resource lock openshift-storage/topolvm: Get "https://172.30.0.1:443/apis/coordination.k8s.io/v1/namespaces/openshift-storage/leases/topolvm": dial tcp 172.30.0.1:443: connect: connection refused
      2023-04-06T17:46:02.570057660+08:00 stdout F leader election lost
      2023-04-06T17:46:02.570087514+08:00 stderr F E0406 09:46:02.565764       1 leaderelection.go:330] error retrieving resource lock openshift-storage/topolvm: Get "https://172.30.0.1:443/apis/coordination.k8s.io/v1/namespaces/openshift-storage/leases/topolvm": context deadline exceeded
      2023-04-06T17:46:02.570087514+08:00 stderr F I0406 09:46:02.565823       1 leaderelection.go:283] failed to renew lease openshift-storage/topolvm: timed out waiting for the condition
      2023-04-06T17:46:02.570087514+08:00 stderr F {"level":"error","ts":1680774362.5658588,"logger":"setup","msg":"problem running manager","error":"leader election lost","stacktrace":"github.com/topolvm/topolvm/pkg/topolvm-controller/cmd.subMain\n\t/remote-source/app/pkg/topolvm-controller/cmd/run.go:145\ngithub.com/topolvm/topolvm/pkg/topolvm-controller/cmd.glob..func1\n\t/remote-source/app/pkg/topolvm-controller/cmd/root.go:34\ngithub.com/spf13/cobra.(*Command).execute\n\t/remote-source/deps/gomod/pkg/mod/github.com/spf13/cobra@v1.4.0/command.go:856\ngithub.com/spf13/cobra.(*Command).ExecuteC\n\t/remote-source/deps/gomod/pkg/mod/github.com/spf13/cobra@v1.4.0/command.go:974\ngithub.com/spf13/cobra.(*Command).Execute\n\t/remote-source/deps/gomod/pkg/mod/github.com/spf13/cobra@v1.4.0/command.go:902\ngithub.com/topolvm/topolvm/pkg/topolvm-controller/cmd.Execute\n\t/remote-source/app/pkg/topolvm-controller/cmd/root.go:41\nmain.main\n\t/remote-source/app/pkg/hypertopolvm/main.go:44\nruntime.main\n\t/usr/lib/golang/src/runtime/proc.go:250"}
      2023-04-06T17:46:02.570087514+08:00 stderr F Error: leader election lost 

      Actual results:

      The said Pods will crash and restart due to short leaseDurationSeconds

      Expected results:

      The said Pods should survive

      Additional info:

       

              rh-ee-jmoller Jakob Moeller (Inactive)
              rhn-support-cchen Chen Chen
              None
              None
              Rahul Deore Rahul Deore
              None
              Votes:
              0 Vote for this issue
              Watchers:
              12 Start watching this issue

                Created:
                Updated:
                Resolved: