Uploaded image for project: 'OpenShift Etcd'
  1. OpenShift Etcd
  2. ETCD-234

Reduce time to sync in clustermembercontroller

XMLWordPrintable

    • Icon: Task Task
    • Resolution: Won't Do
    • Icon: Undefined Undefined
    • None
    • None
    • None
    • False
    • False

      Currently we see this issue:

       
      Aug 28 00:02:20.755103 ip-10-0-131-145 hyperkube[1366]: I0828 00:02:20.755067 1366 prober.go:116] "Probe failed" probeType="Readiness" pod="openshift-etcd/etcd-quorum-guard-588ff9b55d-8lhb7" podUID=5b79def2-9e56-4c93-b8ab-1d04db0f552f containerName="guard" probeResult=failure output=""
      then few seconds later
      Aug 28 00:02:25.797258 ip-10-0-131-145 hyperkube[1366]: I0828 00:02:25.797231 1366 kubelet.go:2175] "SyncLoop (probe)" probe="readiness" status="ready" pod="openshift-etcd/etcd-quorum-guard-588ff9b55d-8lhb7"
      Try to improve the clustermembercontroller sync loop for health status or just improve to not fail there on probe quard during install at least or scale. Instead of maybe operator status use metrics to track this.

      Slack for more context https://coreos.slack.com/archives/C027U68LP/p1630506922034600

       

      AC: 

      • come up with a solution which approach we want to take and present in the team meeting
      • implement the proposed solution

              Unassigned Unassigned
              lcosic@redhat.com Ljiljana Cosic (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

                Created:
                Updated:
                Resolved: