Uploaded image for project: 'OpenShift Etcd'
  1. OpenShift Etcd
  2. ETCD-234

Reduce time to sync in clustermembercontroller


    • Icon: Task Task
    • Resolution: Won't Do
    • Icon: Undefined Undefined
    • None
    • None
    • None
    • False
    • False

      Currently we see this issue:

      Aug 28 00:02:20.755103 ip-10-0-131-145 hyperkube[1366]: I0828 00:02:20.755067 1366 prober.go:116] "Probe failed" probeType="Readiness" pod="openshift-etcd/etcd-quorum-guard-588ff9b55d-8lhb7" podUID=5b79def2-9e56-4c93-b8ab-1d04db0f552f containerName="guard" probeResult=failure output=""
      then few seconds later
      Aug 28 00:02:25.797258 ip-10-0-131-145 hyperkube[1366]: I0828 00:02:25.797231 1366 kubelet.go:2175] "SyncLoop (probe)" probe="readiness" status="ready" pod="openshift-etcd/etcd-quorum-guard-588ff9b55d-8lhb7"
      Try to improve the clustermembercontroller sync loop for health status or just improve to not fail there on probe quard during install at least or scale. Instead of maybe operator status use metrics to track this.

      Slack for more context https://coreos.slack.com/archives/C027U68LP/p1630506922034600



      • come up with a solution which approach we want to take and present in the team meeting
      • implement the proposed solution

            Unassigned Unassigned
            lcosic@redhat.com Ljiljana Cosic (Inactive)
            0 Vote for this issue
            2 Start watching this issue