Uploaded image for project: 'OpenShift Etcd'
  1. OpenShift Etcd
  2. ETCD-234

Reduce time to sync in clustermembercontroller

    XMLWordPrintable

Details

    • Task
    • Resolution: Won't Do
    • Undefined
    • None
    • None
    • None
    • False
    • False

    Description

      Currently we see this issue:

       
      Aug 28 00:02:20.755103 ip-10-0-131-145 hyperkube[1366]: I0828 00:02:20.755067 1366 prober.go:116] "Probe failed" probeType="Readiness" pod="openshift-etcd/etcd-quorum-guard-588ff9b55d-8lhb7" podUID=5b79def2-9e56-4c93-b8ab-1d04db0f552f containerName="guard" probeResult=failure output=""
      then few seconds later
      Aug 28 00:02:25.797258 ip-10-0-131-145 hyperkube[1366]: I0828 00:02:25.797231 1366 kubelet.go:2175] "SyncLoop (probe)" probe="readiness" status="ready" pod="openshift-etcd/etcd-quorum-guard-588ff9b55d-8lhb7"
      Try to improve the clustermembercontroller sync loop for health status or just improve to not fail there on probe quard during install at least or scale. Instead of maybe operator status use metrics to track this.

      Slack for more context https://coreos.slack.com/archives/C027U68LP/p1630506922034600

       

      AC: 

      • come up with a solution which approach we want to take and present in the team meeting
      • implement the proposed solution

      Attachments

        Activity

          People

            Unassigned Unassigned
            lcosic@redhat.com Ljiljana Cosic (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: