Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-48694

Etcd client can unsafely retry timeouts on mutating requests

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Undefined Undefined
    • None
    • 4.18.0, 4.19.0
    • kube-apiserver
    • Yes
    • Approved
    • False
    • Hide

      None

      Show
      None

      Description of problem:

      Our carry patch intended to retry retriable requests that fail due to leader change will retry any etcd error with code "Unavailable": https://github.com/openshift/kubernetes/blob/4b2db1ec33faa3ffc305e5ffa7376908cc955370/staging/src/k8s.io/apiserver/pkg/storage/etcd3/etcd3retry/retry_etcdclient.go#L135-L145, but this includes reasons like "timeout" and does not distinguish between writes and reads. So a "timeout" error on a writing request might be retried even though a "timeout" observed by a client does not indicate that the effect of the write has not been persisted.

      Version-Release number of selected component (if applicable):

          

      How reproducible:

          

      Steps to Reproduce:

          1.
          2.
          3.
          

      Actual results:

          

      Expected results:

          

      Additional info:

          

              bluddy Ben Luddy
              bluddy Ben Luddy
              Ke Wang Ke Wang
              Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

                Created:
                Updated: