• Icon: Task Task
    • Resolution: Unresolved
    • Icon: Undefined Undefined
    • None
    • None
    • CI
    • None
    • Quality / Stability / Reliability
    • False
    • Hide

      None

      Show
      None
    • False
    • Not Selected

      Overview:

      Currently the timeout for image prefetching is purely heuristic, and the same for all images. We follow a very simple exponential pattern: first download has a 30s timeout, next one 1 minute, then 2 minutes and so on

      The problem is that some images are tiny (so we waste time waiting too long on stalled downloads), and others large, and cannot possibly be fetched in less than 30 seconds (so we waste the first one or two attempts).

      One thing that would maximize the robustness of prefetching would be to add a peer-to-peer communication between prefetcher pods such that they can exchange information on how long a successful prefetching of a given image took.

      This way each prefetcher pod would be able to fine-tune the timeout for each image separately, and thus take more download attempts rather than keep idle while a given stalled download sits there waiting for an unrealistically-high deadline calculated by exponential backoff.

      In the failed CI job for the linked ticket, this could have even doubled the number of attempts, increasing the probability of success.

      Acceptance Criteria:

      A list of specific needs or objectives that this task must deliver in order to be considered complete. Complete during Refinement status.

              Unassigned Unassigned
              mowsiany@redhat.com Marcin Owsiany
              ACS Install
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

                Created:
                Updated: