Uploaded image for project: 'OpenShift API for Data Protection'
  1. OpenShift API for Data Protection
  2. OADP-2681

[Upstream] DataMover - datauploads and datadownloads resources aren't distributed equally among the workers.

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Done-Errata
    • Icon: Normal Normal
    • OADP 1.3.0
    • OADP 1.3.0
    • data-mover
    • False
    • Hide

      None

      Show
      None
    • False
    • oadp-operator-bundle-container-1.3.0-147
    • ToDo
    • No
    • 0
    • 0
    • Very Likely
    • 0
    • None
    • Unset
    • Unknown

      Description of problem:

      Tracking bug for upstream Velero 1.12 - https://github.com/vmware-tanzu/velero/issues/6734

       

      Running datamover backup and restore. During the tests, monitoring the datauploads and datadownloads resources and noticed the resources weren't distributed equally among the workers.
      It causes the tests to run long duration, also while running a few cycles of the same test - the results are inconsistent.

      Another issue - the test is not run with the max concurrent (1 resource per node)

       

      Version-Release number of selected component (if applicable):

      OCP 4.12.9

      ODF 4.12.7

      Upstream Velero 1.12

       

      How reproducible:

       

      Steps to Reproduce:
      1. Run datamover backup / restore
      2. monitor datauploads / datadownloads resources
      3. 

      Actual results:

      The resources not distributed among all workers

      Expected results:

      The resources should distributed among all workers equally as much as it can.

       

       

      Additional info:

       

      5 backup cycles duration and datauploads distributed (ns with 100 PVs):
      -0:23:48
      worker000-r640 : 3, worker001-r640 : 61, worker002-r640 : 0, worker003-r640 : 4, worker004-r640 : 32, worker005-r640 : 0
      -0:16:39
      worker000-r640 : 11, worker001-r640 : 23, worker002-r640 : 14, worker003-r640 : 20, worker004-r640 : 19,worker005-r640 : 13
      -0:17:53
      worker000-r640 : 20, worker001-r640 : 17, worker002-r640 : 15, worker003-r640 : 16, worker004-r640 : 9, worker005-r640 : 23
      -0:18:45
      worker000-r640 : 24, worker001-r640 : 15, worker002-r640 : 22, worker003-r640 : 17, worker004-r640 : 6, worker005-r640 : 16
      -0:28:39
      worker000-r640 : 26, worker001-r640 : 15, worker002-r640 : 20, worker003-r640 : 20, worker004-r640 : 2, worker005-r640 : 17

      5 restore cycles datadownloads distributed (ns with 100 PVs):
      -worker000-r640 : 5, worker001-r640 : 51, worker002-r640 : 0, worker003-r640 : 17, worker004-r640 : 27, worker005-r640 : 0
      -worker000-r640 : 24, worker001-r640 : 13, worker002-r640 : 0, worker003-r640 : 22, worker004-r640 : 24, worker005-r640 : 17
      -worker000-r640 : 28, worker001-r640 : 12, worker002-r640 : 10, worker003-r640 : 23, worker004-r640 : 14, worker005-r640 : 13
      -worker000-r640 : 21, worker001-r640 : 18, worker002-r640 : 10, worker003-r640 : 18, worker004-r640 : 17, worker005-r640 : 16
      -worker000-r640 : 15, worker001-r640 : 17, worker002-r640 : 11, worker003-r640 : 21, worker004-r640 : 21, worker005-r640 : 15

            wnstb Wes Hayutin
            dvaanunu@redhat.com David Vaanunu
            David Vaanunu David Vaanunu
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

              Created:
              Updated:
              Resolved: