Uploaded image for project: 'OpenShift API for Data Protection'
  1. OpenShift API for Data Protection
  2. OADP-7132

openshift-velero-plugin panics with 'concurrent map writes'

XMLWordPrintable

    • Incidents & Support
    • 3
    • False
    • Hide

      None

      Show
      None
    • False
    • ToDo
    • Important
    • Very Likely
    • 0
    • None
    • Unset
    • Unknown
    • None

      Description of problem:

      While running backups, some of them are being marked as 'PartiallyFailed', not all of them.
      This happens because the openshift-velero-plugin exits with 'status 2' which is seen in the logs of the backup.
      Enabling debug in velero to gather more information, we can see a goroutine thread in panic for this plugin

      It is believed that this is the culprit of the backup being marked as PartiallyFailed since the plugin crashes due to the panic

      Version-Release number of selected component (if applicable):

      ROSA OCP cluster
      OADP 1.5.3

      How reproducible:

      Random in customer cluster

      Steps to Reproduce:
      1. Have multiple schedules that runs at the same time or run multiple backups.
      Nothing outside of the ordinary other than the amount of threads enabled in the DPA (itemBlockWorkerCount: 8)

      Actual results:

      Random backups are marked with 'PartiallyFailed' phase due to openshift-velero-plugin crashing

      Expected results:

      openshift-velero-plugin should not crash and backup should complete without issues.

              sseago Scott Seago
              rhn-support-jcoscia Javier Coscia
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

                Created:
                Updated: