Uploaded image for project: 'Red Hat Advanced Cluster Security'
  1. Red Hat Advanced Cluster Security
  2. ROX-27782

Pruning should stop deleting batches after context timeout

Create Feature from Fe...Move to CloseXMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Done
    • Icon: Undefined Undefined
    • 4.7.0
    • None
    • None
    • None
    • False
    • Hide

      None

      Show
      None
    • False
    • Rox Sprint 4.7G - Global
    • 0

      During pruning of a LARGE quantity of alerts we discovered the process was not completing in the allotted timeout.  in 4.6 we updated pruning to continue on error.  Meaning if a batch failed we would process the next batch instead of returning and error and rolling back the transaction.  

      However, if a given batch fails because the context is cancelled, all subsequent batches will fail because the context will remain timed out.  We should at a minimum detect the context timeout and process no further batches.  We may want to take it a step further and also check for transient postgres errors before continuing.  

      https://github.com/stackrox/stackrox/blob/285e8e1efa6b4beb56e862c5f5c1e4f04d47021f/pkg/search/postgres/store.go#L538

              rh-ee-aheflin AJ Heflin
              rh-ee-dashrews David Shrewsberry
              ACS Core Workflows
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

                Created:
                Updated:
                Resolved: