Uploaded image for project: 'Migration Toolkit for Applications'
  1. Migration Toolkit for Applications
  2. MTA-6081

5k bulk analysis fails to complete

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Critical Critical
    • MTA 8.1.0
    • MTA 8.0.0
    • Hub, Scale&Perf-QE
    • None
    • Quality / Stability / Reliability
    • False
    • Hide

      None

      Show
      None
    • False
    • Important
    • Yes

      Description of problem:

      Attempting 5k bulk analysis is resulting in partial success of some pods while the majority are either in failed or cancelled state.

      The analysis pods are running on baremetal cluster with 12 workers, with enough available capacity both in cores, memory and storage resources.

      The environment uses resourcequota restriction to 250 total pods which is a reduction from 7.3 where 460 max pods were allowed. This restriction takes into account the increased resource demands for tasks pods but the analysis for 5k is not able to complete as expected.

      This has failed on 8.0.0-24 and 8.0.0-52 I have not been able to successfully complete bulk analysis for 5k maven apps.

       

      MTA 8.0.0.52

      Server Version: 4.18.15

      Kubernetes Version: v1.31.8

       

      How reproducible:

      Each time 

      Steps to Reproduce:

      1. Create 5k apps of maven repo and import csv as managed import

      2. Wait for managed import process and related tasks to complete (lang/tech discoveries)

      3. From the UI select all 5k maven apps, and click analysis (select Linux, containers, and azure/cloudreadniess)

      4. Let the workers churn through the tasks until the analysis on all apps are completed.

      Version-Release number of selected component (if applicable):

      Actual results:

      Some analysis of maven apps succeed, some fail.

      I can't tell you how many because I cant filter the UI tasks based on kind and status. 

      Expected results:

      5k apps to be completed successfully taking longer than previous 7.3 runs because total pods in reourcequota is reduced.

       

      Additional info:

      browse to this link https://drive.google.com/drive/folders/1EEWtcuRiKcZKQDA_cIWOU2v9z2DV0Nxh?usp=sharing

       

      1. mta-logs
      2. sqlite database of mta-hub
      3. resource graphs to show no saturation is occurring
      4. describe on analysis task that failed that shows reason for termination is  'error'

      Note in:
      The mta-hub.log shows numerous database queries and "Zombie detected" messages

       

              jortel Jeff Ortel
              mlehrer@redhat.com Mordechai Lehrer
              Mordechai Lehrer Mordechai Lehrer
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

                Created:
                Updated: