-
Bug
-
Resolution: Unresolved
-
Critical
-
MTA 8.0.0
-
None
-
Quality / Stability / Reliability
-
False
-
-
False
-
-
-
Important
-
Yes
Description of problem:
Attempting 5k bulk analysis is resulting in partial success of some pods while the majority are either in failed or cancelled state.
The analysis pods are running on baremetal cluster with 12 workers, with enough available capacity both in cores, memory and storage resources.
The environment uses resourcequota restriction to 250 total pods which is a reduction from 7.3 where 460 max pods were allowed. This restriction takes into account the increased resource demands for tasks pods but the analysis for 5k is not able to complete as expected.
This has failed on 8.0.0-24 and 8.0.0-52 I have not been able to successfully complete bulk analysis for 5k maven apps.
MTA 8.0.0.52
Server Version: 4.18.15
Kubernetes Version: v1.31.8
How reproducible:
Each time
Steps to Reproduce:
1. Create 5k apps of maven repo and import csv as managed import
2. Wait for managed import process and related tasks to complete (lang/tech discoveries)
3. From the UI select all 5k maven apps, and click analysis (select Linux, containers, and azure/cloudreadniess)
4. Let the workers churn through the tasks until the analysis on all apps are completed.
Version-Release number of selected component (if applicable):
Actual results:
Some analysis of maven apps succeed, some fail.
I can't tell you how many because I cant filter the UI tasks based on kind and status.
Expected results:
5k apps to be completed successfully taking longer than previous 7.3 runs because total pods in reourcequota is reduced.
Additional info:
browse to this link https://drive.google.com/drive/folders/1EEWtcuRiKcZKQDA_cIWOU2v9z2DV0Nxh?usp=sharing
- mta-logs
- sqlite database of mta-hub
- resource graphs to show no saturation is occurring
- describe on analysis task that failed that shows reason for termination is 'error'
Note in:
The mta-hub.log shows numerous database queries and "Zombie detected" messages