-
Bug
-
Resolution: Duplicate
-
Major
-
None
-
premerge
-
Important
-
No
-
False
-
Description of problem:
ARO runs Hive on AKS. We've been restricted to running an older version due to a memory leak within hive. We've worked with the hive team to narrow down the memory leak to start occurring at hive commit: 33c5cd37cd. The leak happens within an hour in larger regions and will eventually end up causing k8s to kill the pod because of OOM issues. The memory leak happens most notably within the hibernation controller reconciliation loop.
Version-Release number of selected component (if applicable):
https://github.com/openshift/hive/tree/33c5cd37cd and beyond are affected
How reproducible:
Every version is affected on ARO AKS clusters.
Steps to Reproduce:
1. Create an AKS cluster 2. Deploy affected hive version to the AKS cluster 3. Run the ARO RP on the hive cluster 4. Create a cluster using the ARO RP 5. Update the cluster service principal credentials to be invalid to ensure the leak progresses faster 6. Watch the memory consumption of the hive-controller pod slowly increase
Actual results:
Hive has a memory leak and eventually crashes
Expected results:
Hive does not have a memory leak and memory usage is stable.
Additional info:
Relevant thread: https://redhat-internal.slack.com/archives/CE3ETN3J8/p1688416480405189 Linked story on ARO side: https://issues.redhat.com/browse/ARO-3639