Loading...

XML

Word

Printable

Details

Type: Feature Request
Resolution: Unresolved
Priority: Normal
Fix Version/s: None
Affects Version/s: None
Component/s: Monitoring
Labels:
None

Blocked:
False
Blocked Reason:
None
Ready:
False
Color Status:
Not Selected
Hierarchy Progress:
0
Hierarchy Progress Bar:

0% 0%

SFDC Cases Counter:
SFDC Cases Links:

Description

Description of problem:

Ref: https://issues.redhat.com/browse/OHSS-30820

Regarding the HC prometheus pod placement policy, we recently noticed there is a topologySpreadConstraint added to them, that keeps them on Infra nodes if available. That is guaranteeing one of the pods to be on infra but leave the second replica to randomly pick a node(worker or infra) that ends up on worker node mostly(even after 3-4 attempts) which is working as expected based on the spec.

But PerfScale workload(in Prow as well as IBM Lakehouse testing) want both of them on infra pool due to prom resource consumption during high scale load

    topologySpreadConstraints:
    - labelSelector:
        matchLabels:
          app: prometheus
      maxSkew: 1
      nodeAffinityPolicy: Honor
      nodeTaintsPolicy: Honor
      topologyKey: topology.kubernetes.io/zone
      whenUnsatisfiable: ScheduleAnyway
    - labelSelector:
        matchLabels:
          app: prometheus
      maxSkew: 2
      nodeAffinityPolicy: Honor
      nodeTaintsPolicy: Honor
      topologyKey: node-role.kubernetes.io/infra
      whenUnsatisfiable: ScheduleAnyway

Version-Release number of selected component (if applicable):

How reproducible:

Always

Steps to Reproduce:

1. create a HC
2. add machinepool with required labels and taints for infra
3. migrate prometheus pods
4. wait and watch the migration to finish

Actual results:

One of the promethes-k8s pod gets migrated to infra and other one stays on the worker node, it might get in to infra node eventually but after multiple attempts as they get random allocation.

Expected results:

Need a policy to make sure prometheus pods with right toleration should get scheduled on Infra node

Additional info:

Slack thread - https://redhat-internal.slack.com/archives/C02LM9FABFW/p1705412300117649

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

Screenshot 2024-01-29 at 15.19.14.png
387 kB
2024/01/29 2:21 PM

Activity

People

Assignee:: Roger Florén

Reporter:: Murali Krishnasamy

Need Info From:: Murali Krishnasamy

QA Contact:: Jie Zhao

Votes:: 0 Vote for this issue

Watchers:: 5 Start watching this issue

Dates

Created:: 2024/01/16 2:01 PM

Updated:: 2024/02/09 11:12 AM