[OCPBUGS-33742] External-dns pod by hypershift operator is running with lower Priority

Type: Bug
Resolution: Done-Errata
Priority: Critical
Fix Version/s: None
Affects Version/s: 4.14.0
Component/s: HyperShift
Labels:
- perfscale-rosahcp

Severity:
Critical
Regression:
No
Sprint:
Hypershift Sprint 254
sprint_count:
1
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Target Version:

4.17.0

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

Description of problem:

Hypershift Operator pods are running with higher PriorityClass but external-dns is set to default class with lower preemption priority, this has made the pod to preempt during migration. 
Observed while performance testing dynamic serving spec migration on MC.

# oc get pods -n hypershift 
NAME                           READY   STATUS    RESTARTS   AGE
external-dns-7f95b5cdc-9hnjs   0/1     Pending   0          23m
operator-956bdb486-djjvb       1/1     Running   0          116m
operator-956bdb486-ppgzt       1/1     Running   0          115m

external-dns pod.spec
  preemptionPolicy: PreemptLowerPriority
  priority: 0
  priorityClassName: default

operator pods.spec
  preemptionPolicy: PreemptLowerPriority
  priority: 100003000
  priorityClassName: hypershift-operator

Version-Release number of selected component (if applicable):

On Management Cluster 4.14.7

How reproducible:

Always

Steps to Reproduce:

    1. Setup a MC with request serving and autoscaling machinesets
    2. Load up the MC to its max capacity
    3. Watch external-dns pod gets preempted when resources needed by other pods

Actual results:

External-dns pod goes to pending state until new node comes up

Expected results:

Since this is also a critical pod like hypershift operator, as it would affect HC dns configuration, this one needs to be a higher priority pod as well.

Additional info:

stage: perf3 sector

links to

openshift/hypershift#4050: OCPBUGS-33742: setting higher priority class for external-dns pods

RHEA-2024:3718 OpenShift Container Platform 4.17.z bug fix update

Errata Tool added a comment - 2024/10/01 5:31 PM

Since the problem described in this issue should be resolved in a recent advisory, it has been closed.

For information on the advisory (Moderate: OpenShift Container Platform 4.17.0 bug fix and security update), and where to find the updated files, follow the link below.

If the solution does not work for you, open a new bug report.
https://access.redhat.com/errata/RHSA-2024:3718

Errata Tool added a comment - 2024/10/01 5:31 PM Since the problem described in this issue should be resolved in a recent advisory, it has been closed. For information on the advisory (Moderate: OpenShift Container Platform 4.17.0 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2024:3718

Murali Krishnasamy added a comment - 2024/05/23 2:20 PM

Verified on using the latest operator sha 'ab64627e765f64503f03b457d7aae00a15b78ec3' in perf3 sector.

# oc get pods -n hypershift external-dns-5f8976df99-m2qdj -o yaml | grep -i priority
  preemptionPolicy: PreemptLowerPriority
  priority: 100003000
  priorityClassName: hypershift-operator

Murali Krishnasamy added a comment - 2024/05/23 2:20 PM Verified on using the latest operator sha 'ab64627e765f64503f03b457d7aae00a15b78ec3' in perf3 sector. # oc get pods -n hypershift external-dns-5f8976df99-m2qdj -o yaml | grep -i priority preemptionPolicy: PreemptLowerPriority priority: 100003000 priorityClassName: hypershift- operator

OpenShift Jira Bot added a comment - 2024/05/22 11:47 AM

Hi mukrishn@redhat.com,

Bugs should not be moved to Verified without first providing a Release Note Type("Bug Fix" or "No Doc Update") and for type "Bug Fix" the Release Note Text must also be provided. Please populate the necessary fields before moving the Bug to Verified.

OpenShift Jira Bot added a comment - 2024/05/22 11:47 AM Hi mukrishn@redhat.com , Bugs should not be moved to Verified without first providing a Release Note Type("Bug Fix" or "No Doc Update") and for type "Bug Fix" the Release Note Text must also be provided. Please populate the necessary fields before moving the Bug to Verified.

Murali Krishnasamy added a comment - 2024/05/21 12:34 PM

Pods were not preempted from the node, they continued run on default-worker pool since we did not disable autoscaling on them.

Used config - https://gitlab.cee.redhat.com/service/app-interface/-/merge_requests/106374

Murali Krishnasamy added a comment - 2024/05/21 12:34 PM Pods were not preempted from the node, they continued run on default-worker pool since we did not disable autoscaling on them. Used config - https://gitlab.cee.redhat.com/service/app-interface/-/merge_requests/106374

Assignee:: Murali Krishnasamy

Reporter:: Murali Krishnasamy

QA Contact:: Jie Zhao

Votes:: 0 Vote for this issue

Watchers:: 5 Start watching this issue

Created:: 2024/05/15 11:17 PM

Updated:: 2024/10/01 5:31 PM

Resolved:: 2024/10/01 5:31 PM

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

Collapse comment: Errata Tool added a comment - 2024/10/01 5:31 PM

Expand comment: Errata Tool added a comment - 2024/10/01 5:31 PM

Collapse comment: Murali Krishnasamy added a comment - 2024/05/23 2:20 PM

Expand comment: Murali Krishnasamy added a comment - 2024/05/23 2:20 PM

Collapse comment: OpenShift Jira Bot added a comment - 2024/05/22 11:47 AM

Expand comment: OpenShift Jira Bot added a comment - 2024/05/22 11:47 AM

Collapse comment: Murali Krishnasamy added a comment - 2024/05/21 12:34 PM

Expand comment: Murali Krishnasamy added a comment - 2024/05/21 12:34 PM

People

Dates