Loading...

XML

Word

Printable

Type: Bug
Resolution: Done-Errata
Priority: Normal
Fix Version/s: None
Affects Version/s: 4.11.z
Component/s: Cloud Compute / Cluster Autoscaler
Labels:
None

Regression:
No
Sprint:
CLOUD Sprint 253, CLOUD Sprint 254, CLOUD Sprint 255
sprint_count:
3
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Release Note Text:
N/A
Release Note Type:
Release Note Not Required
Release Note Status:
Done
Target Version:

4.17.0

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

PX Impact Score:
PX Priority Data:

Description of problem:

- Pods managed by DaemonSets are being evicted.
- This is causing that some pods of OCP components, such as for example csi drivers (and might be more) are beeing evicted before the application pods, causing those application pods going into an Error status (because CSI pod cannot do the tear down of the volumes).
- As applicaiton pods remain in error status, drain operation also fails after the maxPodGracePeriod

Version-Release number of selected component (if applicable):

- 4.11

How reproducible:

- Wait for a new scale-down event

Steps to Reproduce:

1. Wait for a new scale-down event
2.Monitor csi pods (or dns, or ingress...), you will notice that they are evicted, and as it come from DaemonSets, they become scheduled again as new pods.
3. More evidences could be found from kube-api audit logs.

Actual results:

- From audit logs we can see that pods are evicted by the clusterautoscaler

  "kind": "Event",
  "apiVersion": "audit.k8s.io/v1",
  "level": "Metadata",
  "auditID": "ec999193-2c94-4710-a8c7-ff9460e30f70",
  "stage": "ResponseComplete",
  "requestURI": "/api/v1/namespaces/openshift-cluster-csi-drivers/pods/aws-efs-csi-driver-node-2l2xn/eviction",
  "verb": "create",
  "user": {
    "username": "system:serviceaccount:openshift-machine-api:cluster-autoscaler",
    "uid": "44aa427b-58a4-438a-b56e-197b88aeb85d",
    "groups": [
      "system:serviceaccounts",
      "system:serviceaccounts:openshift-machine-api",
      "system:authenticated"
    ],
    "extra": {
      "authentication.kubernetes.io/pod-name": [
        "cluster-autoscaler-default-5d4c54c54f-dx59s"
      ],
      "authentication.kubernetes.io/pod-uid": [
        "d57837b1-3941-48da-afeb-179141d7f265"
      ]
    }
  },
  "sourceIPs": [
    "10.0.210.157"
  ],
  "userAgent": "cluster-autoscaler/v0.0.0 (linux/amd64) kubernetes/$Format",
  "objectRef": {
    "resource": "pods",
    "namespace": "openshift-cluster-csi-drivers",
    "name": "aws-efs-csi-driver-node-2l2xn",
    "apiVersion": "v1",
    "subresource": "eviction"
  },
  "responseStatus": {
    "metadata": {},
    "status": "Success",
    "code": 201

## Even if they come from a daemonset
$ oc get ds -n openshift-cluster-csi-drivers
NAME                      DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR            AGE
aws-ebs-csi-driver-node   8         8         8       8            8           kubernetes.io/os=linux   146m
aws-efs-csi-driver-node   8         8         8       8            8           kubernetes.io/os=linux   127m

Expected results:

DaemonSet Pods should not be evicted

Additional info:

links to

openshift/cluster-monitoring-operator#2346: OCPBUGS-23000: node-exporter: Prevent cluster-autoscaler from evicting

openshift/cluster-network-operator#2369: OCPBUGS-23000: Adds cluster-autoscaler annotation to prevent eviction

RHEA-2024:3718 OpenShift Container Platform 4.17.z bug fix update

Assignee:: Theo Barber-Bany

Reporter:: Luis Perez Besa

QA Contact:: Zhaohua Sun

Votes:: 0 Vote for this issue

Watchers:: 9 Start watching this issue

Created:: 2023/11/07 11:08 AM

Updated:: 2024/10/01 5:33 PM

Resolved:: 2024/10/01 5:33 PM

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates