Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-23000

ClusterAutoscaler is evicting DaemonSet Pods

XMLWordPrintable

    • No
    • CLOUD Sprint 253, CLOUD Sprint 254, CLOUD Sprint 255
    • 3
    • False
    • Hide

      None

      Show
      None

      Description of problem:

      - Pods managed by DaemonSets are being evicted.
      - This is causing that some pods of OCP components, such as for example csi drivers (and might be more) are beeing evicted before the application pods, causing those application pods going into an Error status (because CSI pod cannot do the tear down of the volumes).
      - As applicaiton pods remain in error status, drain operation also fails after the maxPodGracePeriod

      Version-Release number of selected component (if applicable):

      - 4.11

      How reproducible:

      - Wait for a new scale-down event

      Steps to Reproduce:

      1. Wait for a new scale-down event
      2.Monitor csi pods (or dns, or ingress...), you will notice that they are evicted, and as it come from DaemonSets, they become scheduled again as new pods.
      3. More evidences could be found from kube-api audit logs.

      Actual results:

      - From audit logs we can see that pods are evicted by the clusterautoscaler
      
        "kind": "Event",
        "apiVersion": "audit.k8s.io/v1",
        "level": "Metadata",
        "auditID": "ec999193-2c94-4710-a8c7-ff9460e30f70",
        "stage": "ResponseComplete",
        "requestURI": "/api/v1/namespaces/openshift-cluster-csi-drivers/pods/aws-efs-csi-driver-node-2l2xn/eviction",
        "verb": "create",
        "user": {
          "username": "system:serviceaccount:openshift-machine-api:cluster-autoscaler",
          "uid": "44aa427b-58a4-438a-b56e-197b88aeb85d",
          "groups": [
            "system:serviceaccounts",
            "system:serviceaccounts:openshift-machine-api",
            "system:authenticated"
          ],
          "extra": {
            "authentication.kubernetes.io/pod-name": [
              "cluster-autoscaler-default-5d4c54c54f-dx59s"
            ],
            "authentication.kubernetes.io/pod-uid": [
              "d57837b1-3941-48da-afeb-179141d7f265"
            ]
          }
        },
        "sourceIPs": [
          "10.0.210.157"
        ],
        "userAgent": "cluster-autoscaler/v0.0.0 (linux/amd64) kubernetes/$Format",
        "objectRef": {
          "resource": "pods",
          "namespace": "openshift-cluster-csi-drivers",
          "name": "aws-efs-csi-driver-node-2l2xn",
          "apiVersion": "v1",
          "subresource": "eviction"
        },
        "responseStatus": {
          "metadata": {},
          "status": "Success",
          "code": 201
      
      ## Even if they come from a daemonset
      $ oc get ds -n openshift-cluster-csi-drivers
      NAME                      DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR            AGE
      aws-ebs-csi-driver-node   8         8         8       8            8           kubernetes.io/os=linux   146m
      aws-efs-csi-driver-node   8         8         8       8            8           kubernetes.io/os=linux   127m
      

      Expected results:

      DaemonSet Pods should not be evicted

      Additional info:

       

            rh-ee-tbarberb Theo Barber-Bany
            rhn-support-lperezbe Luis Perez Besa
            Zhaohua Sun Zhaohua Sun
            Votes:
            0 Vote for this issue
            Watchers:
            9 Start watching this issue

              Created:
              Updated: