Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-24662

kubelet is failing to rotate pod logs fast enough, causing pods being evicted when ephemeral-storage is configured

XMLWordPrintable

    • Moderate
    • No
    • False
    • Hide

      None

      Show
      None

      Description of problem:

      Running an application like quay.io/rhn_support_sreber/go-faker:latest with ephemeral-storage will trigger constant pod eviction because the kubelet is failing to rotate the logs and therefore ephemeral-storage is being filled (which in turn does trigger the pod eviction)
      
      $ oc get events -n project-300 | grep go-faker-f4b5c8d56-df89c 
      71s         Normal    Scheduled           pod/go-faker-f4b5c8d56-df89c     Successfully assigned project-300/go-faker-f4b5c8d56-df89c to sandbox-s7mtw-worker-eastus1-455nk
      71s         Normal    AddedInterface      pod/go-faker-f4b5c8d56-df89c     Add eth0 [10.131.2.26/23] from ovn-kubernetes
      71s         Normal    Pulling             pod/go-faker-f4b5c8d56-df89c     Pulling image "quay.io/rhn_support_sreber/go-faker:latest"
      71s         Normal    Pulled              pod/go-faker-f4b5c8d56-df89c     Successfully pulled image "quay.io/rhn_support_sreber/go-faker:latest" in 83.911998ms (83.948998ms including waiting)
      71s         Normal    Created             pod/go-faker-f4b5c8d56-df89c     Created container go-faker
      71s         Normal    Started             pod/go-faker-f4b5c8d56-df89c     Started container go-faker
      45s         Warning   Evicted             pod/go-faker-f4b5c8d56-df89c     Pod ephemeral local storage usage exceeds the total limit of containers 1Gi.
      45s         Normal    Killing             pod/go-faker-f4b5c8d56-df89c     Stopping container go-faker
      72s         Normal    SuccessfulCreate    replicaset/go-faker-f4b5c8d56    Created pod: go-faker-f4b5c8d56-df89c
      
      Setting ephemeral-storage to other values such as 5 Gi also does not prevent eviction from happening and therefore a massive amount of ephemeral-storage needs to be requested to cover that case, even though default values in kubelet for containerLogMaxFiles and containerLogMaxSize should take care about limiting amount of logs.
      
      It though seems to be a known problem as a similar issue is reported in [Kubelet does not respect container-log-max-size on time, during heavy log writes from container|https://github.com/kubernetes/kubernetes/issues/110630] but no solution is available as of now ([kubelet: enable configurable rotation duration and parallel rotate|https://github.com/kubernetes/kubernetes/pull/114301] might be an approach).

      Version-Release number of selected component (if applicable):

      OpenShift Container Platform 4.13.24 but seems to affect all version of OpenShift Container Platform 4

      How reproducible:

      Always  

      Steps to Reproduce:

      1. install OpenShift Container Platform 4 via prefer installation method
      2. Deploy quay.io/rhn_support_sreber/go-faker:latest and configure ephemeral-storage
      
      spec:
        progressDeadlineSeconds: 600
        replicas: 0
        revisionHistoryLimit: 10
        selector:
          matchLabels:
            app: go-faker
        strategy:
          rollingUpdate:
            maxSurge: 25%
            maxUnavailable: 25%
          type: RollingUpdate
        template:
          metadata:
            creationTimestamp: null
            labels:
              app: go-faker
              deployment: go-faker
          spec:
            containers:
            - image: quay.io/rhn_support_sreber/go-faker:latest
              imagePullPolicy: Always
              name: go-faker
              resources:
                limits:
                  cpu: 500m
                  ephemeral-storage: 5Gi
                  memory: 512Mi
                requests:
                  cpu: 100m
                  ephemeral-storage: 5Gi
                  memory: 256Mi
              terminationMessagePath: /dev/termination-log
              terminationMessagePolicy: File
            dnsPolicy: ClusterFirst
            restartPolicy: Always
            schedulerName: default-scheduler
            securityContext: {}
            terminationGracePeriodSeconds: 30
      
      
      3. Wait and see how the pod is evicted because it does exceed the configured ephemeral-storage.

      Actual results:

      45s         Warning   Evicted             pod/go-faker-f4b5c8d56-df89c     Pod ephemeral local storage usage exceeds the total limit of containers 1Gi.

      Expected results:

      With OpenShift Container Platform 4 defaults, logs should only use 250MiB or max 300 MiB depending how rotation is done. Therefore ephemeral-storage of 512 MiB or slightly more should not trigger pod eviction and the kubelet should make sure logs are rotated in time and according to the configuration to prevent pod eviction from happening.

      Additional info:

          

            rh-ee-kehannon Kevin Hannon
            rhn-support-sreber Simon Reber
            Sunil Choudhary Sunil Choudhary
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated:
              Resolved: