Description of problem:
Running an application like quay.io/rhn_support_sreber/go-faker:latest with ephemeral-storage will trigger constant pod eviction because the kubelet is failing to rotate the logs and therefore ephemeral-storage is being filled (which in turn does trigger the pod eviction) $ oc get events -n project-300 | grep go-faker-f4b5c8d56-df89c 71s Normal Scheduled pod/go-faker-f4b5c8d56-df89c Successfully assigned project-300/go-faker-f4b5c8d56-df89c to sandbox-s7mtw-worker-eastus1-455nk 71s Normal AddedInterface pod/go-faker-f4b5c8d56-df89c Add eth0 [10.131.2.26/23] from ovn-kubernetes 71s Normal Pulling pod/go-faker-f4b5c8d56-df89c Pulling image "quay.io/rhn_support_sreber/go-faker:latest" 71s Normal Pulled pod/go-faker-f4b5c8d56-df89c Successfully pulled image "quay.io/rhn_support_sreber/go-faker:latest" in 83.911998ms (83.948998ms including waiting) 71s Normal Created pod/go-faker-f4b5c8d56-df89c Created container go-faker 71s Normal Started pod/go-faker-f4b5c8d56-df89c Started container go-faker 45s Warning Evicted pod/go-faker-f4b5c8d56-df89c Pod ephemeral local storage usage exceeds the total limit of containers 1Gi. 45s Normal Killing pod/go-faker-f4b5c8d56-df89c Stopping container go-faker 72s Normal SuccessfulCreate replicaset/go-faker-f4b5c8d56 Created pod: go-faker-f4b5c8d56-df89c Setting ephemeral-storage to other values such as 5 Gi also does not prevent eviction from happening and therefore a massive amount of ephemeral-storage needs to be requested to cover that case, even though default values in kubelet for containerLogMaxFiles and containerLogMaxSize should take care about limiting amount of logs. It though seems to be a known problem as a similar issue is reported in [Kubelet does not respect container-log-max-size on time, during heavy log writes from container|https://github.com/kubernetes/kubernetes/issues/110630] but no solution is available as of now ([kubelet: enable configurable rotation duration and parallel rotate|https://github.com/kubernetes/kubernetes/pull/114301] might be an approach).
Version-Release number of selected component (if applicable):
OpenShift Container Platform 4.13.24 but seems to affect all version of OpenShift Container Platform 4
How reproducible:
Always
Steps to Reproduce:
1. install OpenShift Container Platform 4 via prefer installation method 2. Deploy quay.io/rhn_support_sreber/go-faker:latest and configure ephemeral-storage spec: progressDeadlineSeconds: 600 replicas: 0 revisionHistoryLimit: 10 selector: matchLabels: app: go-faker strategy: rollingUpdate: maxSurge: 25% maxUnavailable: 25% type: RollingUpdate template: metadata: creationTimestamp: null labels: app: go-faker deployment: go-faker spec: containers: - image: quay.io/rhn_support_sreber/go-faker:latest imagePullPolicy: Always name: go-faker resources: limits: cpu: 500m ephemeral-storage: 5Gi memory: 512Mi requests: cpu: 100m ephemeral-storage: 5Gi memory: 256Mi terminationMessagePath: /dev/termination-log terminationMessagePolicy: File dnsPolicy: ClusterFirst restartPolicy: Always schedulerName: default-scheduler securityContext: {} terminationGracePeriodSeconds: 30 3. Wait and see how the pod is evicted because it does exceed the configured ephemeral-storage.
Actual results:
45s Warning Evicted pod/go-faker-f4b5c8d56-df89c Pod ephemeral local storage usage exceeds the total limit of containers 1Gi.
Expected results:
With OpenShift Container Platform 4 defaults, logs should only use 250MiB or max 300 MiB depending how rotation is done. Therefore ephemeral-storage of 512 MiB or slightly more should not trigger pod eviction and the kubelet should make sure logs are rotated in time and according to the configuration to prevent pod eviction from happening.
Additional info: