Description of problem:
Prerequisites (if any, like setup, operators/versions):
Steps to Reproduce
- Deploy the app (using the Helm Chart here because it creates an audit log PVC out of the box since RHIDP-3240 (https://github.com/redhat-developer/rhdh-chart/pull/36)), but I guess we'll get the same issue regardless of where the audit log files are stored
$ git clone https://github.com/redhat-developer/rhdh-chart.git && cd rhdh-chart $ helm upgrade --install my-backstage-with-audit-log charts/backstage \ --set upstream.backstage.appConfig.auditLog.rotateFile.frequency=5m \ --set global.clusterRouterBase=$(oc get ingress.config.openshift.io/cluster '-o=jsonpath={.spec.domain}')
- Wait until the application starts
- Check the audit log PVC volume is mounted in the app container, e.g.:
$ kubectl exec -it deploy/my-backstage-with-audit-log -c backstage-backend -- df -hT /var/log/redhat-developer-hub/audit Filesystem Type Size Used Avail Use% Mounted on /dev/nvme7n1 ext4 974M 32K 958M 1% /var/log/redhat-developer-hub/audit
- Interact with the application and make sure the audit log volume is being written to, e.g.:
$ kubectl exec -it deploy/my-backstage-with-audit-log -c backstage-backend -- ls -lh /var/log/redhat-developer-hub/audit/
total 108K
drwxrws---. 2 root 1000780000 16K Oct 22 09:02 lost+found
-rw-rw-r--. 1 1000780000 1000780000 87K Oct 22 09:31 redhat-developer-hub-audit-2024-10-22.log
- Also make sure audit logs are being displayed in the pod logs
$ kubectl logs deploy/my-backstage-with-audit-log | grep isAuditLog [...] {"actor":{"actorId":"user:default/rm3l","hostname":"my-backstage-with-audit-log-my-ns.apps.rosa.wg27p-yympk-d2y.1khx.p3.openshiftapps.com","ip":"::ffff:10.128.0.15","userAgent":"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/129.0.0.0 Safari/537.36"},"eventName":"BulkImportFindAllImports","isAuditLog":true,"level":"info","message":"'GET /imports' endpoint hit by user:default/rm3l","meta":{},"plugin":"bulk-import","request":{"body":{},"method":"GET","params":{},"query":{"pagePerIntegration":"1","search":"","sizePerIntegration":"5"},"url":"/api/bulk-import/imports?pagePerIntegration=1&sizePerIntegration=5&search="},"response":{"status":304},"service":"backstage","stage":"completion","status":"succeeded","timestamp":"2024-10-22 09:35:47"} [...]
- Now manually fill the audit log volume with fake data (just to make sure the disk is full). This is done so to simulate a disk being full before the audit log file is rotated with the default settings (midnight local system time), as defined in https://github.com/redhat-developer/rhdh-chart/pull/45/files#diff-8060d9a38501197eddb026fced7fe56f104f4f5143210c6d47781b831a7097f4R50-R52
$ kubectl exec -it deploy/my-backstage-with-audit-log -c backstage-backend -- fallocate -l 1G /var/log/redhat-developer-hub/audit/my-big-file
fallocate: fallocate failed: No space left on device
command terminated with exit code 1
- Verify that the disk is full
$ kubectl exec -it deploy/my-backstage-with-audit-log -c backstage-backend -- df -hT /var/log/redhat-developer-hub/audit Filesystem Type Size Used Avail Use% Mounted on /dev/nvme7n1 ext4 974M 958M 0 100% /var/log/redhat-developer-hub/audit
- Interact with the application and make sure the audit logs are still being displayed in the pod logs
$ kubectl logs deploy/my-backstage-with-audit-log | grep isAuditLog [...] {"actor":{"actorId":"user:default/rm3l","hostname":"my-backstage-with-audit-log-my-ns.apps.rosa.wg27p-yympk-d2y.1khx.p3.openshiftapps.com","ip":"::ffff:10.128.0.15","userAgent":"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/129.0.0.0 Safari/537.36"},"eventName":"BulkImportFindImportStatusByRepo","isAuditLog":true,"level":"info","message":"'GET /import/by-repo' endpoint hit by user:default/rm3l","meta":{},"plugin":"bulk-import","request":{"body":{},"method":"GET","params":{},"query":{"defaultBranch":"main","repo":"https://github.com/lemra-org-ent-2/test-starter"},"url":"/api/bulk-import/import/by-repo?repo=https://github.com/lemra-org-ent-2/test-starter&defaultBranch=main"},"response":{"status":304},"service":"backstage","stage":"completion","status":"succeeded","timestamp":"2024-10-22 09:51:10"} $ kubectl exec -it deploy/my-backstage-with-audit-log -c backstage-backend -- date Tue Oct 22 09:51:11 UTC 2024
- If we look at the audit log file in the volume, we can see that it is no longer being written to (last audit log message was written at 2024-10-22 09:45:57), and there is no new audit log file (actually this would not even be possible because the audit log volume is full due to an unrelated big file)
$ kubectl exec -it deploy/my-backstage-with-audit-log -c backstage-backend -- ls -lh /var/log/redhat-developer-hub/audit/ drwxrws---. 2 root 1000780000 16K Oct 22 09:02 lost+found -rw-r--r--. 1 1000780000 1000780000 958M Oct 22 09:45 my-big-file -rw-rw-r--. 1 1000780000 1000780000 172K Oct 22 09:45 redhat-developer-hub-audit-2024-10-22.log $ kubectl exec -it deploy/my-backstage-with-audit-log -c backstage-backend -- tail /var/log/redhat-developer-hub/audit/redhat-developer-hub-audit-2024-10-22.log [...] {"actor":{"actorId":"plugin:bulk-import","hostname":"localhost","ip":"::1","userAgent":"node-fetch/1.0 (+https://github.com/bitinn/node-fetch)"},"eventName":"CatalogLocationFetch","isAuditLog":true,"level":"info","message":"Fetch attempt of locations by plugin:bulk-import succeeded","meta":{},"plugin":"catalog","request":{"body":{},"method":"GET","params":{},"query":{},"url":"/api/catalog/locations"},"response":{"status":200},"service":"backstage","stage":"completion","status":"succeeded","timestamp":"2024-10-22 09:45:57"} {"actor":{"actorId":"user:default/rm3l","hostname":"my-backstage-with-audit-log-my-ns.apps.rosa.wg27p-yympk-d2y.1khx.p3.openshiftapps.com","ip":"::ffff:
Actual results:
- New audit log messages no longer written to the file
- It looks like rotation is never triggered. So the file is never rotated/overwritten even with a frequency set to 5 minutes as above. At least, it was not clear how to check if rotation did happen or not.
- Worse, I observed that audit logs cease to be written to the file even after the disk is cleaned up and no longer full (after being full once)
- No warning message in the pod logs that the audit log volume is full
Expected results:
- Maybe try to rotate/overwrite the audit log file if possible
- Audit logs should continue to be written to the file after the disk is cleaned up and now has capacity
- I would expect at least the application to handle audit log file writing errors and surface the errors in the main container logs?
Reproducibility (Always/Intermittent/Only Once):
Always, when the audit log volume is full before the audit log file is rotated (which might happen if the application is being used extensively by users).
Also, right now, this can be impactful since I think there is currently no way to override the audit log PVC size (and storage class), at least using the Helm Chart.
Build Details:
Additional info (Such as Logs, Screenshots, etc):
Slack thread: https://redhat-internal.slack.com/archives/C06SGFD65NC/p1729160334651819?thread_ts=1729095907.039489&cid=C06SGFD65NC
- is triggered by
-
RHIDP-4852 Remove Audit Log PVC from Developer Hub
- In Progress