-
Bug
-
Resolution: Won't Do
-
Undefined
-
None
-
4.19
-
None
-
Quality / Stability / Reliability
-
False
-
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
Description of problem:
While debugging an azure cloud slowness issue, we noticed that the audit log from kube-apiserver do not have all the timing information. Here is an example run: https://prow.ci.openshift.org/view/gs/test-platform-results/logs/periodic-ci-openshift-release-master-ci-4.19-upgrade-from-stable-4.18-e2e-azure-ovn-upgrade/1915075239528632320 One audit entry: ci-op-9wsygd82-b6ce9-c2cgt-master-0-audit-2025-04-23T18-32-07.575.log.gz:{"kind":"Event","apiVersion":"audit.k8s.io/v1","level":"Metadata","auditID":"81972b7c-9914-43a9-bdfb-3d49b582ec8d","stage":"ResponseComplete","requestURI":"/apis/coordination.k8s.io/v1/namespaces/kube-node-lease/leases/ci-op-9wsygd82-b6ce9-c2cgt-worker-centralus1-kvfqs?timeout=10s","verb":"update","user":{"username":"system:node:ci-op-9wsygd82-b6ce9-c2cgt-worker-centralus1-kvfqs","groups":["system:nodes","system:authenticated"],"extra":{"authentication.kubernetes.io/credential-id":["X509SHA256=62f209954c2ceef1c7c9f3f302d859fb5e5744233dc5c6ad2131a65b2c6e0a47"]}},"sourceIPs":["10.0.128.6"],"userAgent":"kubelet/v1.31.7 (linux/amd64) kubernetes/9b80662","objectRef":{"resource":"leases","namespace":"kube-node-lease","name":"ci-op-9wsygd82-b6ce9-c2cgt-worker-centralus1-kvfqs","uid":"950c5000-aa1a-4c12-96d4-99cb65c88234","apiGroup":"coordination.k8s.io","apiVersion":"v1","resourceVersion":"59698"},"responseStatus":{"metadata":{},"status":"Failure","message":"Timeout: request did not complete within requested timeout - context deadline exceeded","reason":"Timeout","details":{},"code":504},"requestReceivedTimestamp":"2025-04-23T18:06:38.951557Z","stageTimestamp":"2025-04-23T18:06:48.956890Z","annotations":{"apiserver.latency.k8s.io/apf-queue-wait":"51.4µs","apiserver.latency.k8s.io/response-write":"800ns","apiserver.latency.k8s.io/serialize-response-object":"4.245971ms","apiserver.latency.k8s.io/total":"10.005333436s","authorization.k8s.io/decision":"allow","authorization.k8s.io/reason":""}} Here is the annotations: "annotations":{"apiserver.latency.k8s.io/apf-queue-wait":"51.4µs","apiserver.latency.k8s.io/response-write":"800ns","apiserver.latency.k8s.io/serialize-response-object":"4.245971ms","apiserver.latency.k8s.io/total":"10.005333436s","authorization.k8s.io/decision":"allow","authorization.k8s.io/reason":""} Notice that the total in this case is 10.005333436s. But if you look for the segment that really take long, it is not recorded. We suspect the missing part is the etcd delay. But it will be nice to have that listed in the audit log.
Version-Release number of selected component (if applicable):
How reproducible:
Steps to Reproduce:
1. 2. 3.
Actual results:
Expected results:
Additional info: