Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-55358

api server audit logs do not contain all timing info

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Won't Do
    • Icon: Undefined Undefined
    • None
    • 4.19
    • kube-apiserver
    • None
    • Quality / Stability / Reliability
    • False
    • Hide

      None

      Show
      None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      Description of problem:

      While debugging an azure cloud slowness issue, we noticed that the audit log from kube-apiserver do not have all the timing information. 
      
      Here is an example run:
      
      https://prow.ci.openshift.org/view/gs/test-platform-results/logs/periodic-ci-openshift-release-master-ci-4.19-upgrade-from-stable-4.18-e2e-azure-ovn-upgrade/1915075239528632320
      
      One audit entry:
      
      ci-op-9wsygd82-b6ce9-c2cgt-master-0-audit-2025-04-23T18-32-07.575.log.gz:{"kind":"Event","apiVersion":"audit.k8s.io/v1","level":"Metadata","auditID":"81972b7c-9914-43a9-bdfb-3d49b582ec8d","stage":"ResponseComplete","requestURI":"/apis/coordination.k8s.io/v1/namespaces/kube-node-lease/leases/ci-op-9wsygd82-b6ce9-c2cgt-worker-centralus1-kvfqs?timeout=10s","verb":"update","user":{"username":"system:node:ci-op-9wsygd82-b6ce9-c2cgt-worker-centralus1-kvfqs","groups":["system:nodes","system:authenticated"],"extra":{"authentication.kubernetes.io/credential-id":["X509SHA256=62f209954c2ceef1c7c9f3f302d859fb5e5744233dc5c6ad2131a65b2c6e0a47"]}},"sourceIPs":["10.0.128.6"],"userAgent":"kubelet/v1.31.7 (linux/amd64) kubernetes/9b80662","objectRef":{"resource":"leases","namespace":"kube-node-lease","name":"ci-op-9wsygd82-b6ce9-c2cgt-worker-centralus1-kvfqs","uid":"950c5000-aa1a-4c12-96d4-99cb65c88234","apiGroup":"coordination.k8s.io","apiVersion":"v1","resourceVersion":"59698"},"responseStatus":{"metadata":{},"status":"Failure","message":"Timeout: request did not complete within requested timeout - context deadline exceeded","reason":"Timeout","details":{},"code":504},"requestReceivedTimestamp":"2025-04-23T18:06:38.951557Z","stageTimestamp":"2025-04-23T18:06:48.956890Z","annotations":{"apiserver.latency.k8s.io/apf-queue-wait":"51.4µs","apiserver.latency.k8s.io/response-write":"800ns","apiserver.latency.k8s.io/serialize-response-object":"4.245971ms","apiserver.latency.k8s.io/total":"10.005333436s","authorization.k8s.io/decision":"allow","authorization.k8s.io/reason":""}}
      
      Here is the annotations:
      
      "annotations":{"apiserver.latency.k8s.io/apf-queue-wait":"51.4µs","apiserver.latency.k8s.io/response-write":"800ns","apiserver.latency.k8s.io/serialize-response-object":"4.245971ms","apiserver.latency.k8s.io/total":"10.005333436s","authorization.k8s.io/decision":"allow","authorization.k8s.io/reason":""}
      
      Notice that the total in this case is 10.005333436s. But if you look for the segment that really take long, it is not recorded. We suspect the missing part is the etcd delay. But it will be nice to have that listed in the audit log. 

      Version-Release number of selected component (if applicable):

          

      How reproducible:

          

      Steps to Reproduce:

          1.
          2.
          3.
          

      Actual results:

          

      Expected results:

          

      Additional info:

          

              Unassigned Unassigned
              kenzhang@redhat.com Ken Zhang
              None
              None
              Ke Wang Ke Wang
              None
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

                Created:
                Updated:
                Resolved: