-
Bug
-
Resolution: Unresolved
-
Undefined
-
4.17.z, 4.16.z, 4.18.z, 4.19.z, 4.20.z
Description of problem:
kube-apiserver is crashing regularly on audit log events with:
fatal error: concurrent map iteration and map write goroutine 258531290 [running]: reflect.mapiterinit(0x456109?, 0x180?, 0x4d9cf80?) runtime/map.go:1392 +0x13 reflect.(*MapIter).Next(0x49fbe80?) reflect/value.go:1935 +0x57 encoding/json.mapEncoder.encode({0xc2674f8ad0?}, 0xc3f9138f40, {0x49fbe80?, 0xc4509c1a30?, 0xc4509c1970?}, {0xe?, 0x0?}) encoding/json/encode.go:744 +0x33e encoding/json.structEncoder.encode({{{0xc09e531900, 0x12, 0x23}, 0xc0329222a0, 0xc0329222d0}}, 0xc3f9138f40, {0x53dc700?, 0xc4509c1900?, 0x2?}, {0x0, ...}) encoding/json/encode.go:706 +0x21e encoding/json.ptrEncoder.encode({0x1?}, 0xc3f9138f40, {0x5385800?, 0xc4509c1900?, 0x5385800?}, {0x0?, 0x3e?}) encoding/json/encode.go:878 +0x20f encoding/json.(*encodeState).reflectValue(0xc2674f8d90?, {0x5385800?, 0xc4509c1900?, 0x4092a6?}, {0x60?, 0x46?}) encoding/json/encode.go:323 +0x73 encoding/json.(*encodeState).marshal(0x408e5d?, {0x5385800?, 0xc4509c1900?}, {0xc?, 0x0?}) encoding/json/encode.go:295 +0xb9 encoding/json.(*Encoder).Encode(0xc2674f8ea8, {0x5385800, 0xc4509c1900}) encoding/json/stream.go:209 +0xdf k8s.io/apimachinery/pkg/runtime/serializer/json.(*Serializer).doEncode(0x2?, {0x5e4e2a8?, 0xc4509c1900?}, {0x5e320e0, 0xc3ffc5e000}) k8s.io/apimachinery@v0.29.0/pkg/runtime/serializer/json/json.go:246 +0x175 k8s.io/apimachinery/pkg/runtime/serializer/json.(*Serializer).Encode(0xc000815bd0, {0x5e4e2a8, 0xc4509c1900}, {0x5e320e0, 0xc3ffc5e000}) k8s.io/apimachinery@v0.29.0/pkg/runtime/serializer/json/json.go:220 +0xf2 k8s.io/apimachinery/pkg/runtime/serializer/versioning.(*codec).doEncode(0xc000521720, {0x5e46bb0, 0xc3ae9b3530}, {0x5e320e0, 0xc3ffc5e000}, {0x0?, 0x0?}) k8s.io/apimachinery@v0.29.0/pkg/runtime/serializer/versioning/versioning.go:284 +0x9ea k8s.io/apimachinery/pkg/runtime/serializer/versioning.(*codec).encode(0xc000521720, {0x5e46bb0, 0xc3ae9b3530}, {0x5e320e0, 0xc3ffc5e000}, {0x0?, 0x0?}) k8s.io/apimachinery@v0.29.0/pkg/runtime/serializer/versioning/versioning.go:214 +0x15c k8s.io/apimachinery/pkg/runtime/serializer/versioning.(*codec).Encode(0x17d0c97?, {0x5e46bb0?, 0xc3ae9b3530?}, {0x5e320e0?, 0xc3ffc5e000?}) k8s.io/apimachinery@v0.29.0/pkg/runtime/serializer/versioning/versioning.go:207 +0x2d k8s.io/apimachinery/pkg/runtime.Encode({0x7ff02416ffb0, 0xc000521720}, {0x5e46bb0, 0xc3ae9b3530}) k8s.io/apimachinery@v0.29.0/pkg/runtime/codec.go:49 +0x5e k8s.io/apiserver/plugin/pkg/audit/log.(*backend).logEvent(0xc00150ce10, 0xc3ae9b3530) k8s.io/apiserver@v0.29.0/plugin/pkg/audit/log/backend.go:76 +0x65 k8s.io/apiserver/plugin/pkg/audit/log.(*backend).ProcessEvents(0x4130e5?, {0xc439f37820, 0x1, 0xc3e79b8201?}) k8s.io/apiserver@v0.29.0/plugin/pkg/audit/log/backend.go:65 +0x3c k8s.io/apiserver/pkg/server/options.(*ignoreErrorsBackend).ProcessEvents(0x5e71e58?, {0xc439f37820?, 0x4130e5?, 0x98?}) k8s.io/apiserver@v0.29.0/pkg/server/options/audit.go:385 +0x22 k8s.io/apiserver/pkg/endpoints/filters.processAuditEvent({0x5e71e58, 0xc194e1a1b0}, {0x7ff0342a8310, 0xc00148ee60}, 0xc3ae9b3530, {0xc00148ee00?, 0x1, 0x19b0f70?}) k8s.io/apiserver@v0.29.0/pkg/endpoints/filters/audit.go:198 +0x230 k8s.io/apiserver/pkg/server.DefaultBuildHandlerChain.WithAudit.func7.1() k8s.io/apiserver@v0.29.0/pkg/endpoints/filters/audit.go:113 +0x3bb k8s.io/apiserver/pkg/server.DefaultBuildHandlerChain.WithAudit.func7({0x5e5dde0, 0xc092b5dec0}, 0xb7e75d?) k8s.io/apiserver@v0.29.0/pkg/endpoints/filters/audit.go:116 +0x3a8 net/http.HandlerFunc.ServeHTTP(0x0?, {0x5e5dde0?, 0xc092b5dec0?}, 0x0?) net/http/server.go:2141 +0x29 k8s.io/apiserver/pkg/endpoints/filterlatency.trackStarted.func1({0x5e5dde0, 0xc092b5dec0}, 0xc2674ba100) k8s.io/apiserver@v0.29.0/pkg/endpoints/filterlatency/filterlatency.go:84 +0x192 net/http.HandlerFunc.ServeHTTP(0x10?, {0x5e5dde0?, 0xc092b5dec0?}, 0xc48f64f960?) net/http/server.go:2141 +0x29 k8s.io/apiserver/pkg/server.DefaultBuildHandlerChain.WithStartupEarlyAnnotation.withStartupEarlyAnnotation.func30({0x5e5dde0, 0xc092b5dec0}, 0xc2674ba100) k8s.io/apiserver@v0.29.0/pkg/server/filters/with_early_late_annotations.go:127 +0x2a7 net/http.HandlerFunc.ServeHTTP(0x5138300?, {0x5e5dde0?, 0xc092b5dec0?}, 0x5e20e88?) net/http/server.go:2141 +0x29 k8s.io/apiserver/pkg/server.DefaultBuildHandlerChain.TrackCompleted.trackCompleted.func32({0x5e5dde0?, 0xc092b5dec0}, 0xc2674ba100) k8s.io/apiserver@v0.29.0/pkg/endpoints/filterlatency/filterlatency.go:110 +0x177 net/http.HandlerFunc.ServeHTTP(0x5e71e58?, {0x5e5dde0?, 0xc092b5dec0?}, 0x5e17fa0?) net/http/server.go:2141 +0x29 k8s.io/apiserver/pkg/endpoints/filters.withAuthentication.func1({0x5e5dde0, 0xc092b5dec0}, 0xc2674ba100) k8s.io/apiserver@v0.29.0/pkg/endpoints/filters/authentication.go:120 +0x7e5 net/http.HandlerFunc.ServeHTTP(0x5e71e58?, {0x5e5dde0?, 0xc092b5dec0?}, 0x5e20e88?) net/http/server.go:2141 +0x29 k8s.io/apiserver/pkg/endpoints/filterlatency.trackStarted.func1({0x5e5dde0, 0xc092b5dec0}, 0xc266edfe00) k8s.io/apiserver@v0.29.0/pkg/endpoints/filterlatency/filterlatency.go:94 +0x37a net/http.HandlerFunc.ServeHTTP(0xc092b5def0?, {0x5e5dde0?, 0xc092b5dec0?}, 0xc339871ee8?) net/http/server.go:2141 +0x29 k8s.io/apiserver/pkg/server/filters.WithCORS.func1({0x5e5dde0, 0xc092b5dec0}, 0xc266edfe00) k8s.io/apiserver@v0.29.0/pkg/server/filters/cors.go:61 +0x630 net/http.HandlerFunc.ServeHTTP(0xc339871fd0?, {0x5e5dde0?, 0xc092b5dec0?}, 0x5e47930?) net/http/server.go:2141 +0x29 k8s.io/apiserver/pkg/server/filters.(*timeoutHandler).ServeHTTP.func1() k8s.io/apiserver@v0.29.0/pkg/server/filters/timeout.go:115 +0x62 created by k8s.io/apiserver/pkg/server/filters.(*timeoutHandler).ServeHTTP in goroutine 250263206 k8s.io/apiserver@v0.29.0/pkg/server/filters/timeout.go:101 +0x1b2
In the largest production cluster, this is causing API disruptions and further failues due to LIST/WATCH storms when the API abruptly fails.
Version-Release number of selected component (if applicable):
4.16.34
How reproducible:
Random, but happens every days days.
Steps to Reproduce:
1. 2. 3.
Actual results:
API server crashes causing the other 2 to be stormed with kubelet LIST/WATCH requests which sometimes degrades the API server or causes them crash temporarily before recovering.
Expected results:
No panics
Additional info:
This is related to: https://github.com/kubernetes/kubernetes/issues/120507
And should be fixed by: https://github.com/kubernetes/kubernetes/pull/129472
- blocks
-
OCPBUGS-61347 Kube API Server Crashes on Audit Log Event
-
- Verified
-
- is cloned by
-
OCPBUGS-61347 Kube API Server Crashes on Audit Log Event
-
- Verified
-
- links to