-
Bug
-
Resolution: Done-Errata
-
Major
-
None
-
4.16.0
This is a clone of issue OCPBUGS-35036. The following is the description of the original issue:
—
Description of problem:
The following logs are from namespaces/openshift-apiserver/pods/apiserver-6fcd57c747-57rkr/openshift-apiserver/openshift-apiserver/logs/current.log
2024-06-06T15:57:06.628216833Z E0606 15:57:06.628186 1 finisher.go:175] FinishRequest: post-timeout activity - time-elapsed: 139.823053ms, panicked: true, err: <nil>, panic-reason: runtime error: invalid memory address or nil pointer dereference 2024-06-06T15:57:06.628216833Z goroutine 192790 [running]: 2024-06-06T15:57:06.628216833Z k8s.io/apiserver/pkg/endpoints/handlers/finisher.finishRequest.func1.1() 2024-06-06T15:57:06.628216833Z k8s.io/apiserver@v0.29.2/pkg/endpoints/handlers/finisher/finisher.go:105 +0xa5 2024-06-06T15:57:06.628216833Z panic({0x498ac60?, 0x74a51c0?}) 2024-06-06T15:57:06.628216833Z runtime/panic.go:914 +0x21f 2024-06-06T15:57:06.628216833Z github.com/openshift/openshift-apiserver/pkg/image/apiserver/importer.(*ImageStreamImporter).importImages(0xc0c5bf0fc0, {0x5626bb0, 0xc0a50c7dd0}, 0xc07055f4a0, 0xc0a2487600) 2024-06-06T15:57:06.628216833Z github.com/openshift/openshift-apiserver/pkg/image/apiserver/importer/importer.go:263 +0x1cf5 2024-06-06T15:57:06.628216833Z github.com/openshift/openshift-apiserver/pkg/image/apiserver/importer.(*ImageStreamImporter).Import(0xc0c5bf0fc0, {0x5626bb0, 0xc0a50c7dd0}, 0x0?, 0x0?) 2024-06-06T15:57:06.628216833Z github.com/openshift/openshift-apiserver/pkg/image/apiserver/importer/importer.go:110 +0x139 2024-06-06T15:57:06.628216833Z github.com/openshift/openshift-apiserver/pkg/image/apiserver/registry/imagestreamimport.(*REST).Create(0xc0033b2240, {0x5626bb0, 0xc0a50c7dd0}, {0x5600058?, 0xc07055f4a0?}, 0xc08e0b9ec0, 0x56422e8?) 2024-06-06T15:57:06.628216833Z github.com/openshift/openshift-apiserver/pkg/image/apiserver/registry/imagestreamimport/rest.go:337 +0x1574 2024-06-06T15:57:06.628216833Z k8s.io/apiserver/pkg/endpoints/handlers.(*namedCreaterAdapter).Create(0x55f50e0?, {0x5626bb0?, 0xc0a50c7dd0?}, {0xc0b5704000?, 0x562a1a0?}, {0x5600058?, 0xc07055f4a0?}, 0x1?, 0x2331749?) 2024-06-06T15:57:06.628216833Z k8s.io/apiserver@v0.29.2/pkg/endpoints/handlers/create.go:254 +0x3b 2024-06-06T15:57:06.628216833Z k8s.io/apiserver/pkg/endpoints/handlers.CreateResource.createHandler.func1.1() 2024-06-06T15:57:06.628216833Z k8s.io/apiserver@v0.29.2/pkg/endpoints/handlers/create.go:184 +0xc6 2024-06-06T15:57:06.628216833Z k8s.io/apiserver/pkg/endpoints/handlers.CreateResource.createHandler.func1.2() 2024-06-06T15:57:06.628216833Z k8s.io/apiserver@v0.29.2/pkg/endpoints/handlers/create.go:209 +0x39e 2024-06-06T15:57:06.628216833Z k8s.io/apiserver/pkg/endpoints/handlers/finisher.finishRequest.func1() 2024-06-06T15:57:06.628216833Z k8s.io/apiserver@v0.29.2/pkg/endpoints/handlers/finisher/finisher.go:117 +0x84
Version-Release number of selected component (if applicable):
We applied into all clusters in CI and checked 3 of them and all 3 share the same errors.
oc --context build09 get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.16.0-rc.3 True False 3d9h Error while reconciling 4.16.0-rc.3: the cluster operator machine-config is degraded oc --context build02 get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.16.0-rc.2 True False 15d Error while reconciling 4.16.0-rc.2: the cluster operator machine-config is degraded oc --context build03 get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.15.16 True False 34h Error while reconciling 4.15.16: the cluster operator machine-config is degraded
How reproducible:
We applied this PR https://github.com/openshift/release/pull/52574/files to the clusters.
It breaks at least 3 of them.
"qci-pull-through-cache-us-east-1-ci.apps.ci.l2s4.p1.openshiftapps.com" is a registry cache server https://github.com/openshift/release/blob/master/clusters/app.ci/quayio-pull-through-cache/qci-pull-through-cache-us-east-1.yaml
Additional info:
There are lots of image imports in OpenShift CI jobs.
It feels like the registry cache server returns unexpected results to the openshift-apiserver:
2024-06-06T18:13:13.781520581Z E0606 18:13:13.781459 1 strategy.go:60] unable to parse manifest for "sha256:c5bcd0298deee99caaf3ec88de246f3af84f80225202df46527b6f2b4d0eb3c3": unexpected end of JSON input
Our theory is that the requests of imports from all CI clusters crashed the cache server and it sent some unexpected data which caused apiserver to panic.
The expected behaviour is that if the image cannot be pulled from the first mirror in the ImageDigestMirrorSet, then it will be failed over to the next one.
- blocks
-
OCPBUGS-42724 openshift-apiserver panicked with runtime error
- Closed
- clones
-
OCPBUGS-35036 openshift-apiserver panicked with runtime error
- Verified
- is blocked by
-
OCPBUGS-35036 openshift-apiserver panicked with runtime error
- Verified
- is cloned by
-
OCPBUGS-42724 openshift-apiserver panicked with runtime error
- Closed
- links to
-
RHBA-2024:7922 OpenShift Container Platform 4.17.z bug fix update