-
Bug
-
Resolution: Done
-
Critical
-
4.10
-
Important
-
None
-
OCPNODE Sprint 233 (Blue), OCPNODE Sprint 234 (Blue), OCPNODE Sprint 235 (Blue), OCPNODE Sprint 236 (Blue), OCPNODE Sprint 237 (Green)
-
5
-
Rejected
-
Unspecified
-
If docs needed, set a value
-
Customer Escalated
-
-
-
Description of problem:
The OLM registry-server container fails to reach the "Ready" state.
- oc get pod -n openshift-marketplace
NAME READY STATUS RESTARTS AGE
marketplace-operator-7749b7db8d-br5p8 1/1 Running 4 (27h ago) 28h
rh-du-operators-ml5pl 0/1 Running 0 27h
Conditions on the pod show:
status:
conditions:
- lastProbeTime: null
lastTransitionTime: "2022-05-05T15:56:21Z"
status: "True"
type: Initialized - lastProbeTime: null
lastTransitionTime: "2022-05-05T15:56:21Z"
message: 'containers with unready status: [registry-server]'
reason: ContainersNotReady
status: "False"
type: Ready - lastProbeTime: null
lastTransitionTime: "2022-05-05T15:56:21Z"
message: 'containers with unready status: [registry-server]'
reason: ContainersNotReady
status: "False"
type: ContainersReady - lastProbeTime: null
lastTransitionTime: "2022-05-05T15:56:21Z"
status: "True"
type: PodScheduled
containerStatuses: - containerID: cri-o://bff4c347d3fc6a20064926fdfd1ea3c76e039c56205c6b282d3b6c8e2f13233c
image: e24-h01-000-r640.rdu2.scalelab.redhat.com:5000/olm-mirror/redhat-operator-index:v4.9
imageID: e24-h01-000-r640.rdu2.scalelab.redhat.com:5000/olm-mirror/redhat-operator-index@sha256:86efa7af19dfaa7afe0f3469250ad6101c4eed44c7366e3628e7e865834dc43e
lastState: {}
name: registry-server
ready: false
restartCount: 0
started: true
state:
running:
startedAt: "2022-05-05T15:56:36Z"
Journal logs on the node show failures to put the readiness and liveness probe PIDs into cgroup.proc for the container:
May 06 19:06:41 sno00251 bash[26314]: E0506 19:06:41.901014 26314 remote_runtime.go:704] "ExecSync cmd from runtime service failed" err="rpc error: code = Unknown desc = command error: time=\"2022-05-06T19:06:41Z\" level=error msg=\"exec failed: unable to start container process: error adding pid 3616061 to cgroups: failed to write 3616061: open /sys/fs/cgroup/systemd/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pode1d577c2_09dc_4859_aada_0a157e0b07f0.slice/crio-bff4c347d3fc6a20064926fdfd1ea3c76e039c56205c6b282d3b6c8e2f13233c.scope/cgroup.procs: no such file or directory\"\n, stdout: , stderr: , exit code -1" containerID="bff4c347d3fc6a20064926fdfd1ea3c76e039c56205c6b282d3b6c8e2f13233c" cmd=[grpc_health_probe -addr=:50051]
May 06 19:06:41 sno00251 bash[26314]: E0506 19:06:41.901135 26314 prober.go:118] "Probe errored" err="rpc error: code = Unknown desc = command error: time=\"2022-05-06T19:06:41Z\" level=error msg=\"exec failed: unable to start container process: error adding pid 3616061 to cgroups: failed to write 3616061: open /sys/fs/cgroup/systemd/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pode1d577c2_09dc_4859_aada_0a157e0b07f0.slice/crio-bff4c347d3fc6a20064926fdfd1ea3c76e039c56205c6b282d3b6c8e2f13233c.scope/cgroup.procs: no such file or directory\"\n, stdout: , stderr: , exit code -1" probeType="Liveness" pod="openshift-marketplace/rh-du-operators-ml5pl" podUID=e1d577c2-09dc-4859-aada-0a157e0b07f0 containerName="registry-server"
May 06 19:06:41 sno00251 bash[26314]: E0506 19:06:41.907801 26314 remote_runtime.go:704] "ExecSync cmd from runtime service failed" err="rpc error: code = Unknown desc = command error: time=\"2022-05-06T19:06:41Z\" level=error msg=\"exec failed: unable to start container process: error adding pid 3616065 to cgroups: failed to write 3616065: open /sys/fs/cgroup/systemd/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pode1d577c2_09dc_4859_aada_0a157e0b07f0.slice/crio-bff4c347d3fc6a20064926fdfd1ea3c76e039c56205c6b282d3b6c8e2f13233c.scope/cgroup.procs: no such file or directory\"\n, stdout: , stderr: , exit code -1" containerID="bff4c347d3fc6a20064926fdfd1ea3c76e039c56205c6b282d3b6c8e2f13233c" cmd=[grpc_health_probe -addr=:50051]
May 06 19:06:41 sno00251 bash[26314]: E0506 19:06:41.907938 26314 prober.go:118] "Probe errored" err="rpc error: code = Unknown desc = command error: time=\"2022-05-06T19:06:41Z\" level=error msg=\"exec failed: unable to start container process: error adding pid 3616065 to cgroups: failed to write 3616065: open /sys/fs/cgroup/systemd/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pode1d577c2_09dc_4859_aada_0a157e0b07f0.slice/crio-bff4c347d3fc6a20064926fdfd1ea3c76e039c56205c6b282d3b6c8e2f13233c.scope/cgroup.procs: no such file or directory\"\n, stdout: , stderr: , exit code -1" probeType="Readiness" pod="openshift-marketplace/rh-du-operators-ml5pl" podUID=e1d577c2-09dc-4859-aada-0a157e0b07f0 containerName="registry-server"
The crio-bff4c3... directory does not exist:
[root@sno00251 core]# ls -l /sys/fs/cgroup/systemd/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pode1d577c2_09dc_4859_aada_0a157e0b07f0.slice/crio-bff4c347d3fc6a20064926fdfd1ea3c76e039c56205c6b282d3b6c8e2f13233c.scope/
ls: cannot access '/sys/fs/cgroup/systemd/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pode1d577c2_09dc_4859_aada_0a157e0b07f0.slice/crio-bff4c347d3fc6a20064926fdfd1ea3c76e039c56205c6b282d3b6c8e2f13233c.scope/': No such file or directory
[root@sno00251 core]# ls -l /sys/fs/cgroup/systemd/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pode1d577c2_09dc_4859_aada_0a157e0b07f0.slice/
total 0
rw-rr-. 1 root root 0 May 6 19:07 cgroup.clone_children
rw-rr-. 1 root root 0 May 6 19:07 cgroup.procs
drwxr-xr-x. 2 root root 0 May 5 15:56 crio-conmon-bff4c347d3fc6a20064926fdfd1ea3c76e039c56205c6b282d3b6c8e2f13233c.scope
rw-rr-. 1 root root 0 May 6 19:07 notify_on_release
rw-rr-. 1 root root 0 May 6 19:07 tasks
Version-Release number of selected component (if applicable): 4.10.13
How reproducible: 6 out of ~2200 clusters deployed in scale testing have this signature.
Steps to Reproduce:
The OLM registry-server pod is created by automated (rapid) manipulation of the catalogsources
1. Disable default sources in OperatorHub CR
2. Create new CatalogSource pointing to disconnected registry
3. Create subscriptions making use of the new CatalogSource
Actual results: CatalogSource remains in "TRANSIENT_FAILURE" state.
Expected results: CatalogSource becomes ready.
Additional info: