-
Bug
-
Resolution: Can't Do
-
Major
-
openshift-4.14, openshift-4.15, openshift-4.16, openshift-4.17
-
None
-
None
-
Quality / Stability / Reliability
-
False
-
-
None
-
None
-
None
-
None
-
None
-
Customer Reported
-
None
-
None
-
None
Documentation is needed for identifying and resolving the following edge case. For this case to occur, ALL of these must be true:
- MicroShift is installed on a ostree system, such as RH Device Edge.
- A user workload is deployed onto the system via a non-ostree path, e.g. by applying manifests directly to the cluster (via helm, oc, kubectl, etc)
- The user's workload shares a container image layer with a MicroShift workload, for instance, ubi9/ubi-minimal.
The impact of these evil stars aligning will appear when the user upgrades or downgrades MicroShift. This will result in the user's workloads failing to start after the system is rebooted.
Indicators of that the user has stumbled into this edge case are:
- Pod statuses show "CreateContainerError"
$ oc get pod -n my-ns NAMEMESPACE NAME READY STATUS RESTARTS AGE my-ns my-pod 2/2 CreateContainerError 0 24h
AND
- Pod Describe contains this event:
$ oc describe pod -n my-ns my-pod
<...omitted...>
Warning Failed 15m (x3 over 16m) kubelet (combined from similar events): Error: failed to mount container k8s_<POD>-7685458cdf_xxxx_301a0d64-1993-45b9-a040-0a94e7fb6b5b_0(c75248228fe35a43c43b4875183d314d50d48165512659745dcf10fedb4d7f13): readlink /var/lib/containers/storage/overlay/l/KBKAXATHXY65BFU6TBGFTMOCWS: no such file or directory
Further indicators appear in journal output:
$ journalctl -u crio Sep 19 19:17:19 edgenius crio[1408]: time="2024-09-19 19:17:19.412205267Z" level=warning msg="Can't stat lower layer \"/var/lib/containers/storage/overlay/l/QX7R7TM2AO4PWCREA35WV3KGXF\" because it does not exist. Going through storage to recreate the missing symlinks."
Proposed Solution:
- Troubleshooting sub-chapter to provide the above characterization to aid users in diagnosing the edge case, propose production-ready solution, and propose developer-workaround.
- Add in-line warnings under "Embedding in a RHEL for Edge image" -> "Chapter 1. Embedding in a RHEL for Edge image using image builder" and -> "Chapter 3. Embedding in a RHEL for Edge image for offline use". These warnings should make the user aware of the risk of deploying workloads directly instead of embedding the workload container images in an ostree layer.
- is caused by
-
RHEL-58891 Containers are in 'CreateContainerError' state after upgrade of OStree
-
- Closed
-
- relates to
-
OCPBUGS-34705 Containers are in 'CreateContainerError' state after upgrade of OS
-
- Closed
-