-
Bug
-
Resolution: Not a Bug
-
Normal
-
None
-
4.11.z
-
No
-
False
-
Description of problem:
We've seen pods failing to start with errors like: exec /usr/local/bin/cephcsi: no such file or directory or exec /usr/local/bin/cephcsi: no such file or directory
Version-Release number of selected component (if applicable):
How reproducible:
This seems to be happening on about 20% of new nodes on this cluster (with autoscaling)
Steps to Reproduce:
1. Create a new node 2. Get unlucky :(
Actual results:
The pod fails to start
Expected results:
The pod starts
Additional info:
This seems to be related to https://access.redhat.com/solutions/5972661 Our suspicion is that the since cephfsplugin runs as a daemonset, it is starting on new nodes very early and while the image is being pulled, the machine config operator restarts the node causing corruption to the image layers that were in the middle of being pulled down