-
Bug
-
Resolution: Unresolved
-
Major
-
None
-
None
-
False
-
None
-
False
-
-
-
Description of problem:
The pod error Error: failed to create subPath directory for volumeMount "ws-xxx" of container "step-yyy is being observed once in a while (on large OpenShift Pipelines), causing the PipelineRun to fail even though kubernetes would eventually recover and get the necessary pod started.
Seems this problem is also known upstream (see https://github.com/tektoncd/pipeline/issues/6960) but there does not appear to be a solution to the problem to this point.
Workaround
Restart the PipelineRun will likely work as the problem is happening random and has not been observed to happen for the same Pipeline twice in a row.
This though is causing lots of frustration with regards to the user experience, because people need to understand what is happening, whether the failure is due to something they have done or simply because of an issue on the platform. Based on that assessment they will need to take action, which can be time consuming, even though a restart of the PipelineRun would do it.
Prerequisites (if any, like setup, operators/versions):
N/A
Steps to Reproduce
PipelineRun using subPath to mount volumes.
Actual results:
PipelineRun is failing and needs to be restarted once it's clear that it's a platform problem and not related to the Pipeline itself.
Expected results:
Either some way to implement a retry of the PipelineRun when it fails or some grace period before failing to see whether kubernetes would eventually recover and therefore the Pipeline is able to proceed.
Reproducibility (Always/Intermittent/Only Once):
Intermittent
Acceptance criteria:
If Error: failed to create subPath directory for volumeMount "ws-xxx" of container "step-yyy is being reported, the PipelineRun should not fail and wait for kubernetes to resolve the issue or automatically restart the PipelineRun to see whether it's successful in the next attempt.