-
Bug
-
Resolution: Done-Errata
-
Major
-
4.19.0
-
Quality / Stability / Reliability
-
False
-
-
5
-
None
-
None
-
None
-
None
-
uShift Sprint 268, uShift Sprint 269, uShift Sprint 270
-
3
-
Done
-
Bug Fix
-
-
None
-
None
-
None
-
None
Description of problem:
Adding a MicroShift manifests that include a Webhook for particular CR and instance of that CR (in the same manifest or different one) prevents MicroShift from successfully starting.
Version-Release number of selected component (if applicable):
Found on main (4.19) but most likely applicable to all microshift versions.
How reproducible:
always
Steps to Reproduce:
1. Install microshift & microshift-ai-model-serving RPMs
2. Start MicroShift: systemctl start microshift
Actual results:
systemctl start microshift fails. Inspecting journalctl -u microshift shows message: "Failed to initialize CSINode after retrying: timed out waiting for the condition" and "microshift.service: Main process exited, code=exited, status=255/EXCEPTION"
Expected results:
systemctl start microshift is successful
Additional info:
Investigation findings: - When kubelet starts, it first wants to create kubepods.slice before it registers to the API Server - There's another goroutine in kubelet, that creates CSINode, but it needs the v1/Node first and it's only allowed to retry for ~27 seconds - kubepods.slice is created after microshift.service becomes ready (because kubepods.slice has a dependency on microshift.service) - Adding a manifest that cannot be really applied (because it has a CR and there's a webhook for that CR, but the webhook doesn't run yet, because kubelet isn't really up) results in kustomizer sub-service taking a long time until it fails. - In the mean time, the CSINode creation times out and kills microshift by exit(255). In happy flow (i.e. no CR for webhook to validate): - kustomizer doesn't block microshift.service readiness - microshift.service becomes ready and kubepods.slice is created - kubelet registers Node and CSINode is created before 27s timeout and doesn't kill the microshift Seems like best solution would be to extract kustomizer from the microshift readiness, so it doesn't block microshift.service becoming ready. This issue can go unnoticed in tests that do not run `systemctl start` directly because after microshift is killed, then kubepods.slice is immediately created, so next time kubelet starts it doesn't have to create it and proceeds to Node registration. If I comment out deploy of ServingRuntimes in ai-model-serving, then the microshift starts normally (i.e. creation of ServingRuntimes isn't blocked by Webhook that is not operational yet, hence the kustomizer finishes quickly, microshift.service becomes ready, kubepods.slice is created before CSINode creation loop kills the binary)
- links to
-
RHEA-2024:11040
Red Hat build of MicroShift 4.19.z bug fix and enhancement update