-
Bug
-
Resolution: Unresolved
-
Critical
-
4.21.0
-
Quality / Stability / Reliability
-
False
-
-
0
-
Important
-
None
-
None
-
None
-
uShift Sprint 279, uShift Sprint 280
-
2
-
None
-
None
-
None
-
None
-
None
-
None
-
None
Description of problem:
FIX IS BEING WORKED ON IN OVNK UPSTREAM. FOR TIME BEING RECOMMENDED SOLUTION: revert and lock ovnk image for microshift rebases, so ART's rebases do not overwrite to faulty ref. Most recent ovn-k image is causing some troubles for MicroShift. The root cause is this commit: https://github.com/openshift/ovn-kubernetes/commit/2871cdae138b10a282a3a930c5f5c516f9c523bc Linked commit changed code to use `RunOVNControllerAppCtl` which has one slight problem: the file with PID is only read once and uses the same PID for 200 retries (every 2 seconds -> almost 7 minutes before it gives up). For some reason, on MicroShift, the ovn-controller always fail to start cleanly on first boot: it means that after restart it creates a new socket with new PID in its filename. When ovnkube-master reads file with PID, it doesn't see that the PID changed and there's new socket to use. Because it tries to use a socket that exists (ovn-c did not delete it), but there's nothing on the other end, it continuously gets "Connection refused" error. MicroShift becomes healthy again after ovnkube-master runs out of retries, quits, and restarts (so in ~7 minutes).
Version-Release number of selected component (if applicable):
How reproducible:
Every time
Steps to Reproduce:
1. Deploy MicroShift main
2. Observe ovn-k pods & logs
Actual results:
ovn-k takes a long time to start, which prevents all other (non hostnetwork) Pods from starting
Expected results:
ovnk sees that PID was updated and switches to using correct socket
Additional info:
See that ovnkube-master restarted 7 minutes after ovnkube-node: # oc get pods -n openshift-ovn-kubernetes NAME READY STATUS RESTARTS AGE ovnkube-master-l2d7x 4/4 Running 1 (11m ago) 18m ovnkube-node-r8m7x 1/1 Running 1 (18m ago) 18m ~7 minute log of ovnkube-master continously reading the same socket file: https://drive.google.com/file/d/18MLEV2HkcJClWVn0Dq8Z5X0p5e3Hy_0R/view?usp=sharing SOS report: https://drive.google.com/file/d/1omPiTfTpZ3oCpiYN4MaPELQoc_V-RE3W/view?usp=sharing
- is duplicated by
-
USHIFT-6309 Dual stack tests failing with latest 4.21 ovnk image
-
- Closed
-
- links to