-
Bug
-
Resolution: Unresolved
-
Undefined
-
None
-
2.9.3
-
Quality / Stability / Reliability
-
False
-
-
False
-
-
-
Moderate
Description of problem:
Almost every morning the following happens in my cluster: 1. Power up OCP cluster 2. RHV setup is still powered off, vSphere setup is still powered off 3. At some point during the day I power up the RHV setup, the RHV provider in MTV becomes ready 4. At some point during the day I power up the vSphere setup, the vSphere provider never becomes ready - unless I force a forklift-controller restart.
Version-Release number of selected component (if applicable):
2.9.3
How reproducible:
Almost always, every morning
Steps to Reproduce:
1. Power up the OCP cluster with vSphere still powered down 2. Let it run for a while 3. Power up vSphere
Actual results:
vSphere provider is down, I need to kill the controller pod for it try to reconnect. Logs about vSphere go silent after very initial connection attempts.
Expected results:
vSphere provider is up
Additional info:
1. Power up OCP cluster, inventory container starts {"level":"info","ts":"2025-08-28 20:35:23.907","logger":"entrypoint","msg":"setting up prometheus endpoint :2112/metrics"} .... 2. Fail to connect to vSphere (its down, expected) {"level":"info","ts":"2025-08-28 20:35:24.088","logger":"provider|f9b9d","msg":"Condition added.","provider":{"name":"vmware","namespace":"openshift-mtv"},"condition":{"type":"ConnectionTestFailed","status":"True","reason":"Tested","category":"Critical","message":"dial tcp: lookup vmware-vcenter.virt.home.arpa on 172.30.0.10:53: read udp 10.129.4.39:56946->172.30.0.10:53: read: connection refused","lastTransitionTime":null}} 3. Fail to connect to RHV (its down, expected) {"level":"info","ts":"2025-08-28 20:35:24.088","logger":"provider","msg":"Connection test failed.","reason":"got status != 200 from oVirt"} 4. All fine and expected until here. 5. I power up vSphere 6. Nothing, provider is still down 7. The last time it tried to connect to vSphere was just a retry after the start. After the below the logs go quiet about vSphere. {"level":"debug","ts":"2025-08-28 20:35:30.341","logger":"events","msg":"dial tcp 192.168.3.5:443: connect: no route to host","type":"Warning","object":{"kind":"Provider","namespace":"openshift-mtv","name":"vmware","uid":"503fec06-858c-49f9-b679-f92baa71a80a","apiVersion":"forklift.konveyor.io/v1beta1","resourceVersion":"13025367"},"reason":"ConnectionTestFailed"} {"level":"info","ts":"2025-08-28 20:35:30.350","logger":"provider|b75hn","msg":"Reconcile ended.","provider":{"name":"vmware","namespace":"openshift-mtv"},"reQ":0} 8. Nothing is seen again about this provider in the logs, its down forever unless I kill the pod to force a forklift restart 9. If I power up RHV, all works. Its just vSphere that has this weird behaviour. {"level":"info","ts":"2025-08-29 00:56:29.882","logger":"provider|42nbx","msg":"Condition added.","provider":{"name":"rhv","namespace":"openshift-mtv"},"condition":{"type":"ConnectionTestSucceeded","status":"True","reason":"Tested","category":"Required","message":"Connection test, succeeded.","lastTransitionTime":null}} So unless vSphere is up and running when forklift starts, it doesn't seem to try to connect it again later.