-
Bug
-
Resolution: Won't Do
-
Normal
-
None
-
Quality / Stability / Reliability
-
0.42
-
False
-
False
-
NEW
-
Moderate
-
No
Description of problem:
While testing VM migration we notice that sometime SSH connection to VM failed.
Ping the VM right after migration from one of the nodes and see if we didn't witness packet loss.
Version-Release number of selected component (if applicable):
CNV 4.9
Steps to Reproduce:
1. Create migratable VM with OCS storage, create ssh service to VM
2. Migrate VM
3. Connect via SSH
4. Pause, Un-pause VM
5. Connect VM SSH
Actual results:
Once in while SSH connection failing, we see it a lot on the automation runs.
Additional info:
(from the mail thread)
1.
We did automation test to get statistics of the issue:
Ran a loop of migrate vm + connect via ssh for an hour (after each migration perform 10 times ssh_vm-pause-unpause-ssh_vm):
---------------------------------------------------
vm = golden_image_vm_object_from_template_multi_fedora_os_multi_storage_scope_class
iter_pass = 0
iter_fail = 0
import time
with open('test.log', 'w') as ff:
while True:
ff.write("----------------Migrate VM----------------\n")
migrate_vm_and_verify(vm=vm, check_ssh_connectivity=True)
for i in range(0,10):
try:
validate_pause_unpause_linux_vm(vm=vm, pre_pause_pid=ping_process_in_fedora_os)
iter_pass += 1
except Exception:
ff.write("FAIL!!!\n")
iter_fail += 1
time.sleep(1)
ff.write(f"PASSED:
\n")
ff.write(f"FAILED:
\n")
1) migrate_vm_and_verify migrates vm, checks if it succeeded and check ssh connection
2) validate_pause_unpause_linux_vm connects via ssh and creates ping process, pause/unpause vm, ssh and check process id
(counter is for validate_pause_unpause_linux_vm)
The result is:
PASSED: 396
FAILED: 14
Meaning:
validate_pause_unpause_linux_vm rarely fails on first iteration ONLY (rest 9 succeeds)
SSH failure: socket.timeout: 10.1.156.18: timeout(10.0)
2.
Ran loop of 400 SSH connection to VM --> all pass.
- external trackers