-
Bug
-
Resolution: Cannot Reproduce
-
Major
-
None
-
4
-
False
-
-
False
-
ASSIGNED
-
---
-
---
-
-
4
-
CNV-net-QE-253
-
Important
-
No
Version-Release number of selected component (if applicable):
4.13.0
How reproducible:
Every time
Expected results:
Able to Live Migrate to any OCP Worker node
Additional Info:
OCP 4.13.0 BareMetal
Worker/Infra nodes have 4 links in a bond configuration
OCP Virt 4.13.0
ODF 4.12
nncp & nad: cnv-bridge
The issue is as follows:
- Created new VM (rhel8)with dual network stack (default net + bridge net)
- Single rootdisk as part of the initial creation (backed by ODF ceph rbd)
- Start VM
- Start pinging guest VM from an utility host.
- Live migrate the guest vm to any of 3 workers. Have done this several times to make sure I hit every worker node in the cluster.
- Pinging works (it takes maybe 1-2 hits while the guest vm moves around worker node but it resumes normally)
- Network inside the guest vm works. Can ping other hosts on the network, as well as gateway).
Here is where I run into issues
- Hot add a disk to the guest VM (blank pvc coming from ODF ceph rbd storageclass)
- Verify the disk is added via console
- (pinging still working)
- Initiate a Live Migrate
- Wait for guest VM to finish migrating
- (pinging stops responding)
- Log into guest console and check a few things (like ip route, ip neigh, etc)
- Issue a systemctl restart NetworkManager on guest vm and although this succeeds, I can't ping anything like other hosts in the same br network or even the gateway.
In order for pinging (or the vm guest network for that matter) to resume, I do another Live Migrate and hope the guest vm lands on the original worker node where I added the host disk. <- this part is interesting. Why does network resumes only if the guest vm lands back in the worker node where it was when I added the hot disk?
I verified this with other test VMs where I wrote down the worker node, tested the network, then added 1 additional disk and then Live Migrated.... Network breaks until the guest vm is back on said worker node where I added the disk.
NNCP Config:
apiVersion: nmstate.io/v1
kind: NodeNetworkConfigurationPolicy
metadata:
name: br1-bond0-policy
spec:
desiredState:
interfaces:
- name: br1
description: Bond0 br1
type: linux-bridge
state: up
ipv4:
enabled: false
bridge:
options:
stp:
enabled: false
port: - name: bond0
NAD Config:
apiVersion: k8s.cni.cncf.io/v1
kind: NetworkAttachmentDefinition
metadata:
annotations:
description: Hypervisor
k8s.v1.cni.cncf.io/resourceName: bridge.network.kubevirt.io/br1
generation: 2
name: br1-vlan192
namespace: test-vmis
spec:
config: >-
{"name":"br1-vlan192","type":"cnv-bridge","cniVersion":"0.3.1","bridge":"br1","vlan":192,"macspoofchk":true,"ipam":{},"preserveDefaultVlan": false}
NetworkManager from Worker Nodes (this one from wrk1)
[connection]
id=bond0.104
uuid=912add91-19a5-4ac1-9f6c-1f137453dddd
type=vlan
interface-name=bond0.104
autoconnect=true
autoconnect-priority=1
[ethernet]
[vlan]
flags=1
id=104
parent=208a8ef4-8a95-4425-b4ad-58c7431614b9
[ipv4]
address1=10.176.104.170/22
dhcp-client-id=mac
dns=209.196.203.128;
dns-priority=40
dns-search=corp.CLIENTNAME.com;
method=manual
route1=0.0.0.0/0,10.176.107.254
route1_options=table=254
[ipv6]
addr-gen-mode=eui64
dhcp-duid=ll
dhcp-iaid=mac
method=disabled
[proxy]
----------------------------------------------------
[connection]
id=bond0
uuid=208a8ef4-8a95-4425-b4ad-58c7431614b9
type=bond
autoconnect-priority=1
autoconnect-slaves=1
interface-name=bond0
master=eda12b69-4e74-47b2-b7bf-6497855f226e
slave-type=bridge
timestamp=1685497889
[ethernet]
cloned-mac-address=3C:EC:EF:74:4D:80
[bond]
miimon=100
mode=802.3ad
[bridge-port]
vlans=2-4094
----------------------------------------------------
[connection]
id=br1
uuid=eda12b69-4e74-47b2-b7bf-6497855f226e
type=bridge
autoconnect-slaves=1
interface-name=br1
timestamp=1685726538
[ethernet]
[bridge]
stp=false
vlan-filtering=true
[ipv4]
method=disabled
[ipv6]
addr-gen-mode=default
method=disabled
[proxy]
[user]
nmstate.interface.description=Bond0 br1
----------------------------------------------------
[connection]
id=eno1np0
uuid=e469b9bd-c767-4819-80b2-5363f17ba870
type=ethernet
interface-name=eno1np0
master=208a8ef4-8a95-4425-b4ad-58c7431614b9
slave-type=bond
autoconnect=true
autoconnect-priority=1
----------------------------------------------------
[connection]
id=eno2np1
uuid=9d5a4724-54d7-4851-bd45-5262a7990908
type=ethernet
interface-name=eno2np1
master=208a8ef4-8a95-4425-b4ad-58c7431614b9
slave-type=bond
autoconnect=true
autoconnect-priority=1
----------------------------------------------------
[connection]
id=enp1s0f0
uuid=333f032e-42a8-41b3-94aa-5872ddb647e4
type=ethernet
interface-name=enp1s0f0
master=208a8ef4-8a95-4425-b4ad-58c7431614b9
slave-type=bond
autoconnect=true
autoconnect-priority=1
----------------------------------------------------
[connection]
id=enp1s0f1
uuid=3542e9b7-0bad-4e36-b734-0eff79071cac
type=ethernet
interface-name=enp1s0f1
master=208a8ef4-8a95-4425-b4ad-58c7431614b9
slave-type=bond
autoconnect=true
autoconnect-priority=1
VMI:
apiVersion: kubevirt.io/v1
kind: VirtualMachineInstance
metadata:
annotations:
kubevirt.io/latest-observed-api-version: v1
kubevirt.io/storage-observed-api-version: v1alpha3
vm.kubevirt.io/flavor: small
vm.kubevirt.io/os: rhel8
vm.kubevirt.io/workload: server
creationTimestamp: "2023-06-06T15:42:49Z"
finalizers:
- kubevirt.io/virtualMachineControllerFinalize
- foregroundDeleteVirtualMachine
generation: 15
labels:
kubevirt.io/domain: dvuulvocpvmi02
kubevirt.io/nodeName: dvuuopwkr03
kubevirt.io/size: small
name: dvuulvocpvmi02
namespace: test-vmis
ownerReferences: - apiVersion: kubevirt.io/v1
blockOwnerDeletion: true
controller: true
kind: VirtualMachine
name: dvuulvocpvmi02
uid: 4c3b425a-4e4c-4533-9184-f0680cbf185d
resourceVersion: "12322288"
uid: 1811fa44-e509-431c-a937-3ad32e8d127f
spec:
domain:
cpu:
cores: 2
model: host-model
sockets: 1
threads: 1
devices:
disks: - bootOrder: 2
disk:
bus: virtio
name: rootdisk - disk:
bus: scsi
name: disk-little-walrus
interfaces: - macAddress: 02:bb:06:00:00:06
masquerade: {}
model: virtio
name: default - bridge: {}
macAddress: 02:bb:06:00:00:07
model: virtio
name: nic-liable-mollusk
networkInterfaceMultiqueue: true
rng: {}
features:
acpi:
enabled: true
firmware:
uuid: 5a07e466-7638-51a5-9fdd-8ab5e24aebe4
machine:
type: pc-q35-rhel9.2.0
resources:
requests:
memory: 4Gi
evictionStrategy: LiveMigrate
networks: - name: default
pod: {} - multus:
networkName: test-vmis/br1-vlan192
name: nic-liable-mollusk
terminationGracePeriodSeconds: 180
volumes: - dataVolume:
name: dvuulvocpvmi02
name: rootdisk - dataVolume:
hotpluggable: true
name: dvuulvocpvmi02-disk-little-walrus
name: disk-little-walrus
status:
activePods:
abd603fb-a0ff-4bd0-bf38-ac515cf49c83: dvuuopwkr03
conditions: - lastProbeTime: null
lastTransitionTime: "2023-06-06T15:43:10Z"
status: "True"
type: Ready - lastProbeTime: null
lastTransitionTime: null
status: "True"
type: LiveMigratable - lastProbeTime: "2023-06-06T15:43:31Z"
lastTransitionTime: null
status: "True"
type: AgentConnected
guestOSInfo:
id: rhel
kernelRelease: 4.18.0-425.19.2.el8_7.x86_64
kernelVersion: '#1 SMP Fri Mar 17 01:52:38 EDT 2023'
name: Red Hat Enterprise Linux
prettyName: Red Hat Enterprise Linux 8.7 (Ootpa)
version: "8.7"
versionId: "8.7"
interfaces: - infoSource: domain, guest-agent
interfaceName: enp1s0
ipAddress: 192.168.6.231
ipAddresses: - 192.168.6.231
mac: 02:bb:06:00:00:06
name: default
queueCount: 2 - infoSource: domain, guest-agent
interfaceName: enp2s0
ipAddress: 10.176.192.151
ipAddresses: - 10.176.192.151
- fe80::bb:6ff:fe00:7
mac: 02:bb:06:00:00:07
name: nic-liable-mollusk
queueCount: 2
launcherContainerImageVersion: registry.redhat.io/container-native-virtualization/virt-launcher-rhel9@sha256:8d493a50ff05c3b9f30d3ccdd93acec3b1d7fdc07324ce4b92521c6b084496b3
migrationMethod: LiveMigration
migrationTransport: Unix
nodeName: dvuuopwkr03
phase: Running
phaseTransitionTimestamps: - phase: Pending
phaseTransitionTimestamp: "2023-06-06T15:42:49Z" - phase: Scheduling
phaseTransitionTimestamp: "2023-06-06T15:42:49Z" - phase: Scheduled
phaseTransitionTimestamp: "2023-06-06T15:43:10Z" - phase: Running
phaseTransitionTimestamp: "2023-06-06T15:43:13Z"
qosClass: Burstable
runtimeUser: 107
selinuxContext: system_u:object_r:container_file_t:s0:c714,c978
virtualMachineRevisionName: revision-start-vm-4c3b425a-4e4c-4533-9184-f0680cbf185d-17
volumeStatus: - hotplugVolume:
attachPodName: hp-volume-7vnjd
attachPodUID: 843df1d6-191f-4581-a1e1-4ba9daa15a49
message: Successfully attach hotplugged volume disk-little-walrus to VM
name: disk-little-walrus
persistentVolumeClaimInfo:
accessModes: - ReadWriteMany
capacity:
storage: 5Gi
filesystemOverhead: "0"
requests:
storage: "5368709120"
volumeMode: Block
phase: Ready
reason: VolumeReady
target: sda - name: rootdisk
persistentVolumeClaimInfo:
accessModes: - ReadWriteMany
capacity:
storage: 100Gi
filesystemOverhead: "0"
requests:
storage: "107374182400"
volumeMode: Block
target: vda