-
Bug
-
Resolution: Unresolved
-
Critical
-
None
-
4.20
-
None
-
False
-
-
None
-
Critical
-
None
-
x86_64
-
Production
-
None
-
None
-
None
-
Customer Escalated
-
-
-
None
-
None
-
None
-
None
-
None
-
None
-
None
Description of problem:
Cluster config: ROKS 4.20 with OVN default CNI.
Observation: UDN attached CNV VMs loose their IP address and DHCP availability during cluster master operation (tested with patch version update).
Version-Release number of selected component (if applicable):
ROKS 4.20.12.
CNV 4.20.3
How reproducible: 100%
Steps to Reproduce:
0. ROKS cluster with master version 4.20.12.
1. Create a namespace with primary UDN label. In the example, the namespace name is `green`
2. Create a layer2 UDN with the following NAD:
```
kind: NetworkAttachmentDefinition
metadata:
name: green-net
namespace: green
spec:
config: '{"allowPersistentIPs":true,"cniVersion":"1.0.0","joinSubnet":"100.65.0.0/16,fd99::/64","name":"cluster_udn_green-net","netAttachDefName":"green/green-net","role":"primary","subnets":"10.203.0.0/26","topology":"layer2","type":"ovn-k8s-cni-overlay"}'
```
3. Install CNV 4.20.3 and create some Centos VMs in the `green` namespace in such a way that it is attached to the UDN. Verify VM is getting the IP from the UDN over DHCP without any issue.
```
$ k get vmi -A
NAMESPACE NAME AGE PHASE IP NODENAME READY
green example1 69m Running 10.203.0.12 test-d6is4ks20q37r7i27big-gergo03022-default-00000244 True
green example2 69m Running 10.203.0.13 test-d6is4ks20q37r7i27big-gergo03022-default-0000036a True
green example3 61m Running 10.203.0.44 test-d6is4ks20q37r7i27big-gergo03022-default-00000244 True
green example4 58m Running 10.203.0.46 test-d6is4ks20q37r7i27big-gergo03022-default-0000036a True
green example5 57m Running 10.203.0.47 test-d6is4ks20q37r7i27big-gergo03022-default-0000036a True
```
4. Update the master to a new patch level (in this case the target level is 4.20.13.). Wait until DHCP lease expires in the Centos VMs (probably 30 minutes).
5. Check the attached IPs again.
```
$ k get vmi -A
NAMESPACE NAME AGE PHASE IP NODENAME READY
green example1 141m Running test-d6is4ks20q37r7i27big-gergo03022-default-00000244 True
green example2 141m Running fe80::858:aff:fecb:d test-d6is4ks20q37r7i27big-gergo03022-default-0000036a True
green example3 133m Running test-d6is4ks20q37r7i27big-gergo03022-default-00000244 True
green example4 129m Running fe80::858:aff:fecb:2e test-d6is4ks20q37r7i27big-gergo03022-default-0000036a True
green example5 129m Running fe80::858:aff:fecb:2f test-d6is4ks20q37r7i27big-gergo03022-default-0000036a True
```
Actual results: VMs loose their DHCP assigned IPv4 addresses and only link-local v6 addresses are present. VM logs shows unavailable DHCP service.
Expected results: DHCP is not disrupted.
Additional info:
I am not sure if this issue is specific to this master patch update. It might be that this is just one way to trigger it.
VM guest OS restart does not recover the issue. Any DHCP client restart also does not recover.
Given the VMI is owned by a VM object, deleting the VMI is resolving the issue, as new VMI comes up.
Attaching (the same) static IP on the guest instead using DHCP client also works, suggesting the datapath in OVS is generally available, maybe only DHCP is affected. Pods attached to the same UDN are also not disrupted, as they do not use DHCP.
Affected Platforms:
Is it an issue present and reproducible with any IBM Cloud managed Openshift (ROKS) cluster with OVN as the default CNI.