Details
-
Bug
-
Resolution: Obsolete
-
Undefined
-
None
-
4.8
-
Moderate
-
Unspecified
-
If docs needed, set a value
Description
OCP Version: 4.8.36
Description of problem:
A customer with an already installed and running production cluster has OpenShift nodes with 10.88.x.x IP addresses. When they ran a test container via podman to run an fio disk test, podman created a cni-podman0 interface with a 10.88.0.0/16 route on the node. Since they ran this against all the control plane nodes, the control plane nodes lost connection with the infra nodes as a result. An investigation revealed this interface's network range within a hardcoded configuration: see /etc/cni/net.d/87-podman-bridge.conflist
The interface persisted after the container exited and was removed.
To reproduce:
$ for master in $( oc get nodes -l node-role.kubernetes.io/master -oname | awk -F/ '
' ) ; do echo $master ; oc debug node/$master – chroot /host podman run --volume /var/lib/etcd:/var/lib/etcd:Z quay.io/openshift-scale/etcd-perf &>/tmp/${master}_fio.lst ; done
master-0.odile.redhat.com
master-1.odile.redhat.com
master-2.odile.redhat.com
$ oc debug node/master-0.odile.redhat.com
Starting pod/master-0odileredhatcom-debug ...
To use host binaries, run `chroot /host`
Pod IP: 10.0.93.98
If you don't see a command prompt, try pressing enter.
sh-4.4# chroot /host
sh-4.4# netstat -rn
Kernel IP routing table
Destination Gateway Genmask Flags MSS Window irtt Iface
0.0.0.0 10.0.95.254 0.0.0.0 UG 0 0 0 br-ex
10.0.88.0 0.0.0.0 255.255.248.0 U 0 0 0 br-ex
10.88.0.0 0.0.0.0 255.255.0.0 U 0 0 0 cni-podman0
10.128.0.0 0.0.0.0 255.255.254.0 U 0 0 0 ovn-k8s-mp0
10.128.0.0 10.128.0.1 255.252.0.0 UG 0 0 0 ovn-k8s-mp0
169.254.169.0 10.0.95.254 255.255.255.252 UG 0 0 0 br-ex
169.254.169.3 10.128.0.1 255.255.255.255 UGH 0 0 0 ovn-k8s-mp0
169.254.169.254 10.0.88.10 255.255.255.255 UGH 0 0 0 br-ex
172.30.0.0 10.0.95.254 255.255.0.0 UG 0 0 0 br-ex
If the customer has OCP nodes with an IP address that falls within 10.88.0.0/16, they will no longer be able to communicate once the cni-podman0 interface is created.
Additional info:
I looked in the documentation to see if there were any statements/warnings about not using 10.88/16 for cluster nodes, but was not able to find anything. I was also not able to find if changing the interface's configuration would be supported or persist if altered by a customer on an OpenShift cluster.
This BZ [0] seems related, although my customer did not encounter this issue during installation only on an already running cluster when following steps [1] run run fio test.
[0] https://bugzilla.redhat.com/show_bug.cgi?id=1723798
[1] https://access.redhat.com/solutions/4885641