Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-49433

nmstate: after reboot wait-for-br-ex-up.service stuck

XMLWordPrintable

    • Moderate
    • No
    • False
    • Hide

      None

      Show
      None
    • Release Note Not Required
    • In Progress

      This is a clone of issue OCPBUGS-46072. The following is the description of the original issue:

      Description of problem:

      With balance-slb and nmstate a node got stuck on reboot.

      [root@master-1 core]# systemctl list-jobs
      JOB UNIT                                 TYPE  STATE
      307 wait-for-br-ex-up.service            start running
      341 afterburn-checkin.service            start waiting
      187 multi-user.target                    start waiting
      186 graphical.target                     start waiting
      319 crio.service                         start waiting
      292 kubelet.service                      start waiting
      332 afterburn-firstboot-checkin.service  start waiting
      306 node-valid-hostname.service          start waiting
      293 kubelet-dependencies.target          start waiting
      321 systemd-update-utmp-runlevel.service start waiting
      
      
      systemctl status wait-for-br-ex-up.service
      Dec 10 20:11:39 master-1.ostest.test.metalkube.org systemd[1]: Starting Wait for br-ex up event from NetworkManager...
      
          

      Version-Release number of selected component (if applicable):

      4.18.0-0.nightly-2024-12-04-113014
      

      How reproducible:

      Sometimes

      Steps to Reproduce:

      1. create nmstate config

      interfaces:
       - name: bond0
      type: bond
      state: up
      copy-mac-from: eno2
      ipv4:
      enabled: false
      link-aggregation:
      mode: balance-xor
      options:
      xmit_hash_policy: vlan+srcmac
      balance-slb: 1
      port:
       - eno2
       - eno3
       - name: br-ex
      type: ovs-bridge
      state: up
      ipv4:
      enabled: false
      dhcp: false
      ipv6:
      enabled: false
      dhcp: false
      bridge:
      port:
       - name: bond0
       - name: br-ex
       - name: br-ex
      type: ovs-interface
      state: up
      copy-mac-from: eno2
      ipv4:
      enabled: true
      address:
       - ip: "192.168.111.111"
      prefix-length: 24
      ipv6:
      enabled: false
      dhcp: false
       - name: eno1
      type: interface
      state: up
      ipv4:
      enabled: false
      ipv6:
      enabled: false
      dns-resolver:
      config:
      server:
       - 192.168.111.1
      routes:
      config:
       - destination: 0.0.0.0/0
      next-hop-address: 192.168.111.1
      next-hop-interface: br-ex
      

      2. reboot
      3.

      Actual results:

      systemctl status wait-for-br-ex-up.service
      Dec 10 20:11:39 master-1.ostest.test.metalkube.org systemd[1]: Starting Wait for br-ex up event from NetworkManager...
      

      bond0 fails, network is in odd state

      [root@master-1 core]# ip -c a
      1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
          link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
          inet 127.0.0.1/8 scope host lo
             valid_lft forever preferred_lft forever
          inet6 ::1/128 scope host
             valid_lft forever preferred_lft forever
      2: ens2f0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
          link/ether 90:e2:ba:ca:9f:28 brd ff:ff:ff:ff:ff:ff
          altname enp181s0f0
          inet6 fe80::92e2:baff:feca:9f28/64 scope link noprefixroute
             valid_lft forever preferred_lft forever
      3: eno1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
          link/ether 30:d0:42:56:66:bb brd ff:ff:ff:ff:ff:ff
          altname enp23s0f0
      4: ens2f1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
          link/ether 90:e2:ba:ca:9f:29 brd ff:ff:ff:ff:ff:ff
          altname enp181s0f1
          inet6 fe80::92e2:baff:feca:9f29/64 scope link noprefixroute
             valid_lft forever preferred_lft forever
      5: eno2: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN group default qlen 1000
          link/ether 30:d0:42:56:66:bc brd ff:ff:ff:ff:ff:ff
          altname enp23s0f1
      6: eno3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
          link/ether 30:d0:42:56:66:bd brd ff:ff:ff:ff:ff:ff
          altname enp23s0f2
          inet 192.168.111.34/24 brd 192.168.111.255 scope global dynamic noprefixroute eno3
             valid_lft 3576sec preferred_lft 3576sec
          inet6 fe80::32d0:42ff:fe56:66bd/64 scope link noprefixroute
             valid_lft forever preferred_lft forever
      7: eno4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
          link/ether 30:d0:42:56:66:be brd ff:ff:ff:ff:ff:ff
          altname enp23s0f3
      8: ovs-system: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
          link/ether 56:92:14:97:ed:10 brd ff:ff:ff:ff:ff:ff
      9: ovn-k8s-mp0: <BROADCAST,MULTICAST> mtu 1400 qdisc noop state DOWN group default qlen 1000
          link/ether ae:b9:9e:dc:17:d1 brd ff:ff:ff:ff:ff:ff
      10: genev_sys_6081: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 65000 qdisc noqueue master ovs-system state UNKNOWN group default qlen 1000
          link/ether e6:68:4d:df:e0:bd brd ff:ff:ff:ff:ff:ff
          inet6 fe80::e468:4dff:fedf:e0bd/64 scope link
             valid_lft forever preferred_lft forever
      11: br-int: <BROADCAST,MULTICAST> mtu 1400 qdisc noop state DOWN group default qlen 1000
          link/ether 32:5b:1f:35:ce:f5 brd ff:ff:ff:ff:ff:ff
      12: bond0: <NO-CARRIER,BROADCAST,MULTICAST,MASTER,UP> mtu 1500 qdisc noqueue master ovs-system state DOWN group default qlen 1000
          link/ether aa:c8:8c:e3:71:aa brd ff:ff:ff:ff:ff:ff
      13: bond0.104@bond0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue master ovs-system state LOWERLAYERDOWN group default qlen 1000
          link/ether aa:c8:8c:e3:71:aa brd ff:ff:ff:ff:ff:ff
      14: br-ex: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000
          link/ether 30:d0:42:56:66:bd brd ff:ff:ff:ff:ff:ff
          inet 192.168.111.111/24 brd 192.168.111.255 scope global noprefixroute br-ex
             valid_lft forever preferred_lft forever
      
         

      Expected results:

      System reboots correctly.

      Additional info:

      br-ex up/down re-generates the event

      [root@master-1 core]# nmcli device down br-ex ; nmcli device up br-ex
      
      

              bnemec@redhat.com Benjamin Nemec
              openshift-crt-jira-prow OpenShift Prow Bot
              Ross Brattain Ross Brattain
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

                Created:
                Updated: