Uploaded image for project: 'Red Hat Advanced Cluster Management'
  1. Red Hat Advanced Cluster Management
  2. ACM-7899

Submariner 0.16.0 - Upgrade of cluster ocp 4.13 OVN to 4.14 OVN-K fails

XMLWordPrintable

    • Submariner Sprint 2023-12, Submariner Sprint 2023-13
    • Important
    • No

      Description of problem:

      ACM 2.9 / Submariner 0.16.0

      During upgrade of a cluster version from 4.13 OVN to 4.14 OVN-K with Submariner deployed, upgrade flow fails.

      Note - The bug similar to one being already fixed:
      https://issues.redhat.com/browse/ACM-7469
      But this time the bug is not happens in fresh deployment, only in upgrade.

      An upgrade of the cluster in the same topology without Submariner succeeded with no issues.

      How reproducible:

      Deploy ACM 2.9 / Submariner 0.16.0 environment.
      One of the clusters should be ocp 4.13 with OVN CNI.
      Install Submariner.
      Start an upgrade of ocp cluster from 4.13 to 4.14, which should upgrade the CNI to OVN-K.
      The cluster upgrade flow will fail.

      Additional info:

      The following error pops up in ovnkube-node-g2hc9 pod of openshift-ovn-kubernetes namespace in nbdb container:

      2023-10-11T10:03:03.486Z|00162|socket_util_unix|WARN|unlinking "/var/run/ovn/ovnnb_db.sock": Is a directory
      2023-10-11T10:03:03.486Z|00163|fatal_signal|WARN|could not unlink "/var/run/ovn/ovnnb_db.sock" (Is a directory)
      2023-10-11T10:03:03.486Z|00164|stream_unix|ERR|/var/run/ovn/ovnnb_db.sock: binding failed: Is a directory
      2023-10-11T10:03:03.486Z|00165|ovsdb_jsonrpc_server|ERR|Dropped 23 log messages in last 57 seconds (most recently, 2 seconds ago) due to excessive rate
      2023-10-11T10:03:03.486Z|00166|ovsdb_jsonrpc_server|ERR|punix:/var/run/ovn/ovnnb_db.sock: listen failed: Is a directory
      ++ quit
      +++ date -Iseconds
      2023-10-11T10:03:04+00:00 - stopping nbdb
      ++ echo '2023-10-11T10:03:04+00:00 - stopping nbdb'
      ++ /usr/share/ovn/scripts/ovn-ctl stop_nb_ovsdb
      2023-10-11T10:03:04.138Z|00167|socket_util_unix|WARN|unlinking "/var/run/ovn/ovnnb_db.sock": Is a directory
      2023-10-11T10:03:04.138Z|00168|fatal_signal|WARN|could not unlink "/var/run/ovn/ovnnb_db.sock" (Is a directory)
      2023-10-11T10:03:04.138Z|00169|stream_unix|ERR|/var/run/ovn/ovnnb_db.sock: binding failed: Is a directory
      2023-10-11T10:03:04.139Z|00170|socket_util_unix|WARN|unlinking "/var/run/ovn/ovnnb_db.sock": Is a directory
      2023-10-11T10:03:04.139Z|00171|fatal_signal|WARN|could not unlink "/var/run/ovn/ovnnb_db.sock" (Is a directory)
      2023-10-11T10:03:04.139Z|00172|stream_unix|ERR|/var/run/ovn/ovnnb_db.sock: binding failed: Is a directory
      2023-10-11T10:03:04.153Z|00173|socket_util_unix|WARN|unlinking "/var/run/ovn/ovnnb_db.sock": Is a directory
      2023-10-11T10:03:04.153Z|00174|fatal_signal|WARN|could not unlink "/var/run/ovn/ovnnb_db.sock" (Is a directory)
      2023-10-11T10:03:04.153Z|00175|stream_unix|ERR|/var/run/ovn/ovnnb_db.sock: binding failed: Is a directory
      Exiting ovnnb_db (374066).
      [1]+  Done                    exec /usr/share/ovn/scripts/ovn-ctl ${OVN_ARGS} --ovn-nb-log="-vconsole:${OVN_LOG_LEVEL} -vfile:off -vPATTERN:console:%D{%Y-%m-%dT%H:%M:%S.###Z}|%05N|%c%T|%p|%m" run_nb_ovsdb
      +++ date -Iseconds
      2023-10-11T10:03:04+00:00 - nbdb stopped
      ++ echo '2023-10-11T10:03:04+00:00 - nbdb stopped'
      ++ rm -f /var/run/ovn/ovnnb_db.pid
      ++ exit 0 

      The following logs attached to the bug:

      • Cluster must-gather
      • yaml file of "ovnkube-node" daemonset.
      • logs files of all containers under "ovnkube-node" pod

            skitt@redhat.com Stephen Kitt
            mbabushk@redhat.com Maxim Babushkin
            Maxim Babushkin Maxim Babushkin
            ACM QE Team
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated:
              Resolved: