Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-25871

Failure in OCP Upgrade from 4.11 to 4.12 Due to Network Operator Degradation with OVN-K Network type

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Duplicate
    • Icon: Undefined Undefined
    • None
    • 4.11
    • None
    • Important
    • No
    • False
    • Hide

      None

      Show
      None

      Description of problem:

      During the upgrade mutlijob for OCP starting from version 4.10 with OVNkubernetes network type on OSP 16.2, the upgrade process encountered an error when upgrading from version 4.11 to 4.12. 
      The operator 'network' is degraded.

      Version-Release number of selected component (if applicable):

      4.11.55 to 4.12.46
      OVNkubernetes network
      RHOS-16.2-RHEL-8-20230510.n.1

      How reproducible:

      Always

      Steps to Reproduce:

      1.Begin the OCP upgrade process starting from version 4.10
      2.Upgrade from 4.10 to 4.11
      3.Upgrade from 4.11 to 4.12     

      Actual results:

      'network' operator is degraded.

      Expected results:

      Smooth upgrade from 4.11 to 4.12 without any issues.

      Additional info:

      $ oc get co
      NAME                                       VERSION   AVAILABLE   PROGRESSING   DEGRADED   SINCE   MESSAGE
      authentication                             4.12.46   True        False         False      14h     
      baremetal                                  4.12.46   True        False         False      16h     
      cloud-controller-manager                   4.12.46   True        False         False      16h     
      cloud-credential                           4.12.46   True        False         False      16h     
      cluster-autoscaler                         4.12.46   True        False         False      16h     
      config-operator                            4.12.46   True        False         False      16h     
      console                                    4.12.46   True        False         False      14h     
      control-plane-machine-set                  4.12.46   True        False         False      13h     
      csi-snapshot-controller                    4.12.46   True        False         False      16h     
      dns                                        4.11.55   True        False         False      16h     
      etcd                                       4.12.46   True        False         False      16h     
      image-registry                             4.12.46   True        False         False      13h     
      ingress                                    4.12.46   True        False         False      15h     
      insights                                   4.12.46   True        False         False      16h     
      kube-apiserver                             4.12.46   True        False         False      16h     
      kube-controller-manager                    4.12.46   True        False         False      16h     
      kube-scheduler                             4.12.46   True        False         False      16h     
      kube-storage-version-migrator              4.12.46   True        False         False      14h     
      machine-api                                4.12.46   True        False         False      16h     
      machine-approver                           4.12.46   True        False         False      16h     
      machine-config                             4.11.55   True        False         False      14h     
      marketplace                                4.12.46   True        False         False      16h     
      monitoring                                 4.12.46   True        False         False      15h     
      network                                    4.11.55   True        False         True       16h     Error while updating operator configuration: could not apply (apps/v1, Kind=DaemonSet) openshift-ovn-kubernetes/ovnkube-master: failed to apply / update (apps/v1, Kind=DaemonSet) openshift-ovn-kubernetes/ovnkube-master: DaemonSet.apps "ovnkube-master" is invalid: [spec.template.spec.containers[1].lifecycle.preStop: Required value: must specify a handler type, spec.template.spec.containers[3].lifecycle.preStop: Required value: must specify a handler type]
      node-tuning                                4.12.46   True        False         False      13h     
      openshift-apiserver                        4.12.46   True        False         False      16h     
      openshift-controller-manager               4.12.46   True        False         False      13h     
      openshift-samples                          4.12.46   True        False         False      13h     
      operator-lifecycle-manager                 4.12.46   True        False         False      16h     
      operator-lifecycle-manager-catalog         4.12.46   True        False         False      16h     
      operator-lifecycle-manager-packageserver   4.12.46   True        False         False      16h     
      service-ca                                 4.12.46   True        False         False      16h     
      storage                                    4.12.46   True        False         False      16h  
      
      
      $ oc get pods -n openshift-ovn-kubernetes 
      NAME                   READY   STATUS    RESTARTS      AGE
      ovnkube-master-9p9g4   6/6     Running   6             14h
      ovnkube-master-x4hmq   6/6     Running   6             14h
      ovnkube-master-xgbch   6/6     Running   6             14h
      ovnkube-node-25jjs     5/5     Running   0             12h
      ovnkube-node-b2qw8     5/5     Running   0             12h
      ovnkube-node-ckw65     5/5     Running   3 (12h ago)   12h
      ovnkube-node-gqssc     5/5     Running   0             12h
      ovnkube-node-l4dcl     5/5     Running   0             12h
      ovnkube-node-l5b7p     5/5     Running   0             12h
      
      $ oc logs daemonset/ovnkube-master -n openshift-ovn-kubernetes
      Found 3 pods, using pod/ovnkube-master-x4hmq
      Defaulted container "northd" out of: northd, nbdb, kube-rbac-proxy, sbdb, ovnkube-master, ovn-dbchecker
      + [[ -f /env/_master ]]
      + trap quit TERM INT
      ++ date -Iseconds
      + echo '2023-12-24T16:38:59+00:00 - starting ovn-northd'
      2023-12-24T16:38:59+00:00 - starting ovn-northd
      + wait 8
      + exec ovn-northd --no-chdir -vconsole:info -vfile:off '-vPATTERN:console:%D{%Y-%m-%dT%H:%M:%S.###Z}|%05N|%c%T|%p|%m' --ovnnb-db ssl:10.196.2.199:9641,ssl:10.196.2.200:9641,ssl:10.196.2.27:9641 --ovnsb-db ssl:10.196.2.199:9642,ssl:10.196.2.200:9642,ssl:10.196.2.27:9642 --pidfile /var/run/ovn/ovn-northd.pid --n-threads=4 -p /ovn-cert/tls.key -c /ovn-cert/tls.crt -C /ovn-ca/ca-bundle.crt
      2023-12-24T16:39:00.103Z|00001|ovn_northd|INFO|Using 4 threads
      2023-12-24T16:39:00.103Z|00002|ovn_northd|INFO|OVN internal version is : [22.12.3-20.27.0-70.6]
      2023-12-24T16:39:00.104Z|00003|ovn_parallel_hmap|INFO|Setting thread count to 4
      2023-12-24T16:39:00.104Z|00004|ovn_parallel_hmap|INFO|Creating new pool with size 4
      2023-12-24T16:39:00.111Z|00005|reconnect|INFO|ssl:10.196.2.199:9641: connecting...
      2023-12-24T16:39:00.111Z|00006|reconnect|INFO|ssl:10.196.2.199:9641: connection attempt failed (Connection refused)
      2023-12-24T16:39:00.111Z|00007|reconnect|INFO|ssl:10.196.2.27:9641: connecting...
      2023-12-24T16:39:00.111Z|00008|ovn_northd|INFO|OVN NB IDL reconnected, force recompute.
      2023-12-24T16:39:00.111Z|00009|reconnect|INFO|ssl:10.196.2.27:9642: connecting...
      2023-12-24T16:39:00.111Z|00010|ovn_northd|INFO|OVN SB IDL reconnected, force recompute.
      2023-12-24T16:39:00.120Z|00011|reconnect|INFO|ssl:10.196.2.27:9641: connected
      2023-12-24T16:39:00.124Z|00012|reconnect|INFO|ssl:10.196.2.27:9642: connected
      2023-12-24T16:39:10.104Z|00013|memory|INFO|24216 kB peak resident set size after 10.0 seconds
      2023-12-24T16:50:47.153Z|00014|stream_ssl|WARN|SSL_read: unexpected SSL connection close
      2023-12-24T16:50:47.153Z|00015|jsonrpc|WARN|ssl:10.196.2.27:9642: receive error: Protocol error
      2023-12-24T16:50:47.154Z|00016|reconnect|WARN|ssl:10.196.2.27:9642: connection dropped (Protocol error)
      2023-12-24T16:50:47.154Z|00017|reconnect|INFO|ssl:10.196.2.199:9642: connecting...
      2023-12-24T16:50:47.158Z|00018|reconnect|INFO|ssl:10.196.2.199:9642: connected
      2023-12-24T16:50:47.161Z|00019|ovn_northd|INFO|ovn-northd lock acquired. This ovn-northd instance is now active.
      2023-12-24T16:50:47.162Z|00020|ovsdb_cs|INFO|ssl:10.196.2.199:9642: clustered database server is not cluster leader; trying another server
      2023-12-24T16:50:47.162Z|00021|reconnect|INFO|ssl:10.196.2.199:9642: connection attempt timed out
      2023-12-24T16:50:47.162Z|00022|ovn_northd|INFO|ovn-northd lock lost. This ovn-northd instance is now on standby.
      2023-12-24T16:50:47.162Z|00023|reconnect|INFO|ssl:10.196.2.200:9642: connecting...
      2023-12-24T16:50:47.166Z|00024|reconnect|INFO|ssl:10.196.2.200:9642: connected
      2023-12-24T16:50:47.167Z|00025|ovn_northd|INFO|ovn-northd lock acquired. This ovn-northd instance is now active.
      2023-12-24T16:50:47.168Z|00026|ovsdb_cs|INFO|ssl:10.196.2.200:9642: clustered database server is not cluster leader; trying another server
      2023-12-24T16:50:47.168Z|00027|reconnect|INFO|ssl:10.196.2.200:9642: connection attempt timed out
      2023-12-24T16:50:47.169Z|00028|ovn_northd|INFO|ovn-northd lock lost. This ovn-northd instance is now on standby.
      2023-12-24T16:50:47.173Z|00029|stream_ssl|WARN|SSL_read: unexpected SSL connection close
      2023-12-24T16:50:47.173Z|00030|jsonrpc|WARN|ssl:10.196.2.27:9641: receive error: Protocol error
      2023-12-24T16:50:47.173Z|00031|reconnect|WARN|ssl:10.196.2.27:9641: connection dropped (Protocol error)
      2023-12-24T16:50:47.173Z|00032|reconnect|INFO|ssl:10.196.2.200:9641: connecting...
      2023-12-24T16:50:47.184Z|00033|reconnect|INFO|ssl:10.196.2.200:9641: connected
      2023-12-24T16:50:47.187Z|00034|ovsdb_cs|INFO|ssl:10.196.2.200:9641: clustered database server is not cluster leader; trying another server
      2023-12-24T16:50:47.187Z|00035|reconnect|INFO|ssl:10.196.2.200:9641: connection attempt timed out
      2023-12-24T16:50:48.169Z|00036|reconnect|INFO|ssl:10.196.2.27:9642: connecting...
      2023-12-24T16:50:48.169Z|00037|reconnect|INFO|ssl:10.196.2.27:9642: connection attempt failed (Connection refused)
      2023-12-24T16:50:48.170Z|00038|reconnect|INFO|ssl:10.196.2.27:9642: waiting 2 seconds before reconnect
      2023-12-24T16:50:48.188Z|00039|reconnect|INFO|ssl:10.196.2.199:9641: connecting...
      2023-12-24T16:50:48.201Z|00040|reconnect|INFO|ssl:10.196.2.199:9641: connected
      2023-12-24T16:50:48.205Z|00041|ovsdb_cs|INFO|ssl:10.196.2.199:9641: clustered database server is not cluster leader; trying another server
      2023-12-24T16:50:48.205Z|00042|reconnect|INFO|ssl:10.196.2.199:9641: connection attempt timed out
      2023-12-24T16:50:48.205Z|00043|reconnect|INFO|ssl:10.196.2.199:9641: waiting 2 seconds before reconnect
      2023-12-24T16:50:50.171Z|00044|reconnect|INFO|ssl:10.196.2.199:9642: connecting...
      2023-12-24T16:50:50.176Z|00045|reconnect|INFO|ssl:10.196.2.199:9642: connected
      2023-12-24T16:50:50.179Z|00046|ovn_northd|INFO|ovn-northd lock acquired. This ovn-northd instance is now active.
      2023-12-24T16:50:50.180Z|00047|ovsdb_cs|INFO|ssl:10.196.2.199:9642: clustered database server is not cluster leader; trying another server
      2023-12-24T16:50:50.180Z|00048|reconnect|INFO|ssl:10.196.2.199:9642: connection attempt timed out
      2023-12-24T16:50:50.181Z|00049|reconnect|INFO|ssl:10.196.2.199:9642: waiting 4 seconds before reconnect
      2023-12-24T16:50:50.181Z|00050|ovn_northd|INFO|ovn-northd lock lost. This ovn-northd instance is now on standby.
      2023-12-24T16:50:50.208Z|00051|reconnect|INFO|ssl:10.196.2.27:9641: connecting...
      2023-12-24T16:50:50.208Z|00052|reconnect|INFO|ssl:10.196.2.27:9641: connection attempt failed (Connection refused)
      2023-12-24T16:50:50.208Z|00053|reconnect|INFO|ssl:10.196.2.27:9641: waiting 4 seconds before reconnect
      2023-12-24T16:50:54.184Z|00054|reconnect|INFO|ssl:10.196.2.200:9642: connecting...
      2023-12-24T16:50:54.190Z|00055|reconnect|INFO|ssl:10.196.2.200:9642: connected
      2023-12-24T16:50:54.191Z|00056|ovn_northd|INFO|ovn-northd lock acquired. This ovn-northd instance is now active.
      2023-12-24T16:50:54.192Z|00057|ovsdb_cs|INFO|ssl:10.196.2.200:9642: clustered database server is not cluster leader; trying another server
      2023-12-24T16:50:54.192Z|00058|reconnect|INFO|ssl:10.196.2.200:9642: connection attempt timed out
      2023-12-24T16:50:54.192Z|00059|reconnect|INFO|ssl:10.196.2.200:9642: continuing to reconnect in the background but suppressing further logging
      2023-12-24T16:50:54.192Z|00060|ovn_northd|INFO|ovn-northd lock lost. This ovn-northd instance is now on standby.
      2023-12-24T16:50:54.209Z|00061|reconnect|INFO|ssl:10.196.2.200:9641: connecting...
      2023-12-24T16:50:54.212Z|00062|reconnect|INFO|ssl:10.196.2.200:9641: connected
      2023-12-24T16:50:54.216Z|00063|ovsdb_cs|INFO|ssl:10.196.2.200:9641: clustered database server is not cluster leader; trying another server
      2023-12-24T16:50:54.216Z|00064|reconnect|INFO|ssl:10.196.2.200:9641: connection attempt timed out
      2023-12-24T16:50:54.216Z|00065|reconnect|INFO|ssl:10.196.2.200:9641: continuing to reconnect in the background but suppressing further logging
      2023-12-24T16:51:02.224Z|00066|reconnect|INFO|ssl:10.196.2.199:9641: connected
      2023-12-24T16:51:02.226Z|00067|ovsdb_cs|INFO|ssl:10.196.2.199:9641: clustered database server is not cluster leader; trying another server
      2023-12-24T16:51:10.218Z|00068|reconnect|INFO|ssl:10.196.2.199:9642: connected
      2023-12-24T16:51:18.252Z|00069|reconnect|INFO|ssl:10.196.2.200:9641: connected
      2023-12-24T16:58:32.608Z|00070|ovsdb_cs|INFO|ssl:10.196.2.199:9642: clustered database server is not cluster leader; trying another server
      2023-12-24T16:58:32.609Z|00071|reconnect|INFO|ssl:10.196.2.200:9642: connecting...
      2023-12-24T16:58:32.622Z|00072|reconnect|INFO|ssl:10.196.2.200:9642: connected
      2023-12-24T16:58:32.627Z|00073|ovsdb_cs|INFO|ssl:10.196.2.200:9642: clustered database server is not cluster leader; trying another server
      2023-12-24T16:58:32.627Z|00074|reconnect|INFO|ssl:10.196.2.200:9642: connection attempt timed out
      2023-12-24T16:58:32.630Z|00075|reconnect|INFO|ssl:10.196.2.27:9642: connecting...
      2023-12-24T16:58:32.635Z|00076|reconnect|INFO|ssl:10.196.2.27:9642: connected
      

            jluhrsen Jamo Luhrsen
            ykhodork Yaakov Khodorkovski
            Anurag Saxena Anurag Saxena
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

              Created:
              Updated:
              Resolved: