Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-35385

After recovering from etcd backup, ovnkube-node pod which located in the lost control plane host in CrashLoopBackOff state

XMLWordPrintable

    • No
    • SDN Sprint 259, SDN Sprint 260, SDN Sprint 261, SDN Sprint 262, SDN Sprint 263, SDN Sprint 264, SDN Sprint 265
    • 7
    • False
    • Hide

      None

      Show
      None

      Description of problem:
      In an UPI installed OCP, after recovering from etcd backup, ovnkube-node pod which located in the lost control plane host in CrashLoopBackOff state

      1. oc get po -o wide -n openshift-ovn-kubernetes
        NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
        ovnkube-control-plane-5cdd496c9b-dbxq2 2/2 Running 0 40m 192.168.1.104 master-2.ocp4.example.com <none> <none>
        ovnkube-control-plane-5cdd496c9b-wb5bn 2/2 Running 0 40m 192.168.1.103 master-1.ocp4.example.com <none> <none>
        ovnkube-node-4cltd 8/8 Running 1 (32m ago) 32m 192.168.1.106 worker-1.ocp4.example.com <none> <none>
        ovnkube-node-6dvlh 8/8 Running 1 (33m ago) 33m 192.168.1.105 worker-0.ocp4.example.com <none> <none>
        ovnkube-node-6jvsj 8/8 Running 1 (38m ago) 38m 192.168.1.102 master-0.ocp4.example.com <none> <none>
        ovnkube-node-8mvjw 7/8 CrashLoopBackOff 8 (20s ago) 35m 192.168.1.104 master-2.ocp4.example.com <none> <none>
        ovnkube-node-z2kv8 7/8 CrashLoopBackOff 12 (2m1s ago) 36m 192.168.1.103 master-1.ocp4.example.com <none> <none>

      logs in the ovnkube-node-8mvjw pod:

      2024-06-13T03:25:14.487Z|00218|ovsdb_idl|WARN|transaction error:

      {"details":"Transaction causes multiple rows in \"Encap\" table to have identical values (geneve and \"192.168.1.104\") for index on columns \"type\" and \"ip\". First row, with UUID 738f583e-5dde-4ed8-a447-cd9af95a0d53, existed in the database before this transaction and was not modified by the transaction. Second row, with UUID 4e73bcf3-49b8-44ab-be82-d8c3a78da4bb, was inserted by this transaction.","error":"constraint violation"}

      Version-Release number of selected component (if applicable):

      How reproducible:

      Steps to Reproduce:

      1. Backup etcd
      https://docs.openshift.com/container-platform/4.15/backup_and_restore/control_plane_backup_and_restore/backing-up-etcd.html#backing-up-etcd-data_backup-etcd

      2. Restore 2 master nodes according to below doc:
      https://docs.openshift.com/container-platform/4.15/backup_and_restore/control_plane_backup_and_restore/disaster_recovery/scenario-2-restoring-cluster-state.html

      Actual results:
      ovnkube-node pods which located in the lost control plane host in CrashLoopBackOff state

      Expected results:
      Cluster can be restored successfully

      Additional info:
      Issue doesn't happened when the new master node's IP changed. Tested in AWS IPI env.

              jcaamano@redhat.com Jaime CaamaƱo Ruiz
              rhn-support-yhuang Ying Huang
              Anurag Saxena Anurag Saxena
              Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

                Created:
                Updated:
                Resolved: