Loading...

XML

Word

Printable

Type: Bug
Resolution: Won't Do
Priority: Normal
Fix Version/s: None
Affects Version/s: 4.15.z
Component/s: Networking / ovn-kubernetes
Labels:
- SDN:Platform:OVNK

Activity Type:
Quality / Stability / Reliability
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Story Points:
None
Severity:
None
Regression:
No

Target Backport Versions:
None
Target Version:
None
Release Blocker:
None
Sprint:
SDN Sprint 259, SDN Sprint 260, SDN Sprint 261, SDN Sprint 262, SDN Sprint 263, SDN Sprint 264, SDN Sprint 265
sprint_count:
7

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

PX Review Complete:
PX Priority Data:
PX Impact Score:

Release Note Status:
None
Release Note Type:
None
Release Note Text:
None

Escape Reason:
None
Escape Impact:
None
Corrective Measures:
None
SDLC stage when should've been found:
None

Description of problem:
In an UPI installed OCP, after recovering from etcd backup, ovnkube-node pod which located in the lost control plane host in CrashLoopBackOff state

oc get po -o wide -n openshift-ovn-kubernetes
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
ovnkube-control-plane-5cdd496c9b-dbxq2 2/2 Running 0 40m 192.168.1.104 master-2.ocp4.example.com <none> <none>
ovnkube-control-plane-5cdd496c9b-wb5bn 2/2 Running 0 40m 192.168.1.103 master-1.ocp4.example.com <none> <none>
ovnkube-node-4cltd 8/8 Running 1 (32m ago) 32m 192.168.1.106 worker-1.ocp4.example.com <none> <none>
ovnkube-node-6dvlh 8/8 Running 1 (33m ago) 33m 192.168.1.105 worker-0.ocp4.example.com <none> <none>
ovnkube-node-6jvsj 8/8 Running 1 (38m ago) 38m 192.168.1.102 master-0.ocp4.example.com <none> <none>
ovnkube-node-8mvjw 7/8 CrashLoopBackOff 8 (20s ago) 35m 192.168.1.104 master-2.ocp4.example.com <none> <none>
ovnkube-node-z2kv8 7/8 CrashLoopBackOff 12 (2m1s ago) 36m 192.168.1.103 master-1.ocp4.example.com <none> <none>

logs in the ovnkube-node-8mvjw pod:

2024-06-13T03:25:14.487Z|00218|ovsdb_idl|WARN|transaction error:

{"details":"Transaction causes multiple rows in \"Encap\" table to have identical values (geneve and \"192.168.1.104\") for index on columns \"type\" and \"ip\". First row, with UUID 738f583e-5dde-4ed8-a447-cd9af95a0d53, existed in the database before this transaction and was not modified by the transaction. Second row, with UUID 4e73bcf3-49b8-44ab-be82-d8c3a78da4bb, was inserted by this transaction.","error":"constraint violation"}

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1. Backup etcd
https://docs.openshift.com/container-platform/4.15/backup_and_restore/control_plane_backup_and_restore/backing-up-etcd.html#backing-up-etcd-data_backup-etcd

2. Restore 2 master nodes according to below doc:
https://docs.openshift.com/container-platform/4.15/backup_and_restore/control_plane_backup_and_restore/disaster_recovery/scenario-2-restoring-cluster-state.html

Actual results:
ovnkube-node pods which located in the lost control plane host in CrashLoopBackOff state

Expected results:
Cluster can be restored successfully

Additional info:
Issue doesn't happened when the new master node's IP changed. Tested in AWS IPI env.

is depended on by

OCPSTRAT-989 Backup/restore for Hosted Clusters for Self-Managed HCP Part I

Closed

Assignee:: Jaime Caamaño Ruiz

Reporter:: Ying Huang

Need Info From:: None

Contributors:: None

QA Contact:: Anurag Saxena

Doc Contact:: None

Votes:: 0 Vote for this issue

Watchers:: 9 Start watching this issue

Created:: 2024/06/13 3:40 AM

Updated:: 2025/09/13 6:52 PM

Resolved:: 2025/01/14 7:18 PM

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates