Loading...

XML

Word

Printable

Type: Bug
Resolution: Done
Priority: Major
Fix Version/s: None
Affects Version/s: 4.12
Component/s: Networking / ovn-kubernetes
Labels:

Severity:
Moderate
Regression:
None
Release Blocker:
Rejected
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Internal Whiteboard:
Latest Status Summary:

Hide
11/28: Green as there's a fix targeted to 4.12; waiting on CI to pass.
11/28: added to the 4.12 gating list

Show
11/28: Green as there's a fix targeted to 4.12; waiting on CI to pass. 11/28: added to the 4.12 gating list
Target Version:

4.13.0

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

Description of problem:

After node reboot some pods on the rebooted node fail to start:
oc describe po -n openshift-kube-controller-manager kube-controller-manager-guard-master-1.kni-qe-31.lab.eng.rdu2.redhat.com
...
Events:                                                                                                                                                                                                                         Type     Reason                  Age   From          Message                                                                                                                                                                  ----     ------                  ----  ----          -------                                                                                                                                                                  Warning  ErrorAddingLogicalPort  41m   controlplane  deleteLogicalPort failed for pod openshift-kube-controller-manager_kube-controller-manager-guard-master-1.kni-qe-31.lab.eng.rdu2.redhat.com: cannot delete GR SNAT for
pod openshift-kube-controller-manager/kube-controller-manager-guard-master-1.kni-qe-31.lab.eng.rdu2.redhat.com: failed to delete SNAT rule for pod on gateway router GR_master-1.kni-qe-31.lab.eng.rdu2.redhat.com: error in t
ransact with ops [{Op:delete Table:NAT Row:map[] Rows:[] Columns:[] Mutations:[] Timeout:<nil> Where:[where column _uuid == {49c251e2-d559-49b1-ad66-56e2f95f3c4e}] Until: Durable:<nil> Comment:<nil> Lock:<nil> UUIDName:} {Op:delete Table:NAT Row:map[] Rows:[] Columns:[] Mutations:[] Timeout:<nil> Where:[where column _uuid == {ad95fc45-9360-4203-8ec5-95d79367dca1}] Until: Durable:<nil> Comment:<nil> Lock:<nil> UUIDName:} {Op:delete Table:NAT Row:map[] Rows:[] Columns:[] Mutations:[] Timeout:<nil> Where:[where column _uuid == {61cd5b51-4b35-4808-bb8b-fda76212340b}] Until: Durable:<nil> Comment:<nil> Lock:<nil> UUIDName:} {Op:delete Table:NAT Row:map[] Rows:[] Columns:[] Mutations:[] Timeout:<nil> Where:[where column _uuid == {7053c099-d7e1-4f55-955d-6cb36f82091e}] Until: Durable:<nil> Comment:<nil> Lock:<nil> UUIDName:} {Op:mutate Table:Logical_Router Row:map[] Rows:[] Columns:[] Mutations:[{Column:nat Mutator:delete Value:{GoSet:[{GoUUID:49c251e2-d559-49b1-ad66-56e2f95f3c4e} {GoUUID:ad95fc45-9360-4203-8ec5-95d79367dca1} {GoUUID:61cd5b51-4b35-4808-bb8b-fda76212340b} {GoUUID:7053c099-d7e1-4f55-955d-6cb36f82091e}]}}] Timeout:<nil> Where:[where column _uuid == {bb579280-ea53-4d11-8c4f-d9fc6702314b}] Until: Durable:<nil> Comment:<nil> Lock:<nil> UUIDName:}] results [{Count:1 Error: Details: UUID:{GoUUID:} Rows:[]} {C
ount:1 Error: Details: UUID:{GoUUID:} Rows:[]} {Count:1 Error: Details: UUID:{GoUUID:} Rows:[]} {Count:1 Error: Details: UUID:{GoUUID:} Rows:[]} {Count:1 Error: Details: UUID:{GoUUID:} Rows:[]} {Count:0 Error:referential i
ntegrity violation Details:cannot delete NAT row ad95fc45-9360-4203-8ec5-95d79367dca1 because of 1 remaining reference(s) UUID:{GoUUID:} Rows:[]}] and errors []: referential integrity violation: cannot delete NAT row ad95fc45-9360-4203-8ec5-95d79367dca1 because of 1 remaining reference(s)

and after some time new error even appears

   Warning  FailedCreatePodSandBox  38m   kubelet       Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_kube-controller-manager-guard-master-1.kni-qe-31.lab.eng.rdu2.redhat.com_openshift-kube-controller-manager_cbc6c67a-3b3a-4441-ae7c-f433a9b56895_0(f580c2844bd2bc42ce314871f38ca69bc642b80714b2c10bc0d212cfed75bf02): error adding pod openshift-kube-controller-manager_kube-controller-manager-guard-master-1.kni-qe-31.lab.eng.rdu2.redhat.com to CNI network "multus-cni-network": plugin type="multus" name="multus-cni-network" failed (add): [openshift-kube-controller-manager/kube-controller-manager-guard-master-1.kni-qe-31.lab.eng.rdu2.redhat.com/cbc6c67a-3b3a-4441-ae7c-f433a9b56895:ovn-kubernetes]: error adding container to network "ovn-kubernetes": CNI request failed with status 400: '[openshift-kube-controller-manager/kube-con
troller-manager-guard-master-1.kni-qe-31.lab.eng.rdu2.redhat.com f580c2844bd2bc42ce314871f38ca69bc642b80714b2c10bc0d212cfed75bf02] [openshift-kube-controller-manager/kube-controller-manager-guard-master-1.kni-qe-31.lab.eng
.rdu2.redhat.com f580c2844bd2bc42ce314871f38ca69bc642b80714b2c10bc0d212cfed75bf02] failed to get pod annotation: timed out waiting for annotations: context deadline exceeded                                                 '

Version-Release number of selected component (if applicable):

4.12.0-rc.0 (updated from 4.12.0-ec.5)

How reproducible:

so far just 1st attempt to perform update

Steps to Reproduce:

1. Cordon and drain the node
2. Reboot the node
3. Check pods scheduled on the rebooted nodes after nodes is back online

Actual results:

Some pods fail to start on rebooted node

Expected results:

All pods started on rebooted node

Additional info:

Baremetal dualstack cluster with schedulable masters and 2 workers(in another network) deployed with on premise Assisted Installer

links to

https://github.com/openshift/ovn-kubernetes/pull/1381

Assignee:: Jaime Caamaño Ruiz

Reporter:: Yurii Prokulevych

QA Contact:: Anurag Saxena

Votes:: 0 Vote for this issue

Watchers:: 9 Start watching this issue

Created:: 2022/11/15 1:59 PM

Updated:: 2023/05/17 10:32 PM

Resolved:: 2023/05/17 10:32 PM

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates