Loading...

XML

Word

Printable

Type: Bug
Resolution: Not a Bug
Priority: Undefined
Fix Version/s: None
Affects Version/s: 4.12
Component/s: Networking / ovn-kubernetes
Labels:
None

Severity:
Moderate
Regression:
None
Release Blocker:
Rejected
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Target Version:

4.12

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

Description of problem:

The output of oc get cm -n openshift-ovn-kubernetes ovn-kubernetes-master -o json | jq '.metadata.annotations' is at odds with the leader northbound db as reported by oc exec -n openshift-ovn-kubernetes ovnkube-master-hc4tc -- /usr/bin/ovn-appctl -t /var/run/ovn/ovnnb_db.ctl --timeout=3 cluster/status OVN_Northbound

Version-Release number of selected component (if applicable):

I tested this against 4.12 but I assume it affects other versions

How reproducible:

Does not report incorrectly all the time but I have observed it twice on 4.12 cluster.

Steps to Reproduce:

1.Run: 
oc get cm -n openshift-ovn-kubernetes ovn-kubernetes-master -o json | jq '.metadata.annotations' 
It reports 
{
  "control-plane.alpha.kubernetes.io/leader": "{\"holderIdentity\":\"ip-10-0-223-230.us-west-1.compute.internal\",\"leaseDurationSeconds\":60,\"acquireTime\":\"2023-01-10T09:18:16Z\",\"renewTime\":\"2023-01-10T10:53:33Z\",\"leaderTransitions\":2}"
}
So this output indicates ip-10-0-223-230.us-west-1.compute.internal is the leader. The supposed leader is highlighted below: 

$ oc get po -o wide -n openshift-ovn-kubernetes
NAME                   READY   STATUS    RESTARTS       AGE    IP             NODE                                         NOMINATED NODE   READINESS GATES
ovnkube-master-2msln   6/6     Running   1 (111m ago)   117m   10.0.159.67    ip-10-0-159-67.us-west-1.compute.internal    <none>           <none>
ovnkube-master-hc4tc   6/6     Running   0              117m   10.0.223.230   ip-10-0-223-230.us-west-1.compute.internal   <none>           <none>
ovnkube-master-w7p9l   6/6     Running   1 (103m ago)   117m   10.0.162.177   ip-10-0-162-177.us-west-1.compute.internal   <none>           <none>
ovnkube-node-4ggb2     5/5     Running   0              117m   10.0.159.67    ip-10-0-159-67.us-west-1.compute.internal    <none>           <none>
ovnkube-node-54wmz     5/5     Running   0              108m   10.0.146.216   ip-10-0-146-216.us-west-1.compute.internal   <none>           <none>
ovnkube-node-7j7rl     5/5     Running   0              117m   10.0.162.177   ip-10-0-162-177.us-west-1.compute.internal   <none>           <none>
ovnkube-node-j2tqd     5/5     Running   0              107m   10.0.171.199   ip-10-0-171-199.us-west-1.compute.internal   <none>           <none>
ovnkube-node-k4fxw     5/5     Running   1 (108m ago)   108m   10.0.212.66    ip-10-0-212-66.us-west-1.compute.internal    <none>           <none>
ovnkube-node-srks9     5/5     Running   0              117m   10.0.223.230   ip-10-0-223-230.us-west-1.compute.internal   <none>           <none>

2. I ran status check on this pod and it says role is follower with the leader being Leader: 9155 which is 10.0.159.67 which suggests the leader is ovnkube-master-2msln. So that looks inconsistent, someone is reporting this incorrectly. 
$ oc exec -n openshift-ovn-kubernetes ovnkube-master-hc4tc -- /usr/bin/ovn-appctl -t /var/run/ovn/ovnnb_db.ctl --timeout=3 cluster/status OVN_Northbound
Defaulted container "northd" out of: northd, nbdb, kube-rbac-proxy, sbdb, ovnkube-master, ovn-dbchecker
12c4
Name: OVN_Northbound
Cluster ID: cecc (cecc7ea8-9fbc-457d-889d-c01fd278aae5)
Server ID: 12c4 (12c43ba4-90bc-48b2-9cd5-d599266b20f6)
Address: ssl:10.0.223.230:9643
Status: cluster member
Role: follower
Term: 2
Leader: 9155
Vote: unknown

Election timer: 10000
Log: [2, 2333]
Entries not yet committed: 0
Entries not yet applied: 0
Connections: ->0000 <-9155 <-7c91 ->7c91
Disconnections: 0
Servers:
    7c91 (7c91 at ssl:10.0.162.177:9643) last msg 7114020 ms ago
    12c4 (12c4 at ssl:10.0.223.230:9643) (self)
    9155 (9155 at ssl:10.0.159.67:9643) last msg 2486 ms ago

10.0.159.67 is the leader from this output which does not match output from oc get cm -n openshift-ovn-kubernetes ovn-kubernetes-master -o json | jq '.metadata.annotations'

Actual results:

mismatch in output reporting which pod is the leader between two commands

Expected results:

I would expect both commands to report the same leader.

Additional info:

NOTE: That command runs as a readiness probe in the ovnkube-master pods.  You can see it like this: oc get pods/ovnkube-master-jljfs -n openshift-ovn-kubernetes -o json | jq '.spec.containers[] | select(.name=="nbdb") | .readinessProbe'

Assignee:: Ben Bennett

Reporter:: Kevin Quinn

QA Contact:: Anurag Saxena

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Created:: 2023/01/10 11:17 AM

Updated:: 2023/01/13 11:37 AM

Resolved:: 2023/01/13 11:37 AM

Details

Description

Attachments

Easy Agile Planning Poker

Activity

People

Dates