-
Bug
-
Resolution: Not a Bug
-
Undefined
-
None
-
4.12
-
None
-
Moderate
-
None
-
Rejected
-
False
-
Description of problem:
The output of oc get cm -n openshift-ovn-kubernetes ovn-kubernetes-master -o json | jq '.metadata.annotations' is at odds with the leader northbound db as reported by oc exec -n openshift-ovn-kubernetes ovnkube-master-hc4tc -- /usr/bin/ovn-appctl -t /var/run/ovn/ovnnb_db.ctl --timeout=3 cluster/status OVN_Northbound
Version-Release number of selected component (if applicable):
I tested this against 4.12 but I assume it affects other versions
How reproducible:
Does not report incorrectly all the time but I have observed it twice on 4.12 cluster.
Steps to Reproduce:
1.Run: oc get cm -n openshift-ovn-kubernetes ovn-kubernetes-master -o json | jq '.metadata.annotations' It reports { "control-plane.alpha.kubernetes.io/leader": "{\"holderIdentity\":\"ip-10-0-223-230.us-west-1.compute.internal\",\"leaseDurationSeconds\":60,\"acquireTime\":\"2023-01-10T09:18:16Z\",\"renewTime\":\"2023-01-10T10:53:33Z\",\"leaderTransitions\":2}" } So this output indicates ip-10-0-223-230.us-west-1.compute.internal is the leader. The supposed leader is highlighted below: $ oc get po -o wide -n openshift-ovn-kubernetes NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES ovnkube-master-2msln 6/6 Running 1 (111m ago) 117m 10.0.159.67 ip-10-0-159-67.us-west-1.compute.internal <none> <none> ovnkube-master-hc4tc 6/6 Running 0 117m 10.0.223.230 ip-10-0-223-230.us-west-1.compute.internal <none> <none> ovnkube-master-w7p9l 6/6 Running 1 (103m ago) 117m 10.0.162.177 ip-10-0-162-177.us-west-1.compute.internal <none> <none> ovnkube-node-4ggb2 5/5 Running 0 117m 10.0.159.67 ip-10-0-159-67.us-west-1.compute.internal <none> <none> ovnkube-node-54wmz 5/5 Running 0 108m 10.0.146.216 ip-10-0-146-216.us-west-1.compute.internal <none> <none> ovnkube-node-7j7rl 5/5 Running 0 117m 10.0.162.177 ip-10-0-162-177.us-west-1.compute.internal <none> <none> ovnkube-node-j2tqd 5/5 Running 0 107m 10.0.171.199 ip-10-0-171-199.us-west-1.compute.internal <none> <none> ovnkube-node-k4fxw 5/5 Running 1 (108m ago) 108m 10.0.212.66 ip-10-0-212-66.us-west-1.compute.internal <none> <none> ovnkube-node-srks9 5/5 Running 0 117m 10.0.223.230 ip-10-0-223-230.us-west-1.compute.internal <none> <none> 2. I ran status check on this pod and it says role is follower with the leader being Leader: 9155 which is 10.0.159.67 which suggests the leader is ovnkube-master-2msln. So that looks inconsistent, someone is reporting this incorrectly. $ oc exec -n openshift-ovn-kubernetes ovnkube-master-hc4tc -- /usr/bin/ovn-appctl -t /var/run/ovn/ovnnb_db.ctl --timeout=3 cluster/status OVN_Northbound Defaulted container "northd" out of: northd, nbdb, kube-rbac-proxy, sbdb, ovnkube-master, ovn-dbchecker 12c4 Name: OVN_Northbound Cluster ID: cecc (cecc7ea8-9fbc-457d-889d-c01fd278aae5) Server ID: 12c4 (12c43ba4-90bc-48b2-9cd5-d599266b20f6) Address: ssl:10.0.223.230:9643 Status: cluster member Role: follower Term: 2 Leader: 9155 Vote: unknown Election timer: 10000 Log: [2, 2333] Entries not yet committed: 0 Entries not yet applied: 0 Connections: ->0000 <-9155 <-7c91 ->7c91 Disconnections: 0 Servers: 7c91 (7c91 at ssl:10.0.162.177:9643) last msg 7114020 ms ago 12c4 (12c4 at ssl:10.0.223.230:9643) (self) 9155 (9155 at ssl:10.0.159.67:9643) last msg 2486 ms ago 10.0.159.67 is the leader from this output which does not match output from oc get cm -n openshift-ovn-kubernetes ovn-kubernetes-master -o json | jq '.metadata.annotations'
Actual results:
mismatch in output reporting which pod is the leader between two commands
Expected results:
I would expect both commands to report the same leader.
Additional info:
NOTE: That command runs as a readiness probe in the ovnkube-master pods. You can see it like this: oc get pods/ovnkube-master-jljfs -n openshift-ovn-kubernetes -o json | jq '.spec.containers[] | select(.name=="nbdb") | .readinessProbe'