-
Bug
-
Resolution: Not a Bug
-
Undefined
-
None
-
4.12
-
None
-
Quality / Stability / Reliability
-
False
-
-
None
-
Moderate
-
None
-
None
-
Rejected
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
Description of problem:
The output of oc get cm -n openshift-ovn-kubernetes ovn-kubernetes-master -o json | jq '.metadata.annotations' is at odds with the leader northbound db as reported by oc exec -n openshift-ovn-kubernetes ovnkube-master-hc4tc -- /usr/bin/ovn-appctl -t /var/run/ovn/ovnnb_db.ctl --timeout=3 cluster/status OVN_Northbound
Version-Release number of selected component (if applicable):
I tested this against 4.12 but I assume it affects other versions
How reproducible:
Does not report incorrectly all the time but I have observed it twice on 4.12 cluster.
Steps to Reproduce:
1.Run:
oc get cm -n openshift-ovn-kubernetes ovn-kubernetes-master -o json | jq '.metadata.annotations'
It reports
{
"control-plane.alpha.kubernetes.io/leader": "{\"holderIdentity\":\"ip-10-0-223-230.us-west-1.compute.internal\",\"leaseDurationSeconds\":60,\"acquireTime\":\"2023-01-10T09:18:16Z\",\"renewTime\":\"2023-01-10T10:53:33Z\",\"leaderTransitions\":2}"
}
So this output indicates ip-10-0-223-230.us-west-1.compute.internal is the leader. The supposed leader is highlighted below:
$ oc get po -o wide -n openshift-ovn-kubernetes
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
ovnkube-master-2msln 6/6 Running 1 (111m ago) 117m 10.0.159.67 ip-10-0-159-67.us-west-1.compute.internal <none> <none>
ovnkube-master-hc4tc 6/6 Running 0 117m 10.0.223.230 ip-10-0-223-230.us-west-1.compute.internal <none> <none>
ovnkube-master-w7p9l 6/6 Running 1 (103m ago) 117m 10.0.162.177 ip-10-0-162-177.us-west-1.compute.internal <none> <none>
ovnkube-node-4ggb2 5/5 Running 0 117m 10.0.159.67 ip-10-0-159-67.us-west-1.compute.internal <none> <none>
ovnkube-node-54wmz 5/5 Running 0 108m 10.0.146.216 ip-10-0-146-216.us-west-1.compute.internal <none> <none>
ovnkube-node-7j7rl 5/5 Running 0 117m 10.0.162.177 ip-10-0-162-177.us-west-1.compute.internal <none> <none>
ovnkube-node-j2tqd 5/5 Running 0 107m 10.0.171.199 ip-10-0-171-199.us-west-1.compute.internal <none> <none>
ovnkube-node-k4fxw 5/5 Running 1 (108m ago) 108m 10.0.212.66 ip-10-0-212-66.us-west-1.compute.internal <none> <none>
ovnkube-node-srks9 5/5 Running 0 117m 10.0.223.230 ip-10-0-223-230.us-west-1.compute.internal <none> <none>
2. I ran status check on this pod and it says role is follower with the leader being Leader: 9155 which is 10.0.159.67 which suggests the leader is ovnkube-master-2msln. So that looks inconsistent, someone is reporting this incorrectly.
$ oc exec -n openshift-ovn-kubernetes ovnkube-master-hc4tc -- /usr/bin/ovn-appctl -t /var/run/ovn/ovnnb_db.ctl --timeout=3 cluster/status OVN_Northbound
Defaulted container "northd" out of: northd, nbdb, kube-rbac-proxy, sbdb, ovnkube-master, ovn-dbchecker
12c4
Name: OVN_Northbound
Cluster ID: cecc (cecc7ea8-9fbc-457d-889d-c01fd278aae5)
Server ID: 12c4 (12c43ba4-90bc-48b2-9cd5-d599266b20f6)
Address: ssl:10.0.223.230:9643
Status: cluster member
Role: follower
Term: 2
Leader: 9155
Vote: unknown
Election timer: 10000
Log: [2, 2333]
Entries not yet committed: 0
Entries not yet applied: 0
Connections: ->0000 <-9155 <-7c91 ->7c91
Disconnections: 0
Servers:
7c91 (7c91 at ssl:10.0.162.177:9643) last msg 7114020 ms ago
12c4 (12c4 at ssl:10.0.223.230:9643) (self)
9155 (9155 at ssl:10.0.159.67:9643) last msg 2486 ms ago
10.0.159.67 is the leader from this output which does not match output from oc get cm -n openshift-ovn-kubernetes ovn-kubernetes-master -o json | jq '.metadata.annotations'
Actual results:
mismatch in output reporting which pod is the leader between two commands
Expected results:
I would expect both commands to report the same leader.
Additional info:
NOTE: That command runs as a readiness probe in the ovnkube-master pods. You can see it like this: oc get pods/ovnkube-master-jljfs -n openshift-ovn-kubernetes -o json | jq '.spec.containers[] | select(.name=="nbdb") | .readinessProbe'