-
Bug
-
Resolution: Done-Errata
-
Normal
-
4.17
This is a clone of issue OCPBUGS-42200. The following is the description of the original issue:
—
This is a clone of issue OCPBUGS-41920. The following is the description of the original issue:
—
Description of problem:
When we move one node from one custom MCP to another custom MCP, the MCPs are reporting a wrong number of nodes.
For example, we reach this situation (worker-perf MCP is not reporting the right number of nodes)
$ oc get mcp,nodes
NAME CONFIG UPDATED UPDATING DEGRADED MACHINECOUNT READYMACHINECOUNT UPDATEDMACHINECOUNT DEGRADEDMACHINECOUNT AGE
machineconfigpool.machineconfiguration.openshift.io/master rendered-master-c8d23b071e1ccf6cf85c7f1b31c0def6 True False False 3 3 3 0 142m
machineconfigpool.machineconfiguration.openshift.io/worker rendered-worker-36ee1fdc485685ac9c324769889c3348 True False False 1 1 1 0 142m
machineconfigpool.machineconfiguration.openshift.io/worker-perf rendered-worker-perf-6b5fbffac62c3d437e307e849c44b556 True False False 2 2 2 0 24m
machineconfigpool.machineconfiguration.openshift.io/worker-perf-canary rendered-worker-perf-canary-6b5fbffac62c3d437e307e849c44b556 True False False 1 1 1 0 7m52s
NAME STATUS ROLES AGE VERSION
node/ip-10-0-13-228.us-east-2.compute.internal Ready worker,worker-perf-canary 138m v1.30.4
node/ip-10-0-2-250.us-east-2.compute.internal Ready control-plane,master 145m v1.30.4
node/ip-10-0-34-223.us-east-2.compute.internal Ready control-plane,master 144m v1.30.4
node/ip-10-0-35-61.us-east-2.compute.internal Ready worker,worker-perf 136m v1.30.4
node/ip-10-0-79-232.us-east-2.compute.internal Ready control-plane,master 144m v1.30.4
node/ip-10-0-86-124.us-east-2.compute.internal Ready worker 139m v1.30.4
After 20 minutes or half an hour the MCPs start reporting the right number of nodes
Version-Release number of selected component (if applicable):
IPI on AWS version:
$ oc get clusterversion
NAME VERSION AVAILABLE PROGRESSING SINCE STATUS
version 4.17.0-0.nightly-2024-09-13-040101 True False 124m Cluster version is 4.17.0-0.nightly-2024-09-13-040101
How reproducible:
Always
Steps to Reproduce:
1. Create a MCP
oc create -f - << EOF
apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfigPool
metadata:
name: worker-perf
spec:
machineConfigSelector:
matchExpressions:
- {
key: machineconfiguration.openshift.io/role,
operator: In,
values: [worker,worker-perf]
}
nodeSelector:
matchLabels:
node-role.kubernetes.io/worker-perf: ""
EOF
2. Add 2 nodes to the MCP
$ oc label node $(oc get nodes -l node-role.kubernetes.io/worker -ojsonpath="{.items[0].metadata.name}") node-role.kubernetes.io/worker-perf=
$ oc label node $(oc get nodes -l node-role.kubernetes.io/worker -ojsonpath="{.items[1].metadata.name}") node-role.kubernetes.io/worker-perf=
3. Create another MCP
oc create -f - << EOF
apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfigPool
metadata:
name: worker-perf-canary
spec:
machineConfigSelector:
matchExpressions:
- {
key: machineconfiguration.openshift.io/role,
operator: In,
values: [worker,worker-perf,worker-perf-canary]
}
nodeSelector:
matchLabels:
node-role.kubernetes.io/worker-perf-canary: ""
EOF
3. Move one node from the MCP created in step 1 to the MCP created in step 3
$ oc label node $(oc get nodes -l node-role.kubernetes.io/worker -ojsonpath="{.items[0].metadata.name}") node-role.kubernetes.io/worker-perf-canary=
$ oc label node $(oc get nodes -l node-role.kubernetes.io/worker -ojsonpath="{.items[0].metadata.name}") node-role.kubernetes.io/worker-perf-
Actual results:
The worker-perf pool is not reporting the right number of nodes. It continues reporting 2 nodes even though one of them was moved to the worker-perf-canary MCP.
$ oc get mcp,nodes
NAME CONFIG UPDATED UPDATING DEGRADED MACHINECOUNT READYMACHINECOUNT UPDATEDMACHINECOUNT DEGRADEDMACHINECOUNT AGE
machineconfigpool.machineconfiguration.openshift.io/master rendered-master-c8d23b071e1ccf6cf85c7f1b31c0def6 True False False 3 3 3 0 142m
machineconfigpool.machineconfiguration.openshift.io/worker rendered-worker-36ee1fdc485685ac9c324769889c3348 True False False 1 1 1 0 142m
machineconfigpool.machineconfiguration.openshift.io/worker-perf rendered-worker-perf-6b5fbffac62c3d437e307e849c44b556 True False False 2 2 2 0 24m
machineconfigpool.machineconfiguration.openshift.io/worker-perf-canary rendered-worker-perf-canary-6b5fbffac62c3d437e307e849c44b556 True False False 1 1 1 0 7m52s
NAME STATUS ROLES AGE VERSION
node/ip-10-0-13-228.us-east-2.compute.internal Ready worker,worker-perf-canary 138m v1.30.4
node/ip-10-0-2-250.us-east-2.compute.internal Ready control-plane,master 145m v1.30.4
node/ip-10-0-34-223.us-east-2.compute.internal Ready control-plane,master 144m v1.30.4
node/ip-10-0-35-61.us-east-2.compute.internal Ready worker,worker-perf 136m v1.30.4
node/ip-10-0-79-232.us-east-2.compute.internal Ready control-plane,master 144m v1.30.4
node/ip-10-0-86-124.us-east-2.compute.internal Ready worker 139m v1.30.4
Expected results:
MCPs should always report the right number of nodes
Additional info:
It is very similar to this other issue
https://bugzilla.redhat.com/show_bug.cgi?id=2090436
That was discussed in this slack conversation
https://redhat-internal.slack.com/archives/C02CZNQHGN8/p1653479831004619
- blocks
-
OCPBUGS-43575 MCPs report wrong number of nodes when we move nodes from one custom MCP to another custom MCP
-
- Closed
-
- clones
-
OCPBUGS-42200 MCPs report wrong number of nodes when we move nodes from one custom MCP to another custom MCP
-
- Closed
-
- is blocked by
-
OCPBUGS-42200 MCPs report wrong number of nodes when we move nodes from one custom MCP to another custom MCP
-
- Closed
-
- is cloned by
-
OCPBUGS-43575 MCPs report wrong number of nodes when we move nodes from one custom MCP to another custom MCP
-
- Closed
-
- links to
-
RHBA-2024:8415
OpenShift Container Platform 4.16.z bug fix update