Loading...

XML

Word

Printable

Type: Bug
Resolution: Done-Errata
Priority: Normal
Fix Version/s: 4.15.z
Affects Version/s: 4.17
Component/s: Machine Config Operator
Labels:
- mco-triaged

Activity Type:
Quality / Stability / Reliability
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Story Points:
3
Severity:
Moderate
Regression:
No

Target Backport Versions:

4.17.z
Target Version:

4.15.z
Release Blocker:
None
Sprint:
MCO Sprint 261
sprint_count:
1

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

Release Note Status:
Done
Release Note Type:
Release Note Not Required
Release Note Text:
N/A

Escape Reason:
None
Escape Impact:
None
Corrective Measures:
None
SDLC stage when should've been found:
None

This is a clone of issue ~~OCPBUGS-42719~~. The following is the description of the original issue:
—
This is a clone of issue ~~OCPBUGS-42200~~. The following is the description of the original issue:
—
This is a clone of issue ~~OCPBUGS-41920~~. The following is the description of the original issue:
—
Description of problem:

When we move one node from one custom MCP to another custom MCP, the MCPs are reporting a wrong number of nodes.

For example, we reach this situation (worker-perf MCP is not reporting the right number of nodes)

$ oc get mcp,nodes
NAME                                                                     CONFIG                                                         UPDATED   UPDATING   DEGRADED   MACHINECOUNT   READYMACHINECOUNT   UPDATEDMACHINECOUNT   DEGRADEDMACHINECOUNT   AGE
machineconfigpool.machineconfiguration.openshift.io/master               rendered-master-c8d23b071e1ccf6cf85c7f1b31c0def6               True      False      False      3              3                   3                     0                      142m
machineconfigpool.machineconfiguration.openshift.io/worker               rendered-worker-36ee1fdc485685ac9c324769889c3348               True      False      False      1              1                   1                     0                      142m
machineconfigpool.machineconfiguration.openshift.io/worker-perf          rendered-worker-perf-6b5fbffac62c3d437e307e849c44b556          True      False      False      2              2                   2                     0                      24m
machineconfigpool.machineconfiguration.openshift.io/worker-perf-canary   rendered-worker-perf-canary-6b5fbffac62c3d437e307e849c44b556   True      False      False      1              1                   1                     0                      7m52s

NAME                                             STATUS   ROLES                       AGE    VERSION
node/ip-10-0-13-228.us-east-2.compute.internal   Ready    worker,worker-perf-canary   138m   v1.30.4
node/ip-10-0-2-250.us-east-2.compute.internal    Ready    control-plane,master        145m   v1.30.4
node/ip-10-0-34-223.us-east-2.compute.internal   Ready    control-plane,master        144m   v1.30.4
node/ip-10-0-35-61.us-east-2.compute.internal    Ready    worker,worker-perf          136m   v1.30.4
node/ip-10-0-79-232.us-east-2.compute.internal   Ready    control-plane,master        144m   v1.30.4
node/ip-10-0-86-124.us-east-2.compute.internal   Ready    worker                      139m   v1.30.4



After 20 minutes or half an hour the MCPs start reporting the right number of nodes

Version-Release number of selected component (if applicable):
IPI on AWS version:

$ oc get clusterversion
NAME VERSION AVAILABLE PROGRESSING SINCE STATUS
version 4.17.0-0.nightly-2024-09-13-040101 True False 124m Cluster version is 4.17.0-0.nightly-2024-09-13-040101

How reproducible:
Always

Steps to Reproduce:

    1. Create a MCP
    
     oc create -f - << EOF
apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfigPool
metadata:
  name: worker-perf
spec:
  machineConfigSelector:
    matchExpressions:
      - {
         key: machineconfiguration.openshift.io/role,
         operator: In,
         values: [worker,worker-perf]
        }
  nodeSelector:
    matchLabels:
      node-role.kubernetes.io/worker-perf: ""
EOF

    
    2. Add 2 nodes to the MCP
    
   $ oc label node $(oc get nodes -l node-role.kubernetes.io/worker -ojsonpath="{.items[0].metadata.name}") node-role.kubernetes.io/worker-perf=
   $ oc label node $(oc get nodes -l node-role.kubernetes.io/worker -ojsonpath="{.items[1].metadata.name}") node-role.kubernetes.io/worker-perf=

    3. Create another MCP
    oc create -f - << EOF
apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfigPool
metadata:
  name: worker-perf-canary
spec:
  machineConfigSelector:
    matchExpressions:
      - {
         key: machineconfiguration.openshift.io/role,
         operator: In,
         values: [worker,worker-perf,worker-perf-canary]
        }
  nodeSelector:
    matchLabels:
      node-role.kubernetes.io/worker-perf-canary: ""
EOF

    3. Move one node from the MCP created in step 1 to the MCP created in step 3
    $ oc label node $(oc get nodes -l node-role.kubernetes.io/worker -ojsonpath="{.items[0].metadata.name}") node-role.kubernetes.io/worker-perf-canary=
    $ oc label node $(oc get nodes -l node-role.kubernetes.io/worker -ojsonpath="{.items[0].metadata.name}") node-role.kubernetes.io/worker-perf-

Actual results:

The worker-perf pool is not reporting the right number of nodes. It continues reporting 2 nodes even though one of them was moved to the worker-perf-canary MCP.
$ oc get mcp,nodes
NAME                                                                     CONFIG                                                         UPDATED   UPDATING   DEGRADED   MACHINECOUNT   READYMACHINECOUNT   UPDATEDMACHINECOUNT   DEGRADEDMACHINECOUNT   AGE
machineconfigpool.machineconfiguration.openshift.io/master               rendered-master-c8d23b071e1ccf6cf85c7f1b31c0def6               True      False      False      3              3                   3                     0                      142m
machineconfigpool.machineconfiguration.openshift.io/worker               rendered-worker-36ee1fdc485685ac9c324769889c3348               True      False      False      1              1                   1                     0                      142m
machineconfigpool.machineconfiguration.openshift.io/worker-perf          rendered-worker-perf-6b5fbffac62c3d437e307e849c44b556          True      False      False      2              2                   2                     0                      24m
machineconfigpool.machineconfiguration.openshift.io/worker-perf-canary   rendered-worker-perf-canary-6b5fbffac62c3d437e307e849c44b556   True      False      False      1              1                   1                     0                      7m52s

NAME                                             STATUS   ROLES                       AGE    VERSION
node/ip-10-0-13-228.us-east-2.compute.internal   Ready    worker,worker-perf-canary   138m   v1.30.4
node/ip-10-0-2-250.us-east-2.compute.internal    Ready    control-plane,master        145m   v1.30.4
node/ip-10-0-34-223.us-east-2.compute.internal   Ready    control-plane,master        144m   v1.30.4
node/ip-10-0-35-61.us-east-2.compute.internal    Ready    worker,worker-perf          136m   v1.30.4
node/ip-10-0-79-232.us-east-2.compute.internal   Ready    control-plane,master        144m   v1.30.4
node/ip-10-0-86-124.us-east-2.compute.internal   Ready    worker                      139m   v1.30.4

Expected results:

MCPs should always report the right number of nodes

Additional info:

It is very similar to this other issue 
https://bugzilla.redhat.com/show_bug.cgi?id=2090436
That was discussed in this slack conversation
https://redhat-internal.slack.com/archives/C02CZNQHGN8/p1653479831004619

blocks

OCPBUGS-43980 MCPs report wrong number of nodes when we move nodes from one custom MCP to another custom MCP

Closed

clones

OCPBUGS-42719 MCPs report wrong number of nodes when we move nodes from one custom MCP to another custom MCP

Closed

is blocked by

OCPBUGS-42719 MCPs report wrong number of nodes when we move nodes from one custom MCP to another custom MCP

Closed

is cloned by

OCPBUGS-43980 MCPs report wrong number of nodes when we move nodes from one custom MCP to another custom MCP

Closed

links to

openshift/machine-config-operator#4647: [release-4.15] OCPBUGS-43575: MCPs report wrong number of nodes when we move nodes from one custom MCP to another custom MCP

RHBA-2024:8991 OpenShift Container Platform 4.15.z bug fix update

(1 links to)

Assignee:: David Joshy

Reporter:: OpenShift Prow Bot

Need Info From:: None

Contributors:: None

QA Contact:: Sergio Regidor de la Rosa

Doc Contact:: None

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Created:: 2024/10/18 7:58 PM

Updated:: 2025/07/20 1:10 PM

Resolved:: 2024/11/13 6:35 PM

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates

Hide