Loading...

XML

Word

Printable

Type: Bug
Resolution: Done-Errata
Priority: Normal
Fix Version/s: 4.17.z
Affects Version/s: premerge, 4.16
Component/s: Machine Config Operator
Labels:

Activity Type:
Quality / Stability / Reliability
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Story Points:
1
Severity:
Moderate
Regression:
None

Target Backport Versions:
None
Target Version:

4.17.0
Release Blocker:
None
Sprint:
MCO Sprint 256, MCO Sprint 257
sprint_count:
2

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

Release Note Status:
Done
Release Note Type:
Bug Fix
Release Note Text:

Hide
* Previously, nodes could be drained twice because the node was queued multiple times in the drain controller. This behaviour might have been due to increased activity on the node object by on-cluster layering functionality. With this fix, a node queued for drain only once. (link:https://issues.redhat.com/browse/OCPBUGS-33134[*~~OCPBUGS-33134~~])

Show
* Previously, nodes could be drained twice because the node was queued multiple times in the drain controller. This behaviour might have been due to increased activity on the node object by on-cluster layering functionality. With this fix, a node queued for drain only once. (link: https://issues.redhat.com/browse/OCPBUGS-33134 [* OCPBUGS-33134 ])

Escape Reason:
None
Escape Impact:
None
Corrective Measures:
None
SDLC stage when should've been found:
None

Description of problem:

When a OCB is enabled, and a new MC is created, nodes are drained twice when the resulting osImage build is applied.

Version-Release number of selected component (if applicable):

4.16

How reproducible:

Always

Steps to Reproduce:

    1. Enable OCB in the worker pool

oc create -f - << EOF
apiVersion: machineconfiguration.openshift.io/v1alpha1
kind: MachineOSConfig
metadata:
  name: worker
spec:
  machineConfigPool:
    name: worker
  buildInputs:
    imageBuilder:
      imageBuilderType: PodImageBuilder
    baseImagePullSecret:
      name: $(oc get secret -n openshift-config pull-secret -o json | jq "del(.metadata.namespace, .metadata.creationTimestamp, .metadata.resourceVersion, .metadata.uid, .metadata.name)" | jq '.metadata.name="pull-copy"' | oc -n openshift-machine-config-operator create -f - &> /dev/null; echo -n "pull-copy")
    renderedImagePushSecret:
      name: $(oc get -n openshift-machine-config-operator sa builder -ojsonpath='{.secrets[0].name}')
    renderedImagePushspec: "image-registry.openshift-image-registry.svc:5000/openshift-machine-config-operator/ocb-image:latest"
EOF



    2. Wait for the image to be built

    3. When the opt-in image has been finished and applied create a new MC

apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
metadata:
  labels:
    machineconfiguration.openshift.io/role: worker
  name: test-machine-config-1
spec:
  config:
    ignition:
      version: 3.1.0
    storage:
      files:
      - contents:
          source: data:text/plain;charset=utf-8;base64,dGVzdA==
        filesystem: root
        mode: 420
        path: /etc/test-file-1.test

    4. Wait for the image to be built

Actual results:

Once the image is built it is applied to the worker nodes.

If we have a look at the drain operation, we can see that every worker node was drained twice instead of once:

oc -n openshift-machine-config-operator logs $(oc -n openshift-machine-config-operator get pods -l k8s-app=machine-config-controller -o jsonpath='{.items[0].metadata.name}') -c machine-config-controller | grep "initiating drain"
I0430 13:28:48.740300       1 drain_controller.go:182] node ip-10-0-70-208.us-east-2.compute.internal: initiating drain
I0430 13:30:08.330051       1 drain_controller.go:182] node ip-10-0-70-208.us-east-2.compute.internal: initiating drain
I0430 13:32:32.431789       1 drain_controller.go:182] node ip-10-0-69-154.us-east-2.compute.internal: initiating drain
I0430 13:33:50.643544       1 drain_controller.go:182] node ip-10-0-69-154.us-east-2.compute.internal: initiating drain
I0430 13:48:08.183488       1 drain_controller.go:182] node ip-10-0-70-208.us-east-2.compute.internal: initiating drain
I0430 13:49:01.379416       1 drain_controller.go:182] node ip-10-0-70-208.us-east-2.compute.internal: initiating drain
I0430 13:50:52.933337       1 drain_controller.go:182] node ip-10-0-69-154.us-east-2.compute.internal: initiating drain
I0430 13:52:12.191203       1 drain_controller.go:182] node ip-10-0-69-154.us-east-2.compute.internal: initiating drain

Expected results:

Nodes should drained only once when applying a new MC

Additional info:

is cloned by

OCPBUGS-37470 Nodes are drained twice when an OCB image is applied

Closed

is depended on by

OCPBUGS-37470 Nodes are drained twice when an OCB image is applied

Closed

relates to

MCO-665 On-Cluster Layering Tech Preview

Closed

links to

openshift/machine-config-operator#4467: OCPBUGS-33134: Nodes are drained twice when an OCB image is applied

RHEA-2024:3718 OpenShift Container Platform 4.17.z bug fix update

Assignee:: David Joshy

Reporter:: Sergio Regidor de la Rosa

Need Info From:: None

Contributors:: None

QA Contact:: Sergio Regidor de la Rosa

Doc Contact:: None

Votes:: 0 Vote for this issue

Watchers:: 6 Start watching this issue

Created:: 2024/04/30 2:08 PM

Updated:: 2025/09/02 6:15 PM

Resolved:: 2024/10/01 5:39 PM

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates

Hide