[OCPBUGS-37470] Nodes are drained twice when an OCB image is applied - Red Hat Issue Tracker

Type: Bug
Resolution: Done-Errata
Priority: Normal
Fix Version/s: 4.16.z
Affects Version/s: premerge, 4.16
Component/s: Machine Config Operator
Labels:
- mco-triaged
- qe-ocb-test

Severity:
Moderate
Regression:
None
Sprint:
MCO Sprint 257
sprint_count:
1
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Release Note Text:

Hide
* Previously, the same node was queued multiple times in the draining controller which, caused the the same node to be drained twice. With this release, the a node will only be drained once.
__________________
This was happening due to the same node being queued multiple times in the drain controller. This may have been due to increased activity on the node object by OCL functionality. By being more specific about the object diff before nodes are queued for drains, a node will be only queued for drain once.

Show
* Previously, the same node was queued multiple times in the draining controller which, caused the the same node to be drained twice. With this release, the a node will only be drained once. __________________ This was happening due to the same node being queued multiple times in the drain controller. This may have been due to increased activity on the node object by OCL functionality. By being more specific about the object diff before nodes are queued for drains, a node will be only queued for drain once.
Release Note Type:
Bug Fix
Release Note Status:
Done
Target Version:

4.16.z

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

Description of problem:

When a OCB is enabled, and a new MC is created, nodes are drained twice when the resulting osImage build is applied.

Version-Release number of selected component (if applicable):

4.16

How reproducible:

Always

Steps to Reproduce:

    1. Enable OCB in the worker pool

oc create -f - << EOF
apiVersion: machineconfiguration.openshift.io/v1alpha1
kind: MachineOSConfig
metadata:
  name: worker
spec:
  machineConfigPool:
    name: worker
  buildInputs:
    imageBuilder:
      imageBuilderType: PodImageBuilder
    baseImagePullSecret:
      name: $(oc get secret -n openshift-config pull-secret -o json | jq "del(.metadata.namespace, .metadata.creationTimestamp, .metadata.resourceVersion, .metadata.uid, .metadata.name)" | jq '.metadata.name="pull-copy"' | oc -n openshift-machine-config-operator create -f - &> /dev/null; echo -n "pull-copy")
    renderedImagePushSecret:
      name: $(oc get -n openshift-machine-config-operator sa builder -ojsonpath='{.secrets[0].name}')
    renderedImagePushspec: "image-registry.openshift-image-registry.svc:5000/openshift-machine-config-operator/ocb-image:latest"
EOF



    2. Wait for the image to be built

    3. When the opt-in image has been finished and applied create a new MC

apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
metadata:
  labels:
    machineconfiguration.openshift.io/role: worker
  name: test-machine-config-1
spec:
  config:
    ignition:
      version: 3.1.0
    storage:
      files:
      - contents:
          source: data:text/plain;charset=utf-8;base64,dGVzdA==
        filesystem: root
        mode: 420
        path: /etc/test-file-1.test

    4. Wait for the image to be built

Actual results:

Once the image is built it is applied to the worker nodes.

If we have a look at the drain operation, we can see that every worker node was drained twice instead of once:

oc -n openshift-machine-config-operator logs $(oc -n openshift-machine-config-operator get pods -l k8s-app=machine-config-controller -o jsonpath='{.items[0].metadata.name}') -c machine-config-controller | grep "initiating drain"
I0430 13:28:48.740300       1 drain_controller.go:182] node ip-10-0-70-208.us-east-2.compute.internal: initiating drain
I0430 13:30:08.330051       1 drain_controller.go:182] node ip-10-0-70-208.us-east-2.compute.internal: initiating drain
I0430 13:32:32.431789       1 drain_controller.go:182] node ip-10-0-69-154.us-east-2.compute.internal: initiating drain
I0430 13:33:50.643544       1 drain_controller.go:182] node ip-10-0-69-154.us-east-2.compute.internal: initiating drain
I0430 13:48:08.183488       1 drain_controller.go:182] node ip-10-0-70-208.us-east-2.compute.internal: initiating drain
I0430 13:49:01.379416       1 drain_controller.go:182] node ip-10-0-70-208.us-east-2.compute.internal: initiating drain
I0430 13:50:52.933337       1 drain_controller.go:182] node ip-10-0-69-154.us-east-2.compute.internal: initiating drain
I0430 13:52:12.191203       1 drain_controller.go:182] node ip-10-0-69-154.us-east-2.compute.internal: initiating drain

Expected results:

Nodes should drained only once when applying a new MC

Additional info:

clones

OCPBUGS-33134 Nodes are drained twice when an OCB image is applied

Closed

depends on

OCPBUGS-33134 Nodes are drained twice when an OCB image is applied

Closed

links to

openshift/machine-config-operator#4484: [release-4.16] OCPBUGS-37470: Nodes are drained twice when an OCB image is applied

RHBA-2024:4965 OpenShift Container Platform 4.16.z bug fix update

Assignee:: David Joshy

Reporter:: Sergio Regidor de la Rosa

QA Contact:: Sergio Regidor de la Rosa

Votes:: 0 Vote for this issue

Watchers:: 6 Start watching this issue

Created:: 2024/07/23 2:23 PM

Updated:: 2024/08/06 11:28 AM

Resolved:: 2024/08/06 11:28 AM

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates