-
Bug
-
Resolution: Done-Errata
-
Normal
-
4.19.0
-
Quality / Stability / Reliability
-
False
-
-
3
-
Important
-
No
-
None
-
Proposed
-
MCO Sprint 266, MCO Sprint 267, MCO Sprint 268, MCO Sprint 269
-
4
-
In Progress
-
Release Note Not Required
-
N/A
-
None
-
None
-
None
-
None
Description of problem:
In some scenarios when we interrupt a MOSB and we use the rebuild label in the MOSC to rebuild the interrupted MOSB, the image is never rebuilt.
Version-Release number of selected component (if applicable):
4.19
How reproducible:
Always
Steps to Reproduce:
We add here the steps using the new API, it should be the same with the old API
1. Create an infra pool
apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfigPool
metadata:
name: infra
spec:
machineConfigSelector:
matchExpressions:
- {key: machineconfiguration.openshift.io/role, operator: In, values: [worker,infra]}
nodeSelector:
matchLabels:
node-role.kubernetes.io/infra: ""
2. Create a MOSC for the infra pool
oc create -f - << EOF
apiVersion: machineconfiguration.openshift.io/v1
kind: MachineOSConfig
metadata:
name: mosc-infra
spec:
machineConfigPool:
name: infra
currentImagePullSecret:
name: $(oc get secret -n openshift-config pull-secret -o json | jq "del(.metadata.namespace, .metadata.creationTimestamp, .metadata.resourceVersion, .metadata.uid, .metadata.name)" | jq '.metadata.name="pull-copy"' | oc -n openshift-machine-config-operator create -f - &> /dev/null; echo -n "pull-copy")
imageBuilder:
imageBuilderType: Job
baseImagePullSecret:
name: $(oc get secret -n openshift-config pull-secret -o json | jq "del(.metadata.namespace, .metadata.creationTimestamp, .metadata.resourceVersion, .metadata.uid, .metadata.name)" | jq '.metadata.name="pull-copy"' | oc -n openshift-machine-config-operator create -f - &> /dev/null; echo -n "pull-copy")
renderedImagePushSecret:
name: $(oc get secret -n openshift-config pull-secret -o json | jq "del(.metadata.namespace, .metadata.creationTimestamp, .metadata.resourceVersion, .metadata.uid, .metadata.name)" | jq '.metadata.name="pull-copy"' | oc -n openshift-machine-config-operator create -f - &> /dev/null; echo -n "pull-copy")
renderedImagePushSpec: "quay.io/mcoqe/layering:ocl"
EOF
3. Wait for the image to be created
4. Delete the MOSC resource created in step 2
5. Delete the infra pool
6. Create a MOSC for the worker pool
oc create -f - << EOF
apiVersion: machineconfiguration.openshift.io/v1
kind: MachineOSConfig
metadata:
name: mosc-worker
spec:
machineConfigPool:
name: worker
currentImagePullSecret:
name: $(oc get secret -n openshift-config pull-secret -o json | jq "del(.metadata.namespace, .metadata.creationTimestamp, .metadata.resourceVersion, .metadata.uid, .metadata.name)" | jq '.metadata.name="pull-copy"' | oc -n openshift-machine-config-operator create -f - &> /dev/null; echo -n "pull-copy")
imageBuilder:
imageBuilderType: Job
baseImagePullSecret:
name: $(oc get secret -n openshift-config pull-secret -o json | jq "del(.metadata.namespace, .metadata.creationTimestamp, .metadata.resourceVersion, .metadata.uid, .metadata.name)" | jq '.metadata.name="pull-copy"' | oc -n openshift-machine-config-operator create -f - &> /dev/null; echo -n "pull-copy")
renderedImagePushSecret:
name: $(oc get secret -n openshift-config pull-secret -o json | jq "del(.metadata.namespace, .metadata.creationTimestamp, .metadata.resourceVersion, .metadata.uid, .metadata.name)" | jq '.metadata.name="pull-copy"' | oc -n openshift-machine-config-operator create -f - &> /dev/null; echo -n "pull-copy")
renderedImagePushSpec: "quay.io/mcoqe/layering:ocl"
EOF
7. Wait until the new Job is created and delete it to interrupt the MOSB resource
$ oc get machineosbuild
NAME PREPARED BUILDING SUCCEEDED INTERRUPTED FAILED AGE
mosc-infra-b9e2aca7838fb9be42ee2755c9ff35fc False False True False False 8m
mosc-worker-2b5aaa0c9933f34e763039e753b7aefa False True False False False 37s
8. Add the rebuild label to rebuild the interrupted MOSB
$ oc patch machineosconfig mosc-worker --type json -p '[{"op": "add", "path": "/metadata/annotations/machineconfiguration.openshift.io~1rebuild", "value":""}]'
machineosconfig.machineconfiguration.openshift.io/mosc-worker patched
Actual results:
A new job is triggered, and then it is immediately terminated
$ l job
NAME STATUS COMPLETIONS DURATION AGE
build-mosc-worker-2b5aaa0c9933f34e763039e753b7aefa Terminating 0/1 21s 21s
The MOSB resources is recreated, but it is immediately reported as Interrupted again
$ oc get machineosbuild machineosbuild
NAME PREPARED BUILDING SUCCEEDED INTERRUPTED FAILED AGE
mosc-infra-b9e2aca7838fb9be42ee2755c9ff35fc False False True False False 47m
mosc-worker-2b5aaa0c9933f34e763039e753b7aefa False False False True False 119s
Expected results:
The MOSB resource should be rebuilt without problems
Additional info:
- relates to
-
OCPBUGS-48675 In OCL. Error rebuilding a failed build after fixing the failure root cause
-
- Closed
-
- links to
-
RHEA-2024:11038
OpenShift Container Platform 4.19.z bug fix update