-
Bug
-
Resolution: Unresolved
-
Undefined
-
None
-
4.21
-
None
-
Quality / Stability / Reliability
-
False
-
-
None
-
Moderate
-
None
-
None
-
None
-
None
-
CLOUD Sprint 279, CLOUD Sprint 280
-
2
-
None
-
None
-
None
-
None
-
None
-
None
-
None
Description of problem:
awsmachine go into re-create loop and machine report sync error when securityGroups or subnet contain id
Version-Release number of selected component (if applicable):
4.21.0-0.nightly-2025-10-15-162146
How reproducible:
always
Steps to Reproduce:
1.Install an AWS private (its securityGroups and subnet contain id) techpreview cluster, we use automated template ipi-on-aws/versioned-installer-private_cluster-ci with parameter feature_set: "TechPreviewNoUpgrade", the cluster install successfully
2.Observed some sync error in machine, and observed awsmachine go into re-create loop
liuhuali@Lius-MacBook-Pro huali-test % oc get machine -n openshift-machine-api -oyaml
...
- apiVersion: machine.openshift.io/v1beta1
kind: Machine
metadata:
annotations:
machine.openshift.io/instance-state: running
creationTimestamp: "2025-10-17T04:13:43Z"
finalizers:
- sync.machine.openshift.io/finalizer
- machine.machine.openshift.io
generateName: huliu-aws1017c-zkdqq-worker-us-east-2a-
generation: 2
labels:
machine.openshift.io/cluster-api-cluster: huliu-aws1017c-zkdqq
machine.openshift.io/cluster-api-machine-role: worker
machine.openshift.io/cluster-api-machine-type: worker
machine.openshift.io/cluster-api-machineset: huliu-aws1017c-zkdqq-worker-us-east-2a
machine.openshift.io/instance-type: m6i.xlarge
machine.openshift.io/region: us-east-2
machine.openshift.io/zone: us-east-2a
name: huliu-aws1017c-zkdqq-worker-us-east-2a-bj8gz
namespace: openshift-machine-api
ownerReferences:
- apiVersion: machine.openshift.io/v1beta1
blockOwnerDeletion: true
controller: true
kind: MachineSet
name: huliu-aws1017c-zkdqq-worker-us-east-2a
uid: 49582657-0521-4c38-9191-a78707e3377e
resourceVersion: "199579"
uid: a39dadcc-d2ad-467e-888c-dd106a063ecb
spec:
authoritativeAPI: MachineAPI
lifecycleHooks: {}
metadata: {}
providerID: aws:///us-east-2a/i-0fe7455cb532722bc
providerSpec:
value:
ami:
id: ami-082a55a580d5538ed
apiVersion: machine.openshift.io/v1beta1
blockDevices:
- ebs:
encrypted: true
iops: 0
kmsKey:
arn: ""
volumeSize: 120
volumeType: gp3
capacityReservationId: ""
credentialsSecret:
name: aws-cloud-credentials
deviceIndex: 0
iamInstanceProfile:
id: huliu-aws1017c-zkdqq-worker-profile
instanceType: m6i.xlarge
kind: AWSMachineProviderConfig
metadata:
creationTimestamp: null
metadataServiceOptions: {}
placement:
availabilityZone: us-east-2a
region: us-east-2
securityGroups:
- filters:
- name: tag:Name
values:
- huliu-aws1017c-zkdqq-node
- filters:
- name: tag:Name
values:
- huliu-aws1017c-zkdqq-lb
- id: sg-0b5ca4c09a70e5d09
subnet:
id: subnet-08b46039fcd2c66bc
tags:
- name: kubernetes.io/cluster/huliu-aws1017c-zkdqq
value: owned
userDataSecret:
name: worker-user-data
status:
addresses:
- address: 10.0.50.13
type: InternalIP
- address: ip-10-0-50-13.us-east-2.compute.internal
type: InternalDNS
- address: ip-10-0-50-13.us-east-2.compute.internal
type: Hostname
authoritativeAPI: MachineAPI
conditions:
- lastTransitionTime: "2025-10-17T04:14:08Z"
status: "True"
type: Drainable
- lastTransitionTime: "2025-10-17T04:14:22Z"
status: "True"
type: InstanceExists
- lastTransitionTime: "2025-10-17T04:14:08Z"
message: The AuthoritativeAPI status is set to 'MachineAPI'
reason: AuthoritativeAPIMachineAPI
severity: Info
status: "False"
type: Paused
- lastTransitionTime: "2025-10-17T05:48:21Z"
message: 'failed to remove finalizer for deleting Cluster API infra machine:
Operation cannot be fulfilled on awsmachines.infrastructure.cluster.x-k8s.io
"huliu-aws1017c-zkdqq-worker-us-east-2a-bj8gz": the object has been modified;
please apply your changes to the latest version and try again'
reason: FailedToUpdateCAPIInfraMachine
severity: Error
status: "False"
type: Synchronized
- lastTransitionTime: "2025-10-17T04:14:08Z"
status: "True"
type: Terminable
lastUpdated: "2025-10-17T05:48:20Z"
nodeRef:
kind: Node
name: ip-10-0-50-13.us-east-2.compute.internal
uid: f8a71d2a-7774-42e7-9251-f68a0f7d23c9
phase: Running
providerStatus:
conditions:
- lastTransitionTime: "2025-10-17T04:14:15Z"
message: Machine successfully created
reason: MachineCreationSucceeded
status: "True"
type: MachineCreation
instanceId: i-0fe7455cb532722bc
instanceState: running
synchronizedGeneration: 2
...
liuhuali@Lius-MacBook-Pro huali-test % oc get awsmachine -n openshift-cluster-api
NAME CLUSTER STATE READY INSTANCEID MACHINE
huliu-aws1017c-zkdqq-worker-us-east-2a-bj8gz huliu-aws1017c-zkdqq aws:///us-east-2a/i-0fe7455cb532722bc huliu-aws1017c-zkdqq-worker-us-east-2a-bj8gz
huliu-aws1017c-zkdqq-worker-us-east-2a-cf2zn huliu-aws1017c-zkdqq aws:///us-east-2a/i-020b35563810f5fd2 huliu-aws1017c-zkdqq-worker-us-east-2a-cf2zn
huliu-aws1017c-zkdqq-worker-us-east-2b-2l8vn huliu-aws1017c-zkdqq aws:///us-east-2b/i-0d1ff395da901eeb2 huliu-aws1017c-zkdqq-worker-us-east-2b-2l8vn
liuhuali@Lius-MacBook-Pro huali-test % oc get awsmachine -n openshift-cluster-api
NAME CLUSTER STATE READY INSTANCEID MACHINE
huliu-aws1017c-zkdqq-worker-us-east-2a-bj8gz huliu-aws1017c-zkdqq aws:///us-east-2a/i-0fe7455cb532722bc huliu-aws1017c-zkdqq-worker-us-east-2a-bj8gz
huliu-aws1017c-zkdqq-worker-us-east-2a-cf2zn huliu-aws1017c-zkdqq aws:///us-east-2a/i-020b35563810f5fd2 huliu-aws1017c-zkdqq-worker-us-east-2a-cf2zn
liuhuali@Lius-MacBook-Pro huali-test % oc get awsmachine -n openshift-cluster-api
NAME CLUSTER STATE READY INSTANCEID MACHINE
huliu-aws1017c-zkdqq-worker-us-east-2a-cf2zn huliu-aws1017c-zkdqq aws:///us-east-2a/i-020b35563810f5fd2 huliu-aws1017c-zkdqq-worker-us-east-2a-cf2zn
huliu-aws1017c-zkdqq-worker-us-east-2b-2l8vn huliu-aws1017c-zkdqq aws:///us-east-2b/i-0d1ff395da901eeb2 huliu-aws1017c-zkdqq-worker-us-east-2b-2l8vn
liuhuali@Lius-MacBook-Pro huali-test % oc get awsmachine -n openshift-cluster-api
NAME CLUSTER STATE READY INSTANCEID MACHINE
huliu-aws1017c-zkdqq-worker-us-east-2a-bj8gz huliu-aws1017c-zkdqq aws:///us-east-2a/i-0fe7455cb532722bc huliu-aws1017c-zkdqq-worker-us-east-2a-bj8gz
huliu-aws1017c-zkdqq-worker-us-east-2a-cf2zn huliu-aws1017c-zkdqq aws:///us-east-2a/i-020b35563810f5fd2 huliu-aws1017c-zkdqq-worker-us-east-2a-cf2zn
huliu-aws1017c-zkdqq-worker-us-east-2b-2l8vn huliu-aws1017c-zkdqq aws:///us-east-2b/i-0d1ff395da901eeb2 huliu-aws1017c-zkdqq-worker-us-east-2b-2l8vn
liuhuali@Lius-MacBook-Pro huali-test % oc get awsmachine -n openshift-cluster-api
NAME CLUSTER STATE READY INSTANCEID MACHINE
huliu-aws1017c-zkdqq-worker-us-east-2a-bj8gz huliu-aws1017c-zkdqq aws:///us-east-2a/i-0fe7455cb532722bc huliu-aws1017c-zkdqq-worker-us-east-2a-bj8gz
huliu-aws1017c-zkdqq-worker-us-east-2a-cf2zn huliu-aws1017c-zkdqq aws:///us-east-2a/i-020b35563810f5fd2 huliu-aws1017c-zkdqq-worker-us-east-2a-cf2zn
liuhuali@Lius-MacBook-Pro huali-test %
liuhuali@Lius-MacBook-Pro huali-test % oc logs cluster-capi-operator-78bd56b648-5llff -c machine-api-migration
...
I1017 05:31:03.795937 1 machine_sync_controller.go:816] "Successfully updated Cluster API machine" controller="MachineSyncController" controllerGroup="machine.openshift.io" controllerKind="Machine" Machine="openshift-machine-api/huliu-aws1017c-zkdqq-worker-us-east-2a-bj8gz" namespace="openshift-machine-api" name="huliu-aws1017c-zkdqq-worker-us-east-2a-bj8gz" reconcileID="9f6c68e6-afbc-489d-9b6c-0326441aa48b"
I1017 05:31:03.796046 1 machine_sync_controller.go:691] "Deleting the corresponding Cluster API infra machine as it is out of date, it will be recreated" controller="MachineSyncController" controllerGroup="machine.openshift.io" controllerKind="Machine" Machine="openshift-machine-api/huliu-aws1017c-zkdqq-worker-us-east-2a-bj8gz" namespace="openshift-machine-api" name="huliu-aws1017c-zkdqq-worker-us-east-2a-bj8gz" reconcileID="9f6c68e6-afbc-489d-9b6c-0326441aa48b" diff="map[.spec:[AdditionalSecurityGroups.slice[2].Filters: <nil slice> != [] Subnet.Filters: <nil slice> != []]]"
E1017 05:31:03.864591 1 machine_sync_controller.go:710] "Failed to remove finalizer for deleting Cluster API infra machine" err="Operation cannot be fulfilled on awsmachines.infrastructure.cluster.x-k8s.io \"huliu-aws1017c-zkdqq-worker-us-east-2a-bj8gz\": the object has been modified; please apply your changes to the latest version and try again" controller="MachineSyncController" controllerGroup="machine.openshift.io" controllerKind="Machine" Machine="openshift-machine-api/huliu-aws1017c-zkdqq-worker-us-east-2a-bj8gz" namespace="openshift-machine-api" name="huliu-aws1017c-zkdqq-worker-us-east-2a-bj8gz" reconcileID="9f6c68e6-afbc-489d-9b6c-0326441aa48b"
E1017 05:31:03.875550 1 controller.go:347] "Reconciler error" err="unable to ensure Cluster API infra machine: failed to remove finalizer for deleting Cluster API infra machine: Operation cannot be fulfilled on awsmachines.infrastructure.cluster.x-k8s.io \"huliu-aws1017c-zkdqq-worker-us-east-2a-bj8gz\": the object has been modified; please apply your changes to the latest version and try again" controller="MachineSyncController" controllerGroup="machine.openshift.io" controllerKind="Machine" Machine="openshift-machine-api/huliu-aws1017c-zkdqq-worker-us-east-2a-bj8gz" namespace="openshift-machine-api" name="huliu-aws1017c-zkdqq-worker-us-east-2a-bj8gz" reconcileID="9f6c68e6-afbc-489d-9b6c-0326441aa48b"
I1017 05:31:03.920950 1 machine_sync_controller.go:818] "No changes detected for Cluster API machine" controller="MachineSyncController" controllerGroup="machine.openshift.io" controllerKind="Machine" Machine="openshift-machine-api/huliu-aws1017c-zkdqq-worker-us-east-2a-cf2zn" namespace="openshift-machine-api" name="huliu-aws1017c-zkdqq-worker-us-east-2a-cf2zn" reconcileID="61e8f786-b916-4787-a944-e4b46b54e0f9"
I1017 05:31:03.968823 1 machine_sync_controller.go:654] "Successfully created Cluster API infra machine" controller="MachineSyncController" controllerGroup="machine.openshift.io" controllerKind="Machine" Machine="openshift-machine-api/huliu-aws1017c-zkdqq-worker-us-east-2a-cf2zn" namespace="openshift-machine-api" name="huliu-aws1017c-zkdqq-worker-us-east-2a-cf2zn" reconcileID="61e8f786-b916-4787-a944-e4b46b54e0f9"
I1017 05:31:03.982149 1 machine_sync_controller.go:818] "No changes detected for Cluster API machine" controller="MachineSyncController" controllerGroup="machine.openshift.io" controllerKind="Machine" Machine="openshift-machine-api/huliu-aws1017c-zkdqq-worker-us-east-2a-bj8gz" namespace="openshift-machine-api" name="huliu-aws1017c-zkdqq-worker-us-east-2a-bj8gz" reconcileID="619e9d68-a4b4-44ae-b5d9-29ee7be3caf4"
I1017 05:31:03.982254 1 machine_sync_controller.go:691] "Deleting the corresponding Cluster API infra machine as it is out of date, it will be recreated" controller="MachineSyncController" controllerGroup="machine.openshift.io" controllerKind="Machine" Machine="openshift-machine-api/huliu-aws1017c-zkdqq-worker-us-east-2a-bj8gz" namespace="openshift-machine-api" name="huliu-aws1017c-zkdqq-worker-us-east-2a-bj8gz" reconcileID="619e9d68-a4b4-44ae-b5d9-29ee7be3caf4" diff="map[.spec:[AdditionalSecurityGroups.slice[2].Filters: <nil slice> != [] Subnet.Filters: <nil slice> != []]]"
Actual results:
awsmachine go into re-create loop and machine report sync error
Expected results:
awsmachine should not go into re-create loop and machine should sync successfully
Additional info:
must-gather: https://drive.google.com/file/d/1-5d_8A4bDR3AogvCDVmG5GJJ6Zjx4PjI/view?usp=sharing
new feature testing for https://issues.redhat.com//browse/OCPCLOUD-2709 but seems the issue is not related to it.
- is related to
-
OCPCLOUD-3172 cluster-capi-operator: refactor approach for diffing
-
- Code Review
-