Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-63229

[AWS][CAPI]awsmachine go into re-create loop and machine report sync error when securityGroups or subnet contain id

XMLWordPrintable

    • None
    • False
    • Hide

      None

      Show
      None
    • None
    • Moderate
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      Description of problem:

          awsmachine go into re-create loop and machine report sync error when securityGroups or subnet contain id

      Version-Release number of selected component (if applicable):

          4.21.0-0.nightly-2025-10-15-162146

      How reproducible:

          always

      Steps to Reproduce:

          1.Install an AWS private (its securityGroups and subnet contain id) techpreview cluster, we use automated template ipi-on-aws/versioned-installer-private_cluster-ci with parameter feature_set: "TechPreviewNoUpgrade", the cluster install successfully
      
          2.Observed some sync error in machine, and observed awsmachine go into re-create loop
      
      liuhuali@Lius-MacBook-Pro huali-test % oc get machine -n openshift-machine-api -oyaml
      ...
      - apiVersion: machine.openshift.io/v1beta1
        kind: Machine
        metadata:
          annotations:
            machine.openshift.io/instance-state: running
          creationTimestamp: "2025-10-17T04:13:43Z"
          finalizers:
          - sync.machine.openshift.io/finalizer
          - machine.machine.openshift.io
          generateName: huliu-aws1017c-zkdqq-worker-us-east-2a-
          generation: 2
          labels:
            machine.openshift.io/cluster-api-cluster: huliu-aws1017c-zkdqq
            machine.openshift.io/cluster-api-machine-role: worker
            machine.openshift.io/cluster-api-machine-type: worker
            machine.openshift.io/cluster-api-machineset: huliu-aws1017c-zkdqq-worker-us-east-2a
            machine.openshift.io/instance-type: m6i.xlarge
            machine.openshift.io/region: us-east-2
            machine.openshift.io/zone: us-east-2a
          name: huliu-aws1017c-zkdqq-worker-us-east-2a-bj8gz
          namespace: openshift-machine-api
          ownerReferences:
          - apiVersion: machine.openshift.io/v1beta1
            blockOwnerDeletion: true
            controller: true
            kind: MachineSet
            name: huliu-aws1017c-zkdqq-worker-us-east-2a
            uid: 49582657-0521-4c38-9191-a78707e3377e
          resourceVersion: "199579"
          uid: a39dadcc-d2ad-467e-888c-dd106a063ecb
        spec:
          authoritativeAPI: MachineAPI
          lifecycleHooks: {}
          metadata: {}
          providerID: aws:///us-east-2a/i-0fe7455cb532722bc
          providerSpec:
            value:
              ami:
                id: ami-082a55a580d5538ed
              apiVersion: machine.openshift.io/v1beta1
              blockDevices:
              - ebs:
                  encrypted: true
                  iops: 0
                  kmsKey:
                    arn: ""
                  volumeSize: 120
                  volumeType: gp3
              capacityReservationId: ""
              credentialsSecret:
                name: aws-cloud-credentials
              deviceIndex: 0
              iamInstanceProfile:
                id: huliu-aws1017c-zkdqq-worker-profile
              instanceType: m6i.xlarge
              kind: AWSMachineProviderConfig
              metadata:
                creationTimestamp: null
              metadataServiceOptions: {}
              placement:
                availabilityZone: us-east-2a
                region: us-east-2
              securityGroups:
              - filters:
                - name: tag:Name
                  values:
                  - huliu-aws1017c-zkdqq-node
              - filters:
                - name: tag:Name
                  values:
                  - huliu-aws1017c-zkdqq-lb
              - id: sg-0b5ca4c09a70e5d09
              subnet:
                id: subnet-08b46039fcd2c66bc
              tags:
              - name: kubernetes.io/cluster/huliu-aws1017c-zkdqq
                value: owned
              userDataSecret:
                name: worker-user-data
        status:
          addresses:
          - address: 10.0.50.13
            type: InternalIP
          - address: ip-10-0-50-13.us-east-2.compute.internal
            type: InternalDNS
          - address: ip-10-0-50-13.us-east-2.compute.internal
            type: Hostname
          authoritativeAPI: MachineAPI
          conditions:
          - lastTransitionTime: "2025-10-17T04:14:08Z"
            status: "True"
            type: Drainable
          - lastTransitionTime: "2025-10-17T04:14:22Z"
            status: "True"
            type: InstanceExists
          - lastTransitionTime: "2025-10-17T04:14:08Z"
            message: The AuthoritativeAPI status is set to 'MachineAPI'
            reason: AuthoritativeAPIMachineAPI
            severity: Info
            status: "False"
            type: Paused
          - lastTransitionTime: "2025-10-17T05:48:21Z"
            message: 'failed to remove finalizer for deleting Cluster API infra machine:
              Operation cannot be fulfilled on awsmachines.infrastructure.cluster.x-k8s.io
              "huliu-aws1017c-zkdqq-worker-us-east-2a-bj8gz": the object has been modified;
              please apply your changes to the latest version and try again'
            reason: FailedToUpdateCAPIInfraMachine
            severity: Error
            status: "False"
            type: Synchronized
          - lastTransitionTime: "2025-10-17T04:14:08Z"
            status: "True"
            type: Terminable
          lastUpdated: "2025-10-17T05:48:20Z"
          nodeRef:
            kind: Node
            name: ip-10-0-50-13.us-east-2.compute.internal
            uid: f8a71d2a-7774-42e7-9251-f68a0f7d23c9
          phase: Running
          providerStatus:
            conditions:
            - lastTransitionTime: "2025-10-17T04:14:15Z"
              message: Machine successfully created
              reason: MachineCreationSucceeded
              status: "True"
              type: MachineCreation
            instanceId: i-0fe7455cb532722bc
            instanceState: running
          synchronizedGeneration: 2
      ...
      
      liuhuali@Lius-MacBook-Pro huali-test % oc get awsmachine -n openshift-cluster-api
      NAME                                           CLUSTER                STATE   READY   INSTANCEID                              MACHINE
      huliu-aws1017c-zkdqq-worker-us-east-2a-bj8gz   huliu-aws1017c-zkdqq                   aws:///us-east-2a/i-0fe7455cb532722bc   huliu-aws1017c-zkdqq-worker-us-east-2a-bj8gz
      huliu-aws1017c-zkdqq-worker-us-east-2a-cf2zn   huliu-aws1017c-zkdqq                   aws:///us-east-2a/i-020b35563810f5fd2   huliu-aws1017c-zkdqq-worker-us-east-2a-cf2zn
      huliu-aws1017c-zkdqq-worker-us-east-2b-2l8vn   huliu-aws1017c-zkdqq                   aws:///us-east-2b/i-0d1ff395da901eeb2   huliu-aws1017c-zkdqq-worker-us-east-2b-2l8vn
      liuhuali@Lius-MacBook-Pro huali-test % oc get awsmachine -n openshift-cluster-api
      NAME                                           CLUSTER                STATE   READY   INSTANCEID                              MACHINE
      huliu-aws1017c-zkdqq-worker-us-east-2a-bj8gz   huliu-aws1017c-zkdqq                   aws:///us-east-2a/i-0fe7455cb532722bc   huliu-aws1017c-zkdqq-worker-us-east-2a-bj8gz
      huliu-aws1017c-zkdqq-worker-us-east-2a-cf2zn   huliu-aws1017c-zkdqq                   aws:///us-east-2a/i-020b35563810f5fd2   huliu-aws1017c-zkdqq-worker-us-east-2a-cf2zn
      liuhuali@Lius-MacBook-Pro huali-test % oc get awsmachine -n openshift-cluster-api
      NAME                                           CLUSTER                STATE   READY   INSTANCEID                              MACHINE
      huliu-aws1017c-zkdqq-worker-us-east-2a-cf2zn   huliu-aws1017c-zkdqq                   aws:///us-east-2a/i-020b35563810f5fd2   huliu-aws1017c-zkdqq-worker-us-east-2a-cf2zn
      huliu-aws1017c-zkdqq-worker-us-east-2b-2l8vn   huliu-aws1017c-zkdqq                   aws:///us-east-2b/i-0d1ff395da901eeb2   huliu-aws1017c-zkdqq-worker-us-east-2b-2l8vn
      liuhuali@Lius-MacBook-Pro huali-test % oc get awsmachine -n openshift-cluster-api
      NAME                                           CLUSTER                STATE   READY   INSTANCEID                              MACHINE
      huliu-aws1017c-zkdqq-worker-us-east-2a-bj8gz   huliu-aws1017c-zkdqq                   aws:///us-east-2a/i-0fe7455cb532722bc   huliu-aws1017c-zkdqq-worker-us-east-2a-bj8gz
      huliu-aws1017c-zkdqq-worker-us-east-2a-cf2zn   huliu-aws1017c-zkdqq                   aws:///us-east-2a/i-020b35563810f5fd2   huliu-aws1017c-zkdqq-worker-us-east-2a-cf2zn
      huliu-aws1017c-zkdqq-worker-us-east-2b-2l8vn   huliu-aws1017c-zkdqq                   aws:///us-east-2b/i-0d1ff395da901eeb2   huliu-aws1017c-zkdqq-worker-us-east-2b-2l8vn
      liuhuali@Lius-MacBook-Pro huali-test % oc get awsmachine -n openshift-cluster-api
      NAME                                           CLUSTER                STATE   READY   INSTANCEID                              MACHINE
      huliu-aws1017c-zkdqq-worker-us-east-2a-bj8gz   huliu-aws1017c-zkdqq                   aws:///us-east-2a/i-0fe7455cb532722bc   huliu-aws1017c-zkdqq-worker-us-east-2a-bj8gz
      huliu-aws1017c-zkdqq-worker-us-east-2a-cf2zn   huliu-aws1017c-zkdqq                   aws:///us-east-2a/i-020b35563810f5fd2   huliu-aws1017c-zkdqq-worker-us-east-2a-cf2zn
      liuhuali@Lius-MacBook-Pro huali-test %     
      
      
      liuhuali@Lius-MacBook-Pro huali-test % oc logs cluster-capi-operator-78bd56b648-5llff -c machine-api-migration
      ...
      I1017 05:31:03.795937       1 machine_sync_controller.go:816] "Successfully updated Cluster API machine" controller="MachineSyncController" controllerGroup="machine.openshift.io" controllerKind="Machine" Machine="openshift-machine-api/huliu-aws1017c-zkdqq-worker-us-east-2a-bj8gz" namespace="openshift-machine-api" name="huliu-aws1017c-zkdqq-worker-us-east-2a-bj8gz" reconcileID="9f6c68e6-afbc-489d-9b6c-0326441aa48b"
      I1017 05:31:03.796046       1 machine_sync_controller.go:691] "Deleting the corresponding Cluster API infra machine as it is out of date, it will be recreated" controller="MachineSyncController" controllerGroup="machine.openshift.io" controllerKind="Machine" Machine="openshift-machine-api/huliu-aws1017c-zkdqq-worker-us-east-2a-bj8gz" namespace="openshift-machine-api" name="huliu-aws1017c-zkdqq-worker-us-east-2a-bj8gz" reconcileID="9f6c68e6-afbc-489d-9b6c-0326441aa48b" diff="map[.spec:[AdditionalSecurityGroups.slice[2].Filters: <nil slice> != [] Subnet.Filters: <nil slice> != []]]"
      E1017 05:31:03.864591       1 machine_sync_controller.go:710] "Failed to remove finalizer for deleting Cluster API infra machine" err="Operation cannot be fulfilled on awsmachines.infrastructure.cluster.x-k8s.io \"huliu-aws1017c-zkdqq-worker-us-east-2a-bj8gz\": the object has been modified; please apply your changes to the latest version and try again" controller="MachineSyncController" controllerGroup="machine.openshift.io" controllerKind="Machine" Machine="openshift-machine-api/huliu-aws1017c-zkdqq-worker-us-east-2a-bj8gz" namespace="openshift-machine-api" name="huliu-aws1017c-zkdqq-worker-us-east-2a-bj8gz" reconcileID="9f6c68e6-afbc-489d-9b6c-0326441aa48b"
      E1017 05:31:03.875550       1 controller.go:347] "Reconciler error" err="unable to ensure Cluster API infra machine: failed to remove finalizer for deleting Cluster API infra machine: Operation cannot be fulfilled on awsmachines.infrastructure.cluster.x-k8s.io \"huliu-aws1017c-zkdqq-worker-us-east-2a-bj8gz\": the object has been modified; please apply your changes to the latest version and try again" controller="MachineSyncController" controllerGroup="machine.openshift.io" controllerKind="Machine" Machine="openshift-machine-api/huliu-aws1017c-zkdqq-worker-us-east-2a-bj8gz" namespace="openshift-machine-api" name="huliu-aws1017c-zkdqq-worker-us-east-2a-bj8gz" reconcileID="9f6c68e6-afbc-489d-9b6c-0326441aa48b"
      I1017 05:31:03.920950       1 machine_sync_controller.go:818] "No changes detected for Cluster API machine" controller="MachineSyncController" controllerGroup="machine.openshift.io" controllerKind="Machine" Machine="openshift-machine-api/huliu-aws1017c-zkdqq-worker-us-east-2a-cf2zn" namespace="openshift-machine-api" name="huliu-aws1017c-zkdqq-worker-us-east-2a-cf2zn" reconcileID="61e8f786-b916-4787-a944-e4b46b54e0f9"
      I1017 05:31:03.968823       1 machine_sync_controller.go:654] "Successfully created Cluster API infra machine" controller="MachineSyncController" controllerGroup="machine.openshift.io" controllerKind="Machine" Machine="openshift-machine-api/huliu-aws1017c-zkdqq-worker-us-east-2a-cf2zn" namespace="openshift-machine-api" name="huliu-aws1017c-zkdqq-worker-us-east-2a-cf2zn" reconcileID="61e8f786-b916-4787-a944-e4b46b54e0f9"
      I1017 05:31:03.982149       1 machine_sync_controller.go:818] "No changes detected for Cluster API machine" controller="MachineSyncController" controllerGroup="machine.openshift.io" controllerKind="Machine" Machine="openshift-machine-api/huliu-aws1017c-zkdqq-worker-us-east-2a-bj8gz" namespace="openshift-machine-api" name="huliu-aws1017c-zkdqq-worker-us-east-2a-bj8gz" reconcileID="619e9d68-a4b4-44ae-b5d9-29ee7be3caf4"
      I1017 05:31:03.982254       1 machine_sync_controller.go:691] "Deleting the corresponding Cluster API infra machine as it is out of date, it will be recreated" controller="MachineSyncController" controllerGroup="machine.openshift.io" controllerKind="Machine" Machine="openshift-machine-api/huliu-aws1017c-zkdqq-worker-us-east-2a-bj8gz" namespace="openshift-machine-api" name="huliu-aws1017c-zkdqq-worker-us-east-2a-bj8gz" reconcileID="619e9d68-a4b4-44ae-b5d9-29ee7be3caf4" diff="map[.spec:[AdditionalSecurityGroups.slice[2].Filters: <nil slice> != [] Subnet.Filters: <nil slice> != []]]"
       

      Actual results:

      awsmachine go into re-create loop and machine report sync error

      Expected results:

          awsmachine should not go into re-create loop and machine should sync successfully

      Additional info:

          must-gather: https://drive.google.com/file/d/1-5d_8A4bDR3AogvCDVmG5GJJ6Zjx4PjI/view?usp=sharing
      
      new feature testing for https://issues.redhat.com//browse/OCPCLOUD-2709 but seems the issue is not related to it.

              rh-ee-nbrubake Nolan Brubaker
              huliu@redhat.com Huali Liu
              None
              None
              Huali Liu Huali Liu
              None
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

                Created:
                Updated: