Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-35979

Old machines not deleted when rolling out a new config

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Major Major
    • None
    • 4.16
    • HyperShift
    • None
    • Yes
    • Hypershift Sprint 255, Hypershift Sprint 256, Hypershift Sprint 257
    • 3
    • Rejected
    • False
    • Hide

      None

      Show
      None

      Description of problem:

      In integration with latest Hypershift Operator (0.0.39) and a Hosted Cluster 4.15.x
      
      1. apply a new hostedCluster.Spec.Configuration.Image (insecureRegistries)
      2. config is rolledout to all the node pools
      3. nodes with the previous config are stuck because machines can't be deleted. So rollout never progress
      

      CAPI shows this log

      I0624 14:38:22.520708       1 logger.go:67] "Handling deleted AWSMachine"
      E0624 14:38:22.520786       1 logger.go:83] "unable to delete machine" err="failed to get raw userdata: failed to retrieve bootstrap data secret for AWSMachine ocm-int-2c3is2isdhgqcu5qat4a7qbo8j6vqm62-ad-int1/ad-int1-workers-16fe3af3-mdvv6: Secret \"user-data-ad-int1-workers-b14ee318\" not found"
      E0624 14:38:22.521364       1 controller.go:324] "Reconciler error" err="failed to get raw userdata: failed to retrieve bootstrap data secret for AWSMachine ocm-int-2c3is2isdhgqcu5qat4a7qbo8j6vqm62-ad-int1/ad-int1-workers-16fe3af3-mdvv6: Secret \"user-data-ad-int1-workers-b14ee318\" not found" controller="awsmachine" controllerGroup="infrastructure.cluster.x-k8s.io" controllerKind="AWSMachine" AWSMachine="ocm-int-2c3is2isdhgqcu5qat4a7qbo8j6vqm62-ad-int1/ad-int1-workers-16fe3af3-mdvv6" namespace="ocm-int-2c3is2isdhgqcu5qat4a7qbo8j6vqm62-ad-int1" name="ad-int1-workers-16fe3af3-mdvv6" reconcileID="8ca6fbef-1031-45df-b0cc-78d2f25607da"
      

      The secret seems to be deleted by HO too early.
      Found https://github.com/openshift/hypershift/pull/3969 which may be related

      Version-Release number of selected component (if applicable):{code:none}
      
      

      How reproducible:

      Always in ROSA int environment
      
      

      Steps to Reproduce:

      1.
      2.
      3.
      

      Actual results:

      
      

      Expected results:

      
      

      Additional info:

      Patch example
                image:
                  additionalTrustedCA:
                    name: ""
                  registrySources:
                    blockedRegistries:
                    - badregistry.io
      
      

      Slack thread https://redhat-external.slack.com/archives/C01C8502FMM/p1719221463858639

            agarcial@redhat.com Alberto Garcia Lamela
            rh-ee-adecorte Andrea Decorte
            Feilian Xie Feilian Xie
            Votes:
            0 Vote for this issue
            Watchers:
            9 Start watching this issue

              Created:
              Updated: