Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-17846

hypershift cluster could not be created successfully with TP feature gate enabled on management cluster

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Obsolete
    • Icon: Normal Normal
    • None
    • 4.14
    • HyperShift
    • Quality / Stability / Reliability
    • False
    • Hide

      None

      Show
      None
    • None
    • Important
    • No
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      Description of problem:

      When TechPreviewNoUpgrade is enabled on the management cluster, I can not create an  hosted cluster successfully (aws platform in this failure)

      Version-Release number of selected component (if applicable):

      4.14

      How reproducible:

      100%

      Steps to Reproduce:

      1. create a management cluster with featureGate TechPreviewNoUpgrade
      2. install hypershift operator and create a hosted cluster on the management cluster 
      
      HO version:
      
      {"level":"info","ts":"2023-08-16T05:38:31Z","logger":"setup","msg":"Starting hypershift-operator-manager","version":"openshift/hypershift: 57b3cb46c41000033aaaa08480ed2256701c033b. Latest supported OCP: 4.14.0"}
      
      # install HO
      bin/hypershift install \
          --hypershift-image=registry.build05.ci.openshift.org/ci-op-d1rbyhk6/pipeline@sha256:2ee89ab56aa2a48293ae462120a3b9afc7c2deb6bdaa9d3d5452d162a8f46d39 \
          --oidc-storage-provider-s3-credentials=/var/run/secrets/ci.openshift.io/cluster-profile/.awscred \
          --oidc-storage-provider-s3-bucket-name=0491705694d18d9fdaef \
          --oidc-storage-provider-s3-region=us-east-1 \
          --wait-until-available
      
      # create. 
      /usr/bin/hypershift create cluster aws \
          --image-content-sources /tmp/secret/mgmt_iscp.yaml \
          --name 0491705694d18d9fdaef \
          --node-pool-replicas 3 \
          --instance-type m5.xlarge \
          --base-domain qe.devcluster.openshift.com \
          --region us-east-1 \
          --control-plane-availability-policy HighlyAvailable \
          --infra-availability-policy HighlyAvailable \
          --pull-secret=/etc/ci-pull-credentials/.dockerconfigjson \
          --aws-creds=/var/run/secrets/ci.openshift.io/cluster-profile/.awscred \
          --release-image registry.build05.ci.openshift.org/ci-op-d1rbyhk6/release@sha256:8a5507bf897252cab6d1957d9477bce45e7427f4f798450605d3503aed936594 \
          --additional-tags=expirationDate=2023-08-16T09:39+00:00
      
      

      Actual results:

      Here are some error logs in HO:
      
      {"level":"error","ts":"2023-08-16T05:59:06Z","msg":"Failed to reconcile NodePool","controller":"nodepool","controllerGroup":"hypershift.openshift.io","controllerKind":"NodePool","NodePool":{"name":"0491705694d18d9fdaef-us-east-1a","namespace":"clusters"},"namespace":"clusters","name":"0491705694d18d9fdaef-us-east-1a","reconcileID":"2e9d0bbe-2ed4-4e76-97df-e0ae09396fc2","error":"admission webhook \"validation.awsmachinetemplate.infrastructure.cluster.x-k8s.io\" denied the request: AWSMachineTemplate.infrastructure.cluster.x-k8s.io \"0491705694d18d9fdaef-us-east-1a\" is invalid: spec.template.spec.cloudInit.secureSecretsBackend: Forbidden: cannot be set if spec.template.spec.cloudInit.insecureSkipSecretsManager is true","stacktrace":"github.com/openshift/hypershift/hypershift-operator/controllers/nodepool.(*NodePoolReconciler).Reconcile\n\t/hypershift/hypershift-operator/controllers/nodepool/nodepool_controller.go:215\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile\n\t/hypershift/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:121\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/hypershift/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:320\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/hypershift/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:273\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/hypershift/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:234"}

      Expected results:

      The hosted cluster can be created successfully

      Additional info:

      slack discussion:

      https://redhat-internal.slack.com/archives/C01C8502FMM/p1692167982285039

       

      From the slack discussion:

      "so we (Hypershift) don’t install the cluster-api-provider-aws webhook, which explains why we didn’t see this error before.

      But something here is installing the webhook on the mgmt cluster https://github.com/openshift/cluster-api-provider-aws/blob/master/config/webhook/manifests.yaml"

       

              Unassigned Unassigned
              rhn-support-heli He Liu
              None
              None
              He Liu He Liu
              None
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

                Created:
                Updated:
                Resolved: