Uploaded image for project: 'OpenShift Hosted Control Plane'
  1. OpenShift Hosted Control Plane
  2. HOSTEDCP-1182

Strange timeout behavior with create/destroy hostedclusters

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Undefined Undefined
    • None
    • None
    • None
    • False
    • None
    • False
    • Hide

      create:

      hypershift create cluster aws --name test --additional-tags auto-stop=stop,owner=jrickard --node-pool-replicas=3 --instance-type=m5.4xlarge --base-domain <subdomain> --pull-secret ~/.pullsecret.json --aws-creds ~/.aws/credentials --region us-west-1 --infra-id test --wait 

      delete:

      hypershift destroy cluster  aws --name test --base-domain <subdomain> --aws-creds ~/.aws/credentials --region us-west-1 --infra-id test 
      Show
      create: hypershift create cluster aws --name test --additional-tags auto-stop=stop,owner=jrickard --node-pool-replicas=3 --instance-type=m5.4xlarge --base-domain <subdomain> --pull-secret ~/.pullsecret.json --aws-creds ~/.aws/credentials --region us-west-1 --infra-id test --wait delete: hypershift destroy cluster  aws --name test --base-domain <subdomain> --aws-creds ~/.aws/credentials --region us-west-1 --infra-id test
    • 0
    • 0
    • 0

      When creating a hostedcluster with the `–wait` flag we have noticed the creation of the cluster will report a failure occasionally. However, when you check a few minutes later (oc get hostedclusters -n clusters) the cluster will be up and available. 

       

      We have seen this behavior more consistently when destroying clusters, where after 10 minutes the destroy will fail, when we run the same command again it is successful.

      Example 1 Cluster Destroy:

      Using project “default”.
      2023-08-31T03:32:41Z	INFO	Found hosted cluster	{“namespace”: “clusters”, “name”: “beekhof”}
      2023-08-31T03:32:42Z	INFO	Updated finalizer for hosted cluster	{“namespace”: “clusters”, “name”: “beekhof”}
      2023-08-31T03:32:42Z	INFO	Deleting hosted cluster	{“namespace”: “clusters”, “name”: “beekhof”}
      2023-08-31T0342Z	ERROR	Failed to destroy cluster	{“error”: “hostedcluster wasn’t finalized, aborting delete: timed out waiting for the condition”}
      github.com/spf13/cobra.(*Command).execute
      	/remote-source/app/vendor/github.com/spf13/cobra/command.go:916
      github.com/spf13/cobra.(*Command).ExecuteC
      	/remote-source/app/vendor/github.com/spf13/cobra/command.go:1044
      github.com/spf13/cobra.(*Command).Execute
      	/remote-source/app/vendor/github.com/spf13/cobra/command.go:968
      github.com/spf13/cobra.(*Command).ExecuteContext
      	/remote-source/app/vendor/github.com/spf13/cobra/command.go:961
      main.main
      	/remote-source/app/main.go:70
      runtime.main
      	/usr/lib/golang/src/runtime/proc.go:250
      Error: hostedcluster wasn’t finalized, aborting delete: timed out waiting for the condition
      hostedcluster wasn’t finalized, aborting delete: timed out waiting for the condition 

      Example 2 Cluster Destroy:

      hypershift destroy cluster aws --aws-creds ~/.aws/credentials --base-domain aws.validatedpatterns.io --region us-west-2 --destroy-cloud-resources --name claudiol-operator --infra-id claudiol-operator-vg94j
      2023-08-31T18:54:28-06:00	INFO	Found hosted cluster	{"namespace": "clusters", "name": "claudiol-operator"}
      2023-08-31T18:54:28-06:00	INFO	Updated finalizer for hosted cluster	{"namespace": "clusters", "name": "claudiol-operator"}
      2023-08-31T18:54:28-06:00	INFO	Deleting hosted cluster	{"namespace": "clusters", "name": "claudiol-operator"}
      
      2023-08-31T19:04:28-06:00	ERROR	Failed to destroy cluster	{"error": "hostedcluster wasn't finalized, aborting delete: timed out waiting for the condition"}
      github.com/spf13/cobra.(*Command).execute
      	/remote-source/app/vendor/github.com/spf13/cobra/command.go:916
      github.com/spf13/cobra.(*Command).ExecuteC
      	/remote-source/app/vendor/github.com/spf13/cobra/command.go:1044
      github.com/spf13/cobra.(*Command).Execute
      	/remote-source/app/vendor/github.com/spf13/cobra/command.go:968
      github.com/spf13/cobra.(*Command).ExecuteContext
      	/remote-source/app/vendor/github.com/spf13/cobra/command.go:961
      main.main
      	/remote-source/app/main.go:70
      runtime.main
      	/usr/lib/golang/src/runtime/proc.go:250
      Error: hostedcluster wasn't finalized, aborting delete: timed out waiting for the condition
      hostedcluster wasn't finalized, aborting delete: timed out waiting for the condition 

      OpenShift Version: 4.13.8

      Multicluster-Engine Version: stable-2.3

      If there are certain logs that are useful in troubleshooting, lmk and I'll pull them.

              Unassigned Unassigned
              rhn-jrickard Jonathan Rickard
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

                Created:
                Updated: