Loading...

Type: Bug
Resolution: Unresolved
Priority: Undefined
Fix Version/s: None
Affects Version/s: None
Component/s: None
Labels:
- etcd
- validated-pattern

Blocked:
False
Blocked Reason:
None
Ready:
False
Steps to Reproduce:
Hide

create:

hypershift create cluster aws --name test --additional-tags auto-stop=stop,owner=jrickard --node-pool-replicas=3 --instance-type=m5.4xlarge --base-domain <subdomain> --pull-secret ~/.pullsecret.json --aws-creds ~/.aws/credentials --region us-west-1 --infra-id test --wait

delete:

hypershift destroy cluster aws --name test --base-domain <subdomain> --aws-creds ~/.aws/credentials --region us-west-1 --infra-id test
Show
create: hypershift create cluster aws --name test --additional-tags auto-stop=stop,owner=jrickard --node-pool-replicas=3 --instance-type=m5.4xlarge --base-domain <subdomain> --pull-secret ~/.pullsecret.json --aws-creds ~/.aws/credentials --region us-west-1 --infra-id test --wait delete: hypershift destroy cluster aws --name test --base-domain <subdomain> --aws-creds ~/.aws/credentials --region us-west-1 --infra-id test
Intelligence Requested:
Market:

Cost of Delay:
0
WSJF:
0
Risk Score:
0

SFDC Cases Links:
SFDC Cases Open:
SFDC Cases Counter:

When creating a hostedcluster with the `–wait` flag we have noticed the creation of the cluster will report a failure occasionally. However, when you check a few minutes later (oc get hostedclusters -n clusters) the cluster will be up and available.

We have seen this behavior more consistently when destroying clusters, where after 10 minutes the destroy will fail, when we run the same command again it is successful.

Example 1 Cluster Destroy:

Using project “default”.
2023-08-31T03:32:41Z	INFO	Found hosted cluster	{“namespace”: “clusters”, “name”: “beekhof”}
2023-08-31T03:32:42Z	INFO	Updated finalizer for hosted cluster	{“namespace”: “clusters”, “name”: “beekhof”}
2023-08-31T03:32:42Z	INFO	Deleting hosted cluster	{“namespace”: “clusters”, “name”: “beekhof”}
2023-08-31T0342Z	ERROR	Failed to destroy cluster	{“error”: “hostedcluster wasn’t finalized, aborting delete: timed out waiting for the condition”}
github.com/spf13/cobra.(*Command).execute
	/remote-source/app/vendor/github.com/spf13/cobra/command.go:916
github.com/spf13/cobra.(*Command).ExecuteC
	/remote-source/app/vendor/github.com/spf13/cobra/command.go:1044
github.com/spf13/cobra.(*Command).Execute
	/remote-source/app/vendor/github.com/spf13/cobra/command.go:968
github.com/spf13/cobra.(*Command).ExecuteContext
	/remote-source/app/vendor/github.com/spf13/cobra/command.go:961
main.main
	/remote-source/app/main.go:70
runtime.main
	/usr/lib/golang/src/runtime/proc.go:250
Error: hostedcluster wasn’t finalized, aborting delete: timed out waiting for the condition
hostedcluster wasn’t finalized, aborting delete: timed out waiting for the condition

Example 2 Cluster Destroy:

hypershift destroy cluster aws --aws-creds ~/.aws/credentials --base-domain aws.validatedpatterns.io --region us-west-2 --destroy-cloud-resources --name claudiol-operator --infra-id claudiol-operator-vg94j
2023-08-31T18:54:28-06:00	INFO	Found hosted cluster	{"namespace": "clusters", "name": "claudiol-operator"}
2023-08-31T18:54:28-06:00	INFO	Updated finalizer for hosted cluster	{"namespace": "clusters", "name": "claudiol-operator"}
2023-08-31T18:54:28-06:00	INFO	Deleting hosted cluster	{"namespace": "clusters", "name": "claudiol-operator"}

2023-08-31T19:04:28-06:00	ERROR	Failed to destroy cluster	{"error": "hostedcluster wasn't finalized, aborting delete: timed out waiting for the condition"}
github.com/spf13/cobra.(*Command).execute
	/remote-source/app/vendor/github.com/spf13/cobra/command.go:916
github.com/spf13/cobra.(*Command).ExecuteC
	/remote-source/app/vendor/github.com/spf13/cobra/command.go:1044
github.com/spf13/cobra.(*Command).Execute
	/remote-source/app/vendor/github.com/spf13/cobra/command.go:968
github.com/spf13/cobra.(*Command).ExecuteContext
	/remote-source/app/vendor/github.com/spf13/cobra/command.go:961
main.main
	/remote-source/app/main.go:70
runtime.main
	/usr/lib/golang/src/runtime/proc.go:250
Error: hostedcluster wasn't finalized, aborting delete: timed out waiting for the condition
hostedcluster wasn't finalized, aborting delete: timed out waiting for the condition

OpenShift Version: 4.13.8

Multicluster-Engine Version: stable-2.3

If there are certain logs that are useful in troubleshooting, lmk and I'll pull them.

Details

Description

Attachments

Easy Agile Planning Poker

Activity

People

Dates