-
Task
-
Resolution: Done
-
Undefined
-
ACM 2.10.0
-
False
-
None
-
False
-
-
-
No
Create an informative issue (See each section, incomplete templates/issues won't be triaged)
Using the current documentation as a model, please complete the issue template.
Note: Doc team updates the current version and the two previous versions (n-2). For earlier versions, we will address only high-priority, customer-reported issues for releases in support.
Prerequisite: Start with what we have
Always look at the current documentation to describe the change that is needed. Use the source or portal link for Step 4:
- Use the Customer Portal: https://access.redhat.com/documentation/en-us/red_hat_advanced_cluster_management_for_kubernetes
- Use the GitHub link to find the staged docs in the repository: https://github.com/stolostron/rhacm-docs
Describe the changes in the doc and link to your dev story
Provide info for the following steps:
1. - [x] Mandatory Add the required version to the Fix version/s field.
2. - [ ] Mandatory Choose the type of documentation change.
- [x] New topic in an existing section or new section
- [ ] Update to an existing topic
3. - [ ] Mandatory for GA content:
- [] Add steps and/or other important conceptual information here:
- [x] Add Required access level for the user to complete the task here: Admin
- [x] Add verification at the end of the task, how does the user verify success (a command to run or a result to see?)
Verify all of the hypershift CRs are removed
$ oc get hostedcluster -A No resources found $ oc get hostedcontrolplane -A No resources found $ oc get nodepool -A No resources found
Verify all agents are ready to be reused
$ oc get agent -A -ojsonpath='{.items[*]}' | jq '{"agent name": .metadata.name, "agentMachineRef label": .metadata.labels["agentMachineRef"], "name": .status.inventory.hostname, "validations": [(.status.conditions[] | (select(.type == "Bound")), select(.type == "Validated") | .type + " " + .status )]}'
Example output:
{ "agent name": "173357da-436a-409c-ac41-ba913cee49b1", "agentMachineRef label": null, "name": "worker-0-2", "validations": [ "Validated True", "Bound False" ] }
agentMachineRef should be null, Bound should be False, Validated should be True
- [x] Add link to dev story here: https://issues.redhat.com/browse/OCPBUGS-29854
4. - [ ] Mandatory for bugs: What is the diff? Clearly define what the problem is, what the change is, and link to the current documentation:
This is a new troubleshooting topic for ACM 2.10 and might be added to this section: https://access.redhat.com/documentation/en-us/red_hat_advanced_cluster_management_for_kubernetes/2.9/html/clusters/cluster_mce_overview#troubleshooting-mce
Suggested title: Troubleshooting failure to destroy hosted control plane clusters (agent platform)
Symptom:
When destroying a hosted control plane cluster (agent platform) fails with an error similar to this:
$ hcp destroy cluster agent --name hosted-0 --cluster-grace-period 20m0s 2024-02-22T09:36:19-05:00 INFO Found hosted cluster {"namespace": "clusters", "name": "hosted-0"} 2024-02-22T09:36:19-05:00 INFO Deleting hosted cluster {"namespace": "clusters", "name": "hosted-0"} 2024-02-22T09:56:19-05:00 ERROR HostedCluster deletion failed {"namespace": "clusters", "name": "hosted-0", "error": "context deadline exceeded"} github.com/openshift/hypershift/cmd/cluster/core.waitForClusterDeletion /home/hypershift/cmd/cluster/core/destroy.go:268 github.com/openshift/hypershift/cmd/cluster/core.DestroyCluster /home/hypershift/cmd/cluster/core/destroy.go:130 github.com/openshift/hypershift/cmd/cluster/none.DestroyCluster /home/hypershift/cmd/cluster/none/destroy.go:47 github.com/openshift/hypershift/cmd/cluster/agent.DestroyCluster /home/hypershift/cmd/cluster/agent/destroy.go:40 github.com/openshift/hypershift/product-cli/cmd/cluster/agent.NewDestroyCommand.func1 /home/hypershift/product-cli/cmd/cluster/agent/destroy.go:19 github.com/spf13/cobra.(*Command).execute /home/hypershift/vendor/github.com/spf13/cobra/command.go:983 github.com/spf13/cobra.(*Command).ExecuteC /home/hypershift/vendor/github.com/spf13/cobra/command.go:1115 github.com/spf13/cobra.(*Command).Execute /home/hypershift/vendor/github.com/spf13/cobra/command.go:1039 github.com/spf13/cobra.(*Command).ExecuteContext /home/hypershift/vendor/github.com/spf13/cobra/command.go:1032 main.main /home/hypershift/product-cli/main.go:60 runtime.main /home/hypershift/go/src/runtime/proc.go:267 2024-02-22T09:56:19-05:00 ERROR Failed to destroy cluster {"error": "context deadline exceeded"} github.com/openshift/hypershift/product-cli/cmd/cluster/agent.NewDestroyCommand.func1 /home/hypershift/product-cli/cmd/cluster/agent/destroy.go:20 github.com/spf13/cobra.(*Command).execute /home/hypershift/vendor/github.com/spf13/cobra/command.go:983 github.com/spf13/cobra.(*Command).ExecuteC /home/hypershift/vendor/github.com/spf13/cobra/command.go:1115 github.com/spf13/cobra.(*Command).Execute /home/hypershift/vendor/github.com/spf13/cobra/command.go:1039 github.com/spf13/cobra.(*Command).ExecuteContext /home/hypershift/vendor/github.com/spf13/cobra/command.go:1032 main.main /home/hypershift/product-cli/main.go:60 runtime.main /home/hypershift/go/src/runtime/proc.go:267 Error: context deadline exceeded context deadline exceeded
If there are remaining Machine CR(s), but no AgentMachine CRs:
$ oc get machine -A NAMESPACE NAME CLUSTER NODENAME PROVIDERID PHASE AGE VERSION clusters-hosted-0 hosted-0-9gg8b hosted-0-nhdbp Deleting 10h 4.15.0-rc.8 $ oc get agentmachine -A No resources found
Resolving the problem: Manually remove the remaining Machine CR(s)
- Edit the Machine CR(s) and remove the finalizer
$ oc edit machine -A
- Re-run the hcp destroy cluster command
$ hcp destroy cluster agent --name hosted-0 --cluster-grace-period 20m0s
- documents
-
OCPBUGS-29854 Agent-based HostedCluster deletion failing
- Closed