Loading...

XML

Word

Printable

Type: Task
Resolution: Done
Priority: Undefined
Fix Version/s: MCE 2.5.0
Affects Version/s: ACM 2.10.0
Component/s: Documentation, HyperShift
Labels:
- doc-ack

Blocked:
False
Blocked Reason:
None
Ready:
False
Intelligence Requested:
Market:

Regression:
No

SFDC Cases Links:
SFDC Cases Counter:
SFDC Cases Open:

Create an informative issue (See each section, incomplete templates/issues won't be triaged)

Using the current documentation as a model, please complete the issue template.

Note: Doc team updates the current version and the two previous versions (n-2). For earlier versions, we will address only high-priority, customer-reported issues for releases in support.

Prerequisite: Start with what we have

Always look at the current documentation to describe the change that is needed. Use the source or portal link for Step 4:

- Use the Customer Portal: https://access.redhat.com/documentation/en-us/red_hat_advanced_cluster_management_for_kubernetes

- Use the GitHub link to find the staged docs in the repository: https://github.com/stolostron/rhacm-docs

Describe the changes in the doc and link to your dev story

Provide info for the following steps:

1. - [x] Mandatory Add the required version to the Fix version/s field.

2. - [ ] Mandatory Choose the type of documentation change.

- [x] New topic in an existing section or new section
- [ ] Update to an existing topic

3. - [ ] Mandatory for GA content:

- [] Add steps and/or other important conceptual information here:

- [x] Add Required access level for the user to complete the task here: Admin

- [x] Add verification at the end of the task, how does the user verify success (a command to run or a result to see?)
Verify all of the hypershift CRs are removed

$ oc get hostedcluster -A
No resources found
$ oc get hostedcontrolplane -A
No resources found
$ oc get nodepool -A
No resources found

Verify all agents are ready to be reused

$ oc get agent -A -ojsonpath='{.items[*]}' | jq '{"agent name": .metadata.name, "agentMachineRef label": .metadata.labels["agentMachineRef"], "name": .status.inventory.hostname, "validations": [(.status.conditions[] | (select(.type == "Bound")), select(.type == "Validated") | .type + " " + .status )]}'

Example output:

{
  "agent name": "173357da-436a-409c-ac41-ba913cee49b1",
  "agentMachineRef label": null,
  "name": "worker-0-2",
  "validations": [
    "Validated True",
    "Bound False"
  ]
}

agentMachineRef should be null, Bound should be False, Validated should be True

- [x] Add link to dev story here: https://issues.redhat.com/browse/OCPBUGS-29854

4. - [ ] Mandatory for bugs: What is the diff? Clearly define what the problem is, what the change is, and link to the current documentation:

This is a new troubleshooting topic for ACM 2.10 and might be added to this section: https://access.redhat.com/documentation/en-us/red_hat_advanced_cluster_management_for_kubernetes/2.9/html/clusters/cluster_mce_overview#troubleshooting-mce

Suggested title: Troubleshooting failure to destroy hosted control plane clusters (agent platform)

Symptom:

When destroying a hosted control plane cluster (agent platform) fails with an error similar to this:

$ hcp destroy cluster agent --name hosted-0 --cluster-grace-period 20m0s
2024-02-22T09:36:19-05:00    INFO    Found hosted cluster    {"namespace": "clusters", "name": "hosted-0"}
2024-02-22T09:36:19-05:00    INFO    Deleting hosted cluster    {"namespace": "clusters", "name": "hosted-0"}
2024-02-22T09:56:19-05:00    ERROR    HostedCluster deletion failed    {"namespace": "clusters", "name": "hosted-0", "error": "context deadline exceeded"}
github.com/openshift/hypershift/cmd/cluster/core.waitForClusterDeletion
    /home/hypershift/cmd/cluster/core/destroy.go:268
github.com/openshift/hypershift/cmd/cluster/core.DestroyCluster
    /home/hypershift/cmd/cluster/core/destroy.go:130
github.com/openshift/hypershift/cmd/cluster/none.DestroyCluster
    /home/hypershift/cmd/cluster/none/destroy.go:47
github.com/openshift/hypershift/cmd/cluster/agent.DestroyCluster
    /home/hypershift/cmd/cluster/agent/destroy.go:40
github.com/openshift/hypershift/product-cli/cmd/cluster/agent.NewDestroyCommand.func1
    /home/hypershift/product-cli/cmd/cluster/agent/destroy.go:19
github.com/spf13/cobra.(*Command).execute
    /home/hypershift/vendor/github.com/spf13/cobra/command.go:983
github.com/spf13/cobra.(*Command).ExecuteC
    /home/hypershift/vendor/github.com/spf13/cobra/command.go:1115
github.com/spf13/cobra.(*Command).Execute
    /home/hypershift/vendor/github.com/spf13/cobra/command.go:1039
github.com/spf13/cobra.(*Command).ExecuteContext
    /home/hypershift/vendor/github.com/spf13/cobra/command.go:1032
main.main
    /home/hypershift/product-cli/main.go:60
runtime.main
    /home/hypershift/go/src/runtime/proc.go:267
2024-02-22T09:56:19-05:00    ERROR    Failed to destroy cluster    {"error": "context deadline exceeded"}
github.com/openshift/hypershift/product-cli/cmd/cluster/agent.NewDestroyCommand.func1
    /home/hypershift/product-cli/cmd/cluster/agent/destroy.go:20
github.com/spf13/cobra.(*Command).execute
    /home/hypershift/vendor/github.com/spf13/cobra/command.go:983
github.com/spf13/cobra.(*Command).ExecuteC
    /home/hypershift/vendor/github.com/spf13/cobra/command.go:1115
github.com/spf13/cobra.(*Command).Execute
    /home/hypershift/vendor/github.com/spf13/cobra/command.go:1039
github.com/spf13/cobra.(*Command).ExecuteContext
    /home/hypershift/vendor/github.com/spf13/cobra/command.go:1032
main.main
    /home/hypershift/product-cli/main.go:60
runtime.main
    /home/hypershift/go/src/runtime/proc.go:267
Error: context deadline exceeded
context deadline exceeded

If there are remaining Machine CR(s), but no AgentMachine CRs:

$ oc get machine -A
NAMESPACE           NAME             CLUSTER          NODENAME   PROVIDERID   PHASE      AGE   VERSION
clusters-hosted-0   hosted-0-9gg8b   hosted-0-nhdbp                           Deleting   10h   4.15.0-rc.8

$ oc get agentmachine -A
No resources found

Resolving the problem: Manually remove the remaining Machine CR(s)

Edit the Machine CR(s) and remove the finalizer

$ oc edit machine -A

Re-run the hcp destroy cluster command

$ hcp destroy cluster agent --name hosted-0 --cluster-grace-period 20m0s

documents

OCPBUGS-29854 Agent-based HostedCluster deletion failing

Closed

Assignee:: Servesha Dudhgaonkar

Reporter:: Crystal Chun

QA Contact:: David Huynh

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Created:: 2024/03/05 5:01 PM

Updated:: 2024/03/18 1:07 PM

Resolved:: 2024/03/18 1:07 PM

Details

Description

Create an informative issue (See each section, incomplete templates/issues won't be triaged)

Prerequisite: Start with what we have

Describe the changes in the doc and link to your dev story

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates