Loading...

XML

Word

Printable

Type: Bug
Resolution: Unresolved
Priority: Normal
Fix Version/s: None
Affects Version/s: None
Component/s: openstack-operator-dataplane
Labels:
None

Blocked:
False
Blocked Reason:

Hide

None

Show
None
Ready:
False
Docs Approval:
?
AssignedTeam:
rhos-ops-day1day2-edpm
Regression:
None
Intelligence Requested:
Market:

Severity:
Informational

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

The OpenStackDataPlaneDeployment controller could get stuck if a job fails to create or gets deleted while the deployment is in progress. This seems due to the fact that the NodeSet conditions on the Deployment and the job hashes on the Deployment are used to determine if a job should be spawned or not.

lib-common's DoJob looks at the beforeHash which is passed in from ansible_execution.go's func AnsibleExecution. Since the hash is already saved on the Deployment, DoJob sees no change and doesn't CreateOrPatch the job, even if it's missing.

Furthermore, AnsibleExecution only gets called by DeployService, which in turn will only get called if the NodeSet condition for the job on the Deployment is unknown (not if set to False).

This is mainly a debug/dev thing since deleting a job should never happen in practice, however if the job does fail to create for some reason, I could see this causing the Deployment to get stuck.

The way to workaround it is to edit the Deployment's status subresource and delete the hash for the job and delete the NodeSet condition for the job. We could fix this by instead of guarding the job creation by the hashes and conditions, just always CreateOrPatch the job if it doesn't exist.

Assignee:: Unassigned

Reporter:: James Slagle

Team:: rhos-dfg-df

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Created:: 2025/04/02 3:41 PM

Updated:: 2025/06/02 8:04 PM

Details

Description

Attachments

Easy Agile Planning Poker

Activity

People

Dates

PagerDuty