Loading...

XML

Word

Printable

Type: Bug
Resolution: Not a Bug
Priority: Normal
Fix Version/s: None
Affects Version/s: rhos-18.0.10 FR 3
Component/s: telemetry-operator
Labels:
None

Blocked:
False
Blocked Reason:

Hide

None

Show
None
Ready:
False
Docs Approval:
?
Regression:
None
Intelligence Requested:
Market:
PX Impact Score:

Severity:
Low

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

The containers that are installed into a compute agent are controlled via the edpm_telemetry_enabled_exporters list. If we remove ceilometer_agent_compute from there (and I think others too), we have this error:

TASK [osp.edpm.edpm_telemetry : Wait until container is up and running] ********
task path: /usr/share/ansible/collections/ansible_collections/osp/edpm/roles/edpm_telemetry/tasks/chown_healthcheck.yml:7
FAILED - RETRYING: [ntwcis001]: Wait until container is up and running (5 retries left).
FAILED - RETRYING: [ntwcis001]: Wait until container is up and running (4 retries left).
FAILED - RETRYING: [ntwcis001]: Wait until container is up and running (3 retries left).
FAILED - RETRYING: [ntwcis001]: Wait until container is up and running (2 retries left).
FAILED - RETRYING: [ntwcis001]: Wait until container is up and running (1 retries left).
fatal: [ntwcis001]: FAILED! => {"attempts": 5, "changed": false, "containers": [], "stderr": "Error: no such container ceilometer_agent_compute\n", "stderr_lines": ["Error: no such container ceilometer_agent_compute"]}

This is because the healthcheck script gets a list of contents in the healthcheck dir (/var/lib/openstack/healthcheck) and tries to retrieve the container to set up the healthcheck script to the user that runs in the container, without checking that the container might have been disabled through the edpm_telemetry_enabled_exporters list.

chown_healtcheck.yml needs to take in account that the container might not exist, and then just do nothing with the healthcheck script.

The impact is reduced because this only happens on alternative setups like a networker node, where the compute agent is not needed due to Nova container not being present on the node.

Assignee:: Juan Larriba

Reporter:: Juan Larriba

Team:: rhos-conplat-observability

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Created:: 2025/09/04 8:06 AM

Updated:: 2025/09/04 8:09 AM

Resolved:: 2025/09/04 8:09 AM

Details

Description

Attachments

Easy Agile Planning Poker

Activity

People

Dates

PagerDuty