Uploaded image for project: 'Red Hat OpenStack Services on OpenShift'
  1. Red Hat OpenStack Services on OpenShift
  2. OSPRH-19651

EDPM fails when ceilometer_agent_compute is removed from the list of containers

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Not a Bug
    • Icon: Normal Normal
    • None
    • rhos-18.0.10 FR 3
    • telemetry-operator
    • None
    • False
    • Hide

      None

      Show
      None
    • False
    • ?
    • None
    • Low

      The containers that are installed into a compute agent are controlled via the edpm_telemetry_enabled_exporters list. If we remove ceilometer_agent_compute from there (and I think others too), we have this error:

      TASK [osp.edpm.edpm_telemetry : Wait until container is up and running] ********
      task path: /usr/share/ansible/collections/ansible_collections/osp/edpm/roles/edpm_telemetry/tasks/chown_healthcheck.yml:7
      FAILED - RETRYING: [ntwcis001]: Wait until container is up and running (5 retries left).
      FAILED - RETRYING: [ntwcis001]: Wait until container is up and running (4 retries left).
      FAILED - RETRYING: [ntwcis001]: Wait until container is up and running (3 retries left).
      FAILED - RETRYING: [ntwcis001]: Wait until container is up and running (2 retries left).
      FAILED - RETRYING: [ntwcis001]: Wait until container is up and running (1 retries left).
      fatal: [ntwcis001]: FAILED! => {"attempts": 5, "changed": false, "containers": [], "stderr": "Error: no such container ceilometer_agent_compute\n", "stderr_lines": ["Error: no such container ceilometer_agent_compute"]} 

      This is because the healthcheck script gets a list of contents in the healthcheck dir (/var/lib/openstack/healthcheck) and tries to retrieve the container to set up the healthcheck script to the user that runs in the container, without checking that the container might have been disabled through the edpm_telemetry_enabled_exporters list.

      chown_healtcheck.yml needs to take in account that the container might not exist, and then just do nothing with the healthcheck script.

      The impact is reduced because this only happens on alternative setups like a networker node, where the compute agent is not needed due to Nova container not being present on the node.

              rhn-engineering-jlarriba Juan Larriba
              rhn-engineering-jlarriba Juan Larriba
              rhos-conplat-observability
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

                Created:
                Updated:
                Resolved: