-
Bug
-
Resolution: Done
-
Normal
-
rhos-18.0.0
-
None
-
False
-
-
False
-
?
-
?
-
?
-
?
-
None
-
-
-
Moderate
The Data Plane with enabled telemetry can fail on the HCI nodes if the Ceph has been previously deployed. Error:
fatal: [edpm-compute-2]: FAILED! =>
[root@edpm-compute-0 ~]# journalctl -u edpm_node_exporter.service
Oct 08 16:05:52 edpm-compute-0 systemd[1]: Starting node_exporter container...
Oct 08 16:05:54 edpm-compute-0 podman[336628]: Error: unable to start container "a8201d4b08a4067a242771a99ecda59314405cafe7d669703f22ef956ab318ed": cannot listen on the TCP port: listen tcp4 :9100: bind: address already in use
Oct 08 16:05:54 edpm-compute-0 systemd[1]: edpm_node_exporter.service: Control process exited, code=exited, status=125/n/a
Investigating HCI node shows that node_exporter has already been running from prior ceph deployment:
[root@edpm-compute-0 ~]# netstat -ltnp | grep 9100
tcp6 0 0 :::9100 :::* LISTEN 345974/node_exporte
Since ceph and openstack are no longer managed from the same lifecycle tooling it is easy to get into this type of situation with OSP18 and HCI config.
The workaround:
One could disable node-exporter from ceph:
[root@edpm-compute-0 ~]# ceph orch rm node-exporter
or deploy it with monitoring disabled:
--skip-monitoring-stack
but none of this seems to be documented. Also I don't know what is the right solution if both monitoring systems are required.