Uploaded image for project: 'Red Hat OpenStack Services on OpenShift'
  1. Red Hat OpenStack Services on OpenShift
  2. OSPRH-10800

Autoscaling doesn't get configured to use TLS for reaching Prometheus

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Normal Normal
    • None
    • rhos-18.0 Feature Release 1 (Nov 2024)
    • telemetry-operator
    • None
    • 0
    • False
    • Hide

      None

      Show
      None
    • False
    • ?
    • ?
    • ?
    • ?
    • None
    • 1
    • CloudOps 2024 Sprint 24
    • Low

      Autoscaling doesn't get properly configured to reach Prometheus with TLS when the autoscaling resource is created before the metricstorage resource.

      Replication steps:

      1. Disable autoscaling and metric storage
      2. Enable autoscaling
      3. Wait for autoscaling to fully start
      4. Enable metric storage
      5. Try creating an alarm

      There should be an error in the aodh-evaluator logs similar to the following:

      2024-10-18 19:58:56.947 16 ERROR aodh.evaluator [-] Failed to evaluate alarm 89b8de0b-8ce3-40e4-8cb6-d676209ba5e2: observabilityclient.prometheus_client.PrometheusAPIClientError: [400] Client sent an HTTP request to an HTTPS server.
      2024-10-18 19:58:56.947 16 ERROR aodh.evaluator Traceback (most recent call last):
      2024-10-18 19:58:56.947 16 ERROR aodh.evaluator   File "/usr/lib/python3.9/site-packages/aodh/evaluator/__init__.py", line 297, in _evaluate_alarm
      2024-10-18 19:58:56.947 16 ERROR aodh.evaluator     self.evaluators[alarm.type].obj.evaluate(alarm)
      2024-10-18 19:58:56.947 16 ERROR aodh.evaluator   File "/usr/lib/python3.9/site-packages/aodh/evaluator/threshold.py", line 174, in evaluate
      2024-10-18 19:58:56.947 16 ERROR aodh.evaluator     evaluation = self.evaluate_rule(alarm.rule)
      2024-10-18 19:58:56.947 16 ERROR aodh.evaluator   File "/usr/lib/python3.9/site-packages/aodh/evaluator/prometheus.py", line 64, in evaluate_rule
      2024-10-18 19:58:56.947 16 ERROR aodh.evaluator     metrics = self._get_metric_data(alarm_rule['query'])
      2024-10-18 19:58:56.947 16 ERROR aodh.evaluator   File "/usr/lib/python3.9/site-packages/aodh/evaluator/prometheus.py", line 47, in _get_metric_data
      2024-10-18 19:58:56.947 16 ERROR aodh.evaluator     return self._prom.query.query(query, disable_rbac=self._no_rbac)
      2024-10-18 19:58:56.947 16 ERROR aodh.evaluator   File "/usr/lib/python3.9/site-packages/observabilityclient/v1/python_api.py", line 66, in query
      2024-10-18 19:58:56.947 16 ERROR aodh.evaluator     query = self.client.rbac.enrich_query(query, disable_rbac=disable_rbac)
      2024-10-18 19:58:56.947 16 ERROR aodh.evaluator   File "/usr/lib/python3.9/site-packages/observabilityclient/v1/rbac.py", line 103, in enrich_query
      2024-10-18 19:58:56.947 16 ERROR aodh.evaluator     metric_names = self.client.query.list(disable_rbac=False)
      2024-10-18 19:58:56.947 16 ERROR aodh.evaluator   File "/usr/lib/python3.9/site-packages/observabilityclient/v1/python_api.py", line 31, in list
      2024-10-18 19:58:56.947 16 ERROR aodh.evaluator     metrics = self.prom.series(match)
      2024-10-18 19:58:56.947 16 ERROR aodh.evaluator   File "/usr/lib/python3.9/site-packages/observabilityclient/prometheus_client.py", line 118, in series
      2024-10-18 19:58:56.947 16 ERROR aodh.evaluator     decoded = self._get("series", {"match[]": matches})
      2024-10-18 19:58:56.947 16 ERROR aodh.evaluator   File "/usr/lib/python3.9/site-packages/observabilityclient/prometheus_client.py", line 77, in _get
      2024-10-18 19:58:56.947 16 ERROR aodh.evaluator     raise PrometheusAPIClientError(resp)
      2024-10-18 19:58:56.947 16 ERROR aodh.evaluator observabilityclient.prometheus_client.PrometheusAPIClientError: [400] Client sent an HTTP request to an HTTPS server.
      2024-10-18 19:58:56.947 16 ERROR aodh.evaluator 
      2024-10-18 19:58:56.947 16 ERROR aodh.evaluator 
      2024-10-18 19:58:56.951 16 DEBUG aodh.evaluator [-] Evaluating alarm 6ca7258f-b9c3-4351-9c71-f1690f11b549 _evaluate_alarm /usr/lib/python3.9/site-packages/aodh/evaluator/__init__.py:295 

      WA:

      Delete the autoscaling resource with `oc delete autoscaling autoscaling`. It'll get automatically recreated with the correct config.

       

      Proposed fix:

      The telemetry-operator's autoscaling controller should watch the metricstorage resource.

            rh-ee-jwysogla Jaromir Wysoglad
            rh-ee-jwysogla Jaromir Wysoglad
            rhos-dfg-cloudops
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated: