Uploaded image for project: 'Red Hat OpenStack Services on OpenShift'
  1. Red Hat OpenStack Services on OpenShift
  2. OSPRH-1052

Detect whether the Services are down

XMLWordPrintable

    • detect downtime
    • False
    • Hide

      None

      Show
      None
    • False
    • OBSDA-574Container health feature
    • Committed
    • Committed
    • To Do
    • OBSDA-574 - Container health feature
    • Committed
    • Committed
    • 0% To Do, 0% In Progress, 100% Done
    • Hide
      .Improved metrics for RHOSO Observability

      You can now use new metrics for monitoring the health of RHOSO services, including the following:

      * `kube_pod_status_phase`
      * `kube_pod_status_ready`
      * `node_systemd_unit_state`
      * `podman_container_state`
      * `podman_container_health`

      You can use the `kube_pod_status_phase` and `kube_pod_status_ready` to monitor control plane services.

      * `kube_pod_status_phase` - The relevant parameter is `Phase`, with values of Pending, Running, Succeeded, Failed, or Unknown, and corresponding Boolean values of `1` or `0`.

      * `kube_pod_status_ready` - This metric also has Boolean values, with `1` indicating that the pod has all the containers running and readiness probes succeeding, and `0` indicating that the pod has not all the containers running or that the readiness probe did not succeed.

      You can use the `node_systemd_unit_state` to monitor the running state of data plane services.

      * `node_systemd_unit_state ` - The relevant parameter is `State`, with values of activating, active, deactivating, failed, inactive, and corresponding Boolean values of `1` or `0`.

      You can use the `podman_container_state` and `podman_container_health` to monitor the health of data plane containerized services.
       
      * `podman_container_state` - This metric can have the following values: -1=unknown, 0=created, 1=initialized, 2=running, 3=stopped, 4=paused, 5=exited, 6=removing, 7=stopping.

      * `podman_container_health` - This metric can have the following values: -1=unknown, 0=healthy, 1=unhealthy, 2=starting.




      Show
      .Improved metrics for RHOSO Observability You can now use new metrics for monitoring the health of RHOSO services, including the following: * `kube_pod_status_phase` * `kube_pod_status_ready` * `node_systemd_unit_state` * `podman_container_state` * `podman_container_health` You can use the `kube_pod_status_phase` and `kube_pod_status_ready` to monitor control plane services. * `kube_pod_status_phase` - The relevant parameter is `Phase`, with values of Pending, Running, Succeeded, Failed, or Unknown, and corresponding Boolean values of `1` or `0`. * `kube_pod_status_ready` - This metric also has Boolean values, with `1` indicating that the pod has all the containers running and readiness probes succeeding, and `0` indicating that the pod has not all the containers running or that the readiness probe did not succeed. You can use the `node_systemd_unit_state` to monitor the running state of data plane services. * `node_systemd_unit_state ` - The relevant parameter is `State`, with values of activating, active, deactivating, failed, inactive, and corresponding Boolean values of `1` or `0`. You can use the `podman_container_state` and `podman_container_health` to monitor the health of data plane containerized services.   * `podman_container_state` - This metric can have the following values: -1=unknown, 0=created, 1=initialized, 2=running, 3=stopped, 4=paused, 5=exited, 6=removing, 7=stopping. * `podman_container_health` - This metric can have the following values: -1=unknown, 0=healthy, 1=unhealthy, 2=starting.
    • Feature
    • Done
    • Proposed

      Sensubility used to do this in pre-18 environments, but it has been removed from the OSP18 release so we cannot rely on it anymore.

      We need to detect and generate some kind of metric for when a Service is not responding.

      This has two sides with very different particularities, which means we will mostly for sure need two separate solutions:

      • Control Plane: Try to use OpenShift/Kubernetes features to achieve this
      • Compute nodes: It seems that maybe a dedicated exporter might be needed

              mmagr@redhat.com Martin Magr
              rhn-engineering-jlarriba Juan Larriba
              rhos-dfg-cloudops
              Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

                Created:
                Updated:
                Resolved: