[LOG-1098] Ensure Critical alerts have complete playbook entries

Type: Task
Resolution: Done
Priority: Major
Fix Version/s: Logging 5.1
Affects Version/s: None
Component/s: Log Storage
Labels:
None

Story Points:
3
Blocked:
False
Ready:
False
Epic Link:
[ES] Deep Insights
Docs QE Status:
NEW
QE Status:
NEW
Release Note Text:
Undefined
Market:

Sprint:
Logging (LogExp) - Sprint 200

SFDC Cases Links:
SFDC Cases Open:
SFDC Cases Counter:

Provide copy/paste commands to issue

easy to follow checks/next steps (think needing to wake up in the middle of the night to resolve)

Current upstream doc is here: https://github.com/openshift/elasticsearch-operator/blob/master/docs/alerts.md

(Currently we are focusing on Critical in this task to make this more bite-sized – we will have follow up for warning and info as well)

Acceptance Criteria:

Ensure that any alerts that are marked as critical have proper diagnostic steps and action steps so that an user can resolve the alert
Possibly also ensure that our alerts are using the run{{book_url}} variable to point to our playbook location (tbd if this will be upstream or downstream)

is documented by

RHDEVDOCS-2791 Document: Ensure Critical alerts have complete playbook entries

Closed

links to

openshift/elasticsearch-operator#673: [WIP] Log 1098 - Playbook for Critical Alerts

openshift/elasticsearch-operator#701: Critical alerts playbook fixes

Errata Tool added a comment - 2021/07/21 2:59 PM

This issue has been addressed in the following products:

OpenShift Logging 5.1

Via ~~RHBA-2021~~:2112 https://access.redhat.com/errata/RHBA-2021:2112

Errata Tool added a comment - 2021/07/21 2:59 PM This issue has been addressed in the following products: OpenShift Logging 5.1 Via RHBA-2021 :2112 https://access.redhat.com/errata/RHBA-2021:2112

Qiaoling Tang added a comment - 2021/05/06 2:48 AM

The new changes lgtm.

Qiaoling Tang added a comment - 2021/05/06 2:48 AM The new changes lgtm.

Qiaoling Tang added a comment - 2021/04/19 7:46 AM

Hi sasagarw@redhat.com,

I did some testing based on the changes in https://github.com/openshift/elasticsearch-operator/pull/673/files , and found some issues about lowering the number of replicas:

1. Executing ` oc exec -n openshift-logging -c elasticsearch <elasticsearch_pod_name> – es_util --query=<elasticsearch_index_name>/_settings?pretty -X PUT -d '{"index.number_of_replicas":<number_of_replicas>}'` won't take effect, the EO will always rewrite the setting.

2. When `Elasticsearch Node Disk Flood Watermark Reached`, executing command `oc exec -n openshift-logging -c elasticsearch <elasticsearch_pod_name> – es_util --query=<elasticsearch_index_name>/_settings?pretty -X PUT -d '{"index.number_of_replicas":<number_of_replicas>}'` will get below error:

{
  "error" : {
    "root_cause" : [
      {
        "type" : "cluster_block_exception",
        "reason" : "blocked by: [FORBIDDEN/12/index read-only / allow delete (api)];"
      }
    ],
    "type" : "cluster_block_exception",
    "reason" : "blocked by: [FORBIDDEN/12/index read-only / allow delete (api)];"
  },
  "status" : 403
}

Qiaoling Tang added a comment - 2021/04/19 7:46 AM Hi sasagarw@redhat.com , I did some testing based on the changes in https://github.com/openshift/elasticsearch-operator/pull/673/files , and found some issues about lowering the number of replicas: 1. Executing ` oc exec -n openshift-logging -c elasticsearch <elasticsearch_pod_name> – es_util --query=<elasticsearch_index_name>/_settings?pretty -X PUT -d '{"index.number_of_replicas":<number_of_replicas>}'` won't take effect, the EO will always rewrite the setting. 2. When `Elasticsearch Node Disk Flood Watermark Reached`, executing command `oc exec -n openshift-logging -c elasticsearch <elasticsearch_pod_name> – es_util --query=<elasticsearch_index_name>/_settings?pretty -X PUT -d '{"index.number_of_replicas":<number_of_replicas>}'` will get below error: { "error" : { "root_cause" : [ { "type" : "cluster_block_exception" , "reason" : "blocked by: [FORBIDDEN/12/index read-only / allow delete (api)];" } ], "type" : "cluster_block_exception" , "reason" : "blocked by: [FORBIDDEN/12/index read-only / allow delete (api)];" }, "status" : 403 }

Eric Wolinetz (Inactive) added a comment - 2021/03/10 10:36 AM

rdlugyhe that sounds good to me, i think we should also have a line item that is specific for sasagarw@redhat.com which is to open a PR against our EO repo to update the current alert guide with his content (it can be based on your edits). I think it would be useful to have it both in an upstream and downstream location.

Eric Wolinetz (Inactive) added a comment - 2021/03/10 10:36 AM rdlugyhe that sounds good to me, i think we should also have a line item that is specific for sasagarw@redhat.com which is to open a PR against our EO repo to update the current alert guide with his content (it can be based on your edits). I think it would be useful to have it both in an upstream and downstream location.

Rolfe Dlugy-Hegwer added a comment - 2021/03/10 9:54 AM - edited

sasagarw@redhat.com ewolinet@redhat.com: How do these steps forward look to you?

Sashank composes a first draft of the diagnostic and troubleshooting steps in the Google doc.
Rolfe edits the content.
QE verifies the content.
Rolfe converts and publishes the content as topics in the OpenShift docs.

Does this meet the needs of the design you are working on?

Rolfe Dlugy-Hegwer added a comment - 2021/03/10 9:54 AM - edited sasagarw@redhat.com ewolinet@redhat.com : How do these steps forward look to you? Sashank composes a first draft of the diagnostic and troubleshooting steps in the Google doc. Rolfe edits the content. QE verifies the content. Rolfe converts and publishes the content as topics in the OpenShift docs. Does this meet the needs of the design you are working on?

Sashank Agarwal (Inactive) added a comment - 2021/03/08 10:12 PM

rdlugyhe how should we go about this? Can you take a look at it and let me know? Thanks.

Sashank Agarwal (Inactive) added a comment - 2021/03/08 10:12 PM rdlugyhe how should we go about this? Can you take a look at it and let me know? Thanks.

Sashank Agarwal (Inactive) added a comment - 2021/03/07 10:18 PM

Link to the document:

https://docs.google.com/document/d/1EJjqixIxEPLf5pWQcqWqyYfRgvsDktSzBDCc6JcxrYU/edit?usp=sharing

Sashank Agarwal (Inactive) added a comment - 2021/03/07 10:18 PM Link to the document: https://docs.google.com/document/d/1EJjqixIxEPLf5pWQcqWqyYfRgvsDktSzBDCc6JcxrYU/edit?usp=sharing

Assignee:: Sashank Agarwal (Inactive)

Reporter:: Eric Wolinetz (Inactive)

QA Contact:: Qiaoling Tang

Votes:: 0 Vote for this issue

Watchers:: 5 Start watching this issue

Created:: 2021/02/10 4:23 PM

Updated:: 2022/03/16 3:28 PM

Resolved:: 2021/04/14 2:34 AM

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

Collapse comment: Errata Tool added a comment - 2021/07/21 2:59 PM

Expand comment: Errata Tool added a comment - 2021/07/21 2:59 PM

Collapse comment: Qiaoling Tang added a comment - 2021/05/06 2:48 AM

Expand comment: Qiaoling Tang added a comment - 2021/05/06 2:48 AM

Collapse comment: Qiaoling Tang added a comment - 2021/04/19 7:46 AM

Expand comment: Qiaoling Tang added a comment - 2021/04/19 7:46 AM

Collapse comment: Eric Wolinetz (Inactive) added a comment - 2021/03/10 10:36 AM

Expand comment: Eric Wolinetz (Inactive) added a comment - 2021/03/10 10:36 AM

Collapse comment: Rolfe Dlugy-Hegwer added a comment - 2021/03/10 9:54 AM, Edited by Rolfe Dlugy-Hegwer - 2021/03/10 9:55 AM

Expand comment: Rolfe Dlugy-Hegwer added a comment - 2021/03/10 9:54 AM, Edited by Rolfe Dlugy-Hegwer - 2021/03/10 9:55 AM

Collapse comment: Sashank Agarwal (Inactive) added a comment - 2021/03/08 10:12 PM

Expand comment: Sashank Agarwal (Inactive) added a comment - 2021/03/08 10:12 PM

Collapse comment: Sashank Agarwal (Inactive) added a comment - 2021/03/07 10:18 PM

Expand comment: Sashank Agarwal (Inactive) added a comment - 2021/03/07 10:18 PM

People

Dates