Loading...

XML

Word

Printable

Type: Task
Resolution: Done
Priority: Critical
Fix Version/s: None
Affects Version/s: None
Component/s: None
Labels:
- kafka-integrations-apac-refinement-done
- kafka-integrations-europe-refinement-done

Epic Link:
MGDSTRM-10487
Blocked:
False
Blocked Reason:
None
Ready:
False
Discussed with Team:
No
Git Pull Request:
https://github.com/bf2fc6cc711aee1a0c2a/observability-resources-mk/pull/268, https://github.com/bf2fc6cc711aee1a0c2a/kas-sre-sops/pull/431
[QE] How to address?:
---
[QE] Why QE missed?:
---
Intelligence Requested:
Market:

Sprint:
MK - Sprint 233

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

WHAT

If openshift-router decides that it is unable to reconfig haproxy dynamically, it resorts to restarting the haproxy process. This causes all existing customer connections to be dropped. We have configured openshift-routers in such a way that haproxy being restarted should be rare. Let's have an alert to tell us if this is ever not the case.

WHY

haproxy config reloads disconnect established kafka connections, which is disruptive to customer applications

HOW

It appears that template_router_reload_seconds_count might be what we are looking for.
This is the count of the number of observations, so appears to corresponding to the number of reloads.

increase(template_router_reload_seconds_count{job=~".*kas.*"}[5m]) > 0}

We'd need to make sure we understand the behaviour of the counter:

how does it behave on first startup
how does it increment after that

turning off the dynamicConfigManager will help us learn that.

We should then be able add a new alert.

We also need a SOP. If the query fires, restarting ingress pods might help. If that fails SRE should dial Engineering.

DONE

Alert/unit test
SOP written.

relates to

MGDSTRM-9181 Ingress disconnects established connections whenever kafka instances are provisioned/deprovisioned

Closed

mentioned on

Merge request - [production] RHOSAK - bump observability tag to v1.50.0

Merge request - [rhosak] [stage] bump observability to v1.50.0

Assignee:: Keith Wall

Reporter:: Keith Wall

Team:: Kafka Integrations

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Created:: 2023/01/19 4:15 PM

Updated:: 2023/03/15 4:51 PM

Resolved:: 2023/03/15 4:51 PM

Details

Description

WHAT

WHY

HOW

DONE

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates