Loading...

XML

Word

Printable

Type: Story
Resolution: Done
Priority: Major
Fix Version/s: None
Affects Version/s: None
Labels:
- disruption

Blocked:
False
Blocked Reason:
None
Ready:
False
Intelligence Requested:
Market:

SFDC Cases Links:
SFDC Cases Counter:
SFDC Cases Open:

In https://issues.redhat.com/browse/OCPBUGS-13543 we think we may have found a pattern to explain why we have long standing SLB workload disruption of around 8s a run. This was discovered by accident with promtail pods in clusters had a problem with shutdown that slowed down the node rebooting and going NotReady.

To test this theory we want to:

deploy a DaemonSet in origin prior to running the upgrade suite, which runs a process which delays shutdown by ~ 10s to all workers and masters.
get this merged assuming it proves safe.
open multiple prs which increase this 10s to 1m, 2m, 3m, 5m, and run /payload against each. technically periodic-ci-openshift-release-master-ci-4.14-upgrade-from-stable-4.13-e2e-aws-ovn-upgrade would probably be enough, but we might want to examine other clouds though AWS is presently the most visible.
look to see if any of the delays affects disruption to the SLB backend

If we find that at some thresholds we consistently see better disruption results, we have a case that we could improve node shutdown.

- - Sort By Name
  - Sort By Date
  - Ascending
  - Descending
  - Thumbnails
  - List
  - Download All

wait-script.yaml
1 kB
2023/05/15 7:24 PM

links to

openshift/origin#27940: trt-1030: deploy daemonset with shutdown delay

openshift/origin#28004: trt-1030: 80s for service lb under test

openshift/origin#28019: Trt 1030 service lb lifecycle delay

Assignee:: Forrest Babcock

Reporter:: Devan Goodwin

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Created:: 2023/05/11 6:42 PM

Updated:: 2023/07/10 1:06 PM

Resolved:: 2023/07/10 1:06 PM

Details

Description

Attachments

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates