Loading...

XML

Word

Printable

Type: Bug
Resolution: Cannot Reproduce
Priority: Major
Fix Version/s: None
Affects Version/s: 4.13
Component/s: Networking / runtime-cfg
Labels:
None

Activity Type:
Quality / Stability / Reliability
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Story Points:
None
Severity:
None
Regression:
None

Target Backport Versions:
None
Target Version:
None
Release Blocker:
Rejected
Sprint:
None

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

Release Note Status:
None
Release Note Type:
None
Release Note Text:
None

Escape Reason:
None
Escape Impact:
None
Corrective Measures:
None
SDLC stage when should've been found:
None

Description of problem:

Per the discussion on Slack, the 4.12 to 4.13 upgrade job has been failing frequently. It appears this is due to a node going NotReady at an unexpected time, and in the logs we see failures when calling api-int. This may be caused by a problem with the internal loadbalancer.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Node goes NotReady at an unexpected time

Expected results:

Node only goes NotReady when it reboots for upgrade

Additional info:

This is difficult to debug because the on-prem service logs are not persisted over reboot, and the bogus NotReady state is happening before the node reboots. The first step of fixing this will be to fix the logging so we have usable data from before reboot.

depends on

OPNET-194 Persist service logs over reboot

To Do

links to

Link to job results

Assignee:: Benjamin Nemec

Reporter:: Benjamin Nemec

Need Info From:: None

Contributors:: None

QA Contact:: Zhanqi Zhao

Doc Contact:: None

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Created:: 2023/01/13 9:24 PM

Updated:: 2025/07/28 11:34 AM

Resolved:: 2023/05/19 4:13 PM

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates