Loading...

XML

Word

Printable

Type: Bug
Resolution: Duplicate
Priority: Major
Fix Version/s: None
Affects Version/s: 4.16
Component/s: kube-apiserver
Labels:
- TestBlocker
- chaos

Severity:
Critical
Regression:
Yes
Release Blocker:
Approved
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Target Version:

4.16

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

Description of problem:

There are various use cases where customers prefer to turn off and on their clusters depending on the load or in cases where they have multiple clusters to be cost effective.

During the chaos testing: https://github.com/redhat-chaos/krkn-hub/blob/main/docs/power-outages.md, cluster API is not accessible after nodes are stopped and started irrespective of the cluster installation or shutdown time frame. This is a regression and we are able to reproduce it multiple times on 4.16. This problem doesn't exist in previous releases - tested it on 4.14 and 4.15.

The issue might be because of the nodes not getting registered properly after the restart. Logs are not accessible because of the API being down. We will try to create a node with public ip as part of the cluster using custom machineset to be able to ssh and look at the logs.

Version-Release number of selected component (if applicable):

NAME      VERSION       AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.16.0-ec.6   True        False         7h27m   Cluster version is 4.16.0-ec.6

How reproducible:

Always

Steps to Reproduce:

    1. Install a 4.16 cluster using one of the nightly builds or dev-preview releases: https://openshift-release.apps.ci.l2s4.p1.openshiftapps.com/ on AWS cloud provider ( IPI )
 
    2. Run the outage chaos test using the following commands after setting up the AWS profile for aws-cli to access AWS APIs or you can login into the console and manually turn off the nodes.
       $ export SHUTDOWN_DURATION=60 
       $ export CHECK_CRITICAL_ALERTS=True
       $ podman run --name=outage --net=host --env-host=true -v /root/.kube/config:/root/.kube/config:Z -d quay.io/krkn-chaos/krkn-hub:power-outages
       $ podman logs -f outage

    3. Try to access the cluster - $ oc get nodes or any other command after the nodes are back online at the end of the scenario

Actual results:

  Cluster is not accessible - The connection to the server api.ravicluster.aws.rhperfscale.org:6443 was refused - did you specify the right host or port?

Expected results:

 Cluster APIs are accessible and healthy.

Additional info:

duplicates

OCPBUGS-33614 When Performed Node stop and start operation on ibmcloud and AWS deployment cluster becomes inaccessible

Closed

Assignee:: Vadim Rutkovsky

Reporter:: Naga Ravi Chaitanya Elluri

QA Contact:: Ke Wang

Votes:: 0 Vote for this issue

Watchers:: 11 Start watching this issue

Created:: 2024/05/11 11:42 PM

Updated:: 2024/06/04 6:35 AM

Resolved:: 2024/06/03 2:13 PM

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates