Loading...

XML

Word

Printable

Type: Bug
Resolution: Cannot Reproduce
Priority: Normal
Fix Version/s: None
Affects Version/s: 4.12
Component/s: Networking / ovn-kubernetes
Labels:

Regression:
None
Release Blocker:
Rejected
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Internal Whiteboard:
Latest Status Summary:

Hide
11/30: changed Telco rank/bucket to 2 since it's not clear how soon or how frequently Telco customers will need to do this
11/29: added to the Telco-Grade OCP 4.12 gating list, appears to be a regression with the master node replacement procedure

Show
11/30: changed Telco rank/bucket to 2 since it's not clear how soon or how frequently Telco customers will need to do this 11/29: added to the Telco-Grade OCP 4.12 gating list, appears to be a regression with the master node replacement procedure

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

Description of problem:

A replaced bare metal master node cannot reach the api server: Get "https://[fd02::1]:443/api/v1/namespaces/ dial tcp [fd02::1]:443: i/o timeout

Version-Release number of selected component (if applicable):

4.12.0-0.nightly-2022-11-05-000615

How reproducible:

Reproduced once

Steps to Reproduce:

1. Deploy 3 x master nodes bare metal cluster

2. Replace one of the master node by following the steps in https://access.redhat.com/documentation/en-us/openshift_container_platform/4.11/html-single/backup_and_restore/index#restore-replace-stopped-baremetal-etcd-member_replacing-unhealthy-etcd-member

3. After baremetal host is provisioned and the node gets registered check etcd pods running on the new node:

oc -n openshift-etcd logs installer-14-retry-5-openshift-master-0.kni-qe-0.lab.eng.rdu2.redhat.com | tail -5

Actual results:

Calls to https://[fd02::1]:443/ are timing out

oc -n openshift-etcd logs installer-14-retry-5-openshift-master-0.kni-qe-0.lab.eng.rdu2.redhat.com | tail -5
I1108 11:02:41.018359       1 cmd.go:423] Waiting for installer revisions to settle for node openshift-master-0.kni-qe-0.lab.eng.rdu2.redhat.com
W1108 11:02:55.019114       1 cmd.go:467] Error getting installer pods on current node openshift-master-0.kni-qe-0.lab.eng.rdu2.redhat.com: Get "https://[fd02::1]:443/api/v1/namespaces/openshift-etcd/pods?labelSelector=app%3Dinstaller": net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
W1108 11:03:19.023778       1 cmd.go:467] Error getting installer pods on current node openshift-master-0.kni-qe-0.lab.eng.rdu2.redhat.com: Get "https://[fd02::1]:443/api/v1/namespaces/openshift-etcd/pods?labelSelector=app%3Dinstaller": net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
W1108 11:03:39.023304       1 cmd.go:467] Error getting installer pods on current node openshift-master-0.kni-qe-0.lab.eng.rdu2.redhat.com: Get "https://[fd02::1]:443/api/v1/namespaces/openshift-etcd/pods?labelSelector=app%3Dinstaller": net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
F1108 11:03:44.689025       1 cmd.go:106] timed out waiting for the condition

Expected results:

Pods can successfully reach the api server.

Additional info:

Running must-gather is timing out due to `dial tcp [fd02::1]:443: i/o timeout` so attaching pods captured from nodes /var/log/pods.

Assignee:: Nadia Pinaeva

Reporter:: Marius Cornea

QA Contact:: Anurag Saxena

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Created:: 2022/11/08 11:21 AM

Updated:: 2023/01/31 1:11 PM

Resolved:: 2022/12/01 4:54 PM

Details

Description

Attachments

Easy Agile Planning Poker

Activity

People

Dates