Loading...

XML

Word

Printable

Type: Bug
Resolution: Done-Errata
Priority: Critical
Fix Version/s: 4.14.z
Affects Version/s: 4.12.z
Component/s: Networking / ovn-kubernetes
Labels:
- TestBlocker
- release-blocker

Severity:
Important
Regression:
Yes
Sprint:
SDN Sprint 243, SDN Sprint 244, SDN Sprint 245, SDN Sprint 246
sprint_count:
4
Release Blocker:
Rejected
Blocked:
True
Blocked Reason:

Hide

All GCP SNO installation would fail, due to the issue.

Show
All GCP SNO installation would fail, due to the issue.
Release Note Text:
N/A
Release Note Type:
Release Note Not Required
Target Version:

4.14.z

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

I tried upgrading a 4.14 SNO cluster from one nightly image to another and, while on AWS the upgrade works fine, it fails on GCP.

Cluster Network Operator successfully upgrades ovn-kubernetes, but is stuck on cloud network config controller, which is on crash loop back off state because it receives a wrong IP address from the name server when trying to reach the API server. The node IP is actually 10.0.0.3 and the name server returns 10.0.0.2, which I suspect is the bootstrap node IP, but that's only my guess.

Some relevant logs:

$ oc get co network
network                                    4.14.0-0.nightly-2023-08-15-200133   True        True          False      86m     Deployment "/openshift-cloud-network-config-controller/cloud-network-config-controller" is not available (awaiting 1 nodes)

$ oc get pods -n openshift-ovn-kubernetes -o wide
NAME                                     READY   STATUS    RESTARTS       AGE   IP         NODE                                 NOMINATED NODE   READINESS GATES ovnkube-control-plane-844c8f76fb-q4tvp   2/2     Running   3              24m   10.0.0.3   ci-ln-rij2p1b-72292-xmzf4-master-0   <none>           <none> ovnkube-node-24kb7                       10/10   Running   12 (13m ago)   25m   10.0.0.3   ci-ln-rij2p1b-72292-xmzf4-master-0   <none>           <none>

$ oc get pods -n openshift-cloud-network-config-controller -o wide
openshift-cloud-network-config-controller          cloud-network-config-controller-d65ccbc5b-dnt69               0/1     CrashLoopBackOff   15 (2m37s ago)   40m    10.128.0.141   ci-ln-rij2p1b-72292-xmzf4-master-0   <none>           <none>

$ oc logs -n openshift-cloud-network-config-controller          cloud-network-config-controller-d65ccbc5b-dnt69  W0816 11:06:00.666825       1 client_config.go:618] Neither --kubeconfig nor --master was specified.  Using the inClusterConfig.  This might not work. F0816 11:06:30.673952       1 main.go:345] Error building controller runtime client: Get "https://api-int.ci-ln-rij2p1b-72292.gcp-2.ci.openshift.org:6443/api?timeout=32s": dial tcp 10.0.0.2:6443: i/o timeout

I also get 10.0.0.2 if I run a DNS query from the node itself or from a pod:

dig api-int.ci-ln-zp7dbyt-72292.gcp-2.ci.openshift.org
...
;; ANSWER SECTION:
api-int.ci-ln-zp7dbyt-72292.gcp-2.ci.openshift.org. 60 IN A 10.0.0.2

Version-Release number of selected component (if applicable):

4.14

How reproducible:

Always.

Steps to Reproduce:

1.on clusterbot: launch 4.14 gcp,single-node
2. on a terminal: oc adm upgrade --to-image=registry.ci.openshift.org/ocp/release:4.14.0-0.nightly-2023-08-15-200133 --allow-explicit-upgrade --force

Actual results:

name server returns 10.0.0.2, so CNCC fails to reach the API server

Expected results:

name server should return 10.0.0.3

Must-gather: https://drive.google.com/file/d/1MDbsMgIQz7dE6e76z4ad95dwaxbSNrJM/view?usp=sharing

I'm assigning this bug first to the network edge team for a first pass. Please do reassign it if necessary.

clones

OCPBUGS-17841 GCP SNO installation fails because redirect ipt doesn't take effect on SGW

Closed

is blocked by

OCPBUGS-17841 GCP SNO installation fails because redirect ipt doesn't take effect on SGW

Closed

links to

openshift/machine-config-operator#3973: [release-4.14] OCPBUGS-20554: Ensure gcp-routes hack for internalLB hairpin traffic works for SGW

RHBA-2023:7682 OpenShift Container Platform 4.14.z bug fix update

Assignee:: Surya Seetharaman

Reporter:: Riccardo Ravaioli

QA Contact:: Huiran Wang

Votes:: 0 Vote for this issue

Watchers:: 9 Start watching this issue

Created:: 2023/10/13 9:54 AM

Updated:: 2024/04/29 4:59 PM

Resolved:: 2023/12/12 9:48 AM

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates