-
Bug
-
Resolution: Done-Errata
-
Critical
-
4.12.z
-
Important
-
Yes
-
SDN Sprint 243, SDN Sprint 244, SDN Sprint 245, SDN Sprint 246
-
4
-
Rejected
-
True
-
-
N/A
-
Release Note Not Required
I tried upgrading a 4.14 SNO cluster from one nightly image to another and, while on AWS the upgrade works fine, it fails on GCP.
Cluster Network Operator successfully upgrades ovn-kubernetes, but is stuck on cloud network config controller, which is on crash loop back off state because it receives a wrong IP address from the name server when trying to reach the API server. The node IP is actually 10.0.0.3 and the name server returns 10.0.0.2, which I suspect is the bootstrap node IP, but that's only my guess.
Some relevant logs:
$ oc get co network network 4.14.0-0.nightly-2023-08-15-200133 True True False 86m Deployment "/openshift-cloud-network-config-controller/cloud-network-config-controller" is not available (awaiting 1 nodes) $ oc get pods -n openshift-ovn-kubernetes -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES ovnkube-control-plane-844c8f76fb-q4tvp 2/2 Running 3 24m 10.0.0.3 ci-ln-rij2p1b-72292-xmzf4-master-0 <none> <none> ovnkube-node-24kb7 10/10 Running 12 (13m ago) 25m 10.0.0.3 ci-ln-rij2p1b-72292-xmzf4-master-0 <none> <none> $ oc get pods -n openshift-cloud-network-config-controller -o wide openshift-cloud-network-config-controller cloud-network-config-controller-d65ccbc5b-dnt69 0/1 CrashLoopBackOff 15 (2m37s ago) 40m 10.128.0.141 ci-ln-rij2p1b-72292-xmzf4-master-0 <none> <none> $ oc logs -n openshift-cloud-network-config-controller cloud-network-config-controller-d65ccbc5b-dnt69 W0816 11:06:00.666825 1 client_config.go:618] Neither --kubeconfig nor --master was specified. Using the inClusterConfig. This might not work. F0816 11:06:30.673952 1 main.go:345] Error building controller runtime client: Get "https://api-int.ci-ln-rij2p1b-72292.gcp-2.ci.openshift.org:6443/api?timeout=32s": dial tcp 10.0.0.2:6443: i/o timeout
I also get 10.0.0.2 if I run a DNS query from the node itself or from a pod:
dig api-int.ci-ln-zp7dbyt-72292.gcp-2.ci.openshift.org ... ;; ANSWER SECTION: api-int.ci-ln-zp7dbyt-72292.gcp-2.ci.openshift.org. 60 IN A 10.0.0.2
Version-Release number of selected component (if applicable):
4.14
How reproducible:
Always.
Steps to Reproduce:
1.on clusterbot: launch 4.14 gcp,single-node 2. on a terminal: oc adm upgrade --to-image=registry.ci.openshift.org/ocp/release:4.14.0-0.nightly-2023-08-15-200133 --allow-explicit-upgrade --force
Actual results:
name server returns 10.0.0.2, so CNCC fails to reach the API server
Expected results:
name server should return 10.0.0.3
Must-gather: https://drive.google.com/file/d/1MDbsMgIQz7dE6e76z4ad95dwaxbSNrJM/view?usp=sharing
I'm assigning this bug first to the network edge team for a first pass. Please do reassign it if necessary.
- clones
-
OCPBUGS-17841 GCP SNO installation fails because redirect ipt doesn't take effect on SGW
- Closed
- is blocked by
-
OCPBUGS-17841 GCP SNO installation fails because redirect ipt doesn't take effect on SGW
- Closed
- links to
-
RHBA-2023:7682 OpenShift Container Platform 4.14.z bug fix update