Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-20554

GCP SNO installation fails because redirect ipt doesn't take effect on SGW


    • Important
    • Yes
    • SDN Sprint 243, SDN Sprint 244, SDN Sprint 245, SDN Sprint 246
    • 4
    • Rejected
    • True
    • Hide

      All GCP SNO installation would fail, due to the issue. 

      All GCP SNO installation would fail, due to the issue. 
    • N/A
    • Release Note Not Required

      I tried upgrading a 4.14 SNO cluster from one nightly image to another and, while on AWS the upgrade works fine, it fails on GCP.

      Cluster Network Operator successfully upgrades ovn-kubernetes, but is stuck on cloud network config controller, which is on crash loop back off state because it receives a wrong IP address from the name server when trying to reach the API server. The node IP is actually and the name server returns, which I suspect is the bootstrap node IP, but that's only my guess.

      Some relevant logs:


      $ oc get co network
      network                                    4.14.0-0.nightly-2023-08-15-200133   True        True          False      86m     Deployment "/openshift-cloud-network-config-controller/cloud-network-config-controller" is not available (awaiting 1 nodes)
      $ oc get pods -n openshift-ovn-kubernetes -o wide
      NAME                                     READY   STATUS    RESTARTS       AGE   IP         NODE                                 NOMINATED NODE   READINESS GATES ovnkube-control-plane-844c8f76fb-q4tvp   2/2     Running   3              24m   ci-ln-rij2p1b-72292-xmzf4-master-0   <none>           <none> ovnkube-node-24kb7                       10/10   Running   12 (13m ago)   25m   ci-ln-rij2p1b-72292-xmzf4-master-0   <none>           <none>
      $ oc get pods -n openshift-cloud-network-config-controller -o wide
      openshift-cloud-network-config-controller          cloud-network-config-controller-d65ccbc5b-dnt69               0/1     CrashLoopBackOff   15 (2m37s ago)   40m   ci-ln-rij2p1b-72292-xmzf4-master-0   <none>           <none>
      $ oc logs -n openshift-cloud-network-config-controller          cloud-network-config-controller-d65ccbc5b-dnt69  W0816 11:06:00.666825       1 client_config.go:618] Neither --kubeconfig nor --master was specified.  Using the inClusterConfig.  This might not work. F0816 11:06:30.673952       1 main.go:345] Error building controller runtime client: Get "https://api-int.ci-ln-rij2p1b-72292.gcp-2.ci.openshift.org:6443/api?timeout=32s": dial tcp i/o timeout


      I also get if I run a DNS query from the node itself or from a pod:

      dig api-int.ci-ln-zp7dbyt-72292.gcp-2.ci.openshift.org
      api-int.ci-ln-zp7dbyt-72292.gcp-2.ci.openshift.org. 60 IN A


      Version-Release number of selected component (if applicable):


      How reproducible:


      Steps to Reproduce:

      1.on clusterbot: launch 4.14 gcp,single-node
      2. on a terminal: oc adm upgrade --to-image=registry.ci.openshift.org/ocp/release:4.14.0-0.nightly-2023-08-15-200133 --allow-explicit-upgrade --force

      Actual results:

      name server returns, so CNCC fails to reach the API server

      Expected results:

      name server should return


      Must-gather: https://drive.google.com/file/d/1MDbsMgIQz7dE6e76z4ad95dwaxbSNrJM/view?usp=sharing

      I'm assigning this bug first to the network edge team for a first pass. Please do reassign it if necessary.


            sseethar Surya Seetharaman
            rravaiol@redhat.com Riccardo Ravaioli
            Huiran Wang Huiran Wang
            0 Vote for this issue
            9 Start watching this issue