Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-11949

IPI installation in GCP fails when installing in shared VPC with IPsec enabled

    XMLWordPrintable

Details

    • Important
    • No
    • Rejected
    • False
    • Hide

      None

      Show
      None

    Description

      Description of problem:

       With openshift 4.12.7 installer for IPI installation in GCP using ovn
      kubernetes SDN and techpreview feature: install in shared VPC fails.There been testing several installations.
      -> Installations work as expeced with:
        - no shared VPC and no IPsec
        - no shared VPC with IPsec enabled
        - with shared VPC and no IPsec
        *- fails when shared VPC is used and IPsec enabled* 
      
      Basically IPI install and enabling IPSec as day2 task fails as well.
      
      With inital findings we see  we see there is an issue with the cluster
      network: connections between control-plane nodes don't work. But I imagine is
      is not limited to the control- plane nodes.
      
      - ClusterNetwork MTU is set properly to 1354(1500-100(OVN)-46(ESP header)) to  have IPSec enabled on the cluster.- Required ports are open to have IPSec configured on the cluster.
      
      ~~~
      
      $ cat ***sos_commands/networking/netstat_-W_-neopa |grep -e 500 -e 4500|grep 'udp '
      udp        0      0 127.0.0.1:4500          0.0.0.0:*                           0          66850      7038/pluto           off (0.00/0/0)
      udp        0      0 100.81.0.2:4500         0.0.0.0:*                           0          66848      7038/pluto           off (0.00/0/0)
      udp        0      0 10.66.76.97:4500        0.0.0.0:*                           0          66846      7038/pluto           off (0.00/0/0)
      udp        0      0 169.254.169.2:4500      0.0.0.0:*                           0          66844      7038/pluto           off (0.00/0/0)
      udp        0      0 127.0.0.1:500           0.0.0.0:*                           0          66849      7038/pluto           off (0.00/0/0)
      udp        0      0 100.81.0.2:500          0.0.0.0:*                           0          66847      7038/pluto           off (0.00/0/0)
      udp        0      0 10.66.76.97:500         0.0.0.0:*                           0          66845      7038/pluto           off (0.00/0/0)
      udp        0      0 169.254.169.2:500       0.0.0.0:*                           0          66843      7038/pluto           off (0.00/0/0)
      ~~~
      
      - Network custom resource has been configured correctly to enable IPSec.
      
      ~~~
      spec:
        defaultNetwork:
          ovnKubernetesConfig:
            ipsecConfig: {}
      ~~~
      
      From installation log I do not see any issue related to connectivity so far.
      The main reason behind failure seems to be not availability of any 
      router-default PODs and as a result routes can not be accessed and hence 
      authentication and ingress operators are in Degraded state.
      
      ~~~
      time="2023-03-30T15:45:12Z" level=error msg="Cluster operator ingress Degraded is True with IngressDegraded: The \"default\" ingress controller reports Degraded=True: DegradedConditions: One or more other status conditions indicate a degraded state: PodsScheduled=False (PodsNotScheduled: Some pods are not scheduled: Pod \"router-default-65c57f7ff9-9jpn5\" cannot be scheduled: 0/3 nodes are available: 3 node(s) didn't match Pod's node affinity/selector, 3 node(s) had untolerated taint {node-role.kubernetes.io/master: }
      ~~~
      
      All other operators are in Degraded state due to unavailability of optimal amount of replicas:
      
      ~~~
      time="2023-03-30T15:45:12Z" level=error msg="Cluster operator monitoring Degraded is True with UpdatingPrometheusOperatorFailed: reconciling Prometheus Operator Admission Webhook Deployment failed: updating Deployment object failed: waiting for DeploymentRollout of openshift-monitoring/prometheus-operator-admission-webhook: got 2 unavailable replicas"time="2023-03-30T15:45:12Z" level=error msg="Cluster operator image-registry Degraded is True with Unavailable: Degraded: The deployment does not have available replicas"
      ~~~
      
      in 4.13, workaround is to create manifests then delete  <dir>/openshift/99_feature-gate.yaml then
      create cluster - activate IPsec we add cluster-network-03-config.yml to manifests before create 
      cluster is issued. 

      Version-Release number of selected component (if applicable):

      4.12.7

      How reproducible:

      GCP IPI install with sharedVPC and IPSec 

      Steps to Reproduce:

      1. Perform IPI install
      2.
      3.
      

      Actual results:

      Fails at early stages.

      Expected results:

      Shouldn't be failing.

      Additional info:

      It fails at install time and as day 2 task.

      Attachments

        Issue Links

          Activity

            People

              rh-ee-bbarbach Brent Barbachem
              rhn-support-pkhedeka Parikshit Khedekar
              Jianli Wei Jianli Wei
              Votes:
              1 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: