Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-50849

IPSec cluster failed to install in GCP and IPSec pods were in pending status on worker nodes

XMLWordPrintable

    • Critical
    • Yes
    • 5
    • MCO Sprint 267
    • 1
    • Rejected
    • False
    • Hide

      None

      Show
      None
    • Release Note Not Required
    • In Progress

      Description of problem:

      Version-Release number of selected component (if applicable):

      How reproducible:

      Steps to Reproduce:

      Issue found in prow ci
      periodic-ci-openshift-openshift-tests-private-release-4.19-multi-nightly-gcp-ipi-ovn-ipsec-arm-mixarch-f14 #1890061783440297984
      periodic-ci-openshift-openshift-tests-private-release-4.19-multi-nightly-gcp-ipi-ovn-ipsec-amd-mixarch-f28-destructive #1890035862469611520
      periodic-ci-openshift-openshift-tests-private-release-4.19-multi-nightly-gcp-ipi-ovn-ipsec-arm-mixarch-f14 #1890279505117843456

      must-gather logs for second one https://gcsweb-qe-private-deck-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/qe-private-deck/logs/periodic-ci-o[…]r-must-gather/artifacts/must-gather.tar

      % omg get nodes
      NAME                                       STATUS  ROLES                 AGE    VERSION
      ci-op-9pmd0iim-3eaf1-dcw66-master-0        Ready   control-plane,master  1h12m  v1.32.1
      ci-op-9pmd0iim-3eaf1-dcw66-master-1        Ready   control-plane,master  1h13m  v1.32.1
      ci-op-9pmd0iim-3eaf1-dcw66-master-2        Ready   control-plane,master  1h11m  v1.32.1
      ci-op-9pmd0iim-3eaf1-dcw66-worker-a-d6sw7  Ready   worker                1h0m   v1.32.1
      ci-op-9pmd0iim-3eaf1-dcw66-worker-b-97qfp  Ready   worker                58m    v1.32.1
      % omg get pods -n openshift-ovn-kubernetes -o wide
      NAME                                    READY  STATUS   RESTARTS  AGE   IP          NODE
      ovn-ipsec-host-2qfqh                    2/2    Running  0         33m   10.0.0.4    ci-op-9pmd0iim-3eaf1-dcw66-master-2
      ovn-ipsec-host-bqh5n                    0/2    Pending  0         33m   10.0.128.3  ci-op-9pmd0iim-3eaf1-dcw66-worker-b-97qfp
      ovn-ipsec-host-hdjtx                    2/2    Running  0         33m   10.0.0.3    ci-op-9pmd0iim-3eaf1-dcw66-master-1
      ovn-ipsec-host-jwn8s                    2/2    Running  0         33m   10.0.0.6    ci-op-9pmd0iim-3eaf1-dcw66-master-0
      ovn-ipsec-host-n4cpv                    0/2    Pending  0         33m   10.0.128.2  ci-op-9pmd0iim-3eaf1-dcw66-worker-a-d6sw7
      ovnkube-control-plane-85cbb47f9d-n6rps  2/2    Running  1         55m   10.0.0.6    ci-op-9pmd0iim-3eaf1-dcw66-master-0
      ovnkube-control-plane-85cbb47f9d-slb94  2/2    Running  0         47m   10.0.0.3    ci-op-9pmd0iim-3eaf1-dcw66-master-1
      ovnkube-node-2hwb6                      8/8    Running  0         1h0m  10.0.128.2  ci-op-9pmd0iim-3eaf1-dcw66-worker-a-d6sw7
      ovnkube-node-9nhj6                      8/8    Running  1         53m   10.0.0.4    ci-op-9pmd0iim-3eaf1-dcw66-master-2
      ovnkube-node-h2fd2                      8/8    Running  2         53m   10.0.0.3    ci-op-9pmd0iim-3eaf1-dcw66-master-1
      ovnkube-node-hwng4                      8/8    Running  0         56m   10.0.0.6    ci-op-9pmd0iim-3eaf1-dcw66-master-0
      ovnkube-node-k6rfl                      8/8    Running  0         58m   10.0.128.3  ci-op-9pmd0iim-3eaf1-dcw66-worker-b-97qfp
      
           % omg get pod ovn-ipsec-host-n4cpv    -n openshift-ovn-kubernetes -o yaml
          apiVersion: v1
          kind: Pod
          metadata:
            annotations:
              cluster-autoscaler.kubernetes.io/enable-ds-eviction: 'false'
            creationTimestamp: '2025-02-13T14:54:05Z'
            generateName: ovn-ipsec-host-
            labels:
              app: ovn-ipsec
              component: network
              controller-revision-hash: 8b4dd5dc7
              kubernetes.io/os: linux
              openshift.io/component: network
              pod-template-generation: '1'
              type: infra
            managedFields:
            - apiVersion: v1
              fieldsType: FieldsV1
              fieldsV1:
                f:metadata:
                  f:annotations:
                    .: {}
                    f:cluster-autoscaler.kubernetes.io/enable-ds-eviction: {}
                    f:target.workload.openshift.io/management: {}
                  f:generateName: {}
                  f:labels:
                    .: {}
                    f:app: {}
                    f:component: {}
                    f:controller-revision-hash: {}
                    f:kubernetes.io/os: {}
                    f:openshift.io/component: {}
                    f:pod-template-generation: {}
                    f:type: {}
                  f:ownerReferences:
                    .: {}
                    k:{"uid":"61870386-d205-465b-832c-061c3bf7366e"}: {}
                f:spec:
                  f:affinity:
                    .: {}
                    f:nodeAffinity:
                      .: {}
                      f:requiredDuringSchedulingIgnoredDuringExecution: {}
                  f:containers:
                    k:{"name":"ovn-ipsec"}:
                      .: {}
                      f:command: {}
                      f:env:
                        .: {}
                        k:{"name":"K8S_NODE"}:
                          .: {}
                          f:name: {}
                          f:valueFrom:
                            .: {}
                            f:fieldRef: {}
                      f:image: {}
                      f:imagePullPolicy: {}
                      f:lifecycle:
                        .: {}
                        f:preStop:
                          .: {}
                          f:exec:
                            .: {}
                            f:command: {}
                      f:livenessProbe:
                        .: {}
                        f:exec:
                          .: {}
                          f:command: {}
                        f:failureThreshold: {}
                        f:initialDelaySeconds: {}
                        f:periodSeconds: {}
                        f:successThreshold: {}
                        f:timeoutSeconds: {}
                      f:name: {}
                      f:resources:
                        .: {}
                        f:requests:
                          .: {}
                          f:cpu: {}
                          f:memory: {}
                      f:securityContext:
                        .: {}
                        f:privileged: {}
                      f:terminationMessagePath: {}
                      f:terminationMessagePolicy: {}
                      f:volumeMounts:
                        .: {}
                        k:{"mountPath":"/etc"}:
                          .: {}
                          f:mountPath: {}
                          f:name: {}
                        k:{"mountPath":"/etc/cni/net.d"}:
                          .: {}
                          f:mountPath: {}
                          f:name: {}
                        k:{"mountPath":"/etc/openvswitch"}:
                          .: {}
                          f:mountPath: {}
                          f:name: {}
                        k:{"mountPath":"/usr/libexec/ipsec"}:
                          .: {}
                          f:mountPath: {}
                          f:name: {}
                        k:{"mountPath":"/usr/sbin/ipsec"}:
                          .: {}
                          f:mountPath: {}
                          f:name: {}
                        k:{"mountPath":"/var/lib"}:
                          .: {}
                          f:mountPath: {}
                          f:name: {}
                        k:{"mountPath":"/var/log/openvswitch/"}:
                          .: {}
                          f:mountPath: {}
                          f:name: {}
                        k:{"mountPath":"/var/run"}:
                          .: {}
                          f:mountPath: {}
                          f:name: {}
                    k:{"name":"ovn-ipsec-cleanup"}:
                      .: {}
                      f:command: {}
                      f:image: {}
                      f:imagePullPolicy: {}
                      f:name: {}
                      f:resources:
                        .: {}
                        f:requests:
                          .: {}
                          f:cpu: {}
                          f:memory: {}
                      f:securityContext:
                        .: {}
                        f:privileged: {}
                      f:terminationMessagePath: {}
                      f:terminationMessagePolicy: {}
                      f:volumeMounts:
                        .: {}
                        k:{"mountPath":"/etc"}:
                          .: {}
                          f:mountPath: {}
                          f:name: {}
                        k:{"mountPath":"/etc/ovn/"}:
                          .: {}
                          f:mountPath: {}
                          f:name: {}
                        k:{"mountPath":"/var/run"}:
                          .: {}
                          f:mountPath: {}
                          f:name: {}
                  f:dnsPolicy: {}
                  f:enableServiceLinks: {}
                  f:hostNetwork: {}
                  f:hostPID: {}
                  f:initContainers:
                    .: {}
                    k:{"name":"ovn-keys"}:
                      .: {}
                      f:command: {}
                      f:env:
                        .: {}
                        k:{"name":"K8S_NODE"}:
                          .: {}
                          f:name: {}
                          f:valueFrom:
                            .: {}
                            f:fieldRef: {}
                      f:image: {}
                      f:imagePullPolicy: {}
                      f:name: {}
                      f:resources:
                        .: {}
                        f:requests:
                          .: {}
                          f:cpu: {}
                          f:memory: {}
                      f:securityContext:
                        .: {}
                        f:privileged: {}
                      f:terminationMessagePath: {}
                      f:terminationMessagePolicy: {}
                      f:volumeMounts:
                        .: {}
                        k:{"mountPath":"/etc"}:
                          .: {}
                          f:mountPath: {}
                          f:name: {}
                        k:{"mountPath":"/etc/openvswitch"}:
                          .: {}
                          f:mountPath: {}
                          f:name: {}
                        k:{"mountPath":"/etc/ovn/"}:
                          .: {}
                          f:mountPath: {}
                          f:name: {}
                        k:{"mountPath":"/signer-ca"}:
                          .: {}
                          f:mountPath: {}
                          f:name: {}
                        k:{"mountPath":"/var/run"}:
                          .: {}
                          f:mountPath: {}
                          f:name: {}
                  f:nodeSelector: {}
                  f:priorityClassName: {}
                  f:restartPolicy: {}
                  f:schedulerName: {}
                  f:securityContext: {}
                  f:serviceAccount: {}
                  f:serviceAccountName: {}
                  f:terminationGracePeriodSeconds: {}
                  f:tolerations: {}
                  f:volumes:
                    .: {}
                    k:{"name":"etc-openvswitch"}:
                      .: {}
                      f:hostPath:
                        .: {}
                        f:path: {}
                        f:type: {}
                      f:name: {}
                    k:{"name":"etc-ovn"}:
                      .: {}
                      f:hostPath:
                        .: {}
                        f:path: {}
                        f:type: {}
                      f:name: {}
                    k:{"name":"host-cni-netd"}:
                      .: {}
                      f:hostPath:
                        .: {}
                        f:path: {}
                        f:type: {}
                      f:name: {}
                    k:{"name":"host-etc"}:
                      .: {}
                      f:hostPath:
                        .: {}
                        f:path: {}
                        f:type: {}
                      f:name: {}
                    k:{"name":"host-var-lib"}:
                      .: {}
                      f:hostPath:
                        .: {}
                        f:path: {}
                        f:type: {}
                      f:name: {}
                    k:{"name":"host-var-log-ovs"}:
                      .: {}
                      f:hostPath:
                        .: {}
                        f:path: {}
                        f:type: {}
                      f:name: {}
                    k:{"name":"host-var-run"}:
                      .: {}
                      f:hostPath:
                        .: {}
                        f:path: {}
                        f:type: {}
                      f:name: {}
                    k:{"name":"ipsec-bin"}:
                      .: {}
                      f:hostPath:
                        .: {}
                        f:path: {}
                        f:type: {}
                      f:name: {}
                    k:{"name":"ipsec-lib"}:
                      .: {}
                      f:hostPath:
                        .: {}
                        f:path: {}
                        f:type: {}
                      f:name: {}
                    k:{"name":"signer-ca"}:
                      .: {}
                      f:configMap:
                        .: {}
                        f:defaultMode: {}
                        f:name: {}
                      f:name: {}
              manager: kube-controller-manager
              operation: Update
              time: '2025-02-13T14:54:04Z'
            - apiVersion: v1
              fieldsType: FieldsV1
              fieldsV1:
                f:status:
                  f:conditions:
                    k:{"type":"ContainersReady"}:
                      .: {}
                      f:lastProbeTime: {}
                      f:lastTransitionTime: {}
                      f:message: {}
                      f:reason: {}
                      f:status: {}
                      f:type: {}
                    k:{"type":"Initialized"}:
                      .: {}
                      f:lastProbeTime: {}
                      f:lastTransitionTime: {}
                      f:message: {}
                      f:reason: {}
                      f:status: {}
                      f:type: {}
                    k:{"type":"PodReadyToStartContainers"}:
                      .: {}
                      f:lastProbeTime: {}
                      f:lastTransitionTime: {}
                      f:status: {}
                      f:type: {}
                    k:{"type":"Ready"}:
                      .: {}
                      f:lastProbeTime: {}
                      f:lastTransitionTime: {}
                      f:message: {}
                      f:reason: {}
                      f:status: {}
                      f:type: {}
                  f:containerStatuses: {}
                  f:hostIP: {}
                  f:hostIPs: {}
                  f:initContainerStatuses: {}
                  f:podIP: {}
                  f:podIPs:
                    .: {}
                    k:{"ip":"10.0.128.2"}:
                      .: {}
                      f:ip: {}
                  f:startTime: {}
              manager: kubelet
              operation: Update
              subresource: status
              time: '2025-02-13T14:54:05Z'
            name: ovn-ipsec-host-n4cpv
            namespace: openshift-ovn-kubernetes
            ownerReferences:
            - apiVersion: apps/v1
              blockOwnerDeletion: true
              controller: true
              kind: DaemonSet
              name: ovn-ipsec-host
              uid: 61870386-d205-465b-832c-061c3bf7366e
            resourceVersion: '38812'
            uid: ce7f6619-3015-414d-9de4-5991d74258fd
          spec:
            affinity:
              nodeAffinity:
                requiredDuringSchedulingIgnoredDuringExecution:
                  nodeSelectorTerms:
                  - matchFields:
                    - key: metadata.name
                      operator: In
                      values:
                      - ci-op-9pmd0iim-3eaf1-dcw66-worker-a-d6sw7
            containers:
            - command:
              - /bin/bash
              - -c
              - "#!/bin/bash\nset -exuo pipefail\n\n# Don't start IPsec until ovnkube-node has\
                \ finished setting up the node\ncounter=0\nuntil [ -f /etc/cni/net.d/10-ovn-kubernetes.conf\
                \ ]\ndo\n  counter=$((counter+1))\n  sleep 1\n  if [ $counter -gt 300 ];\n \
                \ then\n          echo \"ovnkube-node pod has not started after $counter seconds\"\
                \n          exit 1\n  fi\ndone\necho \"ovnkube-node has configured node.\"\n\
                \nif ! pgrep pluto; then\n  echo \"pluto is not running, enable the service\
                \ and/or check system logs\"\n  exit 2\nfi\n\n# The ovs-monitor-ipsec doesn't\
                \ set authby, so when it calls ipsec auto --start\n# the default ones defined\
                \ at Libreswan's compile time will be used. On restart,\n# Libreswan will use\
                \ authby from libreswan.config. If libreswan.config is\n# incompatible with\
                \ the Libreswan's compiled-in defaults, then we'll have an\n# authentication\
                \ problem. But OTOH, ovs-monitor-ipsec does set ike and esp algorithms,\n# so\
                \ those may be incompatible with libreswan.config as well. Hence commenting\
                \ out the\n# \"include\" from libreswan.conf to avoid such conflicts.\ndefaultcpinclude=\"\
                include \\/etc\\/crypto-policies\\/back-ends\\/libreswan.config\"\nif ! grep\
                \ -q \"# ${defaultcpinclude}\" /etc/ipsec.conf; then\n  sed -i \"/${defaultcpinclude}/s/^/#\
                \ /\" /etc/ipsec.conf\n  # since pluto is on the host, we need to restart it\
                \ after changing connection\n  # parameters.\n  chroot /proc/1/root ipsec restart\n\
                \n  counter=0\n  until [ -r /run/pluto/pluto.ctl ]; do\n    counter=$((counter+1))\n\
                \    sleep 1\n    if [ $counter -gt 300 ];\n    then\n      echo \"ipsec has\
                \ not started after $counter seconds\"\n      exit 1\n    fi\n  done\n  echo\
                \ \"ipsec service is restarted\"\nfi\n\n# Workaround for https://github.com/libreswan/libreswan/issues/373\n\
                ulimit -n 1024\n\n/usr/libexec/ipsec/addconn --config /etc/ipsec.conf --checkconfig\n\
                # Check kernel modules\n/usr/libexec/ipsec/_stackmanager start\n# Check nss\
                \ database status\n/usr/sbin/ipsec --checknss\n\n# Start ovs-monitor-ipsec which\
                \ will monitor for changes in the ovs\n# tunnelling configuration (for example\
                \ addition of a node) and configures\n# libreswan appropriately.\n# We are running\
                \ this in the foreground so that the container will be restarted when ovs-monitor-ipsec\
                \ fails.\n/usr/libexec/platform-python /usr/share/openvswitch/scripts/ovs-monitor-ipsec\
                \ \\\n  --pidfile=/var/run/openvswitch/ovs-monitor-ipsec.pid --ike-daemon=libreswan\
                \ --no-restart-ike-daemon \\\n  --ipsec-conf /etc/ipsec.d/openshift.conf --ipsec-d\
                \ /var/lib/ipsec/nss \\\n  --log-file --monitor unix:/var/run/openvswitch/db.sock\n"
              env:
              - name: K8S_NODE
                valueFrom:
                  fieldRef:
                    apiVersion: v1
                    fieldPath: spec.nodeName
              image: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:7e262b9ed22e74a3a8d7a345b775645267acfbcd571b510e1ace519cc2f658bf
              imagePullPolicy: IfNotPresent
              lifecycle:
                preStop:
                  exec:
                    command:
                    - /bin/bash
                    - -c
                    - '#!/bin/bash
           
                      set -exuo pipefail
           
                      # In order to maintain traffic flows during container restart, we
           
                      # need to ensure that xfrm state and policies are not flushed.
           
           
                      # Don''t allow ovs monitor to cleanup persistent state
           
                      kill "$(cat /var/run/openvswitch/ovs-monitor-ipsec.pid 2>/dev/null)" 2>/dev/null
                      || true
           
                      '
              livenessProbe:
                exec:
                  command:
                  - /bin/bash
                  - -c
                  - "#!/bin/bash\nif [[ $(ipsec whack --trafficstatus | wc -l) -eq 0 ]]; then\n\
                    \  echo \"no ipsec traffic configured\"\n  exit 10\nfi\n"
                failureThreshold: 3
                initialDelaySeconds: 15
                periodSeconds: 60
                successThreshold: 1
                timeoutSeconds: 1
              name: ovn-ipsec
              resources:
                requests:
                  cpu: 10m
                  memory: 100Mi
              securityContext:
                privileged: true
              terminationMessagePath: /dev/termination-log
              terminationMessagePolicy: FallbackToLogsOnError
              volumeMounts:
              - mountPath: /etc/cni/net.d
                name: host-cni-netd
              - mountPath: /var/run
                name: host-var-run
              - mountPath: /var/log/openvswitch/
                name: host-var-log-ovs
              - mountPath: /etc/openvswitch
                name: etc-openvswitch
              - mountPath: /var/lib
                name: host-var-lib
              - mountPath: /etc
                name: host-etc
              - mountPath: /usr/sbin/ipsec
                name: ipsec-bin
              - mountPath: /usr/libexec/ipsec
                name: ipsec-lib
              - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
                name: kube-api-access-7rvbc
                readOnly: true
            - command:
              - /bin/bash
              - -c
              - "#!/bin/bash\n\n# When NETWORK_NODE_IDENTITY_ENABLE is true, use the per-node\
                \ certificate to create a kubeconfig\n# that will be used to talk to the API\n\
                \n\n# Wait for cert file\nretries=0\ntries=20\nkey_cert=\"/etc/ovn/ovnkube-node-certs/ovnkube-client-current.pem\"\
                \nwhile [ ! -f \"${key_cert}\" ]; do\n  (( retries += 1 ))\n  if [[ \"${retries}\"\
                \ -gt ${tries} ]]; then\n    echo \"$(date -Iseconds) - ERROR - ${key_cert}\
                \ not found\"\n    return 1\n  fi\n  sleep 1\ndone\n\ncat << EOF > /var/run/ovnkube-kubeconfig\n\
                apiVersion: v1\nclusters:\n  - cluster:\n      certificate-authority: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt\n\
                \      server: https://api-int.ci-op-9pmd0iim-3eaf1.XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX:6443\n\
                \    name: default-cluster\ncontexts:\n  - context:\n      cluster: default-cluster\n\
                \      namespace: default\n      user: default-auth\n    name: default-context\n\
                current-context: default-context\nkind: Config\npreferences: {}\nusers:\n  -\
                \ name: default-auth\n    user:\n      client-certificate: /etc/ovn/ovnkube-node-certs/ovnkube-client-current.pem\n\
                \      client-key: /etc/ovn/ovnkube-node-certs/ovnkube-client-current.pem\n\
                EOF\nexport KUBECONFIG=/var/run/ovnkube-kubeconfig\n\n\n# It is safe to flush\
                \ xfrm states and policies and delete openshift.conf\n# file when east-west\
                \ ipsec is disabled. This fixes a race condition when\n# ovs-monitor-ipsec is\
                \ not fast enough to notice ipsec config change and\n# delete entries before\
                \ it's being killed.\n# Since it's cleaning up all xfrm states and policies,\
                \ it may cause slight\n# interruption until ipsec is restarted in case of external\
                \ ipsec config.\n# We must do this before killing ovs-monitor-ipsec script,\
                \ otherwise\n# preStop hook doesn't get a chance to run it because ovn-ipsec\
                \ container\n# is abruptly terminated.\n# When east-west ipsec is not disabled,\
                \ then do not flush xfrm states and\n# policies in order to maintain traffic\
                \ flows during container restart.\nipsecflush() {\n  if [ \"$(kubectl get networks.operator.openshift.io\
                \ cluster -ojsonpath='{.spec.defaultNetwork.ovnKubernetesConfig.ipsecConfig.mode}')\"\
                \ != \"Full\" ] && \\\n     [ \"$(kubectl get networks.operator.openshift.io\
                \ cluster -ojsonpath='{.spec.defaultNetwork.ovnKubernetesConfig.ipsecConfig}')\"\
                \ != \"{}\" ]; then\n    ip x s flush\n    ip x p flush\n    rm -f /etc/ipsec.d/openshift.conf\n\
                \    # since pluto is on the host, we need to restart it after the flush\n \
                \   chroot /proc/1/root ipsec restart\n  fi\n}\n\n# Function to handle SIGTERM\n\
                cleanup() {\n  echo \"received SIGTERM, flushing ipsec config\"\n  # Wait upto\
                \ 15 seconds for ovs-monitor-ipsec process to terminate before\n  # cleaning\
                \ up ipsec entries.\n  counter=0\n  while kill -0 \"$(cat /var/run/openvswitch/ovs-monitor-ipsec.pid\
                \ 2>/dev/null)\"; do\n    counter=$((counter+1))\n    sleep 1\n    if [ $counter\
                \ -gt 15 ];\n    then\n      echo \"ovs-monitor-ipsec has not terminated after\
                \ $counter seconds\"\n      break\n    fi\n  done\n  ipsecflush\n  exit 0\n\
                }\n\n# Trap SIGTERM and call cleanup function\ntrap cleanup SIGTERM\n\ncounter=0\n\
                until [ -r /var/run/openvswitch/ovs-monitor-ipsec.pid ]; do\n  counter=$((counter+1))\n\
                \  sleep 1\n  if [ $counter -gt 300 ];\n  then\n    echo \"ovs-monitor-ipsec\
                \ has not started after $counter seconds\"\n    exit 1\n  fi\ndone\necho \"\
                ovs-monitor-ipsec is started\"\n\n# Monitor the ovs-monitor-ipsec process.\n\
                while kill -0 \"$(cat /var/run/openvswitch/ovs-monitor-ipsec.pid 2>/dev/null)\"\
                ; do\n  sleep 1\ndone\n\n# Once the ovs-monitor-ipsec process terminates, execute\
                \ the cleanup command.\necho \"ovs-monitor-ipsec is terminated, flushing ipsec\
                \ config\"\nipsecflush\n\n# Continue running until SIGTERM is received (or exit\
                \ naturally)\nwhile true; do\n  sleep 1\ndone\n"
              image: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:7e262b9ed22e74a3a8d7a345b775645267acfbcd571b510e1ace519cc2f658bf
              imagePullPolicy: IfNotPresent
              name: ovn-ipsec-cleanup
              resources:
                requests:
                  cpu: 10m
                  memory: 50Mi
              securityContext:
                privileged: true
              terminationMessagePath: /dev/termination-log
              terminationMessagePolicy: FallbackToLogsOnError
              volumeMounts:
              - mountPath: /etc/ovn/
                name: etc-ovn
              - mountPath: /var/run
                name: host-var-run
              - mountPath: /etc
                name: host-etc
              - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
                name: kube-api-access-7rvbc
                readOnly: true
            dnsPolicy: Default
            enableServiceLinks: true
            hostNetwork: true
            hostPID: true
            imagePullSecrets:
            - name: ovn-kubernetes-node-dockercfg-sds8g
            initContainers:
            - command:
              - /bin/bash
              - -c
              - "#!/bin/bash\nset -exuo pipefail\n\n# When NETWORK_NODE_IDENTITY_ENABLE is true,\
                \ use the per-node certificate to create a kubeconfig\n# that will be used to\
                \ talk to the API\n\n\n# Wait for cert file\nretries=0\ntries=20\nkey_cert=\"\
                /etc/ovn/ovnkube-node-certs/ovnkube-client-current.pem\"\nwhile [ ! -f \"${key_cert}\"\
                \ ]; do\n  (( retries += 1 ))\n  if [[ \"${retries}\" -gt ${tries} ]]; then\n\
                \    echo \"$(date -Iseconds) - ERROR - ${key_cert} not found\"\n    return\
                \ 1\n  fi\n  sleep 1\ndone\n\ncat << EOF > /var/run/ovnkube-kubeconfig\napiVersion:\
                \ v1\nclusters:\n  - cluster:\n      certificate-authority: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt\n\
                \      server: https://api-int.ci-op-9pmd0iim-3eaf1.XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX:6443\n\
                \    name: default-cluster\ncontexts:\n  - context:\n      cluster: default-cluster\n\
                \      namespace: default\n      user: default-auth\n    name: default-context\n\
                current-context: default-context\nkind: Config\npreferences: {}\nusers:\n  -\
                \ name: default-auth\n    user:\n      client-certificate: /etc/ovn/ovnkube-node-certs/ovnkube-client-current.pem\n\
                \      client-key: /etc/ovn/ovnkube-node-certs/ovnkube-client-current.pem\n\
                EOF\nexport KUBECONFIG=/var/run/ovnkube-kubeconfig\n\n\n# Every time we restart\
                \ this container, we will create a new key pair if\n# we are close to key expiration\
                \ or if we do not already have a signed key pair.\n#\n# Each node has a key\
                \ pair which is used by OVS to encrypt/decrypt/authenticate traffic\n# between\
                \ each node. The CA cert is used as the root of trust for all certs so we need\n\
                # the CA to sign our certificate signing requests with the CA private key. In\
                \ this way,\n# we can validate that any signed certificates that we receive\
                \ from other nodes are\n# authentic.\necho \"Configuring IPsec keys\"\n\ncert_pem=/etc/openvswitch/keys/ipsec-cert.pem\n\
                \n# If the certificate does not exist or it will expire in the next 6 months\n\
                # (15770000 seconds), we will generate a new one.\nif ! openssl x509 -noout\
                \ -dates -checkend 15770000 -in $cert_pem; then\n  # We use the system-id as\
                \ the CN for our certificate signing request. This\n  # is a requirement by\
                \ OVN.\n  cn=$(ovs-vsctl --retry -t 60 get Open_vSwitch . external-ids:system-id\
                \ | tr -d \"\\\"\")\n\n  mkdir -p /etc/openvswitch/keys\n\n  # Generate an SSL\
                \ private key and use the key to create a certitificate signing request\n  umask\
                \ 077 && openssl genrsa -out /etc/openvswitch/keys/ipsec-privkey.pem 2048\n\
                \  openssl req -new -text \\\n              -extensions v3_req \\\n        \
                \      -addext \"subjectAltName = DNS:${cn}\" \\\n              -subj \"/C=US/O=ovnkubernetes/OU=kind/CN=${cn}\"\
                \ \\\n              -key /etc/openvswitch/keys/ipsec-privkey.pem \\\n      \
                \        -out /etc/openvswitch/keys/ipsec-req.pem\n\n  csr_64=$(base64 -w0 /etc/openvswitch/keys/ipsec-req.pem)\
                \ # -w0 to avoid line-wrap\n\n  # Request that our generated certificate signing\
                \ request is\n  # signed by the \"network.openshift.io/signer\" signer that\
                \ is\n  # implemented by the CNO signer controller. This will sign the\n  #\
                \ certificate signing request using the signer-ca which has been\n  # set up\
                \ by the OperatorPKI. In this way, we have a signed certificate\n  # and our\
                \ private key has remained private on this host.\n  cat <<EOF | kubectl create\
                \ -f -\n  apiVersion: certificates.k8s.io/v1\n  kind: CertificateSigningRequest\n\
                \  metadata:\n    generateName: ipsec-csr-$(hostname)-\n    labels:\n      k8s.ovn.org/ipsec-csr:\
                \ $(hostname)\n  spec:\n    request: ${csr_64}\n    signerName: network.openshift.io/signer\n\
                \    usages:\n    - ipsec tunnel\nEOF\n  # Wait until the certificate signing\
                \ request has been signed.\n  counter=0\n  until [ -n \"$(kubectl get csr -lk8s.ovn.org/ipsec-csr=\"\
                $(hostname)\" --sort-by=.metadata.creationTimestamp -o jsonpath='{.items[-1:].status.certificate}'\
                \ 2>/dev/null)\" ]\n  do\n    counter=$((counter+1))\n    sleep 1\n    if [\
                \ $counter -gt 60 ];\n    then\n            echo \"Unable to sign certificate\
                \ after $counter seconds\"\n            exit 1\n    fi\n  done\n\n  # Decode\
                \ the signed certificate.\n  kubectl get csr -lk8s.ovn.org/ipsec-csr=\"$(hostname)\"\
                \ --sort-by=.metadata.creationTimestamp -o jsonpath='{.items[-1:].status.certificate}'\
                \ | base64 -d | openssl x509 -outform pem -text -out $cert_pem\n\n  # kubectl\
                \ delete csr/$(hostname)\n\n  # Get the CA certificate so we can authenticate\
                \ peer nodes.\n  openssl x509 -in /signer-ca/ca-bundle.crt -outform pem -text\
                \ -out /etc/openvswitch/keys/ipsec-cacert.pem\nfi\n\n# Configure OVS with the\
                \ relevant keys for this node. This is required by ovs-monitor-ipsec.\n#\n#\
                \ Updating the certificates does not need to be an atomic operation as\n# the\
                \ will get read and loaded into NSS by the ovs-monitor-ipsec process\n# which\
                \ has not started yet.\novs-vsctl --retry -t 60 set Open_vSwitch . other_config:certificate=$cert_pem\
                \ \\\n                                           other_config:private_key=/etc/openvswitch/keys/ipsec-privkey.pem\
                \ \\\n                                           other_config:ca_cert=/etc/openvswitch/keys/ipsec-cacert.pem\n"
              env:
              - name: K8S_NODE
                valueFrom:
                  fieldRef:
                    apiVersion: v1
                    fieldPath: spec.nodeName
              image: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:7e262b9ed22e74a3a8d7a345b775645267acfbcd571b510e1ace519cc2f658bf
              imagePullPolicy: IfNotPresent
              name: ovn-keys
              resources:
                requests:
                  cpu: 10m
                  memory: 100Mi
              securityContext:
                privileged: true
              terminationMessagePath: /dev/termination-log
              terminationMessagePolicy: FallbackToLogsOnError
              volumeMounts:
              - mountPath: /etc/ovn/
                name: etc-ovn
              - mountPath: /var/run
                name: host-var-run
              - mountPath: /signer-ca
                name: signer-ca
              - mountPath: /etc/openvswitch
                name: etc-openvswitch
              - mountPath: /etc
                name: host-etc
              - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
                name: kube-api-access-7rvbc
                readOnly: true
            nodeName: ci-op-9pmd0iim-3eaf1-dcw66-worker-a-d6sw7
            nodeSelector:
              kubernetes.io/os: linux
            preemptionPolicy: PreemptLowerPriority
            priority: 2000001000
            priorityClassName: system-node-critical
            restartPolicy: Always
            schedulerName: default-scheduler
            securityContext: {}
            serviceAccount: ovn-kubernetes-node
            serviceAccountName: ovn-kubernetes-node
            terminationGracePeriodSeconds: 10
            tolerations:
            - operator: Exists
            volumes:
            - hostPath:
                path: /var/lib/ovn-ic/etc
                type: ''
              name: etc-ovn
            - hostPath:
                path: /var/log/openvswitch
                type: DirectoryOrCreate
              name: host-var-log-ovs
            - configMap:
                defaultMode: 420
                name: signer-ca
              name: signer-ca
            - hostPath:
                path: /var/lib/openvswitch/etc
                type: DirectoryOrCreate
              name: etc-openvswitch
            - hostPath:
                path: /var/run/multus/cni/net.d
                type: ''
              name: host-cni-netd
            - hostPath:
                path: /var/run
                type: DirectoryOrCreate
              name: host-var-run
            - hostPath:
                path: /var/lib
                type: DirectoryOrCreate
              name: host-var-lib
            - hostPath:
                path: /etc
                type: Directory
              name: host-etc
            - hostPath:
                path: /usr/sbin/ipsec
                type: File
              name: ipsec-bin
            - hostPath:
                path: /usr/libexec/ipsec
                type: Directory
              name: ipsec-lib
            - name: kube-api-access-7rvbc
              projected:
                defaultMode: 420
                sources:
                - serviceAccountToken:
                    expirationSeconds: 3607
                    path: token
                - configMap:
                    items:
                    - key: ca.crt
                      path: ca.crt
                    name: kube-root-ca.crt
                - downwardAPI:
                    items:
                    - fieldRef:
                        apiVersion: v1
                        fieldPath: metadata.namespace
                      path: namespace
                - configMap:
                    items:
                    - key: service-ca.crt
                      path: service-ca.crt
                    name: openshift-service-ca.crt
          status:
            conditions:
            - lastProbeTime: null
              lastTransitionTime: '2025-02-13T14:54:05Z'
              status: 'False'
              type: PodReadyToStartContainers
            - lastProbeTime: null
              lastTransitionTime: '2025-02-13T14:54:05Z'
              message: 'containers with incomplete status: [ovn-keys]'
              reason: ContainersNotInitialized
              status: 'False'
              type: Initialized
            - lastProbeTime: null
              lastTransitionTime: '2025-02-13T14:54:05Z'
              message: 'containers with unready status: [ovn-ipsec ovn-ipsec-cleanup]'
              reason: ContainersNotReady
              status: 'False'
              type: Ready
            - lastProbeTime: null
              lastTransitionTime: '2025-02-13T14:54:05Z'
              message: 'containers with unready status: [ovn-ipsec ovn-ipsec-cleanup]'
              reason: ContainersNotReady
              status: 'False'
              type: ContainersReady
            - lastProbeTime: null
              lastTransitionTime: '2025-02-13T14:54:05Z'
              status: 'True'
              type: PodScheduled
            containerStatuses:
            - image: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:7e262b9ed22e74a3a8d7a345b775645267acfbcd571b510e1ace519cc2f658bf
              imageID: ''
              lastState: {}
              name: ovn-ipsec
              ready: false
              restartCount: 0
              started: false
              state:
                waiting:
                  reason: PodInitializing
              volumeMounts:
              - mountPath: /etc/cni/net.d
                name: host-cni-netd
              - mountPath: /var/run
                name: host-var-run
              - mountPath: /var/log/openvswitch/
                name: host-var-log-ovs
              - mountPath: /etc/openvswitch
                name: etc-openvswitch
              - mountPath: /var/lib
                name: host-var-lib
              - mountPath: /etc
                name: host-etc
              - mountPath: /usr/sbin/ipsec
                name: ipsec-bin
              - mountPath: /usr/libexec/ipsec
                name: ipsec-lib
              - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
                name: kube-api-access-7rvbc
                readOnly: true
                recursiveReadOnly: Disabled
            - image: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:7e262b9ed22e74a3a8d7a345b775645267acfbcd571b510e1ace519cc2f658bf
              imageID: ''
              lastState: {}
              name: ovn-ipsec-cleanup
              ready: false
              restartCount: 0
              started: false
              state:
                waiting:
                  reason: PodInitializing
              volumeMounts:
              - mountPath: /etc/ovn/
                name: etc-ovn
              - mountPath: /var/run
                name: host-var-run
              - mountPath: /etc
                name: host-etc
              - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
                name: kube-api-access-7rvbc
                readOnly: true
                recursiveReadOnly: Disabled
            hostIP: 10.0.128.2
            hostIPs:
            - ip: 10.0.128.2
            initContainerStatuses:
            - image: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:7e262b9ed22e74a3a8d7a345b775645267acfbcd571b510e1ace519cc2f658bf
              imageID: ''
              lastState: {}
              name: ovn-keys
              ready: false
              restartCount: 0
              started: false
              state:
                waiting:
                  reason: PodInitializing
              volumeMounts:
              - mountPath: /etc/ovn/
                name: etc-ovn
              - mountPath: /var/run
                name: host-var-run
              - mountPath: /signer-ca
                name: signer-ca
              - mountPath: /etc/openvswitch
                name: etc-openvswitch
              - mountPath: /etc
                name: host-etc
              - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
                name: kube-api-access-7rvbc
                readOnly: true
                recursiveReadOnly: Disabled
            phase: Pending
            podIP: 10.0.128.2
            podIPs:
            - ip: 10.0.128.2
            qosClass: Burstable
            startTime: '2025-02-13T14:54:05Z'
      
      
      
      1.7.1
      
      PrivateBin is a minimalist, open source online pastebin where the server has zero knowledge of pasted data. Data is encrypted/decrypted in the browser using 256 bits AES. More information on the project page. Red Hat Employee Privacy Statement
      
      

      1.

      2.

      3.

      Actual results:

      Expected results:

      Additional info:

      Please fill in the following template while reporting a bug and provide as much relevant information as possible. Doing so will give us the best chance to find a prompt resolution.

      Affected Platforms:

      Is it an

      1. internal CI failure
      2. customer issue / SD
      3. internal RedHat testing failure

      If it is an internal RedHat testing failure:

      • Please share a kubeconfig or creds to a live cluster for the assignee to debug/troubleshoot along with reproducer steps (specially if it's a telco use case like ICNI, secondary bridges or BM+kubevirt).

      If it is a CI failure:

      • Did it happen in different CI lanes? If so please provide links to multiple failures with the same error instance
      • Did it happen in both sdn and ovn jobs? If so please provide links to multiple failures with the same error instance
      • Did it happen in other platforms (e.g. aws, azure, gcp, baremetal etc) ? If so please provide links to multiple failures with the same error instance
      • When did the failure start happening? Please provide the UTC timestamp of the networking outage window from a sample failure run
      • If it's a connectivity issue,
      • What is the srcNode, srcIP and srcNamespace and srcPodName?
      • What is the dstNode, dstIP and dstNamespace and dstPodName?
      • What is the traffic path? (examples: pod2pod? pod2external?, pod2svc? pod2Node? etc)

      If it is a customer / SD issue:

      • Provide enough information in the bug description that Engineering doesn’t need to read the entire case history.
      • Don’t presume that Engineering has access to Salesforce.
      • Do presume that Engineering will access attachments through supportshell.
      • Describe what each relevant attachment is intended to demonstrate (failed pods, log errors, OVS issues, etc).
      • Referring to the attached must-gather, sosreport or other attachment, please provide the following details:
        • If the issue is in a customer namespace then provide a namespace inspect.
        • If it is a connectivity issue:
          • What is the srcNode, srcNamespace, srcPodName and srcPodIP?
          • What is the dstNode, dstNamespace, dstPodName and dstPodIP?
          • What is the traffic path? (examples: pod2pod? pod2external?, pod2svc? pod2Node? etc)
          • Please provide the UTC timestamp networking outage window from must-gather
          • Please provide tcpdump pcaps taken during the outage filtered based on the above provided src/dst IPs
        • If it is not a connectivity issue:
          • Describe the steps taken so far to analyze the logs from networking components (cluster-network-operator, OVNK, SDN, openvswitch, ovs-configure etc) and the actual component where the issue was seen based on the attached must-gather. Please attach snippets of relevant logs around the window when problem has happened if any.
      • When showing the results from commands, include the entire command in the output.  
      • For OCPBUGS in which the issue has been identified, label with “sbr-triaged”
      • For OCPBUGS in which the issue has not been identified and needs Engineering help for root cause, label with “sbr-untriaged”
      • Do not set the priority, that is owned by Engineering and will be set when the bug is evaluated
      • Note: bugs that do not meet these minimum standards will be closed with label “SDN-Jira-template”
      • For guidance on using this template please see
        OCPBUGS Template Training for Networking  components

              djoshy David Joshy
              huirwang Huiran Wang
              Sergio Regidor de la Rosa Sergio Regidor de la Rosa
              Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

                Created:
                Updated: