Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-18159

Azure; NLB; OVN-K: Requests from CNI pods to internalAPI server domain fails always

    XMLWordPrintable

Details

    • Important
    • No
    • SDN Sprint 241, SDN Sprint 242
    • 2
    • Rejected
    • False
    • Hide

      None

      Show
      None

    Description

      Description of problem:

      Azure SNO cluster installation failed due to CNCC pod crashed, found failure in ci jobs and then reproduced it with flexy job
      
      

      Version-Release number of selected component (if applicable):

      4.14.0-ec.4
      
      

      How reproducible:

      Not sure
      
      

      Steps to Reproduce:

      1. Install a cluster with flexy job aos-4_14/ipi-on-azure/versioned-installer-sno-ci, set networkType: "OVNKubernetes"
      
      

      Actual results:

      Installation failed
      % oc get co
      NAME                                       VERSION       AVAILABLE   PROGRESSING   DEGRADED   SINCE   MESSAGE
      authentication                             4.14.0-ec.4   True        False         False      64m     
      baremetal                                  4.14.0-ec.4   True        False         False      89m     
      cloud-controller-manager                   4.14.0-ec.4   True        False         False      92m     
      cloud-credential                           4.14.0-ec.4   True        False         False      97m     
      cluster-autoscaler                         4.14.0-ec.4   True        False         False      89m     
      config-operator                            4.14.0-ec.4   True        False         False      90m     
      console                                    4.14.0-ec.4   True        False         False      72m     
      control-plane-machine-set                  4.14.0-ec.4   True        False         False      89m     
      csi-snapshot-controller                    4.14.0-ec.4   True        False         False      89m     
      dns                                        4.14.0-ec.4   True        False         False      89m     
      etcd                                       4.14.0-ec.4   True        False         False      84m     
      image-registry                             4.14.0-ec.4   True        False         False      75m     
      ingress                                    4.14.0-ec.4   True        False         False      75m     
      insights                                   4.14.0-ec.4   True        False         False      83m     
      kube-apiserver                             4.14.0-ec.4   True        False         False      80m     
      kube-controller-manager                    4.14.0-ec.4   True        False         False      83m     
      kube-scheduler                             4.14.0-ec.4   True        False         False      80m     
      kube-storage-version-migrator              4.14.0-ec.4   True        False         False      90m     
      machine-api                                4.14.0-ec.4   True        False         False      84m     
      machine-approver                           4.14.0-ec.4   True        False         False      89m     
      machine-config                             4.14.0-ec.4   True        False         False      88m     
      marketplace                                4.14.0-ec.4   True        False         False      89m     
      monitoring                                 4.14.0-ec.4   True        False         False      70m     
      network                                    4.14.0-ec.4   True        True          False      92m     Deployment "/openshift-cloud-network-config-controller/cloud-network-config-controller" is not available (awaiting 1 nodes)
      node-tuning                                4.14.0-ec.4   True        False         False      89m     
      openshift-apiserver                        4.14.0-ec.4   True        False         False      75m     
      openshift-controller-manager               4.14.0-ec.4   True        False         False      75m     
      openshift-samples                          4.14.0-ec.4   True        False         False      75m     
      operator-lifecycle-manager                 4.14.0-ec.4   True        False         False      89m     
      operator-lifecycle-manager-catalog         4.14.0-ec.4   True        False         False      89m     
      operator-lifecycle-manager-packageserver   4.14.0-ec.4   True        False         False      80m     
      service-ca                                 4.14.0-ec.4   True        False         False      90m     
      storage                                    4.14.0-ec.4   True        False         False      89m 
      
      oc get pods -n openshift-cloud-network-config-controller  
      NAME                                               READY   STATUS   RESTARTS         AGE
      cloud-network-config-controller-565df6f4b5-sb8kv   0/1     Error    19 (5m58s ago)   93m
      % oc describe pod cloud-network-config-controller-565df6f4b5-sb8kv -n openshift-cloud-network-config-controller  
      Name:                 cloud-network-config-controller-565df6f4b5-sb8kv
      Namespace:            openshift-cloud-network-config-controller
      Priority:             2000000000
      Priority Class Name:  system-cluster-critical
      Service Account:      cloud-network-config-controller
      Node:                 huirwang-0828d-s424j-master-0/10.0.0.6
      Start Time:           Mon, 28 Aug 2023 12:57:02 +0800
      Labels:               app=cloud-network-config-controller
                            component=network
                            openshift.io/component=network
                            pod-template-hash=565df6f4b5
                            type=infra
      Annotations:          k8s.ovn.org/pod-networks:
                              {"default":{"ip_addresses":["10.128.0.30/23"],"mac_address":"0a:58:0a:80:00:1e","gateway_ips":["10.128.0.1"],"routes":[{"dest":"10.128.0.0...
                            k8s.v1.cni.cncf.io/network-status:
                              [{
                                  "name": "ovn-kubernetes",
                                  "interface": "eth0",
                                  "ips": [
                                      "10.128.0.30"
                                  ],
                                  "mac": "0a:58:0a:80:00:1e",
                                  "default": true,
                                  "dns": {}
                              }]
                            openshift.io/scc: restricted-v2
                            seccomp.security.alpha.kubernetes.io/pod: runtime/default
      Status:               Running
      IP:                   10.128.0.30
      IPs:
        IP:           10.128.0.30
      Controlled By:  ReplicaSet/cloud-network-config-controller-565df6f4b5
      Containers:
        controller:
          Container ID:  cri-o://35683ef6222fac819b8cbca5a0a22b047bd8950570a4f1783f9fb515acbde6bd
          Image:         quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:c4d8734ad517b36d7ac0baef72c183b3183e0fa8aa4a465f691bfbf262348970
          Image ID:      quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:c4d8734ad517b36d7ac0baef72c183b3183e0fa8aa4a465f691bfbf262348970
          Port:          <none>
          Host Port:     <none>
          Command:
            /usr/bin/cloud-network-config-controller
          Args:
            -platform-type
            Azure
            -platform-region=
            -platform-api-url=
            -platform-aws-ca-override=
            -platform-azure-environment=AzurePublicCloud
            -secret-name
            cloud-credentials
          State:       Waiting
            Reason:    CrashLoopBackOff
          Last State:  Terminated
            Reason:    Error
            Message:   W0828 06:27:53.509786       1 client_config.go:618] Neither --kubeconfig nor --master was specified.  Using the inClusterConfig.  This might not work.
      F0828 06:28:23.512457       1 main.go:345] Error building controller runtime client: Get "https://api-int.huirwang-0828d.qe.azure.devcluster.openshift.com:6443/api?timeout=32s": dial tcp 10.0.0.4:6443: i/o timeout
      
            Exit Code:    1
            Started:      Mon, 28 Aug 2023 14:27:53 +0800
            Finished:     Mon, 28 Aug 2023 14:28:23 +0800
          Ready:          False
          Restart Count:  19
          Requests:
            cpu:     10m
            memory:  50Mi
          Environment:
            CONTROLLER_NAMESPACE:     openshift-cloud-network-config-controller (v1:metadata.namespace)
            CONTROLLER_NAME:          cloud-network-config-controller-565df6f4b5-sb8kv (v1:metadata.name)
            KUBERNETES_SERVICE_PORT:  6443
            KUBERNETES_SERVICE_HOST:  api-int.huirwang-0828d.qe.azure.devcluster.openshift.com
            RELEASE_VERSION:          4.14.0-ec.4
          Mounts:
            /etc/pki/ca-trust/extracted/pem from trusted-ca (ro)
            /etc/secret/cloudprovider from cloud-provider-secret (ro)
            /kube-cloud-config from kube-cloud-config (ro)
            /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-b9hp9 (ro)
            /var/run/secrets/openshift/serviceaccount from bound-sa-token (ro)
      Conditions:
        Type              Status
        Initialized       True 
        Ready             False 
        ContainersReady   False 
        PodScheduled      True 
      Volumes:
        cloud-provider-secret:
          Type:        Secret (a volume populated by a Secret)
          SecretName:  cloud-credentials
          Optional:    false
        kube-cloud-config:
          Type:      ConfigMap (a volume populated by a ConfigMap)
          Name:      kube-cloud-config
          Optional:  false
        trusted-ca:
          Type:      ConfigMap (a volume populated by a ConfigMap)
          Name:      trusted-ca
          Optional:  false
        bound-sa-token:
          Type:                    Projected (a volume that contains injected data from multiple sources)
          TokenExpirationSeconds:  3600
        kube-api-access-b9hp9:
          Type:                    Projected (a volume that contains injected data from multiple sources)
          TokenExpirationSeconds:  3607
          ConfigMapName:           kube-root-ca.crt
          ConfigMapOptional:       <nil>
          DownwardAPI:             true
          ConfigMapName:           openshift-service-ca.crt
          ConfigMapOptional:       <nil>
      QoS Class:                   Burstable
      Node-Selectors:              node-role.kubernetes.io/master=
      Tolerations:                 node-role.kubernetes.io/master:NoSchedule op=Exists
                                   node.kubernetes.io/memory-pressure:NoSchedule op=Exists
                                   node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                                   node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
      Events:
        Type     Reason            Age                   From               Message
        ----     ------            ----                  ----               -------
        Warning  FailedScheduling  93m                   default-scheduler  0/1 nodes are available: 1 node(s) had untolerated taint {node.kubernetes.io/not-ready: }. preemption: 0/1 nodes are available: 1 Preemption is not helpful for scheduling..
        Normal   Scheduled         92m                   default-scheduler  Successfully assigned openshift-cloud-network-config-controller/cloud-network-config-controller-565df6f4b5-sb8kv to huirwang-0828d-s424j-master-0
        Normal   AddedInterface    92m                   multus             Add eth0 [10.128.0.30/23] from ovn-kubernetes
        Normal   Pulling           92m                   kubelet            Pulling image "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:c4d8734ad517b36d7ac0baef72c183b3183e0fa8aa4a465f691bfbf262348970"
        Normal   Pulled            91m                   kubelet            Successfully pulled image "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:c4d8734ad517b36d7ac0baef72c183b3183e0fa8aa4a465f691bfbf262348970" in 14.561801668s (14.561823468s including waiting)
        Normal   Created           85m (x5 over 91m)     kubelet            Created container controller
        Normal   Started           85m (x5 over 91m)     kubelet            Started container controller
        Normal   Pulled            6m58s (x18 over 91m)  kubelet            Container image "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:c4d8734ad517b36d7ac0baef72c183b3183e0fa8aa4a465f691bfbf262348970" already present on machine
        Warning  BackOff           114s (x332 over 91m)  kubelet            Back-off restarting failed container controller in pod cloud-network-config-controller-565df6f4b5-sb8kv_openshift-cloud-network-config-controller(ab850390-97a3-4fe5-83b7-1bd3c1628470
      
      

      Expected results:

      CNCC pod runs smoothly 
      
      

      Additional info:

      
      

      Attachments

        Issue Links

          Activity

            People

              sseethar Surya Seetharaman
              huirwang Huiran Wang
              Huiran Wang Huiran Wang
              Riccardo Ravaioli
              Votes:
              0 Vote for this issue
              Watchers:
              12 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: