-
Bug
-
Resolution: Done-Errata
-
Critical
-
4.12.z
-
Important
-
No
-
SDN Sprint 241, SDN Sprint 242
-
2
-
Rejected
-
False
-
Description of problem:
Azure SNO cluster installation failed due to CNCC pod crashed, found failure in ci jobs and then reproduced it with flexy job
Version-Release number of selected component (if applicable):
4.14.0-ec.4
How reproducible:
Not sure
Steps to Reproduce:
1. Install a cluster with flexy job aos-4_14/ipi-on-azure/versioned-installer-sno-ci, set networkType: "OVNKubernetes"
Actual results:
Installation failed % oc get co NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE MESSAGE authentication 4.14.0-ec.4 True False False 64m baremetal 4.14.0-ec.4 True False False 89m cloud-controller-manager 4.14.0-ec.4 True False False 92m cloud-credential 4.14.0-ec.4 True False False 97m cluster-autoscaler 4.14.0-ec.4 True False False 89m config-operator 4.14.0-ec.4 True False False 90m console 4.14.0-ec.4 True False False 72m control-plane-machine-set 4.14.0-ec.4 True False False 89m csi-snapshot-controller 4.14.0-ec.4 True False False 89m dns 4.14.0-ec.4 True False False 89m etcd 4.14.0-ec.4 True False False 84m image-registry 4.14.0-ec.4 True False False 75m ingress 4.14.0-ec.4 True False False 75m insights 4.14.0-ec.4 True False False 83m kube-apiserver 4.14.0-ec.4 True False False 80m kube-controller-manager 4.14.0-ec.4 True False False 83m kube-scheduler 4.14.0-ec.4 True False False 80m kube-storage-version-migrator 4.14.0-ec.4 True False False 90m machine-api 4.14.0-ec.4 True False False 84m machine-approver 4.14.0-ec.4 True False False 89m machine-config 4.14.0-ec.4 True False False 88m marketplace 4.14.0-ec.4 True False False 89m monitoring 4.14.0-ec.4 True False False 70m network 4.14.0-ec.4 True True False 92m Deployment "/openshift-cloud-network-config-controller/cloud-network-config-controller" is not available (awaiting 1 nodes) node-tuning 4.14.0-ec.4 True False False 89m openshift-apiserver 4.14.0-ec.4 True False False 75m openshift-controller-manager 4.14.0-ec.4 True False False 75m openshift-samples 4.14.0-ec.4 True False False 75m operator-lifecycle-manager 4.14.0-ec.4 True False False 89m operator-lifecycle-manager-catalog 4.14.0-ec.4 True False False 89m operator-lifecycle-manager-packageserver 4.14.0-ec.4 True False False 80m service-ca 4.14.0-ec.4 True False False 90m storage 4.14.0-ec.4 True False False 89m oc get pods -n openshift-cloud-network-config-controller NAME READY STATUS RESTARTS AGE cloud-network-config-controller-565df6f4b5-sb8kv 0/1 Error 19 (5m58s ago) 93m % oc describe pod cloud-network-config-controller-565df6f4b5-sb8kv -n openshift-cloud-network-config-controller Name: cloud-network-config-controller-565df6f4b5-sb8kv Namespace: openshift-cloud-network-config-controller Priority: 2000000000 Priority Class Name: system-cluster-critical Service Account: cloud-network-config-controller Node: huirwang-0828d-s424j-master-0/10.0.0.6 Start Time: Mon, 28 Aug 2023 12:57:02 +0800 Labels: app=cloud-network-config-controller component=network openshift.io/component=network pod-template-hash=565df6f4b5 type=infra Annotations: k8s.ovn.org/pod-networks: {"default":{"ip_addresses":["10.128.0.30/23"],"mac_address":"0a:58:0a:80:00:1e","gateway_ips":["10.128.0.1"],"routes":[{"dest":"10.128.0.0... k8s.v1.cni.cncf.io/network-status: [{ "name": "ovn-kubernetes", "interface": "eth0", "ips": [ "10.128.0.30" ], "mac": "0a:58:0a:80:00:1e", "default": true, "dns": {} }] openshift.io/scc: restricted-v2 seccomp.security.alpha.kubernetes.io/pod: runtime/default Status: Running IP: 10.128.0.30 IPs: IP: 10.128.0.30 Controlled By: ReplicaSet/cloud-network-config-controller-565df6f4b5 Containers: controller: Container ID: cri-o://35683ef6222fac819b8cbca5a0a22b047bd8950570a4f1783f9fb515acbde6bd Image: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:c4d8734ad517b36d7ac0baef72c183b3183e0fa8aa4a465f691bfbf262348970 Image ID: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:c4d8734ad517b36d7ac0baef72c183b3183e0fa8aa4a465f691bfbf262348970 Port: <none> Host Port: <none> Command: /usr/bin/cloud-network-config-controller Args: -platform-type Azure -platform-region= -platform-api-url= -platform-aws-ca-override= -platform-azure-environment=AzurePublicCloud -secret-name cloud-credentials State: Waiting Reason: CrashLoopBackOff Last State: Terminated Reason: Error Message: W0828 06:27:53.509786 1 client_config.go:618] Neither --kubeconfig nor --master was specified. Using the inClusterConfig. This might not work. F0828 06:28:23.512457 1 main.go:345] Error building controller runtime client: Get "https://api-int.huirwang-0828d.qe.azure.devcluster.openshift.com:6443/api?timeout=32s": dial tcp 10.0.0.4:6443: i/o timeout Exit Code: 1 Started: Mon, 28 Aug 2023 14:27:53 +0800 Finished: Mon, 28 Aug 2023 14:28:23 +0800 Ready: False Restart Count: 19 Requests: cpu: 10m memory: 50Mi Environment: CONTROLLER_NAMESPACE: openshift-cloud-network-config-controller (v1:metadata.namespace) CONTROLLER_NAME: cloud-network-config-controller-565df6f4b5-sb8kv (v1:metadata.name) KUBERNETES_SERVICE_PORT: 6443 KUBERNETES_SERVICE_HOST: api-int.huirwang-0828d.qe.azure.devcluster.openshift.com RELEASE_VERSION: 4.14.0-ec.4 Mounts: /etc/pki/ca-trust/extracted/pem from trusted-ca (ro) /etc/secret/cloudprovider from cloud-provider-secret (ro) /kube-cloud-config from kube-cloud-config (ro) /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-b9hp9 (ro) /var/run/secrets/openshift/serviceaccount from bound-sa-token (ro) Conditions: Type Status Initialized True Ready False ContainersReady False PodScheduled True Volumes: cloud-provider-secret: Type: Secret (a volume populated by a Secret) SecretName: cloud-credentials Optional: false kube-cloud-config: Type: ConfigMap (a volume populated by a ConfigMap) Name: kube-cloud-config Optional: false trusted-ca: Type: ConfigMap (a volume populated by a ConfigMap) Name: trusted-ca Optional: false bound-sa-token: Type: Projected (a volume that contains injected data from multiple sources) TokenExpirationSeconds: 3600 kube-api-access-b9hp9: Type: Projected (a volume that contains injected data from multiple sources) TokenExpirationSeconds: 3607 ConfigMapName: kube-root-ca.crt ConfigMapOptional: <nil> DownwardAPI: true ConfigMapName: openshift-service-ca.crt ConfigMapOptional: <nil> QoS Class: Burstable Node-Selectors: node-role.kubernetes.io/master= Tolerations: node-role.kubernetes.io/master:NoSchedule op=Exists node.kubernetes.io/memory-pressure:NoSchedule op=Exists node.kubernetes.io/not-ready:NoExecute op=Exists for 300s node.kubernetes.io/unreachable:NoExecute op=Exists for 300s Events: Type Reason Age From Message ---- ------ ---- ---- ------- Warning FailedScheduling 93m default-scheduler 0/1 nodes are available: 1 node(s) had untolerated taint {node.kubernetes.io/not-ready: }. preemption: 0/1 nodes are available: 1 Preemption is not helpful for scheduling.. Normal Scheduled 92m default-scheduler Successfully assigned openshift-cloud-network-config-controller/cloud-network-config-controller-565df6f4b5-sb8kv to huirwang-0828d-s424j-master-0 Normal AddedInterface 92m multus Add eth0 [10.128.0.30/23] from ovn-kubernetes Normal Pulling 92m kubelet Pulling image "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:c4d8734ad517b36d7ac0baef72c183b3183e0fa8aa4a465f691bfbf262348970" Normal Pulled 91m kubelet Successfully pulled image "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:c4d8734ad517b36d7ac0baef72c183b3183e0fa8aa4a465f691bfbf262348970" in 14.561801668s (14.561823468s including waiting) Normal Created 85m (x5 over 91m) kubelet Created container controller Normal Started 85m (x5 over 91m) kubelet Started container controller Normal Pulled 6m58s (x18 over 91m) kubelet Container image "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:c4d8734ad517b36d7ac0baef72c183b3183e0fa8aa4a465f691bfbf262348970" already present on machine Warning BackOff 114s (x332 over 91m) kubelet Back-off restarting failed container controller in pod cloud-network-config-controller-565df6f4b5-sb8kv_openshift-cloud-network-config-controller(ab850390-97a3-4fe5-83b7-1bd3c1628470
Expected results:
CNCC pod runs smoothly
Additional info:
- is blocked by
-
OCPBUGS-9972 Azure; NLB; OVN-K: Requests from CNI pods to internalAPI server domain fails intermittently
- Closed
- links to
-
RHBA-2023:5382 OpenShift Container Platform 4.13.z bug fix update