-
Bug
-
Resolution: Unresolved
-
Major
-
None
-
4.14.z
-
None
-
No
-
False
-
Description of problem:
To ensure the functionality of offline SDN migration of OpenShift SDN to OVN-IC at large scale, performed a SDN-OVNK Migration on a cluster which is pre-loaded with cluster-density-v2 workload. Post updating the networkType field of the Network.config.openshift.io CR to OVNKubernetes followed by a reboot, the nodes hosting the Monitoring Operator was in "False" state
Version-Release number of selected component (if applicable):
OCP Version: 4.14.10 ovs-vswitchd (Open vSwitch) 3.1.2
How reproducible:
Reproducible at Scale (252 nodes)
The step listed below will perform SDN--->OVN-K Migration.
1. git clone https://github.com/krishvoor/e2e-benchmarking 2. cd e2e-benchmarking/workloads/sdn2ovn/ 3. ./run.sh
Actual results:
[root@vkommadi aws_249_nodes]# oc get co NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE MESSAGE authentication 4.14.10 True False False 10m baremetal 4.14.10 True False False 6h17m cloud-controller-manager 4.14.10 True False False 6h20m cloud-credential 4.14.10 True False False 6h21m cluster-autoscaler 4.14.10 True False False 6h18m config-operator 4.14.10 True False False 6h19m console 4.14.10 True False False 10m control-plane-machine-set 4.14.10 True False False 17m csi-snapshot-controller 4.14.10 True False False 21m dns 4.14.10 True False False 6h17m etcd 4.14.10 True False False 6h16m image-registry 4.14.10 True False False 14m ingress 4.14.10 True False False 19m insights 4.14.10 True False False 6h12m kube-apiserver 4.14.10 True False False 6h14m kube-controller-manager 4.14.10 True False False 6h15m kube-scheduler 4.14.10 True False False 6h15m kube-storage-version-migrator 4.14.10 True False False 20m machine-api 4.14.10 True False False 6h14m machine-approver 4.14.10 True False False 6h18m machine-config 4.14.10 True False False 122m marketplace 4.14.10 True False False 6h18m monitoring 4.14.10 False True True 11m reconciling Prometheus Operator Admission Webhook Deployment failed: updating Deployment object failed: waiting for DeploymentRollout of openshift-monitoring/prometheus-operator-admission-webhook: context deadline exceeded network 4.14.10 True True True 6h19m DaemonSet "/openshift-ovn-kubernetes/ovnkube-node" rollout is not making progress - pod ovnkube-node-4vq2k is in CrashLoopBackOff State... node-tuning 4.14.10 True False False 6h17m openshift-apiserver 4.14.10 True False False 19m openshift-controller-manager 4.14.10 True False False 6h17m openshift-samples 4.14.10 True False False 6h11m operator-lifecycle-manager 4.14.10 True False False 6h18m operator-lifecycle-manager-catalog 4.14.10 True False False 6h18m operator-lifecycle-manager-packageserver 4.14.10 True False False 19m service-ca 4.14.10 True False False 6h18m storage 4.14.10 True False False 18m [root@vkommadi aws_249_nodes]#
Expected results:
CNI is Successfully Migrated to OVN-Kubernetes, all nodes are up and active
Additional info:
[root@vkommadi aws_249_nodes]# oc get po -n openshift-monitoring -o wide | grep -v Running NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES monitoring-plugin-764d5bd484-4nmks 0/1 ContainerCreating 0 20m <none> ip-10-0-68-231.us-west-2.compute.internal <none> <none> monitoring-plugin-764d5bd484-t9dhq 0/1 ContainerCreating 1 86m <none> ip-10-0-45-234.us-west-2.compute.internal <none> <none> prometheus-operator-admission-webhook-6f5668f5dd-g2j6d 0/1 ContainerCreating 1 135m <none> ip-10-0-20-163.us-west-2.compute.internal <none> <none> prometheus-operator-admission-webhook-6f5668f5dd-gq5sh 0/1 ContainerCreating 1 86m <none> ip-10-0-57-31.us-west-2.compute.internal <none> <none> [root@vkommadi aws_249_nodes]# oc get no/ip-10-0-20-163.us-west-2.compute.internal -oyaml | grep -i machineConfig machineconfiguration.openshift.io/controlPlaneTopology: HighlyAvailable machineconfiguration.openshift.io/currentConfig: rendered-worker-3e2e53c81c94205dce819f2824ea82ff machineconfiguration.openshift.io/desiredConfig: rendered-worker-3e2e53c81c94205dce819f2824ea82ff machineconfiguration.openshift.io/desiredDrain: uncordon-rendered-worker-3e2e53c81c94205dce819f2824ea82ff machineconfiguration.openshift.io/lastAppliedDrain: uncordon-rendered-worker-3e2e53c81c94205dce819f2824ea82ff machineconfiguration.openshift.io/lastSyncedControllerConfigResourceVersion: "1470650" machineconfiguration.openshift.io/reason: "" machineconfiguration.openshift.io/state: Done [root@vkommadi aws_249_nodes]# oc get mcp NAME CONFIG UPDATED UPDATING DEGRADED MACHINECOUNT READYMACHINECOUNT UPDATEDMACHINECOUNT DEGRADEDMACHINECOUNT AGE master rendered-master-6d95af83deed644562dc33d38a3712ba True False False 3 3 3 0 6h20m worker rendered-worker-3e2e53c81c94205dce819f2824ea82ff False True False 252 3 252 0 6h20m [root@vkommadi aws_249_nodes]# ================================================== [root@vkommadi aws_249_nodes]# oc describe po prometheus-operator-admission-webhook-6f5668f5dd-g2j6d -n openshift-monitoring ...... Type Reason Age From Message ---- ------ ---- ---- ------- Normal Scheduled 137m default-scheduler Successfully assigned openshift-monitoring/prometheus-operator-admission-webhook-6f5668f5dd-g2j6d to ip-10-0-20-163.us-west-2.compute.internal Normal AddedInterface 137m multus Add eth0 [10.130.40.9/23] from openshift-sdn Normal Pulling 137m kubelet Pulling image "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:8d85ff677a4e42abc3d951b761e61421eb3b9c92e5bd7e33a2085a18580349d5" Normal Pulled 137m kubelet Successfully pulled image "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:8d85ff677a4e42abc3d951b761e61421eb3b9c92e5bd7e33a2085a18580349d5" in 3.127892796s (3.127910981s including waiting) Normal Created 137m kubelet Created container prometheus-operator-admission-webhook Normal Started 137m kubelet Started container prometheus-operator-admission-webhook Warning FailedMount 29m kubelet MountVolume.SetUp failed for volume "tls-certificates" : object "openshift-monitoring"/"prometheus-operator-admission-webhook-tls" not registered Warning NetworkNotReady 4m16s (x730 over 29m) kubelet network is not ready: container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: No CNI configuration file in /etc/kubernetes/cni/net.d/. Has your network provider started? [root@vkommadi aws_249_nodes]#