Loading...

XML

Word

Printable

Type: Bug
Resolution: Unresolved
Priority: Major
Fix Version/s: None
Affects Version/s: 4.14.z
Component/s: Networking / ovn-kubernetes
Labels:
None

Regression:
No
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Target Version:

4.14.z

SFDC Cases Counter:
SFDC Cases Links:

Description of problem:

To ensure the functionality of offline SDN migration of OpenShift SDN to OVN-IC at large scale, performed a SDN-OVNK Migration on a cluster which is pre-loaded with cluster-density-v2 workload.

Post updating the networkType field of the Network.config.openshift.io CR to OVNKubernetes followed by a reboot, the nodes hosting the Monitoring Operator was in "False" state

Version-Release number of selected component (if applicable):

    OCP Version: 4.14.10
    ovs-vswitchd (Open vSwitch) 3.1.2

How reproducible:

    Reproducible at Scale (252 nodes)

The step listed below will perform SDN--->OVN-K Migration.

    1. git clone https://github.com/krishvoor/e2e-benchmarking
    2. cd e2e-benchmarking/workloads/sdn2ovn/
    3. ./run.sh

Actual results:

[root@vkommadi aws_249_nodes]# oc get co
NAME                                       VERSION   AVAILABLE   PROGRESSING   DEGRADED   SINCE   MESSAGE
authentication                             4.14.10   True        False         False      10m     
baremetal                                  4.14.10   True        False         False      6h17m   
cloud-controller-manager                   4.14.10   True        False         False      6h20m   
cloud-credential                           4.14.10   True        False         False      6h21m   
cluster-autoscaler                         4.14.10   True        False         False      6h18m   
config-operator                            4.14.10   True        False         False      6h19m   
console                                    4.14.10   True        False         False      10m     
control-plane-machine-set                  4.14.10   True        False         False      17m     
csi-snapshot-controller                    4.14.10   True        False         False      21m     
dns                                        4.14.10   True        False         False      6h17m   
etcd                                       4.14.10   True        False         False      6h16m   
image-registry                             4.14.10   True        False         False      14m     
ingress                                    4.14.10   True        False         False      19m     
insights                                   4.14.10   True        False         False      6h12m   
kube-apiserver                             4.14.10   True        False         False      6h14m   
kube-controller-manager                    4.14.10   True        False         False      6h15m   
kube-scheduler                             4.14.10   True        False         False      6h15m   
kube-storage-version-migrator              4.14.10   True        False         False      20m     
machine-api                                4.14.10   True        False         False      6h14m   
machine-approver                           4.14.10   True        False         False      6h18m   
machine-config                             4.14.10   True        False         False      122m    
marketplace                                4.14.10   True        False         False      6h18m   
monitoring                                 4.14.10   False       True          True       11m     reconciling Prometheus Operator Admission Webhook Deployment failed: updating Deployment object failed: waiting for DeploymentRollout of openshift-monitoring/prometheus-operator-admission-webhook: context deadline exceeded
network                                    4.14.10   True        True          True       6h19m   DaemonSet "/openshift-ovn-kubernetes/ovnkube-node" rollout is not making progress - pod ovnkube-node-4vq2k is in CrashLoopBackOff State...
node-tuning                                4.14.10   True        False         False      6h17m   
openshift-apiserver                        4.14.10   True        False         False      19m     
openshift-controller-manager               4.14.10   True        False         False      6h17m   
openshift-samples                          4.14.10   True        False         False      6h11m   
operator-lifecycle-manager                 4.14.10   True        False         False      6h18m   
operator-lifecycle-manager-catalog         4.14.10   True        False         False      6h18m   
operator-lifecycle-manager-packageserver   4.14.10   True        False         False      19m     
service-ca                                 4.14.10   True        False         False      6h18m   
storage                                    4.14.10   True        False         False      18m     
[root@vkommadi aws_249_nodes]#

Expected results:

CNI is Successfully Migrated to OVN-Kubernetes, all nodes are up and active

Additional info:

[root@vkommadi aws_249_nodes]# oc get po -n openshift-monitoring -o wide | grep -v Running
NAME                                                     READY   STATUS              RESTARTS   AGE     IP              NODE                                        NOMINATED NODE   READINESS GATES
monitoring-plugin-764d5bd484-4nmks                       0/1     ContainerCreating   0          20m     <none>          ip-10-0-68-231.us-west-2.compute.internal   <none>           <none>
monitoring-plugin-764d5bd484-t9dhq                       0/1     ContainerCreating   1          86m     <none>          ip-10-0-45-234.us-west-2.compute.internal   <none>           <none>
prometheus-operator-admission-webhook-6f5668f5dd-g2j6d   0/1     ContainerCreating   1          135m    <none>          ip-10-0-20-163.us-west-2.compute.internal   <none>           <none>
prometheus-operator-admission-webhook-6f5668f5dd-gq5sh   0/1     ContainerCreating   1          86m     <none>          ip-10-0-57-31.us-west-2.compute.internal    <none>           <none>
[root@vkommadi aws_249_nodes]# oc get no/ip-10-0-20-163.us-west-2.compute.internal -oyaml | grep -i machineConfig
    machineconfiguration.openshift.io/controlPlaneTopology: HighlyAvailable
    machineconfiguration.openshift.io/currentConfig: rendered-worker-3e2e53c81c94205dce819f2824ea82ff
    machineconfiguration.openshift.io/desiredConfig: rendered-worker-3e2e53c81c94205dce819f2824ea82ff
    machineconfiguration.openshift.io/desiredDrain: uncordon-rendered-worker-3e2e53c81c94205dce819f2824ea82ff
    machineconfiguration.openshift.io/lastAppliedDrain: uncordon-rendered-worker-3e2e53c81c94205dce819f2824ea82ff
    machineconfiguration.openshift.io/lastSyncedControllerConfigResourceVersion: "1470650"
    machineconfiguration.openshift.io/reason: ""
    machineconfiguration.openshift.io/state: Done
[root@vkommadi aws_249_nodes]# oc get mcp
NAME     CONFIG                                             UPDATED   UPDATING   DEGRADED   MACHINECOUNT   READYMACHINECOUNT   UPDATEDMACHINECOUNT   DEGRADEDMACHINECOUNT   AGE
master   rendered-master-6d95af83deed644562dc33d38a3712ba   True      False      False      3              3                   3                     0                      6h20m
worker   rendered-worker-3e2e53c81c94205dce819f2824ea82ff   False     True       False      252            3                   252                   0                      6h20m
[root@vkommadi aws_249_nodes]#
==================================================

[root@vkommadi aws_249_nodes]# oc describe po prometheus-operator-admission-webhook-6f5668f5dd-g2j6d -n openshift-monitoring
......
  Type     Reason           Age                    From               Message
  ----     ------           ----                   ----               -------
  Normal   Scheduled        137m                   default-scheduler  Successfully assigned openshift-monitoring/prometheus-operator-admission-webhook-6f5668f5dd-g2j6d to ip-10-0-20-163.us-west-2.compute.internal
  Normal   AddedInterface   137m                   multus             Add eth0 [10.130.40.9/23] from openshift-sdn
  Normal   Pulling          137m                   kubelet            Pulling image "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:8d85ff677a4e42abc3d951b761e61421eb3b9c92e5bd7e33a2085a18580349d5"
  Normal   Pulled           137m                   kubelet            Successfully pulled image "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:8d85ff677a4e42abc3d951b761e61421eb3b9c92e5bd7e33a2085a18580349d5" in 3.127892796s (3.127910981s including waiting)
  Normal   Created          137m                   kubelet            Created container prometheus-operator-admission-webhook
  Normal   Started          137m                   kubelet            Started container prometheus-operator-admission-webhook
  Warning  FailedMount      29m                    kubelet            MountVolume.SetUp failed for volume "tls-certificates" : object "openshift-monitoring"/"prometheus-operator-admission-webhook-tls" not registered
  Warning  NetworkNotReady  4m16s (x730 over 29m)  kubelet            network is not ready: container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: No CNI configuration file in /etc/kubernetes/cni/net.d/. Has your network provider started?
[root@vkommadi aws_249_nodes]#

Assignee:: Peng Liu

Reporter:: Krishna Harsha Voora

QA Contact:: Anurag Saxena

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Created:: 2024/04/22 2:48 PM

Updated:: 2024/04/30 2:12 PM

Details

Description

Attachments

Activity

People

Dates