Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-4285

Windows node failed to become ready after a while.

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Done
    • Icon: Undefined Undefined
    • 4.12
    • 4.11.z
    • Documentation
    • None
    • None
    • False
    • Hide

      None

      Show
      None

      This bug is a backport clone of [Bugzilla Bug 2072418](https://bugzilla.redhat.com/show_bug.cgi?id=2072418). The following is the description of the original bug:

      Must gather logs:

      1. Issue:
      2022-04-04T22:31:03.565344145Z {"level":"error","ts":1649111463.5605578,"logger":"controller-runtime.manager.controller.configmap","msg":"Reconciler error","reconciler group":"","reconciler kind":"ConfigMap","name":"windows-instances","namespace":"openshift-windows-machine-config-operator","error":"error configuring host with address 10.49.XXX.xxx: configuring node network failed: error waiting for k8s.ovn.org/hybrid-overlay-distributed-router-gateway-mac node annotation for usddceqap71215: timeout waiting for k8s.ovn.org/hybrid-overlay-distributed-router-gateway-mac node annotation: timed out waiting for the condition","errorVerbose":"timed out waiting for the condition\ntimeout waiting for k8s.ovn.org/hybrid-overlay-distributed-router-gateway-mac node annotation\ngithub.com/openshift/windows-machine-config-operator/pkg/nodeconfig.(*nodeConfig).waitForNodeAnnotation\n\t/remote-source/build/windows-machine-config-operator/pkg/nodeconfig/nodeconfig.go:352\ngithub.com/openshift/windows-machine-config-operator/pkg/nodeconfig.(*nodeConfig).configureNetwork\n\t/remote-source/build/windows-machine-config-operator/pkg/nodeconfig/nodeconfig.go:283\ngithub.com

      2. WMCO & OpenShift Version
      OCP 4.9 on BM UPI BYOH.

      3. Platform - AWS/Azure/VSpehre/Platform=none
      Baremetal (Platform=none)
      4. If the platform is vSphere, what is the VMware tools version?
      5. Is it a new test case or an old test case?
      if it is the old test case, is it regression or first-time tested?
      Is it platform-specific or consistent across all platforms?
      6. Steps to Reproduce
      7. Actual Result and Expected Result
      Node not ready, it should be in ready state.
      8. A possible workaround has been tried? Is there a way to recover from the issue being tried out?
      Not working.

      9. Logs
      WMCO pod logs:
      2022-04-04T22:18:17.697597006Z

      {"level":"info","ts":1649110697.6975124,"logger":"wc 10.49.168.80","msg":"configure","service":"hybrid-overlay-node","args":"--node usddceqap71215 --k8s-kubeconfig c:\\k\\kubeconfig --windows-service --logfile C:\\var\\log\\hybrid-overlay\\hybrid-overlay.log"}

      2022-04-04T22:20:18.300356816Z

      {"level":"info","ts":1649110818.2968366,"logger":"wc 10.49.168.80","msg":"configured","service":"hybrid-overlay-node","args":"--node usddceqap71215 --k8s-kubeconfig c:\\k\\kubeconfig --windows-service --logfile C:\\var\\log\\hybrid-overlay\\hybrid-overlay.log"}

      2022-04-04T22:31:03.565344145Z {"level":"error","ts":1649111463.5605578,"logger":"controller-runtime.manager.controller.configmap","msg":"Reconciler error","reconciler group":"","reconciler kind":"ConfigMap","name":"windows-instances","namespace":"openshift-windows-machine-config-operator","error":"error configuring host with address 10.49.xx.xx: configuring node network failed: error waiting for k8s.ovn.org/hybrid-overlay-distributed-router-gateway-mac node annotation for usddceqap71215: timeout waiting for k8s.ovn.org/hybrid-overlay-distributed-router-gateway-mac node annotation: timed out waiting for the condition","errorVerbose":"timed out waiting for the condition\ntimeout waiting for k8s.ovn.org/hybrid-overlay-distributed-router-gateway-mac node annotation\ngithub.com/openshift/windows-machine-config-operator/pkg/nodeconfig.(*nodeConfig).waitForNodeAnnotation\n\t/remote-source/build/windows-machine-config-operator/pkg/nodeconfig/nodeconfig.go:352\ngithub.com/openshift/windows-machine-config-operator/pkg/nodeconfig.(*nodeConfig).configureNetwork\n\t/remote-source/build/windows-machine-config-operator/pkg/nodeconfig/nodeconfig.go:283\ngithub.com

      spec:
      cloudConfig:
      name: ''
      platformSpec:
      type: None
      status:
      apiServerInternalURI: https://api-int.np-ocp.us.kworld.xxxxx.com:6443
      apiServerURL: https://api.np-ocp.us.kworld.xxxx.com:6443
      controlPlaneTopology: HighlyAvailable
      etcdDiscoveryDomain: ''
      infrastructureName: np-ocp-wj2xn
      infrastructureTopology: HighlyAvailable
      platform: None
      platformStatus:
      type: None

      NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
      usddceqap71215 Ready,SchedulingDisabled worker 2h18m v1.22.1-1739+c8538fcbd98efa 10.49.xx.xx <none> Windows Server 2019 Standard 10.0.17763.2686 docker://20.10.9

      Optional logs:
      Anything that can be useful to debug the issue.

              mburke@redhat.com Michael Burke
              openshift-crt-jira-prow OpenShift Prow Bot
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

                Created:
                Updated:
                Resolved: