Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-20255

WMCO logs a misleading error each time it reboots an instance

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Major Major
    • 4.16.0
    • 4.14, 4.15
    • Windows Containers
    • None
    • Moderate
    • No
    • 3
    • WINC - Sprint 251, WINC - Sprint 252
    • 2
    • False
    • Hide

      None

      Show
      None
    • Hide
      WMCO was logging error messages when any commands ran on a Windows instance through SSH failed. This was incorrect behavior as some commands were expected to fail. This has been fixed so that only true errors are now logged. (link:https://issues.redhat.com/browse/OCPBUGS-20255[*OCPBUGS-20255*])
      Show
      WMCO was logging error messages when any commands ran on a Windows instance through SSH failed. This was incorrect behavior as some commands were expected to fail. This has been fixed so that only true errors are now logged. (link: https://issues.redhat.com/browse/OCPBUGS-20255 [* OCPBUGS-20255 *])
    • Bug Fix
    • In Progress

      Description of problem:

      When WMCO reboots a node, it ensures the reboot takes place by trying to run powershell commands on the instance until they fail, meaning the SSH connection went down as expected due to the ongoing reboot. However, WMCO logs an error even though this is expected, which could be misleading

      Version-Release number of selected component (if applicable):

      4.15, 4.14

      How reproducible:

      Always

      Steps to Reproduce:

      1. Cause a Windows instance to reboot (either by having WMCO enable the Containers feature on the OS or by editing global cluster-wide proxy settings)
      2. Inspect logs for error running powershell command
      

      Actual results:

      WMCO erroneously logs an error during instance reboots

      Expected results:

      WMCO should not log anything if the error is expected as it is a false negative

      Additional info:

      WMCO's vm.Run() method logs any error running a command -- we should add a mechanism to not log errors if they are expected.
      
      Example log:
      {"level":"info","ts":"2023-10-09T13:44:08Z","logger":"wc 192.168.221.240","msg":"rebooting instance"}
      {"level":"error","ts":"2023-10-09T13:44:14Z","logger":"wc 192.168.221.240","msg":"error running","cmd":"powershell.exe -NonInteractive -ExecutionPolicy Bypass \"Get-Help\"","out":"","error":"read tcp 192.168.221.205:49170->192.168.221.240:22: read: connection reset by peer","stacktrace":"github.com/openshift/windows-machine-config-operator/pkg/windows.(*windows).Run\n\t/remote-source/build/windows-machine-config-operator/pkg/windows/windows.go:387\ngithub.com/openshift/windows-machine-config-operator/pkg/windows.(*windows).waitUntilUnreachable.func1\n\t/remote-source/build/windows-machine-config-operator/pkg/windows/windows.go:972\nk8s.io/apimachinery/pkg/util/wait.loopConditionUntilContext.func2\n\t/remote-source/build/windows-machine-config-operator/vendor/k8s.io/apimachinery/pkg/util/wait/loop.go:73\nk8s.io/apimachinery/pkg/util/wait.loopConditionUntilContext\n\t/remote-source/build/windows-machine-config-operator/vendor/k8s.io/apimachinery/pkg/util/wait/loop.go:74\nk8s.io/apimachinery/pkg/util/wait.PollUntilContextTimeout\n\t/remote-source/build/windows-machine-config-operator/vendor/k8s.io/apimachinery/pkg/util/wait/poll.go:48\ngithub.com/openshift/windows-machine-config-operator/pkg/windows.(*windows).waitUntilUnreachable\n\t/remote-source/build/windows-machine-config-operator/pkg/windows/windows.go:970\ngithub.com/openshift/windows-machine-config-operator/pkg/windows.(*windows).RebootAndReinitialize\n\t/remote-source/build/windows-machine-config-operator/pkg/windows/windows.go:402\ngithub.com/openshift/windows-machine-config-operator/pkg/windows.(*windows).ensureHostNameAndContainersFeature\n\t/remote-source/build/windows-machine-config-operator/pkg/windows/windows.go:589\ngithub.com/openshift/windows-machine-config-operator/pkg/windows.(*windows).Bootstrap\n\t/remote-source/build/windows-machine-config-operator/pkg/windows/windows.go:450\ngithub.com/openshift/windows-machine-config-operator/pkg/nodeconfig.(*nodeConfig).Configure\n\t/remote-source/build/windows-machine-config-operator/pkg/nodeconfig/nodeconfig.go:165\ngithub.com/openshift/windows-machine-config-operator/controllers.(*instanceReconciler).ensureInstanceIsUpToDate\n\t/remote-source/build/windows-machine-config-operator/controllers/controllers.go:110\ngithub.com/openshift/windows-machine-config-operator/controllers.(*WindowsMachineReconciler).configureMachine\n\t/remote-source/build/windows-machine-config-operator/controllers/windowsmachine_controller.go:426\ngithub.com/openshift/windows-machine-config-operator/controllers.(*WindowsMachineReconciler).Reconcile\n\t/remote-source/build/windows-machine-config-operator/controllers/windowsmachine_controller.go:347\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile\n\t/remote-source/build/windows-machine-config-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:118\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/remote-source/build/windows-machine-config-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:314\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/remote-source/build/windows-machine-config-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:265\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/remote-source/build/windows-machine-config-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:226"}

       

            rh-ee-ssoto Sebastian Soto
            mohashai Mohammad Shaikh
            Aharon Rasouli Aharon Rasouli
            Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

              Created:
              Updated: