Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-14700

BYOH node failed to upgrade: Cannot remove item C:\\var\\log\\containerd\\containerd.log: The process cannot access the file \r\n'containerd.log' because it is being used by another process

XMLWordPrintable

    • +
    • Critical
    • Yes
    • 3
    • WINC - Sprint 238
    • 1
    • False
    • Hide

      None

      Show
      None
    • Fixes an issue which would stop Windows Nodes from being deconfigured due to failing to remove the containerd log file. This has been fixed by properly stopping containerd before attempting to remove the log file.
    • Bug Fix

      Description of problem:

      Upgrading a BYOH node is failing, the BYOH node after upgrade remains in NotReady,SchedulingDisabled'
      
      {"level":"error","ts":"2023-06-07T16:52:46Z","msg":"Reconciler error","controller":"configmap","controllerGroup":"","controllerKind":"ConfigMap","ConfigMap":{"name":"windows-instances","namespace":"openshift-windows-machine-config-operator"},"namespace":"openshift-windows-machine-config-operator","name":"windows-instances","reconcileID":"8e6cc51d-9fd4-4e44-b39e-6b6d678c6422","error":"error configuring host with address 10.0.128.7: error deconfiguring instance: unable to remove created directories: unable to remove directory C:\\var\\log, out: Remove-Item : Cannot remove item C:\\var\\log\\containerd\\containerd.log: The process cannot access the file \r\n'containerd.log' because it is being used by another process.\r\nAt line:1 char:27\r\n+ if(Test-Path C:\\var\\log) {Remove-Item -Recurse -Force C:\\var\\log}\r\n+                           ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\r\n    + CategoryInfo          : WriteError: (containerd.log:FileInfo) [Remove-Item], IOException\r\n    + FullyQualifiedErrorId : RemoveFileSystemItemIOError,Microsoft.PowerShell.Commands.RemoveItemCommand\r\nRemove-Item : Cannot remove item C:\\var\\log\\containerd: The directory is not empty.\r\nAt line:1 char:27\r\n+ if(Test-Path C:\\var\\log) {Remove-Item -Recurse -Force C:\\var\\log}\r\n+                           ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\r\n    + CategoryInfo          : WriteError: (containerd:DirectoryInfo) [Remove-Item], IOException\r\n    + FullyQualifiedErrorId : RemoveFileSystemItemIOError,Microsoft.PowerShell.Commands.RemoveItemCommand\r\nRemove-Item : Cannot remove item C:\\var\\log: The directory is not empty.\r\nAt line:1 char:27\r\n+ if(Test-Path C:\\var\\log) {Remove-Item -Recurse -Force C:\\var\\log}\r\n+                           ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\r\n    + CategoryInfo          : WriteError: (C:\\var\\log:DirectoryInfo) [Remove-Item], IOException\r\n    + FullyQualifiedErrorId : RemoveFileSystemItemIOError,Microsoft.PowerShell.Commands.RemoveItemCommand\r\n, err: error running powershell.exe -NonInteractive -ExecutionPolicy Bypass \"if(Test-Path C:\\var\\log) {Remove-Item -Recurse -Force C:\\var\\log}\": Process exited with status 1","stacktrace":"sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/remote-source/build/windows-machine-config-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:329\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/remote-source/build/windows-machine-config-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:274\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/remote-source/build/windows-machine-config-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:235"}

      Version-Release number of selected component (if applicable):

      Upgrading from:
      windows-services-7.0.1-bc9473b         2      6h13m
      To:
      windows-services-8.0.1-01a3618         2      53m

      How reproducible:

      Most likely, in AWS passed

      Steps to Reproduce:

      1. Install a BYOH node server 2022 on GCP (not via machineset)
      2. perform upgrade from 7.0.1-bc9473b 4.12 to 4.13 8.0.1-01a3618
      3. wait until the upgrade completed 
      

      Actual results:

      In case machineset node get upgraded BYOH is stuck in NotReady,SchedulingDisabled

      Expected results:

      Nodes should be in Ready after upgrade with the correct kubelet version

      Additional info:

       oc get nodes -owide
      NAME                                                        STATUS                        ROLES                  AGE     VERSION                       INTERNAL-IP   EXTERNAL-IP   OS-IMAGE                                                       KERNEL-VERSION                 CONTAINER-RUNTIME
      mgcp-byoh-0.c.openshift-qe.internal                         NotReady,SchedulingDisabled   worker                 4h25m   v1.25.0-2653+a34b9e9499e6c3   10.0.128.7    <none>        Windows Server 2022 Datacenter                                 10.0.20348.1726                containerd://1.19
      rrasouli-397-x7hdb-master-0.c.openshift-qe.internal         Ready                         control-plane,master   6h55m   v1.26.5+7a891f0               10.0.0.3      <none>        Red Hat Enterprise Linux CoreOS 413.92.202306010245-0 (Plow)   5.14.0-284.16.1.el9_2.x86_64   cri-o://1.26.3-8.rhaos4.13.gitec064c9.el9
      rrasouli-397-x7hdb-master-1.c.openshift-qe.internal         Ready                         control-plane,master   6h56m   v1.26.5+7a891f0               10.0.0.5      <none>        Red Hat Enterprise Linux CoreOS 413.92.202306010245-0 (Plow)   5.14.0-284.16.1.el9_2.x86_64   cri-o://1.26.3-8.rhaos4.13.gitec064c9.el9
      rrasouli-397-x7hdb-master-2.c.openshift-qe.internal         Ready                         control-plane,master   6h54m   v1.26.5+7a891f0               10.0.0.4      <none>        Red Hat Enterprise Linux CoreOS 413.92.202306010245-0 (Plow)   5.14.0-284.16.1.el9_2.x86_64   cri-o://1.26.3-8.rhaos4.13.gitec064c9.el9
      rrasouli-397-x7hdb-worker-a-5872s.c.openshift-qe.internal   Ready                         worker                 6h44m   v1.26.5+7a891f0               10.0.128.3    <none>        Red Hat Enterprise Linux CoreOS 413.92.202306010245-0 (Plow)   5.14.0-284.16.1.el9_2.x86_64   cri-o://1.26.3-8.rhaos4.13.gitec064c9.el9
      rrasouli-397-x7hdb-worker-b-fsc8d.c.openshift-qe.internal   Ready                         worker                 6h44m   v1.26.5+7a891f0               10.0.128.2    <none>        Red Hat Enterprise Linux CoreOS 413.92.202306010245-0 (Plow)   5.14.0-284.16.1.el9_2.x86_64   cri-o://1.26.3-8.rhaos4.13.gitec064c9.el9

            rh-ee-ssoto Sebastian Soto
            rrasouli Aharon Rasouli
            Aharon Rasouli Aharon Rasouli
            Votes:
            0 Vote for this issue
            Watchers:
            9 Start watching this issue

              Created:
              Updated:
              Resolved: