Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-20083

BYOH node failed to upgrade: Cannot remove item C:\\var\\log\\containerd\\containerd.log: The process cannot access the file \r\n'containerd.log' because it is being used by another process

XMLWordPrintable

    • Critical
    • No
    • 0
    • Rejected
    • False
    • Hide

      None

      Show
      None

      This is a clone of issue OCPBUGS-14700. The following is the description of the original issue:

      Description of problem:

      Upgrading a BYOH node is failing, the BYOH node after upgrade remains in NotReady,SchedulingDisabled'
      
      {"level":"error","ts":"2023-06-07T16:52:46Z","msg":"Reconciler error","controller":"configmap","controllerGroup":"","controllerKind":"ConfigMap","ConfigMap":{"name":"windows-instances","namespace":"openshift-windows-machine-config-operator"},"namespace":"openshift-windows-machine-config-operator","name":"windows-instances","reconcileID":"8e6cc51d-9fd4-4e44-b39e-6b6d678c6422","error":"error configuring host with address 10.0.128.7: error deconfiguring instance: unable to remove created directories: unable to remove directory C:\\var\\log, out: Remove-Item : Cannot remove item C:\\var\\log\\containerd\\containerd.log: The process cannot access the file \r\n'containerd.log' because it is being used by another process.\r\nAt line:1 char:27\r\n+ if(Test-Path C:\\var\\log) {Remove-Item -Recurse -Force C:\\var\\log}\r\n+                           ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\r\n    + CategoryInfo          : WriteError: (containerd.log:FileInfo) [Remove-Item], IOException\r\n    + FullyQualifiedErrorId : RemoveFileSystemItemIOError,Microsoft.PowerShell.Commands.RemoveItemCommand\r\nRemove-Item : Cannot remove item C:\\var\\log\\containerd: The directory is not empty.\r\nAt line:1 char:27\r\n+ if(Test-Path C:\\var\\log) {Remove-Item -Recurse -Force C:\\var\\log}\r\n+                           ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\r\n    + CategoryInfo          : WriteError: (containerd:DirectoryInfo) [Remove-Item], IOException\r\n    + FullyQualifiedErrorId : RemoveFileSystemItemIOError,Microsoft.PowerShell.Commands.RemoveItemCommand\r\nRemove-Item : Cannot remove item C:\\var\\log: The directory is not empty.\r\nAt line:1 char:27\r\n+ if(Test-Path C:\\var\\log) {Remove-Item -Recurse -Force C:\\var\\log}\r\n+                           ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\r\n    + CategoryInfo          : WriteError: (C:\\var\\log:DirectoryInfo) [Remove-Item], IOException\r\n    + FullyQualifiedErrorId : RemoveFileSystemItemIOError,Microsoft.PowerShell.Commands.RemoveItemCommand\r\n, err: error running powershell.exe -NonInteractive -ExecutionPolicy Bypass \"if(Test-Path C:\\var\\log) {Remove-Item -Recurse -Force C:\\var\\log}\": Process exited with status 1","stacktrace":"sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/remote-source/build/windows-machine-config-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:329\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/remote-source/build/windows-machine-config-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:274\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/remote-source/build/windows-machine-config-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:235"}

      Version-Release number of selected component (if applicable):

      Upgrading from:
      windows-services-7.0.1-bc9473b         2      6h13m
      To:
      windows-services-8.0.1-01a3618         2      53m

      How reproducible:

      Most likely, in AWS passed

      Steps to Reproduce:

      1. Install a BYOH node server 2022 on GCP (not via machineset)
      2. perform upgrade from 7.0.1-bc9473b 4.12 to 4.13 8.0.1-01a3618
      3. wait until the upgrade completed 
      

      Actual results:

      In case machineset node get upgraded BYOH is stuck in NotReady,SchedulingDisabled

      Expected results:

      Nodes should be in Ready after upgrade with the correct kubelet version

      Additional info:

       oc get nodes -owide
      NAME                                                        STATUS                        ROLES                  AGE     VERSION                       INTERNAL-IP   EXTERNAL-IP   OS-IMAGE                                                       KERNEL-VERSION                 CONTAINER-RUNTIME
      mgcp-byoh-0.c.openshift-qe.internal                         NotReady,SchedulingDisabled   worker                 4h25m   v1.25.0-2653+a34b9e9499e6c3   10.0.128.7    <none>        Windows Server 2022 Datacenter                                 10.0.20348.1726                containerd://1.19
      rrasouli-397-x7hdb-master-0.c.openshift-qe.internal         Ready                         control-plane,master   6h55m   v1.26.5+7a891f0               10.0.0.3      <none>        Red Hat Enterprise Linux CoreOS 413.92.202306010245-0 (Plow)   5.14.0-284.16.1.el9_2.x86_64   cri-o://1.26.3-8.rhaos4.13.gitec064c9.el9
      rrasouli-397-x7hdb-master-1.c.openshift-qe.internal         Ready                         control-plane,master   6h56m   v1.26.5+7a891f0               10.0.0.5      <none>        Red Hat Enterprise Linux CoreOS 413.92.202306010245-0 (Plow)   5.14.0-284.16.1.el9_2.x86_64   cri-o://1.26.3-8.rhaos4.13.gitec064c9.el9
      rrasouli-397-x7hdb-master-2.c.openshift-qe.internal         Ready                         control-plane,master   6h54m   v1.26.5+7a891f0               10.0.0.4      <none>        Red Hat Enterprise Linux CoreOS 413.92.202306010245-0 (Plow)   5.14.0-284.16.1.el9_2.x86_64   cri-o://1.26.3-8.rhaos4.13.gitec064c9.el9
      rrasouli-397-x7hdb-worker-a-5872s.c.openshift-qe.internal   Ready                         worker                 6h44m   v1.26.5+7a891f0               10.0.128.3    <none>        Red Hat Enterprise Linux CoreOS 413.92.202306010245-0 (Plow)   5.14.0-284.16.1.el9_2.x86_64   cri-o://1.26.3-8.rhaos4.13.gitec064c9.el9
      rrasouli-397-x7hdb-worker-b-fsc8d.c.openshift-qe.internal   Ready                         worker                 6h44m   v1.26.5+7a891f0               10.0.128.2    <none>        Red Hat Enterprise Linux CoreOS 413.92.202306010245-0 (Plow)   5.14.0-284.16.1.el9_2.x86_64   cri-o://1.26.3-8.rhaos4.13.gitec064c9.el9

              rh-ee-mankulka Mansi Kulkarni
              openshift-crt-jira-prow OpenShift Prow Bot
              Aharon Rasouli Aharon Rasouli
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

                Created:
                Updated:
                Resolved: